Coding Dojo Report: Writing a version control system

February 2011 • Daniel Lucraft

It’s always difficult to come up with an original idea for a fun Coding Dojo. At Songkick our dojos tend to follow the simple pattern of: here’s a new language you have never used, try and get your head around it in 2 hours enough to solve a small problem.

And “new language dojos” are fun. Previously I’ve run Clojure and Factor dojos, and others Scala and Haskell dojos. But although they’re really good for getting better at quickly grasping something new, they’re not so good at deepening your coding skills.

So for last weeks dojo I wanted to do two things. First, go back to Ruby. We’re still pure Ruby at Songkick, so let’s stick to that and improve our software skills rather than grasping at something new.

Second, come up with something a lot more interesting than the usual Fibonacci or bowling style problem. We are using Ruby! 4 developers should be able to accomplish much more than that in 2 hours.

The Premise

Here’s what I came up with: the Git object system is very simple. Let’s write enough of a version control system that we can replicate that object model. We’ll learn about Git and practice developing Ruby object models at the same time.

Now Git has enough optimizations and enhancements layered over the simple object model that 2 hours is probably not enough to write something that reads from the .git repository directory.

So I quickly wrote a simple Ruby version control library, that just has that core object model, and can only commit, checkout, log and show. We will use the repository generated by that program as the beginning of our problem.

Here’s the backstory.

The Dojo Backstory

Our dojo master has written a git-like version control system in Ruby, called Ruby Version Control (RVC).

As a good developer, he committed the code as he went along.

AND he used RVC to track the development of itself, once that was possible. He thought to himself, this gets me extra points for being SELF-HOSTING. (A magic incantation that instantly makes your project many times cooler.)

But our developer was soon humbled. You see, he accidentally deleted all the RVC code files from the working directory of his RVC project.

No problem, he said to himself, for I have been diligently committing as I went along, so I can retrieve the code from the RVC … repository …

Wait.

If the code is stored in the RVC repository, and the only thing that is able to retrieve code from an RVC repository is RVC itself, and I have just deleted all the files that make up RVC from the disk…. how do I read the RVC repository and get the files back???

At this point readers might find a diagram useful. The RVC project our developer is writing, looked like this before the mishap.

rvc/
  .rvc/
    - lots of binary object files...
  bin/
    rvc
  lib/
    rvc/
    rvc.rb
  spec/
    spec_helper.rb
    rvc_spec.rb

So all the versions of the code were being stored in .rvc, and the current version of the code lives in the working directory. The bin/rvc binary is used to read from and commit to the .rvc repository.

After the mishap, all that is left is the .rvc repository:

rvc/
  .rvc/
    - lots of binary object files...

But because the rvc binary and library has been deleted, there is nothing that can read these files, so the situation appears hopeless…

Fortunately, there is a team of experienced developers at hand, who have been looking for an interesting problem to take on! Perhaps they can help?

Running the Dojo

So, practically the dojo participants were given a .rvc directory, and the challenge to read the version info from there and restore the project.

A little bit of RVC/git knowledge is required. It is enough to say these things:

RVC has three types of object, the Commit, the Tree (representing a Directory), and the Blob (representing a File).
A Commit contains a SHA1 pointer to the Tree (directory) that is checked in, a message, a username, a timestamp, and a SHA1 pointer to the parent commit.
A Tree contains a list of Trees and Blobs that are inside it.
A Blob contains file data.
The HEAD is the most recent Commit. Objects are zlibbed

Results

I was a bit unsure how well this would go down with the Songkickers. But everyone seemed to have fun and by the end they had recreated enough of RVC to read the repository, inspect the logs and restore the working directory. Saved! ;)

Along the way, we learned some useful things:

We learned more viscerally how Git’s object model works. RVC is similar enough to git that if you have worked through the RVC problem you now know quite a bit about how git stores things internally.
We learned about the Zlib standard library, which is very useful when you need it.
We had a debate about how to represent the object system of RVC, and about the merits of an ObjectFactory (the team couldn’t bring themselves to name a Ruby class somethingFactory, so they called it a StorageAdapter. Same difference.)
We started off not writing tests, lost track of what we were doing, and then recommitted to TDD half way through.

So I’m counting this Dojo as a success!

Playing along at home

If you would like to play along at home, you can checkout the project on github here. There are several branches:

master: Contains just the .rvc directory. Your task is to extract the information in this repository.

The other branches contain solutions, so don’t peek if you want to play!

dojo-solution: This is the solution the Songkick devs came up with over the course of 2 hours
original-rvc: This contains the original code for RVC that I wrote to be able to create the .rvc repo in the first place!