oskarth.com

Grok LOC?

This is the seventh post in my series on Grokking xv6. In this post we will look at how I tested one of my original hypothesis starting out: understanding every single line of code.

The last two months I have been going through xv6. Here’s a quote on one of my hypotheses from my initial post:

  1. With dedicated study, I’ll be able to grok xv6 in about a month.

To make the concept of grokking more concrete:

  • understand every single line in the source code
  • teach it or parts of it to someone else
  • rewrite or extend parts of the source code

In this post I want to talk specifically about understanding every single line of code. Before we go into that there’s two observations I want to make:

  1. Time frame: I originally set out to go through xv6 in about a month. In reality it took close to two months, and not counting vacation around six weeks. This is still “about” a month but was a bit longer than expected and desired.

  2. Every single line of code: I realized after a while that this was a bit over-ambitious so I mentally adjusted the bar to understanding 95% of the code.

Methodology

With that said, how do we test a hypothesis like this? And what does it mean to understand a line of code? I decided to use a sampling method and classify lines of code into ones I either understood or didn’t. This test was somewhat subjective - essentially I asked myself if I understood what a line’s purpose was, and if I understood all the components of the line. For example, it was fine to check how related functions were defined and to look up basic documentation, but anything that required googling was a ‘No’. The idea is that if I wrote a piece of code myself from scratch this is about what I would expect to understand.

Since the xv6 code booklet was numbered with lines of code from about 1 to 10000, I simply generated a bunch of random numbers from 1 to 10000:

(repeatedly 50 #(rand-int 10000)) ; Clojurescript

This ensured uniform sampling. Next I removed all the lines that matched one of the following: empty lines, comments, #include statements, and lines of code with very few characters in them, such as { or }. The goal of this was to get real lines of code only, and not ones that have a very generic explanation.

I repeated the above process until I had 20 lines of real code. 20 was chosen as being a good balance between getting enough of a sample for my purposes, and not taking too long to go through.

Since I would only be checking a sample as opposed to the whole population (i.e. all ~ 10 000 lines of code), I would only get an estimate for my understanding of the code.

I decided that I could consider my hypothesis false if the observed value was more than two standard deviations away (outside a 95% confidence interval) from the expected value. Assuming the hypothesis that I would understand 95% of the code and that I would look at 20 lines of code, I used a little utility I wrote, rrange, to see what the range would be.

~$ rrange 0.95 20    # unix util
Around 19 ~ [17, 20]

This means that if I have a 95% understanding of the whole code base and I look at 20 random lines of code, I should expect to understand 17 to 20 of them.

When making statistical claims such as this, one has to be careful about not confusing the sample and population. For example, if I have a real understanding of 70% of the code, it wouldn’t be that unlikely to get 18 out of 20 ‘Yes’s’.

~$ rrange 0.7 20
Around 14 ~ [10, 18]

Another potential source of error is that I performed the test on myself, and I have a vested interest in getting a good outcome. A more objective test would be desirable.

Results

Here’s a list of the 20 samples, together with a note on its context and a brief comment on my understanding of the line and its purpose. Sometimes, in the case of a return statement, the test is whether I understand why that thing is being returned.

You can find the source code here.

Conclusion and further work

My understanding of the sample code was within two standard deviations of the estimated value, so I failed to reject the hypothesis that I understand 95% of the xv6 source code. Furthermore, it seems unlikely that I understand less than 70% of the source code.

Initially I was skeptical about my ability to test this hypothesis, but I’m pretty happy with the method used in this article. A few weeks ago I did a trial run, and I found that I got more ‘No’s’ on samples related to the filesystem, which I had yet to study by then. A similar number of samples related to the file system were present this time around, and I got more ‘Yes’s’ on those samples, which seems to reflect my deepened understanding of that part of the code base. This suggests that the test for understanding that I’m using is not completely unreasonable.

The test still leaves a lot to desire though, primarily because of two reasons: (a) it lacks objectivity, and (b) it doesn’t touch on the essence of programming. The essence of programming is to program, as opposed to reading other people’s programs. A different direction that I think would be interesting to pursue is to re-create an OS or part of it from scratch. However, I think this test, along with the related homework assignments and the other posts in this series, are good enough for my present purposes.

We are almost coming to an end to the series. The next step will be to use my knowledge and do something new with it.

(If you liked this, you might enjoy Writing a Lisp Interpreter.)

Subscribe to stay up to date