Monday, 22 March 2010

An Analytical Anniversary

Today is my anniversary.  I have been at Symplectic Ltd for one of your Earth "years".  And a very busy one it has been, what with writing repository integration tools for our research management system to deposit content into DSpace, EPrints and Fedora, plus supporting the integration into a number of other platforms.  I thought it would be fun to do a bit of a breakdown of the code that I've written from scratch in the last 12 months (which I'm counting as 233 working days).  I'm going to do an analysis of the following areas of productivity:

  • lines of code
  • lines of inline code commentary
  • number of A4 pages of documentation (end user, administrator and technical)
  • number of version control commits

Lets start from the bottom and work upwards.

Number of version control commits

Total: 700

Per day: 3

I tend to commit units of work, so this might suggest that I do 3 bits of functionality every day.  In reality I quite often also commit quick bug fixes (so that I can record in the commit log the fix details), or at the end of a day/week, when I want to know that my code is safe from hardware theft, nuclear disaster, etc.

Number of A4 pages of documentation

Total: 72

Per day: 0.31

Not everyone writes their documentation in A4 form any more, and it's true that some of my dox take the form of web pages, but as a commercial software house we tend to produce well formatted, nice end-user and administrator documentation.  In addition, I rather enjoy at a geek level a nice printable document that's well laid out, so I do my technical dox that way too.

The amount of documentation is relatively small, but it doesn't take into account a lot of informal documentation.  More importantly, though, at the back end of the first version of our Repository Tools software, the documentation is still in development.  I expect the number of pages to probably triple or quadruple over the next few weeks.

Lines of Code and Lines of Commentary

I wrote a script which analysed my outputs.  Ironically, it's written in Python, which isn't one of the languages that I use professionally, so it's not included in this analysis (and none of my personal programming projects are therefore included).  This analysis covers all of my final code on my anniversary (23rd March), and does not take into account prototyping or refactoring of any kind.  Note also that blank lines are not counted.

Line Counts:

XML (107 Files) :: Lines of Code: 17819; Lines of Inline Comments: 420

XML isn't really programming, but it was interesting to see how much I actually work with it.  This figure is not used in any of the below statistics.  Some of these are large metadata documents and some are configuration (maven build files, ant build files, web server config, etc).

XSLT (36 Files) :: Lines of Code: 8502; Lines of Inline Comments: 2762
JAVA (181 Files) :: Lines of Code: 22350; Lines of Inline Comments: 7565
JSP (16 Files) :: Lines of Code: 2847; Lines of Inline Comments: 1
PERL (58 Files) :: Lines of Code: 6506; Lines of Inline Comments: 1699
TOTAL (291 Files) :: Lines of Code: 40205; Lines of Inline Comments: 12027

I remember once being told that 30k lines of code a year was pretty reasonable for a developer.  I feel quite chuffed!

Lines of code/comments per day:

XSLT :: Lines of Code: 36; Lines of Inline Comments: 12
JAVA :: Lines of Code: 96; Lines of Inline Comments: 32
JSP :: Lines of Code: 12; Lines of Inline Comments: 0
PERL :: Lines of Code: 28; Lines of Inline Comments: 7
TOTAL :: Lines of Code: 173; Lines of Inline Comments: 52

It looks much less impressive when you look at it on a daily basis.  We just have to remember that this is 173 wonderful lines of code every day!

Comment to code ratio (comments/code):

XSLT :: 0.33
JAVA :: 0.34
JSP :: 0
PERL :: 0.26
TOTAL :: 0.30

It was interesting to see that my commenting ratio is fairly stable at about 30% of the overall codebase size.  I didn't plan that or anything.  This includes block comments for classes and methods, and inline programmer documentation.  The reason for the shortfall in Perl is suggested below.  Notice that I didn't write any comments in the JSPs because I only use this code for testing, and is less carefully curated code.

Some perl comments don't start with anything specific - they are block comments starting and ending with =xxx and =cut respectively, which is difficult to parse out for analysis easily. Therefore the Perl code line counts overestimate and the comment counts underestimate. More likely figures are, given a 0.33 comment to code ratio:

PERL (58 Files) :: Lines of Code: 5498; Lines of Inline Comments: 2707

Amount of testing code (testing/production):

9937 / 30268 = 0.33

This is the total amount of code that I wrote to test the other code that I wrote.  So nearly 10k lines of code are there purely to demonstrate that the other 30k lines of code are working.  I'm not going to suggest that this 33% is a linear relationship as the projects increase in size, but maybe we'll find out next year.  Incidentally, the test code that I analysed was the third version of my test framework, so in reality I wrote quite a few more lines of code (perhaps 3 or 4k) before reaching the final version used above.

Note that I'm a big fan of Behaviour Driven Development, and this does tend to cause testing code to be fairly extensive in its own right.

Number of new files per day:

XSLT :: 0.15
JAVA :: 0.78
JSP :: 0.07
PERL :: 0.25
TOTAL :: 1.25

In reality, of course, I create lots and lots of new files over a short period of time, and then nothing for ages.

Average file length:

Excluding blank lines: 179
Including blank lines: 211
Spaciousness (including/excluding): 1.18

What is spaciousness?  It's a measure of how I tend to space my code.  Everyone, I have noticed, is fairly different in this regard - I wonder what other people's spaciousness is?

Source Code

Do you want to have a go at this yourself?  Blogger doesn't make attaching files particularly easy, so you can get this from the nice folks at pastebin, who say this shouldn't ever time out: