I recently watched an interesting video called What we know about software development and read an interesting blog post called Norris Numbers.

One of the interesting results in the video is that we can't usefully review more than about 200 lines of code in one sitting.

The blog post describes various thresholds for the manageability of a project. 2,000 lines of code for something that's quickly hacked together and is in a bit of a mess. 20,000 for something with well designed internal APIs.

So, to paraphrase the software commandment number 15, write less code.

Good ways of doing this are:

  1. Pick a language appropriate to the task.

    If you're tying simple programs together, and the most complicated data structure you need to care about is a string, shell is a good choice. C isn't, because of all the complexities of allocating memory, juggling file-descriptors and putting command line arguments together.

  2. Use appropriate libraries.

    Code from another library with a well defined API doesn't count towards your project, as you only need to worry about using the API correctly, rather than how the library works.

  3. In extreme cases, use a Domain Specific Language.

    Relational databases tend to be accessed by SQL queries. These let you manipulate data and retrieve results, without generally having to worry about how it's stored.

    The benefit of this is that you can avoid having to worry about locking to make concurrent access safe, how it's stored on disk, indexing your data to make it faster to retrieve, caching results so you can re-use them, or spreading your database across multiple machines, so if one machine goes down, you can still access your data.

As a case study, I'm going to look at the NetSurf web browser.

So looking at the 3 suggested approaches for reducing lines of code.

Choice of language

NetSurf is primarily written in C. This is not ideal for reducing the amount of code, but it is appropriate, as NetSurf needs to run on a variety of platforms, some of which haven't got a lot of CPU power or RAM.

The verbosity of the language is offset by other approaches to reduce the amount of code.

Using appropriate libraries

NetSurf initially used existing libraries, but for various reasons has written its own libraries to replace them, and split out code from the main project into new libraries.

The following was generated by the sloccount tool.

SLOC    Directory   SLOC-by-Language (Sorted)
187047  netsurf         ansic=171269,objc=8341,cpp=5716,perl=980,sh=447,
111597  libdom          xml=81901,ansic=28064,perl=1269,sh=250,python=113
37841   libcss          ansic=37773,perl=68
13802   libhubbub       ansic=12531,jsp=1156,perl=97,sh=10,python=8
11178   libnsfb         ansic=11168,sh=10
5577    nsgenbind       ansic=3564,yacc=1509,lex=479,sh=25
5218    libparserutils  ansic=5099,perl=119
3175    librufl         ansic=3153,perl=22
2645    libsvgtiny      ansic=2645
1621    buildsystem     perl=1492,sh=108,ansic=21
1076    libnsbmp        ansic=1076
1015    libnsgif        ansic=935,perl=80
887     libpencil       ansic=887
857     librosprite     ansic=857
624     libwapcaplet    ansic=624

It shows NetSurf hovering around the 190,000 lines of code mark, with a lot of support libraries.

Domain specific languages

The above list doesn't include everything, since NetSurf uses a Domain Specific Language for binding its JavaScript engine to its Document Object Model (DOM).

There's 2,641 lines of .bnd code in the netsurf project, which is parsed by the nsgenbind program to produce 13,597 lines of C code.

This comes out a little ahead in terms of total lines of code, since nsgenbind is 5,577 lines. However, I am assured that in the future there will be greater gains as the bindings increase in size.

It also has some other benefits, as binding code is both tricky and dull, so it is best left to automation; and it allows NetSurf to support 2 different JavaScript APIs, and NetSurf is looking to support a third.


Your homework is to take a project you work on, look at how many lines of code it is with sloccount or cloc. If it's too big, think about how you could split it up to be more manageable.