I recently watched an interesting video called What we know about software development and read an interesting blog post called Norris Numbers.
One of the interesting results in the video is that we can't usefully review more than about 200 lines of code in one sitting.
The blog post describes various thresholds for the manageability of a project. 2,000 lines of code for something that's quickly hacked together and is in a bit of a mess. 20,000 for something with well designed internal APIs.
So, to paraphrase the software commandment number 15, write less code.
Good ways of doing this are:
Pick a language appropriate to the task.
If you're tying simple programs together, and the most complicated data structure you need to care about is a string, shell is a good choice. C isn't, because of all the complexities of allocating memory, juggling file-descriptors and putting command line arguments together.
Use appropriate libraries.
Code from another library with a well defined API doesn't count towards your project, as you only need to worry about using the API correctly, rather than how the library works.
In extreme cases, use a Domain Specific Language.
Relational databases tend to be accessed by SQL queries. These let you manipulate data and retrieve results, without generally having to worry about how it's stored.
The benefit of this is that you can avoid having to worry about locking to make concurrent access safe, how it's stored on disk, indexing your data to make it faster to retrieve, caching results so you can re-use them, or spreading your database across multiple machines, so if one machine goes down, you can still access your data.
As a case study, I'm going to look at the NetSurf web browser.
So looking at the 3 suggested approaches for reducing lines of code.
Choice of language
NetSurf is primarily written in C. This is not ideal for reducing the amount of code, but it is appropriate, as NetSurf needs to run on a variety of platforms, some of which haven't got a lot of CPU power or RAM.
The verbosity of the language is offset by other approaches to reduce the amount of code.
Using appropriate libraries
NetSurf initially used existing libraries, but for various reasons has written its own libraries to replace them, and split out code from the main project into new libraries.
The following was generated by the sloccount tool.
SLOC Directory SLOC-by-Language (Sorted)
187047 netsurf ansic=171269,objc=8341,cpp=5716,perl=980,sh=447,
asm=288,php=6
111597 libdom xml=81901,ansic=28064,perl=1269,sh=250,python=113
37841 libcss ansic=37773,perl=68
13802 libhubbub ansic=12531,jsp=1156,perl=97,sh=10,python=8
11178 libnsfb ansic=11168,sh=10
5577 nsgenbind ansic=3564,yacc=1509,lex=479,sh=25
5218 libparserutils ansic=5099,perl=119
3175 librufl ansic=3153,perl=22
2645 libsvgtiny ansic=2645
1621 buildsystem perl=1492,sh=108,ansic=21
1076 libnsbmp ansic=1076
1015 libnsgif ansic=935,perl=80
887 libpencil ansic=887
857 librosprite ansic=857
624 libwapcaplet ansic=624
It shows NetSurf hovering around the 190,000 lines of code mark, with a lot of support libraries.
Domain specific languages
The above list doesn't include everything, since NetSurf uses a Domain Specific Language for binding its JavaScript engine to its Document Object Model (DOM).
There's 2,641 lines of .bnd
code in the netsurf project, which is parsed
by the nsgenbind
program to produce 13,597 lines of C code.
This comes out a little ahead in terms of total lines of code, since nsgenbind is 5,577 lines. However, I am assured that in the future there will be greater gains as the bindings increase in size.
It also has some other benefits, as binding code is both tricky and dull, so it is best left to automation; and it allows NetSurf to support 2 different JavaScript APIs, and NetSurf is looking to support a third.
Summary
Your homework is to take a project you work on, look at how many lines of code it is with sloccount or cloc. If it's too big, think about how you could split it up to be more manageable.