Robust Software Version Numbering

Posted by Jeremy ⋅ December 5, 2008

This article isn’t directly database-related, but I think it’s a great software engineering topic so it seemed worth writing about. Right now I’m involved in a project that involves releasing software packages of a few different flavors. Some of them are other people’s software that we’re re-packaging (like oracle database binaries) and some are code that was completely written and is maintained in-house. And as we’ve been working through the software development process this question has recently come to mind: how should we assign version numbers?

Actually it’s part of a bigger question of how we develop elegant software – other related design issues in this project are clear end-to-end traceability (requirement -> source code version -> released binary package), solid change control, integrated testing framework and readiness for automation (build and test). But the one I’m interested in right now is this: how can I develop a robust version numbering strategy?

First of all, there really isn’t a one-size-fits-all “correct” version numbering scheme. The best way to choose version numbers is influenced by your software development philosophy (and resultant development lifecycle), project management approach and development team. (Globally distributed informal volunteer network or local engineers on payroll?) So the best solution for my current project might not be the best solution for my next project.

But lets take a closer look at this project. First, a little background on the art of version numbering. In the open-source world, many projects follow a three-digit version: MAJOR.MINOR.PATCH. Also, it’s clear that the version number is closely tied to the development lifecycle. You can find plenty of example discussion from open-source projects such as the Linux kernel, Linux desktop distros, Eclipse, Apache APR and FarCry CMS. Before the 2.6 series the linux kernel used an even-odd scheme to distinguish between unstable and stable releases but that has now been abandoned and most people seem to regard this as a good thing.

The best overview of the topic that I could find was the wikipedia article on Software Versioning. The section about pre-release versions does a good job of describing one requirement of my current project. I also found the section about internal version numbers interesting; I’ve worked on projects in the past with that strategy.

I found an article on linux.com written by Nathan Willis a few years ago offering some principles. I agree with his first and third suggestions: “Pick a numbering scheme, then don’t change it. Ever.” and “Make friends with infinity.” Best to think this thing out at the front end of a project (before 1.1) and you shouldn’t be bothered by big version numbers. However I don’t buy his second point about the meaning of the decimal; I think most everyone today is used to the meaning of the decimal in version numbers.

For some alternate viewpoints I came across something Zack Weinberg wrote back in 2002. He states strongly that the decimal point isn’t mathematical, which I agree with. However contrary to his article, in our development lifecycle the test releases do need version numbers. But he has an interesting suggesting of using date-based numbers for development snapshots – that’s an interesting idea which I’ll have to think about. His list if ways to do it wrong has a good point about distinguishing between version x and version x.0 although I’m not in full agreement with the other list items. I actually find the GNU suggestion for numbering test versions very interesting – I’ll have to give that some thouht. (Tests versions for 4.6 are numbered 4.5.90 – 4.5.99)

Ultimately I don’t think either of these two sets of guidelines is robust enough for me – especially for repackaging the oracle database binaries. Some of these might seem obvious, but I thought a good place to start might be by listing what I would consider to be requirements for an elegant and robust version numbering scheme:

elegant, clear, understandable version numbers that are as simple as possible
support for single-digit or double-digit versions (1,2,3,4 or 1.0, 1.1, 1.2 ,2.0)
versions for pre-release builds such as test builds (for the quality process) and early-access beta builds.
very clear difference between pre-release and release versions, so someone doesn’t mistakenly install a pre-release build when they want a production build.
names need to sort correctly – pre-release before release. it’s critical that RPM and RHN sort them right; it would be nice if unix “ls” did too.
support for continuing development. after release 1.0, the 2.0-beta version should sort after release 1.0 and before release 2.0
support for special branch versions – version 1.0+patch12345, where patch12345 is not included in version 2.0 which has different patches
clear difference between pre-release and release versions on special branches.

I have to be honest – this is tough. I’ll be giving it some thought over the next few weeks. Let me know if you have any suggestions. :)