Darwin's Theories Blog

New Theories for a New Time

The Failure of Software Packaging

2007-03-19
Some Software Projects don't get the needs of others
There are hundreds of thousands of open source projects out there (SourceForge.net alone hosts more than 132,000 distinct projects), more than anybody could use in a lifetime. Most are good, some are great, some are not so great. As the head of the OpenBSD project says, "Choice is Good". But I've had to face some issues related to packaging recently.
  1. There is a massive failure to provide consistent packaging of open source software. RPM and DEB and several other formats exist even in the confines of a single operating system - Linux. Even Ubuntu, which uses the Debian format, seems to mostly re-create all the packages for its users. Most of this work is done by volunteers, who in the process probably use up more energy than a small city, and which makes a non-zero contribution to global warming. Why do we need a dozen different binary packagings of my file command, for example? They all do basically the same thing.
  2. Most of the packaging formats don't do a very good job of capturing everything needed to rebuild a package. The BSD distributions have learned to do a better job of this; they run on so many different CPUs (not just PC's) that they have to make it easier for users to build packages. The BSD packaging takes the form of a Makefile and patches, which they ship on CD and make available for download. So in "BSD land" it's pretty easy to build a package (if you can't find a pre-compiled one already, which you usually can). Of course they also support the functionality of installing and upgrading binary packages without building, for those who don't want to bother.
  3. In BSD-land there is a strong emphasis on repeatable, automated building. Files are downloaded, checked with MD5 for SHA1, and then extracted; the software is then configured, made (compiled), installed in a test area, packaged, and installed - all with no manual intervention. And of course its prerequisites are also downloaded automatically, so BSD users using the ports/packages system never have to go looking for a -dev version of something - you just type "make" and it all just works.
At least, most of the time. One thing that makes it harder to do repeatable, automated package building is the refusal of a software project to make available a complete, immutable-once-version-numbered, standard format (tar.gz, tar.bz2, or .zip) distribution. Once you put up version 1.2.34 of your software package, please understand that hundreds of thousands of users will get copies of the MD5 of that file. If you change the file, please change the version number. Otherwise many people will get frustrated when they try to build it. The MD5/SHA1 verification is there to protect your project as well as the BSD systems from trojan horse infestations, and also to ensure that you didn't change a line of code that will cause a patch to fail to apply. In short, if you change a distribution file, please always change its number.

Another annoyance is the actual lack of a downloadable source distribution. And as much as I like Java, it seems that some Java projects are among the worst offenders here. For example, a well-known teaching tool until recently only provided downloads in Windows .exe and in a Java-based interactive-installation format. You simply could not get the source for this program without running a GUI-based installer, interacting with it as per the usual installation script time-waster, and telling it to extract the source code. How does that support the premise of being able to inspect the source code before running it? To their credit (and the reason I'm not naming them here), when I mentioned this and explained the problem it was causing, they had a .zip file up on their SourceForge site (and the SourceForge mirror network) in under an hour.

Another example is Sun's Project Glassfish, a "Java EE" server. You download a big distfile (which is versioned properly), but as soon as you try to build it, the Maven build tool first starts scribbling in its private directory in your home directory, and then - starts downloading newer versions of some files! Folks, you can not get to repeatable automated building if you don't leave the platform-specific ports maintainers the flexibility to get distributions, install them in a known location, and just use them. On a bad day I think that Maven is one of the biggest roadblocks to repeatable automated building. Maybe it isn't, but today it seems like it.