Packaging

Packaging
Prev	Chapter 7. Packaging, Releasing, and Daily Development	Next

The canonical form for distribution of free software is as source code. This is true regardless of whether the software normally runs in source form (i.e., interpreted, like Perl, Python, PHP, etc) or is typically compiled first (like C, C++, Java, Rust, etc). With compiled software, most users will probably not compile the sources themselves, but will instead install from pre-built binary packages (see the section called “Binary Packages”). However, those binary packages are still derived from a particular source distribution. The point of the source package is to unambiguously define the release. When the project distributes "Scanley 2.5.0", what it means, specifically, is "The tree of source code files that, when compiled (if necessary) and installed, produces Scanley 2.5.0."

There is a fairly strict standard for how source releases should look. One will occasionally see deviations from this standard, but they are the exception, not the rule. Unless there is a compelling reason to do otherwise, your project should follow this standard too.

Format

The source code should be shipped in the standard formats for transporting directory trees. For Unix and Unix-like operating systems, the convention is to use TAR format, compressed by compress, gzip, bzip or bzip2. For MS Windows, the standard method for distributing directory trees is zip format, which compresses automatically. For JavaScript projects, it is customary to ship the "minified"^[111] versions of the files together with the human-readable source files.

Name and Layout

The name of the package should consist of the software's name plus the release number, plus the format suffixes appropriate for the archive type. For example, Scanley 2.5.0, packaged for Unix using GNU Zip (gzip) compression, would look like this:

scanley-2.5.0.tar.gz

or for Windows using zip compression:

scanley-2.5.0.zip

Either of these archives, when unpacked, should create a single new directory tree named scanley-2.5.0 in the current directory. Underneath the new directory, the source code should be arranged in a layout ready for compilation (if compilation is needed) and installation. In the top level of new directory tree, there should be a plain text README file explaining what the software does and what release this is, and giving pointers to other resources, such as the project's web site, other files of interest, etc. Among those other files should be an INSTALL file, sibling to the README file, giving instructions on how to build and install the software for all the operating systems it supports. As mentioned in the section called “How to Apply a License to Your Software”, there should also be a LICENSE or COPYING file, giving the software's terms of distribution.^[112]

There should also be a CHANGES file (sometimes called NEWS or something else), explaining what's new in this release.^[113] The CHANGES file accumulates changelists for all releases, in reverse chronological order, so that the list for this release appears at the top of the file. Completing that list is usually the last thing done on a stabilizing release branch; some projects write the list piecemeal as they're developing, others prefer to save it all up for the end and have one person write it, getting information by combing the version control logs. The list looks something like this:

Version 2.5.0
(20 December 2022, from branch 2.5.x)
http://scanley.org/repos/tags/2.5.0/

 New features and enhancements:
    * Added regular expression queries (issue #53)
    * Added support for UTF-16 documents
    * Documentation translated into Malagasy, Polish, Russian
    * ...

 Bugfixes:
    * fixed reindexing bug (issue #945)
    * fixed some query bugs (issues #815, #1007, #1008)
    * ...

The list can be as long as necessary, but don't bother to describe every little bugfix and feature enhancement in detail. The point is to give users an overview of what they would gain by upgrading to the new release, and to tell them about any incompatible changes. In fact, the changelist is customarily included in the announcement email (see the section called “Testing and Releasing”), so write it with that audience in mind.

The actual layout of the source code inside the tree should be the same as, or as similar as possible to, the source code layout one would get by checking out the project directly from its version control repository. Sometimes there are a few differences, for example because the package contains some generated files needed for configuration and compilation (see the section called “Compilation and Installation”), or because the distribution includes third-party software that is not maintained by the project, but that is required and that users are not likely to already have. But even if the distributed tree corresponds exactly to some development tree in the version control repository, the distribution itself should not be a working copy (see working copy or working files). The release is supposed to represent a static reference point — a particular, unchangeable configuration of source files. If it were a working copy, the danger would be that the user might update it, and afterward think that he still has the release when in fact he has something different.

The package should be the same regardless of the packaging. The release — that is, the precise entity referred to when someone says "Scanley 2.5.0" — is the tree created by unpacking a zip file or tarball. So the project might offer all of these for download:

scanley-2.5.0.tar.bz2
scanley-2.5.0.tar.gz
scanley-2.5.0.zip

...but the source tree created by unpacking them would be the same. That source tree itself is the distribution; the form in which it is downloaded is merely a matter of convention or convenience. Certain minor differences between source packages are allowable: for example, in the Windows package, text files may have lines ending with CRLF (Carriage Return and Line Feed), while Unix packages would use just LF. The trees may be arranged slightly differently between source packages destined for different operating systems, too, if those operating systems require different sorts of layouts for compilation. However, these are all basically trivial transformations. The basic source files should be the same across all the packagings of a given release.

To Capitalize or Not to Capitalize

When referring to a project by name, people generally capitalize it as a proper noun, and capitalize acronyms if there are any: "MySQL 5.0", "Scanley 2.5.0", etc. Whether this capitalization is reproduced in the package name is up to the project. Either Scanley-2.5.0.tar.gz or scanley-2.5.0.tar.gz would be fine, for example (I personally prefer the latter, because I don't like to make people hit the shift key, but plenty of projects ship capitalized packages). The important thing is that the directory created by unpacking the tarball use the same capitalization. There should be no surprises: the user must be able to predict with perfect accuracy the name of the directory that will be created when she unpacks a distribution.

Pre-Releases

When shipping a pre-release or candidate release, the qualifier is a part of the release number, so include it in the name of the package's name. For example, the ordered sequence of alpha and beta releases given earlier in the section called “Release Number Components” would result in package names like this:

scanley-2.3.0-alpha1.tar.gz
scanley-2.3.0-alpha2.tar.gz
scanley-2.3.0-beta1.tar.gz
scanley-2.3.0-beta2.tar.gz
scanley-2.3.0-beta3.tar.gz
scanley-2.3.0.tar.gz

The first would unpack into a directory named scanley-2.3.0-alpha1, the second into scanley-2.3.0-alpha2, and so on.

Compilation and Installation

For software requiring compilation or installation from source, there are usually standard procedures that experienced users expect to be able to follow. For example, for programs written in C, C++, or certain other compiled languages, the standard for a long time under Unix-like systems was for the user to type:

   $ ./configure
   $ make
   $ sudo make install

The first command autodetects as much about the environment as it can and prepares for the build process, the second command builds the software in place (but does not install it), and the last command installs it on the system.

This is not the only standard, though it has historically been one of the most widespread. These days there are often instructions for how to deploy into a popular container environment such as Docker as well. Furthermore, other programming languages have their own standards for building and installing packages. If it's not obvious to you what the applicable standards are for your project, ask an experienced developer; you can safely assume that some standard applies, even if you don't yet know it.

Whatever the appropriate standards for your project are, don't deviate from them unless you absolutely must. Standard installation procedures are practically spinal reflexes for a lot of system administrators. If they see familiar invocations documented in your project's INSTALL file, that instantly raises their faith that your project is generally aware of conventions, and that it is likely to have gotten other things right as well. Also, as discussed in the section called “Downloads”, having a standard build procedure pleases potential developers.

On Windows, the standards for building and installing are a bit less settled. For projects requiring compilation, the general convention seems to be to ship a tree that can fit into the workspace/project model of the standard Microsoft development environments (Developer Studio, Visual Studio, VS.NET, MSVC++, etc). Depending on the nature of your software, it may be possible to offer a Unix-like build option on Windows using MinGW or Cygwin. And of course, if you're using a language or programming framework that comes with its own build and install conventions — e.g., Python — you should simply use whatever the standard method is for that framework, whether on Windows, Unix, Mac OS X, or any other operating system.

Be willing to put in a lot of extra effort in order to make your project conform to the relevant build or installation standards. Building and installing is an entry point: it's okay for things to get harder after that, if they absolutely must, but it would be a shame for the user's or developer's very first interaction with the software to require unexpected steps.

Binary Packages

Although the formal release is a source code package, users often install software from binary packages, either provided by their operating system's software distribution mechanism, or obtained manually from the project web site or from some third party. Here, "binary" doesn't necessarily mean "compiled"; it's a general term for a pre-configured form of the package that allows the user to install it on her computer without going through the usual source-based build and install procedures. On RedHat GNU/Linux, it is the RPM system; on Debian GNU/Linux, it is the APT (.deb) system; etc.

Whether these binary packages are assembled by people closely associated with the project, or by distant third parties, users are going to treat them as equivalent to the project's official releases, and will file tickets in the project's bug tracker based on the behavior of the binary packages. Therefore, it is in the project's interest to provide packagers with clear guidelines, and work closely with them to ensure that what they produce represents the software fairly and accurately.

The main thing packagers need to know is that they should always base their binary packages on an official source release. Sometimes packagers are tempted to pull an unstable incarnation of the code from the repository, or to include selected changes that were committed after the release was made, in order to provide users with certain bug fixes or other improvements. The packager thinks he is doing his users a favor by giving them the more recent code, but actually this practice can cause a great deal of confusion. Projects are prepared to receive reports of bugs found in released versions, and bugs found in recent mainline and major branch code (that is, found by people who deliberately run bleeding edge code). When a bug report comes in from these sources, the responder will often be able to confirm immediately that the bug is known to be present in that snapshot, and perhaps that it has since been fixed and that the user should upgrade or wait for the next release. If it is a previously unknown bug, knowing the precise release makes it easier to reproduce and easier to categorize in the tracker.

However, projects are not prepared to receive bug reports based on unspecified intermediate or hybrid versions. Such bugs can be hard to reproduce; also, they may be due to unexpected interactions between individual changes pulled together from different development stages, and thereby cause misbehaviors that the project's developers should not have to take the blame for. I have even seen dismayingly large amounts of time wasted because a bug was absent when it should have been present: someone was running a slightly patched-up version, based on (but not identical to) an official release, and when the predicted bug did not happen, everyone had to dig around a lot to figure out why.

Still, there will sometimes be circumstances when a packager insists that modifications to the source release are necessary.^[114] Packagers should be encouraged to bring this up with the project's developers and describe their plans. They may get approval, but failing that, they will at least have notified the project of their intentions, so the project can watch out for unusual bug reports. The developers may respond by putting a disclaimer on the project's web site, and may ask that the packager do the same thing in the appropriate place, so that users of that binary package know what they are getting is not exactly the same as what the project officially released. There need be no animosity in such a situation, though unfortunately there often is. It's just that packagers have a slightly different set of goals from developers. The packagers mainly want the best out-of-the-box experience for their users. The developers want that too, of course, but they also need to ensure that they know what versions of the software are out there, so they can receive coherent bug reports and make compatibility guarantees. Sometimes these goals conflict. When they do, it's good to keep in mind that the project has no control over the packagers, and that the bonds of obligation run both ways. It's true that the project is doing the packagers a favor simply by producing the software. But the packagers are also doing the project a favor, by taking on a mostly unglamorous job in order to make the software more widely available — often orders of magnitude more available. It's fine to disagree with packagers, but don't flame them; just try to work things out as best you can.

^[111]See https://en.wikipedia.org/wiki/Minification_%28programming%29.

^[112]Your all-caps files — README, INSTALL, etc — may of course have ".md" extensions to indicate Markdown (https://daringfireball.net/projects/markdown/) format, or ".txt" to indicate plain text, etc.

^[113]Sumana Harihareswara points out that there is a distinction between a detailed CHANGES file and release notes. Her full post at harihareswara.net/posts/2024/changelogs-and-release-notes is very much worth reading. In the part I'll quote here, she uses the word "changelog", but she means by it what I called a CHANGES or NEWS file:

"Release notes are a prose summary of what's changed, which exist in addition to a changelog (the release notes might link to the changelog or include a copy at the bottom), and which focus on changes the user might perceive. ... If a changelog is too overwhelming and an end user is left thinking, 'but what matters to me?' they can go read the release notes."

^[114] https://en.wikipedia.org/wiki/Mozilla_Corporation_software_rebranded_by_the_Debian_project#Iceweasel gives a well-known example of this.

Prev	Up	Next
Stabilizing a Release	Home	Testing and Releasing