Chapter 3. Technical Infrastructure

Table of Contents

What a Project Needs
Web Site
Canned Hosting
Choosing a canned hosting site
Hosting on fully open source infrastructure
Anonymity and involvement
Mailing Lists / Message Forums
Choosing the Right Forum Management Software
Spam Prevention
Identification and Header Management
The Great Reply-to Debate
Archiving
Mailing List / Message Forum Software
Version Control
Version Control Vocabulary
Choosing a Version Control System
Using the Version Control System
Version everything
Browsability
Use branches to avoid bottlenecks
Singularity of information
Authorization
Tools for commit review
Pull requests
TBD (Gerrit et al)
Commit emails
Bug Tracker
Interaction with Email
Pre-Filtering the Bug Tracker
IRC / Real-Time Chat Systems
IRC Bots
Commit Notifications in IRC
Archiving IRC
RSS Feeds
Wikis
Wikis and Spam
Choosing a Wiki
Q&A Forums
Social Networking Services

Free software projects rely on technologies that support the selective capture and integration of information. The more skilled you are at using these technologies, and at persuading others to use them, the more successful your project will be.

This only becomes more true as the project grows. Good information management is what prevents open source projects from collapsing under the weight of Brooks' Law[25], which states that adding manpower to a late software project makes it later. Fred Brooks observed that the complexity of communications in a project increases as the square of the number of participants. When only a few people are involved, everyone can easily talk to everyone else, but when hundreds of people are involved, it is no longer possible for each person to remain constantly aware of what everyone else is doing. If good free software project management is about making everyone feel like they're all working together in the same room, the obvious question is: what happens when everyone in a crowded room tries to talk at once?

This problem is not new. In non-metaphorical crowded rooms, the solution is parliamentary procedure: formal guidelines for how to have real-time discussions in large groups, how to make sure important dissents are not lost in floods of "me-too" comments, how to form subcommittees, how to recognize and record when decisions are made, etc. An important part of parliamentary procedure is specifying how the group interacts with its information management system. Some remarks are made "for the record", others are not. The record itself is subject to direct manipulation, and is understood to be not a literal transcript of what occurred, but a representation of what the group is willing to agree occurred. The record is not monolithic, but takes different forms for different purposes. It comprises the minutes of individual meetings, the complete collection of all minutes of all meetings, summaries, agendas and their annotations, committee reports, reports from correspondents not present, lists of action items, etc.

Because the Internet is not really a room, we don't have to worry about replicating those parts of parliamentary procedure that keep some people quiet while others are speaking. But when it comes to information management techniques, well-run open source projects are parliamentary procedure on steroids. Since almost all communication in open source projects happens in writing, elaborate systems have evolved for routing and labeling data appropriately, for minimizing repetitions so as to avoid spurious divergences, for storing and retrieving data, for correcting bad or obsolete information, and for associating disparate bits of information with each other as new connections are observed. Active participants in open source projects internalize many of these techniques, and will often perform complex manual tasks to ensure that information is routed correctly. But the whole endeavor ultimately depends on sophisticated software support. As much as possible, the communications media themselves should do the routing, labeling, and recording, and should make the information available to humans in the most convenient way possible. In practice, of course, humans will still need to intervene at many points in the process, and it's important that the software make such interventions convenient too. But in general, if the humans take care to label and route information accurately on its first entry into the system, then the software should be configured to make as much use of that metadata as possible.

The advice in this chapter is intensely practical, based on experiences with specific software and usage patterns. But the point is not just to teach a particular collection of techniques. It is also to demonstrate, by means of many small examples, the overall attitude that will best encourage good information management in your project. This attitude will involve a combination of technical skills and people skills. The technical skills are essential because information management software always requires configuration, plus a certain amount of ongoing maintenance and tweaking as new needs arise (for example, see the discussion of how to handle project growth in the section called “Pre-Filtering the Bug Tracker” later in this chapter). The people skills are necessary because the human community also requires maintenance: it's not always immediately obvious how to use these tools to full advantage, and in some cases projects have conflicting conventions (for example, see the discussion of setting Reply-to headers on outgoing mailing list posts, in the section called “Mailing Lists / Message Forums”). Everyone involved with the project will need to be encouraged, at the right times and in the right ways, to do their part to keep the project's information well organized. The more involved the contributor, the more complex and specialized the techniques she can be expected to learn.

Information management has no cut-and-dried solution. There are too many variables. You may finally get everything configured just the way you want it, and have most of the community participating, but then project growth will make some of those practices unscalable. Or project growth may stabilize, and the developer and user communities settle into a comfortable relationship with the technical infrastructure, but then someone will come along and invent a whole new information management service, and pretty soon newcomers will be asking why your project doesn't use it—for example, this happened to a lot of free software projects that predate the invention of the wiki (see en.wikipedia.org/wiki/Wiki), and more recently has been happening to projects whose workflows were developed before the rise of GitHub PRs (see the section called “Pull requests”) as the canonical way to package proposed contributions. Many infrastructure questions are matters of judgement, involving tradeoffs between the convenience of those producing information and the convenience of those consuming it, or between the time required to configure information management software and the benefit it brings to the project.

Beware of the temptation to over-automate, that is, to automate things that really require human attention. Technical infrastructure is important, but what makes a free software project work is care—and intelligent expression of that care—by the humans involved. The technical infrastructure is mainly about giving humans easy ways to apply care.

What a Project Needs

Most open source projects offer at least a minimum, standard set of tools for managing information:

Web site

Primarily a centralized, one-way conduit of information from the project out to the public. The web site may also serve as an administrative interface for other project tools.

Mailing lists / Message forums

Usually the most active communications forum in the project, and the "medium of record."

Version control

Enables developers to manage code changes conveniently, including reverting and "change porting". Enables everyone to watch what's happening to the code.

Bug tracking

Enables developers to keep track of what they're working on, coordinate with each other, and plan releases. Enables everyone to query the status of bugs and record information (e.g., reproduction recipes) about particular bugs. Can be used for tracking not only bugs, but also tasks, releases, new features, etc.

Real-time chat

A place for quick, lightweight discussions and question/answer exchanges. Not always archived completely.

Each tool in this set addresses a distinct need, but their functions are also interrelated, and the tools must be made to work together. Below we will examine how they can do so, and more importantly, how to get people to use them.

You may be able to avoid a lot of the headache of choosing and configuring many of these tools by using a canned hosting site: an online service that offers prepackaged, templatized web services with some or all of the collaboration tools needed to run a free software project. See the section called “Canned Hosting” later in this chapter for a discussion of the advantages and disadvantages of canned hosting.