Message Forums / Mailing Lists

Not all projects need to use discussion forum software. For relatively small, focused projects that are organized around a single code repository, the email gateway features of the bug tracker (as discussed in the section called “Bug Tracker” later in this chapter) may be enough to sustain most conversations. When a non-technical topic needs to be discussed, someone can just create an issue ticket — a fake bug report, essentially — for the topic and conduct the discussion there. So if you think your project will get along fine without forums, you can skip this section and just try that. It will be obvious pretty quickly if you do need them.

Larger and more complex projects, however, will almost always benefit from having dedicated discussion forums. This is partly because there will be many conversations that are not attached to a specific bug, and partly because the larger the project, the more important it is to keep the bug tracker focused on actual bugs and have a separate place for other kinds of discussions.

For a long time, discussion forums were mainly mailing lists, but the distinction between mailing lists and Web-based forums is, thankfully, slowly disappearing. Services like Google Groups (https://groups.google.com/), which is not itself open source, and Discourse (http://www.discourse.org/), which is, have established that cross-accessibility of message forums as mailing lists and vice versa is the minimum bar to meet, and modern discussion management systems reflect this.

Because of this nearly-completed unification between email lists and web-based forums[43], I will use the terms message forum and mailing list more or less interchangeably. They refer to any kind of message-based forum where posts are linked together in threads (topics), people can subscribe, archives of past messages can be browsed, and the forum can be interacted with via email or via a web browser.

If a user is exposed to any channel besides a project's web pages, it is most likely to be one of the project's message forums. But before she experiences the forum itself, she will experience the process of finding the right forum. Your project should have a prominently-placed description of all the available public forums, to give newcomers guidance in deciding which ones to browse or post to first. A typical such description might say something like this:

The mailing lists are the main day-to-day communication channels for the Scanley community. You don't have to be subscribed to post to a list, but if it's your first time posting (whether you're subscribed or not), your message may be held in a moderation queue until a human moderator has a chance to confirm that the message is not spam. We're sorry for this delay; blame the spammers who make it necessary.

Scanley has the following lists:

users {_AT_} scanley.org:

Discussion about using Scanley or programming with the Scanley API, suggestions of possible improvements, etc. You can browse the users@ archives at <<<link to archive>>> or subscribe here: <<<link to subscribe>>>.

dev {_AT_} scanley.org:

Discussion about developing Scanley. Maintainers and contributors are subscribed to this list. You can browse the dev@ archives at <<<link to archive>>> or subscribe here: <<<link to subscribe>>>.

(Sometimes threads cross over between users@ and dev@, and Scanley's developers will often participate in discussions on both lists. In general if you're unsure where a question or post should go, start it out on users@. If it should be a development discussion, someone will suggest moving it over to dev@.)

announcements {_AT_} scanley.org:

This is a low-traffic, subscribe-only list. The Scanley developers post announcements of new releases and occasional other news items of interest to the entire Scanley community here, but followup discussion takes place on users@ or dev@. <<<link to subscribe>>>.

notifications {_AT_} scanley.org:

All code commit messages, bug tracker tickets, automated build/integration failures, etc, are sent to this list. Most developers should subscribe: <<<link to subscribe>>>.

There is also a non-public list you may need to send to, although only developers are subscribed:

security {_AT_} scanley.org:

Where the Scanley project receives confidential reports of security vulnerabilities. Of course, the report will be made public eventually, but only after a fix is released; see our security procedures page for more [...]

Choosing the Right Forum Management Software

It's worth investing some time in choosing the right mailing list management system for your project. Modern list management tools (some of which are listed later in the section called “Mailing List / Message Forum Software”) offer at least the following features:

Both email- and web-based access

Users should be able to subscribe to the forums by email, and read them on the web (where they are organized into conversations or "threads", just as they would be in a mailreader).

Moderation features

To "moderate" is to check posts, especially first-time posts, to make sure they are not spam before they go out to the entire list. Moderation necessarily involves human administrators, but software can do a great deal to make it easier on the moderators. There is more said about moderation in the section called “Spam Prevention” later in this chapter.

Rich administrative interface

There are many things administrators need to do besides spam moderation — for example, removing obsolete addresses, a task that can become urgent when a recipient's address starts sending "I am no longer at this address" bounces back to the list in response to every list post (though some systems can even detect this and unsubscribe the person automatically). If your forum software doesn't have decent administrative capabilities, you will quickly realize it, and should consider switching to software that does.

Header manipulation

Some people have sophisticated filtering and replying rules set up in their mail readers, and rely on the forum adding or manipulating certain standard headers. See the section called “Identification and Header Management” later in this chapter for more on this.

Archiving

All posts to the managed lists are stored and made available on the web (see the section called “Conspicuous Use of Archives” for more on the importance of public archives). Usually the archiver is a native part of the message forum system; occasionally, it is a separate tool that needs to be integrated.

The point of the above list is really just to show that forum management is a complex problem that has already been given a lot of thought, and to some degree been solved. You don't need to become an expert, but you will have to learn at least a little bit about it, and you should expect list management to occupy your attention from time to time in the course of running any free software project. Below we'll examine a few of the most common issues.

Spam Prevention

A mailing list that takes no spam prevention measures at all will quickly be submerged in junk emails, to the point of unusability. Spam prevention is mandatory. It is really two distinct functions: preventing spam posts from appearing on your mailing lists, and preventing your mailing list from being a source of new email addresses for spammers' harvesters.

Filtering posts

There are three basic techniques for preventing spam posts, and most mailing list software offers all three. They are best used in tandem:

  1. Only auto-allow postings from list subscribers.

    This is effective as far as it goes, and also involves very little administrative overhead, since it's usually just a matter of changing a setting in the mailing list software's configuration. But note that posts which aren't automatically approved must not be simply discarded. Instead, they should go into a moderation queue, for two reasons. First, you want to allow non-subscribers to post: a person with a question or suggestion should not need to subscribe to a mailing list just to ask a question there. Second, even subscribers may sometimes post from an address other than the one by which they're subscribed. Email addresses are not a reliable method of identifying people, and shouldn't be treated as such.

  2. Filter posts through spam-detection software.

    If the mailing list software makes it possible (most do), you can have posts filtered by spam-filtering software. Automatic spam-filtering is not perfect, and never will be, since there is a never-ending arms race between spammers and filter writers. However, it can greatly reduce the amount of spam that makes it through to the moderation queue. Since the longer that queue is the more time humans must spend examining it, any amount of automated filtering is beneficial.

    There is not space here for detailed instructions on setting up spam filters. You will have to consult your mailing list software's documentation for that (see the section called “Mailing List / Message Forum Software”). List software often comes with some built-in spam prevention features, but you may want to add some third-party filters. I've had good experiences with SpamAssassin (https://spamassassin.apache.org/). That is not a comment on the many other open source spam filters out there, some of which are apparently also quite good; I just happen to have used SpamAssassin myself and been satisfied with it.

  3. Moderation.

    For mails that aren't automatically allowed by virtue of being from a list subscriber, and which make it through the spam filtering software, if any, the last stage is moderation: the mail is routed to a special holding area, where a human examines it and confirms or rejects it.

    Confirming a post usually takes one of two forms: you can accept the sender's post just this once, or you can tell the system to allow this and all future posts from the same sender. You almost always want to do the latter, in order to reduce the future moderation burden — after all, someone who has made a valid post to a forum is unlikely to suddenly turn into a spammer later.

    Rejecting is done by either marking the item to be discarded, or by explicitly telling the system the message was spam so the system can improve its ability to recognize future spams. Sometimes you also have the option to automatically discard future mails from the same sender without them ever being held in the moderation queue, but there is rarely any point doing this, since spammers don't send from the same address twice anyway.

    Oddly, most message-forum systems have not yet given the moderation queue administrative interface the attention it deserves, considering how common the task is, so moderation often still requires more clicks and UI gestures than it should. I hope this situation will improve in the future. In the meantime, perhaps knowing you're not alone in your frustration will temper your disappointment somewhat.

Identification and Header Management

When interacting with the forum by email, subscribers often want to filter mails from the list into custom inboxes. Their mail reading software can do this automatically by examining the mail's headers. The headers are the fields at the top of the mail that indicate the sender, recipient, subject, date, and various other things about the message. Certain headers are well known and are effectively mandatory:

From: ...
To: ...
Subject: ...
Date: ...

Others are optional, though still quite standard. For example, emails are not strictly required to have the

Reply-to: sender@email.address.here

header, but most do, because it gives recipients a foolproof way to reach the author (it is especially useful when the author had to send from an address other than the one to which replies should be directed).

Some mail reading software offers an easy-to-use interface for filing mails based on patterns in the Subject header. This leads people to request that the mailing list add an automatic prefix to all Subjects, so they can set their readers to look for that prefix and automatically file the mails in the right folder. The idea is that the original author would write:

Subject: Making the 2.5 release.

but the mail would show up on the list looking like this:

Subject: [Scanley Discuss] Making the 2.5 release.

Although most list management software offers the option to do this, you may decide against turning the option on. The problem it solves can often be solved in less obtrusive ways (see below), and there is a cost to eating space in the Subject field. Experienced mailing list users typically scan the Subjects of the day's incoming list mail to decide what to read and/or respond to. Prepending the list's name to the Subject can push the right side of the Subject off the screen, rendering it invisible. This obscures information that people depend on to decide what mails to open, thus reducing the overall functionality of the mailing list for everyone.

Instead of munging the Subject header, people could take advantage of the other standard headers, starting with the To header, which should say the mailing list's address:

To: <discuss@lists.example.org>

Any mail reader that can filter on Subject should be able to filter on To just as easily.

There are a few other optional-but-standard headers expected for mailing lists; they are sometimes not displayed by most mailreader software, but they are present nonetheless. Filtering on them is even more reliable than using the "To" or "Cc" headers, and since these headers are added to each post by the mailing list management software itself, some users may be counting on their presence:

List-Help: <mailto:discuss-help@lists.example.org>
List-Unsubscribe: <mailto:discuss-unsubscribe@lists.example.org>
List-Post: <mailto:discuss@lists.example.org>
List-Id: <discuss.lists.example.org>
Delivered-To: mailing list discuss@lists.example.org
Mailing-List: contact discuss-help@lists.example.org; run by ezmlm

For the most part, they are self-explanatory. See http://www.nisto.com/listspec/list-manager-intro.html for more explanation, or if you need the really detailed, formal specification, see http://www.faqs.org/rfcs/rfc2369.html.

Having said all that, these days I find that most subscribers just request that the Subject header include a list-identifying prefix. That's increasingly how people are accustomed to filtering email: Subject-based filtering is what many of the major online email services (like Gmail) offer users by default, and those services tend not to make it easy to see the presence of less-commonly used headers like the ones I mentioned above — thus making it less likely that people would even realize that they even have the option of filtering on those other headers.

Therefore, reluctantly, I recommend using a Subject prefix (keep it as short as you can) when that's what your community wants. But if your project highly technical and most of its participants are comfortable filtering on other headers, then do that and leave the Subject line undisturbed.

Some mailing list software offers an option to append unsubscription instructions to the bottom of every post. If that option is available, turn it on. It causes only a couple of extra lines per message, in a harmless location, and it can save you a lot of time, by cutting down on the number of people who mail you — or worse, mail the list! — asking how to unsubscribe.

The Great Reply-to Debate

Earlier, in the section called “Avoid Private Discussions”, I stressed the importance of making sure discussions stay in public forums, and talked about how active measures are sometimes needed to prevent conversations from trailing off into private email threads; furthermore, this chapter is all about setting up project communications software to do as much of the work for people as possible. Therefore, if the mailing list management software offers a way to automatically cause discussions to stay on the list, you would think turning on that feature would be the obvious choice.

Well, not quite. There is such a feature, but it has some pretty severe disadvantages. The question of whether or not to use it is one of the hottest debates in mailing list management — admittedly, not a controversy that's likely to make the evening news in your city, but it can flare up from time to time in free software projects. Below, I will describe the feature, give the major arguments on both sides, and make the best recommendation I can.

The feature itself is very simple: the mailing list software can, if you wish, automatically set the Reply-to header on every post to redirect replies to the mailing list. That is, no matter what the original sender puts in the Reply-to header (or even if they don't include one at all), by the time the list subscribers see the post, the header will contain the list address:

Reply-to: discuss@lists.example.org

On its face, this seems like a good thing. Because virtually all mail reading software pays attention to the Reply-to header, now when anyone responds to a post, their response will be automatically addressed to the entire list, not just to the sender of the message being responded to. Of course, the responder can still manually change where the message goes, but the important thing is that by default replies are directed to the list. It's a perfect example of using technology to encourage collaboration.

Unfortunately, there are some disadvantages. The first is known as the Can't Find My Way Back Home problem: sometimes the original sender will put their "real" email address in the Reply-to field, because for one reason or another they send email from a different address than where they receive it. People who always read and send from the same location don't have this problem, and may be surprised that it even exists. But for those who have unusual email configurations, or who cannot control how the From address on their mails looks (perhaps because they send from work and do not have any influence over the IT department), using Reply-to may be the only way they have to ensure that responses reach them. When such a person posts to a mailing list that she's not subscribed to, her setting of Reply-to becomes essential information. If the list software overwrites it,[44] she may never see the responses to her post.

The second disadvantage has to do with expectations, and in my opinion is the most powerful argument against Reply-to munging. Most experienced mail users are accustomed to two basic methods of replying: reply-to-all and reply-to-author. All modern mail reading software has separate keys for these two actions. Users know that to reply to everyone (that is, including the list), they should choose reply-to-all, and to reply privately to the author, they should choose reply-to-author. Although you want to encourage people to reply to the list whenever possible, there are certainly circumstances where a private reply is the responder's prerogative — for example, they may want to say something confidential to the author of the original message, something that would be inappropriate on the public list.

Now consider what happens when the list has overridden the original sender's Reply-to. The responder hits the reply-to-author key, expecting to send a private message back to the original author. Because that's the expected behavior, he may not bother to look carefully at the recipient address in the new message. He composes his private, confidential message, one which perhaps says embarrassing things about someone on the list, and hits the send key. Unexpectedly, a few minutes later his message appears on the mailing list! True, in theory he should have looked carefully at the recipient field, and should not have assumed anything about the Reply-to header. But authors almost always set Reply-to to their own personal address (or rather, their mail software sets it for them), and many longtime email users have come to expect that. In fact, when a person deliberately sets Reply-to to some other address, such as the list, she usually makes a point of mentioning this in the body of her message, so people won't be surprised at what happens when they reply.

Because of the possibly severe consequences of this unexpected behavior, my own preference is to configure list management software to never touch the Reply-to header. This is one instance where using technology to encourage collaboration has, it seems to me, potentially dangerous side-effects. However, there are also some powerful arguments on the other side of this debate. Whichever way you choose, you will occasionally get people posting to your list asking why you didn't choose the other way. Since this is not something you ever want as the main topic of discussion on your list, it might be good to have a canned response ready, of the sort that's more likely to stop discussion than encourage it. Make sure you do not insist that your decision, whichever it is, is obviously the only right and sensible one (even if you think that's the case). Instead, point out that this is a very old debate, there are good arguments on both sides, no choice is going to satisfy all users, and therefore you just made the best decision you could. Politely ask that the subject not be revisited unless someone has something genuinely new to say, then stay out of the thread and hope it dies a natural death. (See also the section called “Avoid Holy Wars”.)

Someone may suggest a vote to choose one way or the other. You can do that if you want, but I personally do not feel that counting heads is a satisfactory solution in this case. The penalty for someone who is surprised by the behavior is so huge (accidentally sending a private mail to a public list), and the inconvenience for everyone else is fairly slight (occasionally having to remind someone to respond to the whole list instead of just to you), that it's not clear that a majority should be able to put a minority at such risk.

I have not addressed all aspects of this issue here, just the ones that seemed most important. For a full discussion, see these two canonical documents, which are the ones people always cite when they're having this debate:

Despite the mild preference indicated above, I do not feel there is a "right" answer to this question,[45] and happily participate in many lists that do set Reply-to. The most important thing you can do is settle on one way or the other early, and try not to get entangled in debates about it after that. When the debate re-arises every few years, as it inevitably will, you can point people to the archived discussion from last time.

Archiving

Every discussion forum should be fully archived. It's common for new discussions to refer to old ones, and often people doing an Internet search will find a solution to a problem by stumbling across a message that had been casually posted to a mailing list by some stranger. Archives also provide history and context for new users and developers who are becoming more involved in the project.

The technical details of setting up archiving are specific to the software that's running the forum, and are beyond the scope of this book. If you need to choose or configure an archiver, consider these properties:

Prompt updating

People will often want to refer to an archived message that was posted recently. If possible, the archiver should archive each post instantaneously, so that by the time a post appears on the mailing list, it's already present in the archives. If that option isn't available, then at least try to set the archiver to update itself every hour or so. (By default, some archivers run their update processes once per night, but in practice that's far too much lag time for an active mailing list.)

Referential stability

Once a message is archived at a particular URL, it should remain accessible at that exact same URL forever. Even if the archives are rebuilt, restored from backup, or otherwise fixed, any URLs that have already been made publicly available should remain the same. Stable references make it possible for Internet search engines to index the archives, which is a major boon to users looking for answers. Stable references are also important because mailing list posts and threads are often linked to from other places, such as from the bug tracker (see the section called “Bug Tracker”) or from other project documents.

Ideally, mailing list software would include a message's archive URL, or at least the message-specific portion of the URL, in a header or footer when it distributes the message to recipients. That way people who have a copy of the message would be able to instantly know its archive location without having to actually visit the archives, which would be helpful because any operation that involves web browsing is automatically time-consuming. Whether any mailing list software actually offers this feature, I don't know; unfortunately, the ones I have used do not. However, it's something to look for (or, if you write mailing list software, it's a feature to consider implementing, please).

Thread support

It should be possible to go from any individual message to the thread (group of related messages) that the original message is part of. Each thread should have its own URL too, separate from the URLs of the individual messages in the thread.

Searchability

An archiver that doesn't support searching — on the bodies of messages, as well as on authors and subjects — is close to useless. Note that some archivers support searching by simply farming the work out to an external search engine such as Google. This is acceptable, but direct search support is usually more fine-tuned, because it allows the searcher to specify that the match must appear in a subject line versus the body, for example.

The above is just a technical checklist to help you evaluate and set up an archiver. Getting people to actually use the archiver to the project's advantage is discussed in later chapters, in particular the section called “Conspicuous Use of Archives”.

Mailing List / Message Forum Software

Here are some tools for running message forums. If the site where you're hosting your project already has a default setup, then you can just use that and avoid having to choose. But if you need to install one yourself, below are some possibilities. (Of course, there are probably other tools out there that I just didn't happen to find, so don't take this as a complete list).

  • Discourse — https://discourse.org/

    Discourse was built to be the One True Discussion System for Web and mobile, and so far it seems to be living up to its promise. It is open source, supports both browser-based and email-based participation in discussions, and is under active development with commercial support available. You can purchase hosted discourse if you don't want to set up yourself.

  • Sympa — https://www.sympa.org/

    Sympa is developed and maintained by a consortium of French universities. It is designed for a given instance to handle both very large lists (> 1,000,000 members) and a large number of lists. Sympa can work with a variety of dependencies; for example, you can run it with sendmail, postfix, qmail or exim as the underlying message transfer agent. It has built-in Web-based archiving.

  • Mailman — http://www.list.org/

    For many years, Mailman was the standard for open source project mailing lists. It comes with a built-in archiver and has hooks for plugging in external archivers. Mailman is very reliable in terms of message delivery and other under-the-hood functionality, but its reputation suffered for a while because of various user interface issues in its aging 2.x code base (especially for spam moderation and subscription moderation), and delays in shipping its long-awaited 3.0 release.

    However, Mailman 3.0 has now shipped, and is worth a look. It should solve many of the problems of Mailman 2, and may make Mailman a reasonable choice again. This excellent article by Sumana Harihareswara describes the major improvements: https://lwn.net/Articles/638090/.

  • Google Groups — https://groups.google.com/

    Listing Google Groups here was a tough call. The service is not itself open source, and a few of its administrative functions can be a bit hard to use. However, its advantages are substantial: your group's archives are always online and searchable; you don't have to worry about scalability, backups, or other run-time infrastructure issues; the moderation and spam-prevention features are pretty good (with the latter constantly being improved, which is important in the neverending spam arms race); and Google Groups are easily accessible via both email and web, in ways that are likely to be already familiar to many participants. These are strong advantages. If you just want to get your project started, and don't want to spend too much time thinking about what message forum software or service to use, Google Groups is a good default choice.



[43] Which was a long time coming — see http://www.rants.org/2008/03/06/thread_theory/ for more. And no, I'm not too dignified to refer to my own blog post.

[44] In theory, the list software could add the list's address to whatever Reply-to destination were already present, if any, instead of overwriting. In practice, for reasons I don't know, most list software overwrites instead of appending.

[45] Although there is, of course, a right answer, and it is to leave the original author's Reply-to untouched. The relevant standards document, http://www.ietf.org/rfc/rfc2822.txt, says "When the 'Reply-To:' field is present, it indicates the mailbox(es) to which the author of the message suggests that replies be sent."