Web Site

Web Site
Prev	Chapter 3. Technical Infrastructure	Next

For our purposes, the web site means web pages devoted to helping people participate in the project as developers, documenters, etc. Note that this may be different from the main user-facing web site. In many projects, users have different needs and often (statistically speaking) a different mentality from the developers. The kinds of web pages most helpful to users are not always the same as those helpful for developers. Don't try to make a "one size fits all" web site just to save some writing and maintenance effort: you'll end up with a site that is not quite right for either audience.

The two types of sites should cross-link, of course, and in particular it's important that the user-oriented site have, tucked a way in a corner somewhere, a clear link to the developers' site, since most new developers will start out at the user-facing pages and look for a path from there to the developers' area.

An example may make this clearer. As of this writing in February 2022, the office suite LibreOffice has its main user-oriented web site at https://www.libreoffice.org/, as you'd expect. If you were a user wanting to download and install LibreOffice, you'd start there, go straight to the "Download" link, and so on. But if you were a developer looking to fix a bug in LibreOffice, you might start at https://www.libreoffice.org/, but you'd be looking for a link that says something like "Developers", or "Development", or "Get Involved" — in other words, you'd be looking for the gateway to the development area.

LibreOffice, like other large projects, has a few different gateways to developer-land. There's a prominent link partway down the page that says "Get Involved", and at the top there's also a dropdown menu named "Improve It" that offers a number of paths to participation, including a "Developers" item.

The "Get Involved" page is aimed at the broadest possible range of potential contributors: developers, yes, but also documenters, quality-assurance testers, marketing helpers, web infrastructure experts, financial or in-kind donors, interface designers, support forum helpers, etc. This frees up the "Developers" page to target the rather narrower audience of programmers interested in improving the LibreOffice code. The set of links and short descriptions provided on both pages is admirably clear and concise: you can tell immediately from looking whether you're in the right place for what you want do, and if so what the next thing to click on is. The "Development" page gives some information about where to find the code, how to contact the other developers, how to file bugs, and things like that, but most importantly it points to what most seasoned open source contributors would instantly recognize as the real gateway to actively-maintained development information: the development wiki at https://wiki.documentfoundation.org/Development.

This division into two contributor-facing gateways, one for all kinds of contributions and another for coders specifically, is probably right for a large, multi-faceted project like LibreOffice. You'll have to use your judgement as to whether that kind of subdivision is appropriate for your project; at least at the beginning, it probably isn't. It's better to start with one unified contributor gateway, aimed at all the types of contributors you expect, and if that page ever gets large enough or complex enough to feel unwieldy — listen carefully for complaints about it, since you and other long-time participants will be naturally desensitized to weaknesses in introductory pages! — then you can divide it up however seems best.

From a technical point of view there is not much to say about setting up the project web site. Web hosting is easy to come by, and most of the important things to say about layout and arrangement were covered in the previous chapter. The web site's main function is to present a clear and welcoming overview of the project, and to bind together the various collaboration tools (the version control system, bug tracker, etc.). To save time and effort, many projects just use one of the canned hosting services, as described below.

Canned Hosting

A canned hosting site is an online service that offers some or all of the online collaboration tools needed to run a free software project. At a minimum, a canned hosting site offers public version control repositories and bug tracking; most also offer wiki space, many offer mailing list hosting^[36] too, and some offer continuous integration testing^[37] and other services^[38]. For many projects, canned hosting provides a perfectly adequate developer-oriented entry point to the project, and there is no need to set up a separate web site.

There are two main advantages to using a canned site. The first is server maintenance: uptime monitoring, operating system upgrades, etc. Having someone else handle that is one less thing to worry about. The second advantage is simplicity. They have already chosen a bug tracker, a version control system, perhaps discussion forum software, and everything else you need to run a project. They've configured the tools, arranged single-sign-on authentication where appropriate, are taking care of backups for all the data stored in the tools, etc. You don't need to make many decisions. All you have to do is fill in a registration form, press a button, and suddenly you've got a project development web site.

These are pretty significant benefits. The disadvantage, of course, is that you must accept their choices and configurations, even if something different would be better for your project. Usually canned sites are adjustable within certain narrow parameters, but you will never get the fine-grained control you would have if you set up the site yourself and had full administrative access to the server.

A perfect example of this is the handling of generated files. Certain project web pages may be generated files — for example, there are systems for keeping FAQ data in an easy-to-edit master format, from which HTML, PDF, and other presentation formats can be generated. As explained in the section called “Version Everything”, you wouldn't want to version the generated formats, only the master file. But when your web site is hosted on someone else's server, it may be difficult to set up a custom hook to regenerate the online HTML version of the FAQ whenever the master file is changed.

If you choose a canned site, try to leave open the option of switching to a different site later, by using a custom domain name as the project's development home address. You can forward that URL to the canned site, or have a fully customized development home page at the main URL and link to the canned site for specific functionality. Just try to arrange things such that if you later decide to use a different hosting solution, the project's main address doesn't need to change.

If you're not sure whether to use canned hosting, then you should probably use canned hosting. These sites have integrated their services in myriad ways (just one example: if a commit mentions a bug ticket number using a certain format, then people browsing that commit later will find that it automatically links to that ticket), ways that would be laborious for you to reproduce, especially if it's your first time running an open source project. The universe of possible configurations of collaboration tools is vast and complex, but the same set of choices has faced everyone running an open source project and there are some settled solutions now. Each of the canned hosting sites implements a reasonable subset of that solution space, and unless you have reason to believe you can do better, your project will probably run best by just using one of those sites.

Choosing a Canned Hosting Site

There are now so many sites providing free-of-charge canned hosting for projects released under open source licenses that there is not space here to review the field.

So I'll make this easy:

If you don't know what to choose, then choose GitHub (https://github.com/). It's by far the most popular and appears set to stay that way for some years to come. It has a good set of features and integrations. Many developers are already familiar with GitHub and have an account there. It offers APIs at https://develop.github.com/ for interacting programmatically with project resources, and starting in 2020 it introduced message forums^[39].

If you're not convinced by GitHub (for example because your project uses, say, Mercurial instead of Git for version control), but you aren't sure where to host, take a look at Wikipedia's thorough comparison at https://en.wikipedia.org/wiki/Comparison_of_open_source_software_hosting_facilities; it's the first place to look for up-to-date, comprehensive information on open source project hosting options.

Hosting on Fully Open Source Infrastructure

Although all the canned hosting sites use plenty of free software in their stack, most of them also wrote some proprietary code to glue it all together. In these cases the hosting environment itself is not fully open source, and thus cannot be easily reproduced by others. For example, while Git itself is free software, GitHub is a hosted service running partly with proprietary software — if you leave GitHub, you can't take a copy of their infrastructure with you, at least not all of it.

Some projects would prefer a canned hosting site that runs an entirely free software infrastructure. This might be to preserve and signal their commitment to software freedom, and in some cases might also be due to immediate utilitarian considerations — for example, politically sensitive projects that are worried about being deplatformed want to know that they can reproduce their project's hosting independently should it ever become necessary.

Fortunately, there are places to obtain fully free-software commercial hosting. I will list a few examples below (as of early 2020), albeit with no pretense of completeness.

GitLab (https://gitlab.com/): GitLab offers an excellent collaboration platform that comes in two versions: fully free-software (they call this their "Community Edition") and proprietary (which they call their "Enterprise Edition"^[40]. The proprietary edition is hosted by GitLab.com, and has a few features the open source edition doesn't have. Interestingly, GitLab.com themselves don't offer hosting of the strictly open source edition, but some other companies do. Two of them are GitLabHost BV (https://www.gitlabhost.com/) and 2nd Watch (https://www.2ndwatch.com/); you can probably find others by searching https://partners.gitlab.com/. (It's also pretty easy to set up your own instance of GitLab — my own company did so at https://code.librehq.com/ and it was no trouble.)
Sourcehut (https://sourcehut.org/ and https://sr.ht/): Sourcehut offers project hosting with both Git and Mercurial available as version control systems. It is designed to be light, fast, and developer-focused: there is no tracking nor advertising, all of its features work without in-browser Javascript, and many of its features work without even requiring a user account (e.g., some email-driven interactions with the bug tracker). As of early 2022, it's still in "public alpha", but it seems to be reasonably stable and has projects using it.
Redmine (http://www.redmine.org/) or Trac (https://trac.edgewall.org/): Any service that offers hosting of either of the Redmine or Trac code collaboration platforms effectively offers fully freedom-preserving project hosting, because those platforms include most of the features needed to run an open source project. Some companies offer that kind of commercial platform hosting with a zero-cost or very cheap rate for open source projects.

Should you host your project on fully open source infrastructure? I can't answer that question for you, since it ultimately depends on you and your project's philosophical positions. However, as a practical matter, I cannot say I've seen any evidence that the degree of software-freedom of the hosting platform has much effect on a project's success. The vast majority of developers who work on free software projects seem to be willing to participate through a non-free hosting platform when that's what the project is using.

Whether the hosting platform is itself free software or not, it is crucial to be able to interact with project data in automatable ways, and to have a way to export data out of the hosting platform. A site that meets these criteria can never truly lock you in, and will even be somewhat extensible, via its programmatic interface.

Of course, all the above applies only to the servers of the hosting site. Your project itself should never require participants to run proprietary software on their own machines.^[41]

Anonymity and Involvement

A problem that is not strictly limited to the canned sites, but is most often found there, is the over-requirement of user registration to participate in various aspects of the project. The proper degree of requirement is a bit of a judgement call. User registration helps prevent spam, for one thing, and even if every commit gets reviewed you still probably don't want anonymous^[42] strangers pushing changes into your repository, for example.

But sometimes user registration ends up being required for tasks that ought to be permitted to unregistered visitors, especially the ability to file tickets in the bug tracker, and to comment on existing tickets. By requiring a logged-in username for such actions, the project raises the involvement bar for what should be quick, convenient tasks. It also changes the demographics of who files bugs, since those who take the trouble to set up a user account at the project site are hardly a random sample even from among users who are willing to file bugs (who in turn are already a biased subset of all the project's users). Of course, one wants to be able to contact someone who's entered data into the ticket tracker, but having a field where she can enter her email address (if she wants to) would be sufficient for that. If a new user spots a bug and wants to report it, she'll only be annoyed at having to fill out an account creation form before she can enter the bug into the tracker. She may simply decide not to file the bug at all.

If you have control over which actions can be done anonymously, make sure that at least all read-only actions are permitted to non-logged-in visitors, and if possible that data entry portals, such as the bug tracker, that tend to bring information from users to developers, can also be used anonymously, although of course anti-spam techniques, such as captchas, may still be necessary.

^[36]Note that even when a canned hosting site doesn't offer message forums as a standalone feature, it will usually offer rich notification and subscription/watch features attached to its bug tracker and version control system, such that participants can effectively have a message-forum-style discussion centered around a particular bug or change. While these features are very useful, they are not a full substitute for first-class message forums as described in the section called “Message Forums / Mailing Lists”.

^[37]See automated-testing.

^[38]Note that for successful free software projects, interested commercial entities will eventually often step up to fund many of these services anyway; see the section called “Providing Build Farms and Development Servers” for further discussion of this.

^[39]That is, message forums as in the section called “Message Forums / Mailing Lists”. The feature's name is "GitHub Discussions"; you have to turn it on for your repository, as it's not currently on by default.

^[40]See the section called “"Commercial" vs "Proprietary"” for why this terminology deserves scare quotes.

^[41]The exception to this is proprietary Javascript code that is received from the hosting site and run confined or "sandboxed" in one tab in the user's browser. The question of whether such code is conceptually an extension of the server, or should be thought of as running on the client machine even though in some senses it has more access to server resources than it does to client resources, is a deep and ongoing debate. We won't settle it here, but the issue is at least more complex than just which CPU is executing the instructions.

^[42]Pseudonymous is another matter. As long as a consistent identity has accrued reputation, you may not need to know who it actually is.