Les logiciels de gestion de versions

Les logiciels de gestion de versions
Précédent	Chapitre 3. L'infrastructure technique	Suivant

Un logiciel de gestion de versions (ou logiciel de gestion des révisions) est un mélange de technologie et de bonnes pratiques pour traquer et contrôler les modifications apportées aux fichiers d’un projet, en particulier au code source, à la documentation et aux pages Web. Si vous n’avez jamais utilisé un logiciel de gestion de version, la première chose que vous devriez faire est de trouver qui en a l’expérience et la maîtrise, et le convaincre de rejoindre le projet. De nos jours, tout le monde s’attend au minimum à ce que le code source du projet soit sous la surveillance d’un logiciel de gestion de versions, et votre projet ne sera pas pris au sérieux s’il n’utilise pas efficacement un tel logiciel.

Les logiciels de gestion de versions sont devenus des standards, car ils fournissent une aide précieuse dans quasiment chaque do- maine d’un projet efficace : la communication entre développeurs, la gestion des sorties, la gestion des bogues, la stabilité du code, le développement expérimental, les attributions et les autorisations de modifications. La gestion de versions vous fournit un contrôle cen- tralisé sur tous ces domaines. Le cœur de la gestion de versions est la gestion des modifications: l’identification de chaque petit changement apporté aux fichiers du projet, l’annotation de chaque modification par des métadonnées comme la date du changement, son auteur, et la possibilité de ressortir ces données pour toute demande, quelle qu’en soit la manière. C’est un mécanisme de communication avec lequel le changement est l’unité de base de l’information.

Nous n’aborderons pas tous les aspects de l’utilisation d’un logiciel de gestion de versions dans cette partie. La gestion de versions étant un vaste sujet, nous l’étudierons au fur et à mesure, tout au long du livre. Ici, nous allons nous intéresser plus particulièrement au choix et à l’installation d’un logiciel de gestion de versions, avec comme objectif la promotion du développement collaboratif.

Vocabulaire de la gestion de versions

Ce livre ne vous enseignera pas l’emploi de la gestion de versions si vous ne l’avez pas déjà expérimenté auparavant, cependant il serait impossible d’aborder ce sujet sans quelques termes clés. Ces termes sont utiles, indépendamment de tout système de gestion de versions : ce sont les noms et verbes de base de la collaboration en réseau, et ils seront employés de manière générique tout au long de ce livre. Même s’il n’existait aucun système de gestion de versions, le problème de gestion des modifications serait quand même présent, et ces mots nous fournissent un langage pour en parler de manière concise.

"Version" contre "Révision"

Le mot version est parfois employé comme synonyme de « ré- vision ». Ici, je ne lui donnerai pas cette signification afin de ne pas le confondre trop facilement avec « version », dans le sens de version d’un logiciel, c’est à dire le numéro de sortie ou d’édition comme dans « Version 1.0 ». Mais l’expression « ges- tion de versions » étant déjà répandue, je continuerai à ’utiliser comme synonyme de « gestion de révisions » ou « gestion de modifications ».

Commit

Apporter une modification au projet, ou, plus formellement, enregistrer un changement dans la base de données de gestion de versions, pour qu’il puisse être ajouté dans une version future du projet. Commit peut être utilisé comme un verbe ou un nom. En tant que nom, il est surtout synonyme de « modification ». Par exemple : « Je viens juste d’enregistrer un correctif pour le bogue de crash de serveur que les gens ont rapporté sur Mac OS X. Jay, pourrais-tu, s’il te plaît, vérifier le commit, et t’assurer que je ne me trompe pas au sujet de l’allocation ? »

Messages enregistrés

Quelques commentaires joints à chaque commit, décrivant la nature et le but du commit. Les messages enregistrés font partie des documents les plus importants d’un projet : ils font le lien entre le langage très technique des modifications du code et le langage plus compréhensible qui se rapporte aux fonctionnalités, aux corrections de bogues et à la progression du projet. Par la suite, dans cette section, nous étudierons comment distribuer les messages enregistrés au bon public ; de plus, la section nommée la section intitulée « Codifying Tradition » in Chapitre 6, Communications dans le chapitre 6 aborde les manières d’encourager les participants à écrire des messages enregistrés concis et utiles.

Mise à jour

Demander que les autres modifications (commit) soient incorporées dans votre propre version du projet, c’est à dire, mettre votre copie à jour. C’est une opération de routine, la plupart des développeurs mettent à jour leur code plusieurs fois par jour. Ainsi, ils savent qu’ils utilisent sensiblement la même chose que les autres. En conséquence, s’ils détectent un bogue, il y a peu de chance qu’il ait déjà été corrigé. Par exemple : « Salut, j’ai remarqué que le code d’indexation ou- blie toujours le dernier nombre. Est-ce un nouveau bogue ? » « Oui, mais il a été réparé la semaine dernière, fais une mise à jour, il devrait disparaître. »

Dépôt

Une base de données au sein de laquelle les modifications sont stockées. Certains logiciels de gestion de versions sont centralisés : il y a un unique dépôt maître qui conserve toutes les modifications du projet. D’autres sont décentralisés : chaque développeur possède son propre dépôt et les modifications peuvent être partagées entre les dépôts de manière arbitraire. Le logiciel de gestion de versions conserve un suivi des dépendances entre les modifications. Au moment de la publication d’une nouvelle version, un ensemble particulier de modifications est approuvé pour la sortie. Quant à savoir quel système est le meilleur, centralisé ou décentralisé... Cette question est l’une des vieilles guerres du développement de logiciel, essayez de ne pas vous laisser entraîner dans ce débat sur l’une des listes du projet.

Retrait

L’obtention d’une copie du projet depuis le dépôt. Un retrait produit en général une arborescence de répertoires appelée « copie de travail » (voir ci-dessous), à partir de laquelle des changements peuvent être intégrés au dépôt originel. Pour certains logiciels décentralisés de gestion de versions, chaque copie de travail est, elle-même, un dépôt, et les modifications peuvent être envoyées (ou aspirées) vers les dépôts les acceptant.

Copie de travail

L’arborescence personnelle d’un développeur contient les fichiers du code source du projet, et peut éga- lement contenir pages Web et autres documents. Une copie de travail contient également quelques méta-données prises en charge par le logiciel de gestion de versions, indiquant à la copie de travail de quel dépôt elle provient, quelles « révisions » (voir ci-dessous) des fichiers sont présentes, etc. Généralement, chaque développeur possède sa propre copie de travail dans laquelle il réalise et teste les modifications, et à partir de laquelle il commit.

Révision, Modifications et Ensemble de modifications

Une « révision » est en général une incarnation précise d’un fichier ou dossier particulier. Par exemple, si le projet commence avec la révision 6 du fichier F et qu’ensuite quelqu’un modifie F on parlera alors de la révision 7 de F. Certains systèmes parlent aussi de « révision », « modification » ou « ensemble de modifications » pour se référer à un ensemble de modifications ajoutées en même temps comme une unité conceptuelle.

These terms occasionally have distinct technical meanings in different version control systems, but the general idea is always the same: they give a way to speak precisely about exact points in time in the history of a file or a set of files (say, immediately before and after a bug is fixed). For example: "Oh yes, she fixed that in revision 10" or "She fixed that in revision 10 of foo.c."

Ces termes ont parfois une signification technique distincte selon le logiciel de gestion de versions, mais l’idée générale est toujours la même : ils fournissent un moyen de parler sans ambiguïté d’un point précis dans l’histoire d’un fichier ou d’un ensemble de fichiers : par exemple, immédiatement avant ou après la correction d’un bogue, ou encore « Ah oui, elle a corrigé cela dans la révision 10 » ou bien « Elle a corrigé cela dans la révision 10 de foo.c. » Quand quelqu’un parle d’un fichier ou d’un ensemble de fichiers sans préciser de révision particulière, on comprend généralement qu’il s’agit de la révision la plus récente.

Diff

La représentation textuelle d’une modification. Un diff montre quelles lignes ont été modifiées et comment, en ajoutant quelques lignes de contexte d’un côté ou de l’autre. Pour un développeur déjà familier avec le code, la lecture d’un diff et du code suffisent, en général, à comprendre l’impact des modifications, voire à détecter des bogues.

Mot-clé

Une étiquette pour un ensemble de fichiers donnés à une révision donnée. Les mots-clés sont en général utilisés pour résumer les idées majeures du projet. Par exemple, un mot-clé est généralement utilisé pour chaque sortie publique afin qu’on puisse obtenir, directement depuis le logiciel de gestion de versions, l’ensemble exact des fichiers/révisions compris dans cette version. Des mots-clés courants sont Release_1_0, Delivery_00456, etc.

Branche

Une copie du projet, sous gestion de versions mais isolée, afin que les modifications de cette branche n’affectent pas le reste du projet (et vice versa) , sauf quand les modifications sont « fusionnées » volontairement dans un sens ou l’autre (voir plus bas). Les branches sont aussi connues sous le nom de « lignes de développement ». Même dans un projet n’ayant pas explicitement de branches, on considère toujours que le développement s’effectue sur la « branche principale », également connue sous le nom de « ligne principale » ou «tronc».

Les branches offrent la possibilité d’isoler différentes lignes de développement les unes des autres. Par exemple, une branche peut être employée pour faire du développement expérimental qui serait trop déstabilisant pour le tronc principal. Ou, à l’inverse, une branche peut être utilisée pour stabiliser une nouvelle version. Au cours du processus de sortie, le développement normal continue sans interruption dans la branche principale du dépôt, tandis que dans la branche de sortie aucun changement n’est accepté, sauf ceux approuvés par les responsables de la parution. Ainsi, la conception de la nouvelle version n’interfère pas avec le travail de développement en cours. Voir la section la section intitulée « Use branches to avoid bottlenecks » plus loin dans ce chapitre pour une discussion plus détaillée à propos des branches.

Fusion (ou port)

Transférer une modification d’une branche à une autre. Cela englobe la fusion du tronc principal vers d’autres branches et inversement. En fait, ce sont les types de fusion les plus courants, il est rare de porter une modifica- tion entre deux branches secondaires. Voir la section appelée la section intitulée « Singularity of information » pour en savoir plus sur ce type de fusion.

« Fusion » a un deuxième sens proche : c’est ce que fait le logiciel de gestion de versions quand il voit que deux personnes ont modifié le même fichier à des endroits différents. Puisque les deux modifications n’interfèrent pas entre elles, quand l’une des personnes met à jour sa copie du fichier (contenant déjà ses propres changements), les modifications de l’autre personne seront automatiquement fusionnées. C’est très courant, particulièrement dans les projets où plusieurs personnes travaillent sur le même code. Quand deux modifications différentes se chevauchent, il en résulte un « conflit », voir ci-dessous.

Conflit

C’est ce qui se passe quand deux personnes tentent de faire des changements au même endroit du code. Tous les systèmes de gestion de version détectent automatiquement les conflits, et avertissent au moins l’un des responsable de ces modifications conflictuelles. C’est alors à l’humain de régler le conflit, et d’envoyer la résolution au logiciel de gestion de version.

Verrouiller

Une manière de se réserver les modifications sur un fichier ou un dossier particulier. Par exemple : « Je ne peux pas envoyer de modifications des pages Web en ce moment. Il semblerait qu’Alfred les ait verrouillées pendant qu’il modifie leur image de fond. » Tous les systèmes de gestion de versions ne permettent pas ceci, et ceux qui l’autorisent n’imposent pas l’utilisation de cette fonctionnalité. C'est parce que le développement simultané, parallèle, est la norme, et le fait d'empêcher l'accès à des fichiers à d'autres personnes en utilisant le verrouillage est (habituellement) contraire à cet idéal.

On dit que les systèmes de gestion de version, imposant le verrouillage avant d’enregistrer des modifications, utilisent le modèle verrouillage-modification-déverrouillage. Ceux qui ne le font pas utilisent le modèle dit de copie-modification-fusion. Une excellente explication détaillée et une comparaison de ces deux modèles peut être trouvée à l'endroit suivant : http://svnbook.red-bean.com/svnbook-1.0/ch02s02.html. En général, le modèle copie-modification-fusion est plus adapté au développement Open Source, et tous les logiciels de gestion de versions abordés dans ce livre prennent en charge ce modèle.

Choisir un logiciel de gestion de versions

As of this writing, the two most popular version control systems in the free software world are Concurrent Versions System (CVS, http://www.cvshome.org/) and Subversion (SVN, http://subversion.tigris.org/).

CVS has been around for a long time. Most experienced developers are already familiar with it, it does more or less what you need, and since it's been popular for a long time, you probably won't end up in any long debates about whether or not it was the right choice. CVS has some disadvantages, however. It doesn't provide an easy way to refer to multi-file changes; it doesn't allow you to rename or copy files under version control (so if you need to reorganize your code tree after starting the project, it can be a real pain); it has poor merging support; it doesn't handle large files or binary files very well; and some operations are slow when large numbers of files are involved.

None of CVS's flaws is fatal, and it is still quite popular. However, in the last few years the more recent Subversion has been gaining ground, especially in newer projects.^[14]. If you're starting a new project, I recommend Subversion.

On the other hand, since I'm involved in the Subversion project, my objectivity might reasonably by questioned. And in the last few years a number of new open-source version control systems have appeared. Annexe A, Free Version Control Systems lists all the ones I know of, in rough order of popularity. As the list makes clear, deciding on a version control system could easily become a lifelong research project. Possibly you will be spared the decision because it will be made for you by your hosting site. But if you must choose, consult with your other developers, ask around to see what people have experience with, then pick one and run with it. Any stable, production-ready version control system will do; you don't have to worry too much about making a drastically wrong decision. If you simply can't make up your mind, then go with Subversion. It's fairly easy to learn, and is likely to remain a standard for at least a few years.

Using the Version Control System

The recommendations in this section are not targeted toward a particular version control system, and should be simple to implement in any of them. Consult your specific system's documentation for details.

Version everything

Keep not only your project's source code under version control, but also its web pages, documentation, FAQ, design notes, and anything else that people might want to edit. Keep them right next to the source code, in the same repository tree. Any piece of information worth writing down is worth versioning—that is, any piece of information that could change. Things that don't change should be archived, not versioned. For example, an email, once posted, does not change; therefore, versioning it wouldn't make sense (unless it becomes part of some larger, evolving document).

The reason versioning everything together in one place is important is so people only have to learn one mechanism for submitting changes. Often a contributor will start out making edits to the web pages or documentation, and move to small code contributions later, for example. When the project uses the same system for all kinds of submissions, people only have to learn the ropes once. Versioning everything together also means that new features can be committed together with their documentation updates, that branching the code will branch the documentation too, etc.

Don't keep generated files under version control. They are not truly editable data, since they are produced programmatically from other files. For example, some build systems create configure based on the template configure.in. To make a change to the configure, one would edit configure.in and then regenerate; thus, only the template configure.in is an "editable file." Just version the templates—if you version the result files as well, people will inevitably forget to regenerate when they commit a change to a template, and the resulting inconsistencies will cause no end of confusion.^[15]

The rule that all editable data should be kept under version control has one unfortunate exception: the bug tracker. Bug databases hold plenty of editable data, but for technical reasons generally cannot store that data in the main version control system. (Some trackers have primitive versioning features of their own, however, independent of the project's main repository.)

Browsability

The project's repository should be browsable on the Web. This means not only the ability to see the latest revisions of the project's files, but to go back in time and look at earlier revisions, view the differences between revisions, read log messages for selected changes, etc.

Browsability is important because it is a lightweight portal to project data. If the repository cannot be viewed through a web browser, then someone wanting to inspect a particular file (say, to see if a certain bugfix had made it into the code) would first have to install version control client software locally, which could turn their simple query from a two-minute task into a half-hour or longer task.

Browsability also implies canonical URLs for viewing specific revisions of files, and for viewing the latest revision at any given time. This can be very useful in technical discussions or when pointing people to documentation. For example, instead of saying "For tips on debugging the server, see the www/hacking.html file in your working copy," one can say "For tips on debugging the server, see http://subversion.apache.org/docs/community-guide/," giving a URL that always points to the latest revision of the hacking.html file. The URL is better because it is completely unambiguous, and avoids the question of whether the addressee has an up-to-date working copy.

Some version control systems come with built-in repository-browsing mechanisms, while others rely on third-party tools to do it. Three such tools are ViewCVS (http://viewcvs.sourceforge.net/), CVSWeb (http://www.freebsd.org/projects/cvsweb.html), and WebSVN (http://websvn.tigris.org/). The first works with both CVS and Subversion, the second with CVS only, and the third with Subversion only.

Commit emails

Every commit to the repository should generate an email showing who made the change, when they made it, what files and directories changed, and how they changed. The email should go to a special mailing list devoted to commit emails, separate from the mailing lists to which humans post. Developers and other interested parties should be encouraged to subscribe to the commits list, as it is the most effective way to keep up with what's happening in the project at the code level. Aside from the obvious technical benefits of peer review (see la section intitulée « Pratiquez la revue par pairs »), commit emails help create a sense of community, because they establish a shared environment in which people can react to events (commits) that they know are visible to others as well.

The specifics of setting up commit emails will vary depending on your version control system, but usually there's a script or other packaged facility for doing it. If you're having trouble finding it, try looking for documentation on hooks, specifically a post-commit hook, also called the loginfo hook in CVS. Post-commit hooks are a general means of launching automated tasks in response to commits. The hook is triggered by an individual commit, is fed all the information about that commit, and is then free to use that information to do anything—for example, to send out an email.

With pre-packaged commit email systems, you may want to modify some of the default behaviors:

Some commit mailers don't include the actual diffs in the email, but instead provide a URL to view the change on the web using the repository browsing system. While it's good to provide the URL, so the change can be referred to later, it is also very important that the commit email include the diffs themselves. Reading email is already part of people's routine, so if the content of the change is visible right there in the commit email, developers will review the commit on the spot, without leaving their mail reader. If they have to click on a URL to review the change, most won't do it, because that requires a new action instead of a continuation of what they were already doing. Furthermore, if the reviewer wants to ask something about the change, it's vastly easier to hit reply-with-text and simply annotate the quoted diff than it is to visit a web page and laboriously cut-and-paste parts of the diff from web browser to email client.
(Of course, if the diff is huge, such as when a large body of new code has been added to the repository, then it makes sense to omit the diff and offer only the URL. Most commit mailers can do this kind of limiting automatically. If yours can't, then it's still better to include diffs, and live with the occasional huge email, than to leave the diffs off entirely. Convenient reviewing and commenting is a cornerstone of cooperative development, much too important to do without.)
The commit emails should set their Reply-to header to the regular development list, not the commit email list. That is, when someone reviews a commit and writes a response, their response should be automatically directed toward the human development list, where technical issues are normally discussed. There are a few reasons for this. First, you want to keep all technical discussion on one list, because that's where people expect it to happen, and because that way there's only one archive to search. Second, there might be interested parties not subscribed to the commit email list. Third, the commit email list advertises itself as a service for watching commits, not for watching commits and occasional technical discussions. Those who subscribed to the commit email list did not sign up for anything but commit emails; sending them other material via that list would violate an implicit contract. Fourth, people often write programs that read the commit email list and process the results (for display on a web page, for example). Those programs are prepared to handle consistently-formatted commit emails, but not inconsistent human-written mails.
Note that this advice to set Reply-to does not contradict the recommendations in la section intitulée « Le grand débat du « Répondre à » » earlier in this chapter. It's always okay for the sender of a message to set Reply-to. In this case, the sender is the version control system itself, and it sets Reply-to in order to indicate that the appropriate place for replies is the development mailing list, not the commit list.

CIA: Another Change Publication Mechanism

Commit emails are not the only way to propagate change news. Recently, another mechanism called CIA (http://cia.navi.cx/) has been developed. CIA is a real-time commit statistics aggregator and distributor. The most popular use of CIA is to send commit notifications to IRC channels, so that people logged into those channels see the commits happening in real time. Though of somewhat less technical utility than commit emails, since observers might or might not be around when a commit notice pops up in IRC, this technique is of immense social utility. People get the sense of being part of something alive and active, and feel that they can see progress being made right before their eyes.

The way it works is that you invoke the CIA notifier program from your post-commit hook. The notifier formats the commit information into an XML message, and sends to a central server (typically cia.navi.cx). That server then distributes the commit information to other forums.

CIA can also be configured to send out RSS feeds. See the documentation at http://cia.navi.cx/ for details.

To see an example of CIA in action, point your IRC client at irc.freenode.net, channel #commits.

Use branches to avoid bottlenecks

Non-expert version control users are sometimes a bit afraid of branching and merging. This is probably a side effect of CVS's popularity: CVS's interface for branching and merging is somewhat counterintuitive, so many people have learned to avoid those operations entirely.

If you are among those people, resolve right now to conquer any fears you may have and take the time to learn how to do branching and merging. They are not difficult operations, once you get used to them, and they become increasingly important as a project acquires more developers.

Branches are valuable because they turn a scarce resource—working room in the project's code—into an abundant one. Normally, all developers work together in the same sandbox, constructing the same castle. When someone wants to add a new drawbridge, but can't convince everyone else that it would be an improvement, branching makes it possible for her to go to an isolated corner and try it out. If the effort succeeds, she can invite the other developers to examine the result. If everyone agrees that the result is good, they can tell the version control system to move ("merge") the drawbridge from the branch castle over to the main castle.

It's easy to see how this ability helps collaborative development. People need the freedom to try new things without feeling like they're interfering with others' work. Equally importantly, there are times when code needs to be isolated from the usual development churn, in order to get a bug fixed or a release stabilized (see la section intitulée « Stabilizing a Release » and la section intitulée « Maintaining Multiple Release Lines » in Chapitre 7, Packaging, Releasing, and Daily Development) without worrying about tracking a moving target.

Use branches liberally, and encourage others to use them. But also make sure that a given branch is only active for exactly as long as needed. Every active branch is a slight drain on the community's attention. Even those who are not working in a branch still maintain a peripheral awareness of what's going on in it. Such awareness is desirable, of course, and commit emails should be sent out for branch commits just as for any other commit. But branches should not become a mechanism for dividing the development community. With rare exceptions, the eventual goal of most branches should be to merge their changes back into the main line and disappear.

Singularity of information

Merging has an important corollary: never commit the same change twice. That is, a given change should enter the version control system exactly once. The revision (or set of revisions) in which the change entered is its unique identifier from then on. If it needs to be applied to branches other than the one on which it entered, then it should be merged from its original entry point to those other destinations—as opposed to committing a textually identical change, which would have the same effect in the code, but would make accurate bookkeeping and release management impossible.

The practical effects of this advice differ from one version control system to another. In some systems, merges are special events, fundamentally distinct from commits, and carry their own metadata with them. In others, the results of merges are committed the same way other changes are committed, so the primary means of distinguishing a "merge commit" from a "new change commit" is in the log message. In a merge's log message, don't repeat the log message of the original change. Instead, just indicate that this is a merge, and give the identifying revision of the original change, with at most a one-sentence summary of its effect. If someone wants to see the full log message, she should consult the original revision.

The reason it's important to avoid repeating the log message is that log messages are sometimes edited after they've been committed. If a change's log message were repeated at each merge destination, then even if someone edited the original message, she'd still leave all the repeats uncorrected—which would only cause confusion down the road.

The same principle applies to reverting a change. If a change is withdrawn from the code, then the log message for the reversion should merely state that some specific revision(s) is being reverted, not describe the actual code change that results from the reversion, since the semantics of the change can be derived by reading the original log message and change. Of course, the reversion's log message should also state the reason why the change is being reverted, but it should not duplicate anything from the original change's log message. If possible, go back and edit the original change's log message to point out that it was reverted.

All of the above implies that you should use a consistent syntax for referring to revisions. This is helpful not only in log messages, but in emails, the bug tracker, and elsewhere. If you're using CVS, I suggest "path/to/file/in/project/tree:REV", where REV is a CVS revision number such as "1.76". If you're using Subversion, the standard syntax for revision 1729 is "r1729" (file paths are not needed because Subversion uses global revision numbers). In other systems, there is usually a standard syntax for expressing the changeset name. Whatever the appropriate syntax is for your system, encourage people to use it when referring to changes. Consistent expression of change names makes project bookkeeping much easier (as we will see in Chapitre 6, Communications and Chapitre 7, Packaging, Releasing, and Daily Development), and since a lot of the bookkeeping will be done by volunteers, it needs to be as easy as possible.

Authorization

Most version control systems offer a feature whereby certain people can be allowed or disallowed from committing in specific sub-areas of the repository. Following the principle that when handed a hammer, people start looking around for nails, many projects use this feature with abandon, carefully granting people access to just those areas where they have been approved to commit, and making sure they can't commit anywhere else. (See la section intitulée « Committers » in Chapitre 8, Managing Volunteers for how projects decide who can commit where.)

There is probably little harm done by exercising such tight control, but a more relaxed policy is fine too. Some projects simply use an honor system: when a person is granted commit access, even for a sub-area of the repository, what they actually receive is a password that allows them to commit anywhere in the project. They're just asked to keep their commits in their area. Remember that there is no real risk here: in an active project, all commits are reviewed anyway. If someone commits where they're not supposed to, others will notice it and say something. If a change needs to be undone, that's simple enough—everything's under version control anyway, so just revert.

There are several advantages to the relaxed approach. First, as developers expand into other areas (which they usually will if they stay with the project), there is no administrative overhead to granting them wider privileges. Once the decision is made, the person can just start committing in the new area right away.

Second, expansion can be done in a more fine-grained manner. Generally, a committer in area X who wants to expand to area Y will start posting patches against Y and asking for review. If someone who already has commit access to area Y sees such a patch and approves of it, they can just tell the submitter to commit the change directly (mentioning the reviewer/approver's name in the log message, of course). That way, the commit will come from the person who actually wrote the change, which is preferable from both an information management standpoint and from a crediting standpoint.

Last, and perhaps most important, using the honor system encourages an atmosphere of trust and mutual respect. Giving someone commit access to a subdomain is a statement about their technical preparedness—it says: "We see you have expertise to make commits in a certain domain, so go for it." But imposing strict authorization controls says: "Not only are we asserting a limit on your expertise, we're also a bit suspicious about your intentions." That's not the sort of statement you want to make if you can avoid it. Bringing someone into the project as a committer is an opportunity to initiate them into a circle of mutual trust. A good way to do that is to give them more power than they're supposed to use, then inform them that it's up to them to stay within the stated limits.

The Subversion project has operated on the honor system way for more than four years, with 33 full and 43 partial committers as of this writing. The only distinction the system actually enforces is between committers and non-committers; further subdivisions are maintained solely by humans. Yet we've never had a problem with someone deliberately committing outside their domain. Once or twice there's been an innocent misunderstanding about the extent of someone's commit privileges, but it's always been resolved quickly and amiably.

Obviously, in situations where self-policing is impractical, you must rely on hard authorization controls. But such situations are rare. Even when there are millions of lines of code and hundreds or thousands of developers, a commit to any given code module should still be reviewed by those who work on that module, and they can recognize if someone committed there who wasn't supposed to. If regular commit review isn't happening, then the project has bigger problems to deal with than the authorization system anyway.

In summary, don't spend too much time fiddling with the version control authorization system, unless you have a specific reason to. It usually won't bring much tangible benefit, and there are advantages to relying on human controls instead.

None of this should be taken to mean that the restrictions themselves are unimportant, of course. It would be bad for a project to encourage people to commit in areas where they're not qualified. Furthermore, in many projects, full (unrestricted) commit access has a special status: it implies voting rights on project-wide questions. This political aspect of commit access is discussed more in la section intitulée « Who Votes? » in Chapitre 4, Social and Political Infrastructure.

^[14]See http://cia.vc/stats/vcs and http://subversion.tigris.org/svn-dav-securityspace-survey.html for evidence of this growth.

^[15]For a different opinion on the question of versioning configure files, see Alexey Makhotkin's post "configure.in and version control" at http://versioncontrolblog.com/2007/01/08/configurein-and-version-control/.

Précédent	Niveau supérieur	Suivant
Les listes de diffusion	Sommaire	Bug Tracker