Streamed Lines: Branching Patterns for Parallel Software Development

Copyright © 1998 by Brad Appleton, Stephen Berczuk, Ralph Cabrera, and Robert Orenstein.
Permission is granted to copy for the PLoP '98 conference.

Abstract: Most software version control systems provide mechanisms for branching into multiple lines of development and merging source code from one development line into another. However, the techniques, policies and guidelines for using these mechanisms are often misapplied or not fully understood. This is unfortunate, since the use or misuse of branching and merging can make or break a parallel software development project. Streamed Lines is a pattern language for organizing related lines of development into appropriately diverging and converging streams of source code changes.

Keywords: Branching, Parallel Development, Patterns, Software Configuration Management, Version Control


Table of Contents
[printing/downloading instructions]
Send us your comments!

Introduction to SCM Patterns

Read this section by following the above hyperlink if you want an introduction to SCM patterns. You can read about our motivation and progress in developing an SCM pattern language, and view a diagram showing the relationships between SCM patterns. Skip ahead to the next section if you want to stay focused on parallel development and branching.

Parallel Development

Any software project of certain team and system sizes will invariably require at least some efforts to be conducted in parallel. Large projects require many roles to be filled: developers, architects, build managers, quality assurance personnel, and other participants all make contributions. Multiple releases must be maintained, and many platforms may be supported. It is often claimed that parallel development will boost team productivity and coordination, but these are not the only reasons for developing in parallel. As [Perry98] points out, parallel development is inevitable in projects with more than one developer. The question is not "should we conduct a parallel development effort", but "how should a parallel development effort best be conducted?"

[Perry98] suggests that many of the basic parallel development problems which arise can be traced back to the essential problems of: system evolution, scale, multiple dimensionality, and knowledge distribution.

Thus, a fundamental and important problem in building and evolving complex large scale software systems is how to manage the phenomena of parallel changes. How do we support the people doing these parallel changes by organizational structures, by project management, by process, and by technology?

Branching for Effective Parallel Development

If parallel development is a fact of life for any large software project, then how can developers making changes to the system in parallel be supported by project management, organizational structures, and technology? Streamed Lines is a pattern language that attempts to provide at least a partial answer to this question by presenting branching and merging patterns for decomposing a project's workflow into separate lines of development, and then later recomposing these lines back into the main workstream. The patterns describe recurring solutions for deciding how and when development paths should diverge (branch) and converge (merge).

Streamed Lines does not describe a complete solution to all the problems encountered during parallel development; It merely attempts to reveal the ways in which branches can be used to help create an effective parallel development solution. What do we even mean by "effective parallel development"? [Atria95] defines effective parallel development as:

... the ability for a software team to undertake multiple, related development activities -- designing, coding, building, merging, releasing, porting, testing, bug-fixing, documenting, etc. -- at the same time, often for multiple releases that use a common software base, with accuracy and control.

Note that this definition extends to include teams that span multiple locations, an increasingly common situation for many organizations. It encompasses all elements of a software system and all phases of the development lifecycle. Inherent within the definition is the concept of integration, in which parallel development activities and projects merge back into the common software base. Also, the definition of effective parallel development includes process control -- the policies and "rules of the road" that help assure a controlled, accurate development environment.

So how can branching help us achieve effective parallel development? Branches may be used to isolate changes, and to insulate developers from other's integrated changes that have yet to be integrated, built, tested, and baselined. Branches may also be used to organize the decomposition work into change-tasks and work-streams and to control the integration of changes from tasks and streams into other streams. When used appropriately in this manner, branching helps address problems of communication, visibility, project planning and tracking, and ultimately risk management.

Introduction to Branching

The following is a brief introduction to the concepts of file checkin/checkout, and to branching and merging. If you are already familiar with these concepts you may safely skip this section.

Project-Oriented Branching

Most VC tools supporting branches do so at the granularity of a lone file or element. The revisions and branches for each file form a version tree which depicts the evolution of a single file. This is called file-oriented branching. Branches are used and organized and viewed in the context of a single file. While there may be a loose or coincidental similarity between the version trees of different files, file-oriented branching focuses primarily on physical modifications to individual files as the unit of change and change-flow.

But branching is most conceptually powerful when viewed from a project-wide or system-wide perspective; the resultant version tree reflects the evolution of an entire project or system. We call this project-oriented branching. With project-oriented branching, branches are used and organized and viewed in the context of an entire project, product, or system. Project-oriented branching imposes a more or less uniform structure on the version trees for all the files in the system. Instead of emphasizing modifications to individual files, project-oriented branching focuses primarily on the flow of logical changes across the entire system. Logical changes flow through and between streams of work in which product and component versions are integrated, built, baselined, and released.

Dimensions of Branching

There are essentially five different forms of branching, each of which may be represented using the file-based branching of most VC tools:

Physical:
Branching of the system's physical configuration - branches are created for files, components, and subsystems

Functional:
Branching of the system's functional configuration - branches are created for features, logical changes (bug-fixes and enhancements), and other significant units of deliverable functionality (e.g., patches, releases, and products)

Environmental:
Branching of the system's operating environment - branches are created for various aspects of the build and run-time platforms (e.g. compilers, windowing systems, libraries, hardware, operating systems, etc.) and/or for the entire platform

Organizational:
Branching of the team's work efforts - branches are created for activities/tasks, subprojects, roles, and groups

Procedural:
Branching of the team's work behaviors - branches are created to support various policies, processes, and states

Specific instances of each type of branching will be discussed in many of the patterns which follow. It should be mentioned that there is frequent overlap between the above types of branching. For example, a branch created for a particular bug-fix may be regarded as both a bug-fix branch, and as an activity-branch. In this case, the set of changes which constitute the fix are performed as a single task. But a branch created for an integration effort won't always correspond to a single fix or feature. It is quite common, however, for a branch to correspond to more than one type of branching. The important thing to remember is which type is perceived as the primary intent of the branch.

It should also be mentioned that using branches for more than 2-3 of these dimensions at the same time is discouraged because it can necessitate a combinatorial explosion of branches spawned from the same origination point (which is quite unwieldy). [Conradi96] discusses this inherent weakness of hierarchical branching and version-trees: a hierarchical organization is often convenient, but it quickly breaks down when variance occurs simultaneously along multiple dimensions.

Branching Terms and Notation

We use various terms and notation throughout this paper. Where possible, we have tried to use names and concepts that frequently recur in practice.

Branches, Change-Tasks, and Codelines

In general, when a branch corresponds to a line of development containing (or intended for) multiple sets of logical changes, we refer to the branch as a codeline, even though it need not be limited to source-code artifacts. Often a branch is used only for a single logical change (also called a change-task). If a branch is used for a single change-task and is then immediately merged back to its parent, we call it an activity-branch, or simply a branch or subbranch. In theory, the terms "branch" and "codeline" may be used as synonyms. When describing branching patterns, however, we try to be consistent in using the term "codeline" to refer to a longer-lived workstream, and using the term "branch" to mean a single activity-branch or a subbranch of a codeline.

Versions, Change-Packages, and Baselevels

A version may refer to a revision of a single file, or to a set of file revisions that make up the entire project (or one of its components/subsystems). A change-package is the group of revisions that were modified or created as part of a change-task. A baselevel is a named configuration of the project that is self-consistent enough to serve as a stable base for subsequent development efforts. A baseline is a baselevel that is suitable for a formal internal or external release.

Merging, Propagating, and Syncing

Merging is the process of integrating the revisions in a change-package into a the contents of a codeline. Sometimes, a change in one codeline needs to be incorporated into another codeline. For example, a bug-fix in a maintenance codeline may also be needed in the corresponding development codeline for the next major release. We refer to this as change propagation, or simply propagation. When the entire contents of a codeline are merged into another codeline, or into a developers workspace, we call this particular kind of merging, syncing with the codeline, or just syncing.

Version Tree Diagrams

Since revision names like "1.4.1.2", used by VC tools like RCS (and many others) aren't particularly mnemonic, we use more symbolic branch names consisting of letters and numbers (and some other characters). We also use the '/' character to indicate the beginning of a branch name, so that versions can be uniquely determined with an identifier such as "/main/rel1-maint/fix232/4". Hence a fully specified version name resembles a directory path in Unix or DOS. A few VC tools (most notably ClearCase and Perforce) use the same or similar conventions for version naming.

Figure 5: Notation used for version trees

When drawing codelines, branches, change-tasks, and their relationships, we use a tree structure with branch-names inside boxes and version-names inside circles (a "box" or "circle" with no name inside is considered "anonymous"). Branches and codelines are indicated with solid lines, whereas merges and propagations are indicated with dashed lines. These version-tree diagrams are reminiscent of interaction sequence diagrams in the UML; but we draw the timeline from left to right instead of from top to bottom (to conserve space).

Branch names always appear at the beginning of the timeline for the branch, and are preceded by a '/'. A "box" appearing in the middle of a timeline for a branch corresponds to a change-task that was performed "on-line" (directly on the codeline, instead of on its own branch), and there is no leading slash in front of the name for such a change-task. The length of a change-task "box" may be used to indicate its duration relative to other change-tasks.

Forces of Branching and Parallel Development

Parallel development raises several important issues and concerns for the success of the development projects. These risk-factors are briefly identified here, and are described in detail in a separate section.

Teamwork
Reusability
Safety
Liveness
SCM Tool Support

Branching Patterns and their Participants

The patterns in Streamed Lines are divided into categories of branching policy, branch creation, and branching structures. These categories loosely correspond to the [GoF] pattern categories of: behavioral, creational, and structural (respectively). In addition, many of the patterns refer to some basic types of branches and codelines. We define all of these categories below:

Basic Branch/Line Elements
Some basic varieties of branches and codelines that serve as lower-level building blocks for various patterns; these are not necessarily patterns per se, but they nevertheless participate in one or more patterns in the language

Branching Policy Patterns
Patterns describing behavioral policies to establish or preserve the conceptual or physical characteristics of a codeline

Branch Creation Patterns
Patterns describing when to create a new kind of branch or codeline

Branch Structuring Patterns
Patterns describing the collaborations between two or more related branches in a branching structure

The participants in Streamed Lines are distributed among these four categories as follows:

Basic Branch/Line Elements Branching Policy Patterns
Branch Creation Patterns Branch Structuring Patterns

The full pattern descriptions appear in Appendix A.

Using the Branching Patterns

We have presented a series of patterns for managing branching in parallel development projects. Certain subsets of these patterns represent conflicting styles and may not mesh well together for the same project; the patterns selected for a particular project are dependent on the needs of the organization and the project itself. In this section, we provide some guidelines on which patterns to select for your project. Which patterns you use will largely depend upon selected tradeoffs between safety and productivity (or "liveness"). More conservative strategies tend to tradeoff productivity for safety, while more optimistic strategies may do the opposite.

Generally speaking, using more branches for greater isolation reduces safety risks, but at the expense of more merging and integration effort. More merging and integration also requires more communication and greater visibility of changes and baselines. Using fewer branches reduces merging and integration efforts, but at the expense of less isolation and less safety. Merging sooner rather than later fleshes out risks early on while there is more time to address them, but requires continual efforts to regularly monitor and address such risks.

In short, you will have to confront and manage risks concerning safety, productivity, and communication no matter what you do. Time and effort must be invested to manage these risks. The three basic ways to do this are to pay now, to pay later, or to pay-as-you-go.

The most productive overall strategies attempt to invest a reasonably small amount up front, and then pay the rest as they go. The larger and more critical and risk-averse your project is, the more you will need to invest in "up front" planning and policies, while still employing a pay-as-you-go strategy throughout the lifetime of the project (which includes regular monitoring and feedback to make incremental corrections). Such an approach essentially tries to offload back-end costs (of deferred or unmanaged risks) by handling the most critical risks "up front" as a minimal initial investment, and to amortize the remaining costs using a "just-in-time" approach.

Here then are the important strategic decisions to make while planning the branching and merging road-map for your parallel development efforts. Be aware that performing less up-front planning requires more attentive and visible monitoring and feedback; while more up-front planning often results in more things that need to be corrected later on. These differences should decrease, and eventually converge, as the project evolves and its parallel development policies and procedures become more stable and mature.

Determine Your Risk Tolerance

Before making any important strategic decisions, probably the first and most important thing to do is determine the amount and kind of risk your project can tolerate within its environment. Look at all of the forces of branching and parallel development described earlier and try to get a good picture of how and where each of them applies to your project and its development environment. Which risks apply to you? Which ones seem important and which ones seem secondary?

Typically, the most fundamentally important tradeoff to consider will be that of safety versus liveness. To get an idea of how much safety risk you can tolerate, ask yourself how much time and effort is required to back-out an unwanted or detrimental change from one of your codelines and builds. How many people does it impact and how soon (and how critically) are they impacted? How much rework and rebuilding is required and how much time and staff are required to perform that rework? How much additional communication overhead does the rework impose?

If the answer to these questions leads you to believe it would be a very significant, or even monumental undertaking to back-out an unwanted change, then your project probably has a very low threshold for safety risks. If on the other hand it seems that only a select few people would be affected and it wouldn't take very much time to correct the problem, then you may have a very high threshold for safety risks.

Don't forget to consider how your risk-threshold will change and evolve as the project evolves and matures! It is exceedingly common for a project to tolerate more risk (and sometimes have greater time-to-market pressures) before it has been deployed to a broad base of customers than after it has been deployed and several releases are being supported and maintained. Also, if the size of the team or of the system is expected to grow considerably, it may make more sense to take some preventive measures early on, before it becomes to difficult to impose non-trivial changes in the team's process and behavior. At the very least, you will need to plan to migrate from a process that tolerates more risk to a process that eventually tolerates less risk.

Select an Appropriate Branching Style

The first strategic decision to make is whether to adopt the strategy of Early Branching or Deferred Branching. These are the two different "branching styles" underlying the majority of the branching patterns in Streamed Lines. Early Branching is better suited to larger or more formal efforts that require a high degree of fine-grained isolation and control; you assume less safety risks but pay the price of additional merging and propagation. Deferred Branching is good for projects that can afford to risk losing a bit of safety in order to gain more productivity; less branching and integration means less overhead, but also less isolation and verification.

The choice of early or deferred branching also affects the visibility with which teamwork and workflow can be communicated from a file's version tree. Deferred branching may hide the intent of a change or set of changes to go into specific releases. Early branching makes this intent clear early on, but requires more effort to follow through with that intent and propagate the change to more codelines than would be required if you had waited longer before branching.

The branching style that you decide is best suited for your environment will dictate a complementary set of patterns and pattern variants:

Regardless of the branching style selected, Codeline Policy and Codeline Ownership should used be for every branch and codeline created. These two practices need to be employed in a way that is readily visible to the team, and which can be easily and quickly communicated in as short a time-span as possible.

Patterns like Parallel Maintenance/Development and Overlapping Releases are typically the first branching structures many shops encounter. They can be applied using either branching-style. It depends primarily upon when you branch (early or late) and upon which effort goes on the branch and which stays on the parent codeline.

Early branching tends to keep the release or major release as the invariant for each codeline. So instead of splitting development and maintenance across codelines, it keeps the same release on the same codeline, regardless of whether or not it is development effort or maintenance effort for the given release.

For deferred branching, the releasing/maintenance effort will always be the one that branches off, allowing the latest and greatest development to continue on the same line as before. This way of thinking may be peculiar to those accustomed to an early branching style that uses separate codelines for each release; they may have difficulty understanding why it is coherent. With deferred branching, it's not the release that remains invariant on the branch, it's that the recency of the effort on the branch: the latest development efforts, or else the latest maintenance efforts.

Select Appropriate Merging Styles

Along with selecting a branching style, you will need to select appropriate merging styles to match your branching preferences. A higher tolerance for safety risks and minimal effort implies a relaxed policy toward codelines, and requires fewer integration lines; A lower tolerance for safety risks implies stricter codeline policies, more codelines, and more integration effort.

Although the choice of merging style often follows from the chosen branching style, a higher risk branching style does not necessarily imply a higher risk merging style. In fact, you may wish to offset high risk in one with low risk in the other. If you take more risks when splitting things apart, you may want to take less risk when putting things back together.

Remember that every time you add another line of integration, you are in effect, adding another level of indirection: you gain more isolation and nicer conceptual organization but you spend more time merging. It should be noted that a Virtual Codeline is somewhat merge-evasive and may be used to simulate just about any kind of codeline. The merging patterns that are more suited to each merging style are as follows:

In either case, frequent incremental integration is always a good idea (using Merge Early and Often or one of its variants) but the merging frequency and ownerships will differ between the two styles. The relaxed style favors liveness and assumes higher risk by having people merge and propagate their own changes across codelines. The more restricted style favors safety and has more codelines, each with more restricted access, and with codeline-owners performing most of the merges.

Unlike the branching styles, the merging styles may be mixed and matched to achieve a gradual progression from high-activity codelines with relaxed policies to lower-activity codelines with restricted policies. This can be accomplished with patterns such as Docking Line, Subproject Line, Component Line and Remote Line. But with a more relaxed style, each of these kinds of codelines will typically merge back to the development line while a more restricted style is more likely to use it as one in a set of Staged Integration Lines.

Start Simple

By choosing appropriate branching and merging styles, you have effectively chosen risk management strategies for organizing and integrating work activities (and even for visibly communicating the status of codelines and baselines to a large extent). Now you are ready to create some specific codelines. It is extremely rare for a single project to use all of the branching patterns presented here. The majority of parallel development projects will typically use the following "core set" of branching patterns (or one of their variants):

Take Baby Steps

Many parallel development efforts will require little more than the above patterns, along with one of MYOC, Docking Line, or Staged Integration Lines. Other projects will have more sophisticated needs. They may start out with the above, and be okay for awhile; But they will eventually need to progress to the next tier of branching patterns, or their variants (often in the following order):

Once again, one or more of the following merging patterns will be used with the above: MYOC, Docking Line, or Staged Integration Lines.

Evolving Integration Needs

Often, the project will take on more risk during early development and then gradually tolerate less and less risk as it grows in team-size, project size/complexity, or moves more and more into maintenance mode. In addition to requiring more of the second-tier branching patterns above, merging styles may need to become less forgiving and more cautiously controlled:

Special Project Needs

The following patterns are usually for "special needs" only:

You may need them very rarely, or only for certain kinds of projects and project teams. But when the project does require them, they often have a very profound impact on the overall shape of the project-wide version tree, and on the overall organization of parallel development efforts. These patterns (along with Change Propagation Queues) should be used sparingly, and only as the need arises. This is especially true of platform-lines since it is often better to handle multi-platform issues with separate files and/or directories than with separate branches.

Revisit, Refactor, and Realign

As the project evolves, there will always be the need to periodically revisit, refactor, and realign the branching/merging structures adopted and their corresponding policies. You will also want look at the overall picture of the project-wide version tree and check to see if the tree looks too wide, too unwieldy, or too disjointed. Prudent use of codeline propagation and retirement into the Mainline will help guard against the tree becoming too wide. The patterns Subproject Line, and Policy Branch can help to correct a version tree that has become to complex and unwieldy. MYOC and Docking Lines can help remedy development that has become too isolated or disjoint.

General Advice and Recurring Themes

The branching patterns in Streamed Lines don't cover every possible contingency. Situations will arise where the correct pattern or variant to use is not at all obvious, or may not even exist. However, even in these cases, some of the recurring themes which underly many of the branching patterns presented here may still be broadly applicable for your particular problem. These are as follows.

Use Meaningful Branch Names

Just like variable names in a program, each branch should have a meaningful name which communicates its purpose or its policy. Meaningful names help to more clearly and visibly communicate intent and status, particularly when the names appear in tool generated reports, queries, and diagrams (especially version trees). If your VC tool doesn't directly support named branches, then floating labels (sometimes called sticky labels) can be used to the same effect. See the pattern Virtual Codeline.

Prefer Branching over Freezing

Don't suspend all activities on a particular codeline when many of those activities could continue unaffected on a separate branch, without impacting the efforts on the original codeline. Productivity need not be hindered this way. See Parallel Releasing/Development Lines for an example. This in fact increases productivity while imposing very little additional safety risk and only modest additional integration effort.

Integrate Early and Often

Frequent, incremental integration is one of the signposts of success, and its absence is often a characteristic of failure. Current project management methods tend to avoid strict waterfall models and embrace the spiral-like models of iterative/incremental development and evolutionary delivery. Incremental integration strategies, like Merge Early and Often and its variants, are a form of risk management that tries to flush out risk earlier in the lifecycle when there is more time to respond to it. The regularity of the rhythm between integrations is seen by [Booch], [McCarthy], and [McConnell] as a leading indicator of project health (like a "pulse" or a "heartbeat").

Not only does early and frequent integration flesh out risk sooner and in smaller "chunks," it also communicates changes between teammates. Every time a developer integrates a new baseline into their workspace, or a new change into the baseline, they learn something about what has happened to the system and where it has changed. In this sense, integration turns out to be a very real form of communication, albeit an indirect one. For this reason, it is crucial that the presence of new baselines and baselevels are clearly and visibly communicated to all concerned, and that the completion of important changes that are ready to be built/baselines are also clearly and visibly communicated.

So perhaps a corollary to "integrate early and often" would be "commit changes visibly and clearly." This includes changes that have been committed to be included into a particular baseline/codeline, as well as baselines that are now ready to be sync-ed into developer's workspaces.

Branch on Incompatibilities

Often, the best way to resolve risks that arise from opposing forces (or competing concerns) is create a new branch for the competition. Such incompatibilities may result from: access policies, dueling ownerships, integration frequency, activity-load, activity-type, and platform. Examples of this include: Policy Branch, Inside/Outside Lines, Component Line, Parallel Maintenance/Development, and Platform Line.

Add Another Level of Integration

Sometimes branching on incompatibility isn't enough. Divergence will often require frequent convergence, or continuous mediation. In this case, it is often necessary to add another level of indirection, by adding another line of integration between the two opposing forces or competing codelines. Examples are: Subproject Line, Docking Line, Remote Development Line, Staged Integration Lines, and Mainline.

This will help reduce risk by isolating variation along the appropriate dimension of work. While this does help to control and contain the amount of variation to a locally manageable region, it does impose an additional integration burden later on. (So does branching on incompatibility.) The theory here is that the integration overhead at the end will be minimized by the continual control that is more easily afforded by isolating the change.

KISS (Keep It Simple Stupid!)

Avoid branching hierarchies that are extremely wide or dense! (Think of "branch and bound.") Try for minimal reconciliation by creating new branches only when the added benefit is worth the added synchronization overhead. Use additional branches to provide greater isolation between tasks and changes; and use integration-lines to add additional verification and validation of merged changes.

But don't use branches to solve all your problems! Many problems are best addressed by different means. For example, numerous multi-platform issues are better solved by using extra files and directories rather than platform-branches. Don't use branches as a "hammer" to make every problem look like a nail, and don't "sow" a new branch unless you can reap the benefits.

Preserve Integrity and Consistency

Preserve the conceptual integrity of the branch! When delegating volatile aspects of high-impact variation to separate branches, keep each aspect logically consistent within its own branch: keep codeline usage consistent with its policy, and keep codeline policy consistent with its purpose. Occasional "fine-tuning" and remedial actions are to be expected, but avoid changes that violate the spirit of the codeline's intent.

Preserve the physical integrity of the branch! Don't merge incomplete or inconsistent changes into the codeline; and don't leave codelines in inconsistent states. When the configuration of a codeline is inconsistent or incorrect it can adversely impact all users of the codeline. Try to keep codelines reliably consistent, and consistently reliable.

Choose optimistic or pessimistic branching policies and stick with them! For a given project, strike a sensible balance of trade-offs between safety (isolation, access control, code integrity, and risk mitigation) and liveness (productivity, integration overhead, working "on-line") and then apply them in a consistent manner. The balance may need to be dynamically adjusted over time; but at any given time, the policies should be consistent with one another.

Isolate Change

You may recall that one of the recurring themes in the [GoF] Design Patterns book is: "Encapsulate the thing that varies." Branching doesn't achieve encapsulation of information so much as it achieves isolation of changes. So a recurring theme in most of these branching patterns is: Isolate the thing that varies! Each branch and codeline isolates one or more of the following dimensions over a given time-period:

Isolate Work, not People

Perhaps most importantly, the branching policies and patterns described here do not remove the need for communication between project team members; These patterns should facilitate communication, not eliminate it! The goal of these patterns is to help isolate work, not people. People working together on a project need to remain socially connected and coordinated, and to maintain awareness of the impact of their efforts downstream and throughout the entire lifecycle. Jeopardize this and you jeopardize team synergy, and ultimately, team success.

If you isolate people from their work, systemic disconnection may result: developers lose touch with the effects of their own efforts on the overall project. If you segregate people from each other according to their work tasks, social isolation may occur: people lose touch with one another and with the overall project team. The purpose of parallelization is not to isolate people from people, or people from their work, but to isolate work from other work. Conway's Law (see [Cope95]) applies just as much to the architecture of the project's version tree as it does to the architecture of the system. Use this wisdom to your advantage (and ignore it at your peril).

Branching Traps and Pitfalls to Avoid

There are some common traps and pitfalls to watch out for when using branching for parallel development. Some of them are the result of naive approaches which seem "right" at first glance, but which deeper understanding reveals to be a "dead end." Others are the result of inappropriately (or overzealously) using the various branching patterns in the wrong context. Many of these branching "pitfalls" try to include some analysis of root cause and cure/prevention. But ultimately it seems like all of them can be traced back to some combination of poor planning, poor communication, or poor management.

Effects of the Branching Patterns

Similarities with Concurrent/Parallel Programming

[McKenney95] writes of the forces for and against parallelizing a software program, breaking them down into: Speedup, Contention, Overhead, Economics, Complexity, and a few others. Most of these forces are equally applicable to the case of concurrent/parallel software development. In fact, designing parallel development strategies for concurrent software development bears more than a striking resemblance to parallel programming strategies for concurrent object systems. The former deals with multiple collaborating objects running in multiple threads of execution across multiple address spaces in a parallel software program; the latter deals with multiple collaborating individuals working in multiple threads of development across multiple workspaces in a parallel software development project.

As [Lea96] describes, some of the most basic tradeoffs to be made when designing concurrent object systems are those of safety ("The property that nothing bad ever happens") and liveness ("The property that anything ever happens at all"). These tradeoffs are essentially the same for software development:

From either direction, the goal is to assure liveness across the broadest possible set of contexts without sacrificing safety.

The need to apply such strategies across the broadest possible set of contexts ties into their reusability across the project, and between projects. Hence all the same issues and concerns mentioned by [Lea96] regarding safety, liveness, and reusability also arise during parallel development.

Isolation and Risk Mitigation

Branching is an optimistic concurrency control strategy for parallel development. It tries to mitigate the risk associated with such optimism by separating concurrent/parallel efforts into isolated paths of development. Branching off into separate workstreams is fairly easy to do with minimal interference, and gets rid of the need for development tasks to "block" waiting for checkout-locks to be released. Rejoining the two paths after they've been separated is done via integration (merging). The inherent risk in resynchronization is mitigated by allowing it to happen in a well insulated context at a more convenient time.

In effect, every codeline and branch represents a form of risk management by isolating how functionality, environment, knowledge, teamwork, responsibility, and reliability, are distributed and disseminated across time and space.

Managing Complexity with Hierarchy

Branching and merging hierarchically decompose and recompose parallel development into more manageable chunks! By isolating things along various dimensions in a hierarchical fashion, we are attempting to manage dynamically evolving complexity and dependencies. First we decompose the parallel development problem into codelines and branches and subbranches, then we recompose the subparts back into the larger whole by progressively merging subbranches back to branches, branches back to codelines, and codelines back into the mainstream.

Integration Overhead

Regardless of whether changes are reconciled and synchronized immediately, or deferred to a more convenient time and place, there is always a risk of compromising the integrity of the codeline during a merge. This is the price for such an optimistic concurrency mechanism. The usual laws of thermodynamics (regarding entropy and enthalpy) apply here as well: it is usually harder to put things back together than it was to take them apart. For every branch created, there is almost always an opposing merge to be reckoned with!

Integrity and Reproducibility

By separating development into isolated development paths and change-tasks, branching eases the burden of tracing changes (both physical and functional) and their dependencies. This makes configurations, features and faults easier to track, verify and reproduce. Although each merge carries with it some additional risk to codeline safety, intelligent use of branching and merging really can help to preserve codeline integrity (physical integrity, as well as conceptual integrity).

Communication, Coordination, and Productivity

If your VC tool supports symbolic branch names (rather than numeric ones) then mnemonic branch names can serve as an effective and highly visible form of communication that describes the intent of the branch and the work taking place upon it. If you aren't using such a VC tool you may need to find a way to work around this, either using a technical solution (like Virtual Codeline) or a social convention among the project team.

Branching also helps communication and collaboration be effectively organized, synchronized, and parallelized. If used properly so that it isolates work instead of people, branching promotes effective teamwork and really can reduce time-to-release. If you thoughtfully apply risk-aware strategies for the selection of branching and merging styles, and periodically take a step back to review and revise the overall branching-tree, you should be able to reap the benefits of parallel development (shorter cycle-time) and keep the amount of synchronization overhead (and risk) to a manageable level.

Parallelization with Concurrency Constructs

Despite the fact that many VC tools consider branching to be one of their nicer and more advanced features, branching is in fact a somewhat low-level construct used for concurrency control. Most VC tools implement file-oriented branching but not project-oriented branching. File-oriented branching is not ideally suited for parallelization of work and workflow at coarser-grained levels beyond a single file or directory.

Using file-oriented branching to represent project-oriented branching results in a fair amount of trivial merging where revision contents need to be propagated from branch to branch with little or no difference between them (often causing unnecessary rebuilds when in fact file-contents have not changed between revisions). Good merging tools can minimize the pain and overhead associated with this, but the overhead can still be significant.

Unfortunately, the majority of readily available VC tools don't provide the user with anything better. It would be far more suitable if one's VC or SCM tool provided predefined constructs which directly map to the conceptual notions of: change-sets, activities, and activity-streams, without being dependent upon branches. Then we could use the SCM tool to directly model parallel effort and workflow and let the tool itself worry about how to handle the low-level concurrency control (branching) with the help of some user-supplied policy preferences. There are a select few tools which actually do provide this capability but they are presently in the minority. So unless you are using such a tool, branching tends to be the next best mechanism for supporting parallelism.

Branching Topology Comprises Workflow

The result of using all these branching patterns is a version branch tree structure that, for the most part, represents the intended structure of activity workflow for the project. One might regard this as a simple byproduct of Conway's Law, namely that "Architecture follows Organization" (see [Cope95]). In the case of branching for parallel software development, we might rename this as a corollary to Conway's Law and call it "Branching Topology Comprises Workflow."

What this means is that tool-generated diagrams and queries/reports can show version trees which closely conform to the intended work breakdown structure (WBS) for the project team. This helps visibly track and communicate status and progress in "real-time" to all users of the VC tool and repository.

The branching tree of a project represents the structure of its evolution in terms of change-flows. The flow of work activities is also an important project structure. Streamed Lines attempts to coordinate these two sets of structures so that activity and workflow conveniently map to change-flows (using branches as the grouping mechanism). This helps makes the project's development and evolution easier to conceptualize and manage. In this manner, Streamed Lines assists in bringing some of the architectural and management structures of a software project into alignment.


Appendix A - Streamed Lines: The Patterns


Acknowledgements

The authors would like to give special thanks to the following people for their significant contributions to Streamed Lines:


References


[back to the table of contents]

Send us your comments!