Tag Archives: DVCS

The Evolution of the Landing Cycle

The canonical commit/push/land cycle at Mozilla

[This is an experiment in publishing a doc piece by piece as blog entries. Please refer to the main page for additional context.]

Untangling the terminology

In the old days, before DVCS, “commit” only had only one real purpose. It was how you published your work to the rest of the world (or your project’s world at least). With DVCS, you are likely committing quite often, but still only occasionally publishing.

In the early days of hg at Mozilla, you would “push your changes”. But with the advent of social coding sites such as bitbucket & github, you often push your changes there as well. Or you push to our try servers.

Informally, Mozillian’s use the term “landing” to describe committing a change to one of the official repositories (also known as “repository of record” or RoR). So while most of the online documentation refers to “commit”, I’ll use the less ambiguous term “landing” here.

The Evolvolution of the Landing Cycle

It happened slowly, so we may not have noticed it, but the distinctions between “commit” and “land” have changed over time. Can we benefit from embracing these changes and extending them further? I believe so, even though that might require us to view our toolsets differently.

The old version

With centralized VCS, the cycle was roughly like:

digraph old_cycle {     rankdir="BT"     RoR_1 [label="RoR T(0)"]     RoR_2 [label="RoR T(1..n-2)"]     RoR_3 [label="RoR T(n-1)"]     RoR_n [label="RoR T(n)"]      { rank=same         other_pc [label="everyone else&squot;s PC"]         dev_pc [label="Dev&squot;s PC", fontcolor="green", color="green"]     }      { rank=same         edge [style="dotted", color="blue"]         RoR_1 -> RoR_2 -> RoR_3         RoR_n     }      {         RoR_1 -> other_pc [style="invis", weight="50"]         edge [penwidth="1", style="dashed", color="cyan"]         other_pc -> RoR_2         other_pc -> RoR_2         other_pc -> RoR_2         other_pc -> RoR_3     }      edge [penwidth="4", color="green", fontcolor="green"]     RoR_1 ->  dev_pc [label="initial checkout"]     RoR_3 ->  dev_pc [label="update\nbefore landing"]     dev_pc -> RoR_n  [label="landing change"] }

The new version

With DVCS, the cycle has been expanded to account for all the “social coding” that goes on between developers directly and indirectly via (public or private) hosted repositories.

NOTE

Nothing has changed w.r.t. how developers interact with the RoR!

digraph new_cycle {     rankdir="BT"     RoR_1 [label="RoR T(0)"]     RoR_2 [label="RoR T(1..n-2)"]     RoR_3 [label="RoR T(n-1)"]     RoR_n [label="RoR T(n)"]      { rank=same         edge [style="dotted", color="blue"]         RoR_1 -> RoR_2 -> RoR_3         RoR_n     }      { rank=same         other_pc [label="everyone else&squot;s PC"]         dev_pc [label="Dev&squot;s PC", fontcolor="green", color="green"]     }      { rank=same         bitbucket [label="Bitbucket", color="crimson"]         github [label="GitHub", color="crimson"]     }      {         RoR_1 -> other_pc [style="invis", weight="50"]         edge [penwidth="1", style="dashed", color="cyan"]         other_pc -> RoR_2         other_pc -> RoR_2         other_pc -> RoR_2         other_pc -> RoR_3     }      {         edge [penwidth="4", color="green", fontcolor="green"]         RoR_1 ->  dev_pc [label="initial checkout"]         RoR_3 ->  dev_pc [label="update\nbefore landing"]         dev_pc -> RoR_n  [label="landing change"]     }      {         edge [color="crimson"]         other_pc -> bitbucket         bitbucket -> other_pc         other_pc -> bitbucket         github -> other_pc         other_pc -> github         github -> other_pc         dev_pc -> github         dev_pc -> bitbucket         dev_pc -> other_pc         other_pc -> dev_pc     } }

So What?

Besides demonstrating my skill level at graphviz, what can be learned from these diagrams? I think something very special:

There are two separate work flows here, and they interact only in well defined ways.

or

The work flow of a development is distinct from the workflow of landing changes into the official repositories.

I assert that we can make the following conclusion from that observation:

Conclusion

There is no hard requirement that both the developer and the official repositories use identical tooling. What is the hard requirement is that the contract between the two continues to be met.

This really shouldn’t be news worthy – it is actually the way things have always been. Before DVCS, every developer did their own source code management on their computer. Maybe it was multiple checkouts, tarballs, patch sets, but chances are it had nothing to do with the tool used for the RoR. Developers even managed to exchange information with each other, again via tarballs, emailed patches, directory permissions and a host of other methods outside of the RoR tooling. (Personally, my first production use of hg was to follow the pattern Mozilla documented for using hg with CVS. That was wonderfully freeing, as it gave me the safety net of local commits and local branches. As a developer, I wanted and could make good use of far more flexibility than any well designed release system would let me have.)

But wait, there’s more

What isn’t shown above is that there is another interface on the other side of the repositories of record. That interface is the contract for using the buildfarm/testfarm/tbpl/L10N machinery that is being run by various groups, all pulling from the repositories of record. (For a zoomed view of this diagram, see this post.)

digraph layers {     node [shape="octagon"]     dev_world     repositories_of_record [shape=rectangle]     buildbot_world      dev_world -> repositories_of_record [label="landing work flow",                                          arrowhead="vee"]     repositories_of_record -> dev_world [arrowhead="vee"]     repositories_of_record -> buildbot_world [label="buildbot work flows",                                               arrowhead="vee"]     buildbot_world -> repositories_of_record [arrowhead="vee"]  }

Because this “backend API” exists, even if we change the tooling used to land changesets, it would have to be done in a manner that continues to support the automated usage of the RoRs. That workflow is not a “natural fit” for any DVCS (arguably especially not git). [1]

Another way to position that is, if we can find a clean, quick, way for developers to continue using hg as the only repository of record, we isolate them from tracking and complying with changes in the RoR -> buildbot contract. That’s the path I’m currently following.

Decoupling. It’s a good thing!

Footnotes:

[1]

Yes, the “backend API” could, theoretically, be changed. At the very least, that is a large (hence long term – possibly years) project, due to both the number of systems involved, and the need to keep those systems up 24x7x365.

To provide more timely improvements in the process, I’m focussed on shorter term deliveries.

… proposed changes to commit workflow

[This is an experiment in publishing a doc piece by piece as blog entries. Please refer to the main page for additional context.]

With all the changes to support git, how will that affect a committer’s workflow? (For developer impact, see this post.)

The primary goal is to work within the existing Mozilla commit policy [1]. Working within that constraint, the idea is "as little as possible", and this post will try to describe how big "as little" is.

Remember: all existing ways of working with hg will continue to work! These are just going to be some additional options for folks who prefer to use github & bitbucket.

Introduction

To start things off, here are the simplified workflows that describe how things currently work with the Mercurial repositories on hg.m.o and following commit policy. The major differences between the committer’s workflow and the developer’s workflow are:

  • the source of the changes (could be patches from developers without commit access, in addition to development work done directly by the committer).
  • the ultimate destination repository is the repository of record hosted on hg.mozilla.org.

The following table should match today’s workflow for landing someone else’s (or their own) approved patch:

Committer Action Steps Involved Pseudo Commands
First time checkout [2]    
get personal copy  
get copy on PC hg clone hg.m.o PC
tweak for 1 step commit to hg add hg.m.o as pushable path
Work Cycle    
incorporate r+ work [3] hg patch OR pull from github or bitbucket (may not match r+ attachment) [4]
resolve conflicts  
local work hg commit
Commit Process    
  hg push hg.m.o

Changes for Bitbucket

As bitbucket uses mercurial as it’s primary DVCS, you might expect there to be not too many changes. That’s correct – there is just some additional steps on bitbucket so other users can see your work.

The changes from hg.m.o are italicized.

Committer Action Steps Involved Pseudo Commands
First time checkout [2]    
get personal copy bitbucket fork
get copy on PC hg clone bitbucket PC
tweak for 1 step commit to hg add hg.m.o as pushable path
Work Cycle    
incorporate r+ work [3] hg patch OR pull from github or bitbucket (may not match r+ attachment) [4]
resolve conflicts  
local work hg commit
Commit Process    
  hg push hg.m.o

Changes for Github

For github, there are a few additional steps that will be supported by scripts. As above, the differences from hg.m.o are italicized.

Committer Action Steps Involved Pseudo Commands
First time checkout [2]    
get personal copy github fork
get copy on PC git clone github PC
tweak for 1 step commit to hg run script to setup hg repo & git post commit hooks & magic_refs [5]
Work Cycle    
incorporate r+ work [3] apply patch manually OR pull from github or bitbucket (but may not match r+ attachment) [4]
resolve conflicts  
local work git commit
Commit Process    
  git push magic_ref [5]

Notes

[1] There is a strong case to be made for revising the commit policy to account for both new tooling (e.g. github/bitbucket) and new Mozilla projects (e.g. everything beyond Firefox/Thunderbird). A policy discussion involves a lot more groups, and is explicitly out of scope of this effort.
[2] (1, 2, 3) assumes committer keeps "clean repo" for all push to moz actions (best practice)
[3] (1, 2, 3) due to patch rot, the committer may need to make adjustments to the patch. It’s up to the authorized committer to decide if such changes require re-approval.
[4] (1, 2, 3) directly incorporating a pull request (from either github, bitbucket, or any other hosting service) is straightforward with the hg-git extension installed in Mercurial, if the other repo was based on the same original. This requires coordination among the team members.
[5] (1, 2) This is how several developers now work, what’s new is the wrapper script to make it easy to setup. See the reference section of the main project page for links to their original code (which will be the basis for this project).

… proposed changes to dev workflow

[Refer to the main page for additional context.]

With all the changes to support git, how will that affect a developer’s workflow? (The committer’s workflow will be covered in a future post.)

The idea is "not much at all", and this post will try to define "not much".

Remember: all existing ways of working with hg will continue to work! These are just going to be some additional options for folks who prefer to use github & bitbucket.

Introduction

To start things off, here are the simplified workflows that describe how things currently work with the Mercurial repositories on hg.m.o and following commit policy.

This should match today’s workflow for devlopment (not commit):

Developer Action Steps Involved Pseudo Commands
First time checkout    
get copy on PC hg clone hg.m.o PC
tweak for 1 step push to try add try as pushable path
Work Cycle    
local work [1] hg commit
push to try [1] hg push magic_try_path
Review Process    
create patch attach patch to bug
Commit Process    
nag committer  

Changes for Bitbucket

As bitbucket uses mercurial as it’s primary DVCS, you might expect there to be not too many changes. That’s correct – there is just some additional steps on bitbucket so other users can see your work.

The changes from hg.m.o are italicized.

Developer Action Steps Involved Pseudo Commands
First time checkout    
  get personal copy fork on bitbucket
get copy on PC hg clone bitbucket PC
tweak for 1 step push to try add try as pushable path
Work Cycle    
local work [1] hg commit
push to try [1] hg push magic_try_path
Review Process    
create patch attach patch to bug OR hg push bitbucket and bitbucket pull request
Commit Process    
nag committer  

Changes for Github

For github, there are a few additional steps that will be supported by scripts. As above, the differences from hg.m.o are italicized.

Developer Action Steps Involved Pseudo Commands
First time checkout    
get personal copy github fork
get copy on PC git clone github PC
tweak for 1 step push to try run script to setup hg repo & git post commit hooks & refs [2]
Work Cycle    
local work [1] git commit
push to try [1] git push magic_try_remote
Review Process    
create patch attach patch to bug OR git push github and github pull request
Commit Process    
nag committer  

Notes

[1] (1, 2, 3, 4, 5, 6) these actions affect the local repository only and may actually be ‘mq’ operations for hg and ‘developer branch’ operations for git. The workflow for commit to the master respository will be in a following post.
[2] a number of existing developers have already developed scripts to support this. These will be unified and documented as part of the roll out

…DVCS Survey Summary

[Refer to the main page for additional context.]

Summary

A long time ago (December of 2011), I sent out a brief survey on DVCS usage to Mozilla folks (and asked them to spread wider). While there were only 42 responses, there were some interesting patterns.

Disclaimer!

I am neither a statistician nor a psychometrician.

I believe you can see the raw summary at via this link. What follows are the informal inferences I drew from the results (remember the disclaimer).

Commit to git is not the issue:

  • Several level 3 committers volunteered [1] the information that commit to git would be of little benefit to their productivity, due to tooling they have.
  • There are several developer written toolkits being used to ease working with hg.m.o. They include transparent push-to-try from git.

Git is not the issue:

  • The number of "passionate for git" is about equal to "passionate for hg" (small survey size).
  • One respondent volunteered [1] that he found git very difficult to explain to new community members, others felt the opposite.

The Issue Is…

The issue is some level of official support of social coding sites:

  • 86% mentioned "pull requests" as a better review process than patch submittal.
  • A number of volunteered comments expressed some level of confusion, including which github repo to use, how to start a new project, what workflows to use to interact with hg.m.o, etc.

A Plan

These survey results, plus some technical investigation into what was possible with given tools, led to forming the (very agressive) 2012 Q1 goal captured in bug 713782. A looser summary of that follows.

End the confusion!

  • Provide an officially supported repositories for releng supported prodcuts on github & bitbucket
  • Coordinate tooling to allow git to work with hg.m.o
  • Document "how to set up git to work with releng" [2]

Prepare for disaster

  • ensure every releng supported product has an inside-the-firewall repository of record.

Notes

[1] (1, 2) When the survey allowed text input, and responses were specific beyond the question, I count that as a "volunteered" answer of greater "weight" than a checkbox response.
[2] There are several categories of Mozilla source code on github: Product code, SAAS code, and experimental code are the big ones. The scope of these recomendations is only for product code that will be built, tested, and released by the releng machinery.

Wowza! That’s a Feature!

 

tl;dr

Wowza! I found the killer feature in git – you can have your cake and eat it, too!

Every time I’ve had to move to a new VCS, there’s never been enough time available to move the complete history correctly. Linux had this problem in spades when they moved off BitKeeper onto git in a very-short-time.

The solution? Take your time to convert the history correctly (or not, you can correct later), then allow developers who want it to prepend it on their machines, without making their repo “different” from the latest one.

I only found a reference to this feature last Friday – it used to require breaking the warranty, but is not supported as a first class command. More later, but to see how (and create you’re own prepended history), see: https://github.com/hwine/git_playground/tree/master/prepend_history

Much thanks to Scott Chacon and his writeup on “git replace” at http://progit.org/2010/03/17/replace.html

As my README shows, I have some more details to learn, but for now, just WOWZA!!!!

Release “As-Is” at high level

[Refer to the main page for additional context.]

Where we are in January 2012

The purpose of this post is to present a very high level picture of the current Firefox build & release process as a set of requirements. Some of these services are provided or supported by groups outside of releng (particularly it & webdev). This diagram will be useful in understanding the impact of changes.

System Scope

Here is how I’m defining the boundaries of the current system:

  • Applies to anything & everything involved in producing artifacts that are directly installed on an end user’s device.
    • In: Firefox, Thunderbird, Seamonkey
    • Not in: services such as browser id, sync, the addons.m.o., apps, and other code that executes on Mozilla’s servers.
    • Not in: extensions themselves (they are installed into FF/TB, not directly onto the end user’s device). Yes, that is a hair I want to split. [1]
  • But only if the entire process happens "inside the Mozilla firewall". (For example, the download mirrors are not in scope.)
[1] There may be some extensions, such as the Thunderbird Lightening, that cross this line. For now, I’ll ignore those.

The Developer Services

N.B. these are only repository related services that a devloper commonly accesses.

With this setup, there is official support for:

  • pulling from Mercurial repositories hosted at mozilla.org (hg.m.o)
  • commit access from developer’s own clone of the Mercurial repository (with committer policy enforcement)
  • "try server" access from developer’s own clone of the Mercurial repository
  • continuous integration based on commits to hg.m.o
  • tracking of results of commit compilation & testing, via tbpl.m.o.
  • web viewing of repositories
  • web cross reference of many repositories on mxr.m.o

The Picture

Block diagram

The Ideal DVCS Goal

[Refer to the main page for additional context.]

Based on discussions to date, everyone seems to have similar ideas about what "supporting git for releng" means. Later posts will highlight the work needed to ensure the ideal can be achieved, and how to arrive there.

For this post, I intend to limit the viewpoint and scope to that of the developer impact. Release notions (such as "system of record") and scaling issues won’t be mentioned here. (N.B. Those concerns will be a key part of the path to verifying feasibility, but do not change the goal.)

As a reminder, I’m just talking about repositories that are used to produce products. [1]

The Ideal Future

The Ideal: In the future, each contributor will be as free to choose their DVCS of choice as they currently are to choose a text editor.

Since that’s a pretty general statement, let me limit it a bit with some caveats that I expect will apply.

  • This may only be true at a technical level, as various teams might have social conventions or alternate tooling that will limit DVCS choices.
  • Initial planning will be for products [1] that are released via the Release Engineering team. [2]
  • The current set of supported DVCS would be hg and git. By supported, I think folks roughly mean:
    • an officially supported way get the latest revision of any official branch/repository, without any extra work on the part of the developer.
    • an officially supported way to interact with existing Mozilla hosted tools, such as the try server, tbpl, etc.
    • an officially supported way to commit from the DVCS, that supports the current commit workflow (and policy).
  • There will be support of social coding sites. I think support means providing an officially supported mechanism to keep "current" instances of each official branch in the social coding sites. The goal would be to allow developers to utilize the enhanced services offered by those sites to view, clone, and share work prior to doing an official commit.

Next Steps

  1. This post will help validate that the above vision has a wider consensus. Please pass this link along and add your comments below.
  2. Keep an eye on the main page for more material.

[1] (1, 2) I’m using "products" to distinquish binaries Mozilla provides to the end users to install on their computers from "services" Mozilla hosts. Product examples include Firefox and Thunderbird. Service examples include Sync and BrowserID.
[2] The rationale for starting with existing products is that is where the majority of our developers work. Additionally, the existing continuous build & test infrastructure will place some constraints on DVCS workflows. Once those constraints are documented, other projects can make informed decisions on their usage.

Releng & Git Project Overview

This is the first in a series of posts about the "support git in releng" project. The goal of this project, as stated in bug 713782, is:

… The idea here is to see if we can support git in Mozilla’s RelEng infrastructure, to at least the same standard (or better) as we already currently support hg.

My hope is that blog posts will be a better forum for discussion than the tracking bug 713782, or a wiki page, at this stage.

These posts will highlight the various issues, so that the vague definitions above become clear, as do the intermediate steps needed to achieve completion.

Glossary

The project goal (above) has a few terms that need defining, so they mean the same thing to all of us. And the discussion will introduce some terms as well. As these come up, they will be added here.

Mozilla Products
Software delivered to the end user for installation directly on their computer.
Releng Infrastructure
The processes & machines that are used to produce Mozilla Products, from the source code repository through continuous integration and testing to the product binaries ready for the end user.

Survey Surprise

[Just a quick note, because I love being surprised. More details on the survey after I scrub it some more.]

As some of you know, I put out a survey to learn a bit about how Mozillians are using hg & git & github. (Don’t worry if you missed it, as further incarnations of it are likely to be coming as I work through the support-git project.) First, a disclaimer: this survey was not rigorous (I’m not a psychometrician) and the sample size was quite small (42 respondents), from a skewed selection of the community (mostly employees).

I expected to find quite a few folks answering John’s question with "it’s not git, it’s github", and I did. But what did surprise me is why they liked github. It had little to do with defect tracking, code review mechanisms, or other proprietary workflow enhancements. (These were cited by some, and will be in further analysis.)

It had everything to do with "social coding", and the positive impact it has on Community!

Two thirds of respondents who expressed an opinion on github cited "social coding" as a major reason. WOW! Some of the more exciting answers to "What I like best about github is" included:

How you can easily see what people are doing with forks of your project.

Discoverability. Put your [stuff] on GitHub and people will not only find it but also fix it. Simple as that.

incredibly low barrier to hosting code in a highly visible way, and collaborating with other projects.

Ease of collaboration, I get pull requests from people I don’t even know.

I had previously heard from project leaders that they perceived that social coding sites (like github & bitbucket) increased community involvement. Some of these comments suggest we may be able to collect data on the effect.

I do enjoy surprises like this – they help clear away the cobwebs of "stuff that ain’t so" from my brain. This point may well help me focus the abstract "bring git to parity" into "remove impediments to using social coding sites".