This planet aggregates posts related to version control and
distro packaging.
Please refrain from using it as a discussion
forum.
You can add your own feed to the list of feeds.
The other day, Romain shared his concerns about using Git for team-maintained packaging. His comment system is currently broken, so I wrote an e-mail reply, which I would like to share.
I agree with Romain that the design decision to not support subtree checkouts like SVN is not without problems. As opposed to a single SVN repo with components in subdirectories that you can individually check out, you might end up with a hundred Git repos, and the same change to all then requires one to iterate all 100.
I’d like to make the distinction between trivial changes (e.g.
s/© 2008/&-2009/g) and those that might not be
(e.g. Standards-Version, or something even more
elaborate).
In case of the former, there’s no question, it can be painful to operate across a hundred repos. Tools like mr make that a bit easier, but it’s still far from optimal.
The latter, however — updating Standards-Version
and adding the appropriate changelog entry — is not really
comparable. Neither would be e.g. changing a file location in 100
different repos. In those cases, every single package needs manual
intervention, and if only for quality reasons and testing. In this
sense, I actually think that a single SVN checkout with all the
subtrees and the possibility to easily commit the result of a
recursive action is counter-productive.
On the other hand, I don’t say that I am pleased with the workflow Git (or any other DVCS for that matter) imposes. It’s sometimes quite painful, as Romain says. We are missing higher-level tools that allow for easier and more intuitive bulk operations. I think that they should be implemented outside of the VCS-tool though, true to the Unix principles. SVN integrates it all into a monolithic piece of software, and that often isn’t ideal either (think size and slowness, or backup weight, or chance of corruption, or granular access control, or the impossibility to properly track files across subtrees).
mr is a step in the right direction, and we need
more tools along those lines. First, however, I think that people
need to figure out how exactly to use DVCS for packaging, such that
there is any chance of consolidating workflows across a larger
number of packages; if everyone does it their own, slightly unique
way, then that goal is inifinitely far. This is the reason I
started vcs-pkg.org, and even
though we’re still far from anywhere, I am quite pleased with what
we’ve done so far.
If you’re at Debcamp or DebConf, maybe you could join the discussion.
Romain also mentioned that distributed VCS don’t allow for the same sort of centralisation as SVN does. I disagree: you can use Git in exactly that way, as a centralised repo from which packages are built. The nice advantage over SVN (one which svk tried to close) is the ability for everyone to easily branch/fork, or work offline.
Once you start down that path, it somehow inherently becomes everyone’s own responsibility to ensure that one’s changes end up in the central repository (where commit hooks might verify the build-ability, ensure that the test suite still passes, or run simple format/consistency checks).
This sort of workflow is very different from the one with a self-appointed benevolent dictator at the top, who (like Linus, or Junio for Git) sometimes forget to include patches due to overflooding. The question is really: Given that you need some sort of centralised release coordination, do you want a human or a repo to be the central entity (and single point of failure)?
I really prefer the repo, since that places the sole responsibility on the leafs, on the contributors, who need to see their code through all the way.
It’s a whole lot more rewarding to commit/push, get a rejection, pull, merge, commit/push, and be done, rather than to send a patch to upstream, wait, reping, notice that it’s not in in the new release, ask, ping, change, reping, get angry, ping, hope, wait, ping, wonder why the heck you are still doing this, write angry email but don’t send it, reping, ask, and finally notice that it’s been accepted after all.
NP: Deep Purple: Made in Japan
Posted Fri 19 Jun 2009 06:28:01 UTCThis is a continuation from before. I am digressing a little in this post. One of the things I want to get out of this exercise is to learn more about Ontologies and Ontology editors, and on the principle that you can never learn something unless you build something with it (aka bone knowledge), so this is gathering my thoughts to get started on creating an Ontology for package building. Perhaps this has been done before, and better, but I’ll probably learn more trying to create my own.
Also, I am playing around with code, an odd melange of my
package building porcelain, and gitpkg, and other
ideas bruited on IRC, and I don’t want to blog about
something that would be embarrassing in the long run if some of the
concepts I have milling around turn out to not meet the challenge
of first contact with reality.
I want to create a ontology related to packaging software. It should be general enough to cater to the needs any packaging effort in a distribution agnostic and version control agnostic manner. It should enable us to talk about packaging schemes and mechanisms, compare different methods, and perhaps to work towards a common interchange mechanism good enough for people to share the efforts spent in packaging software.
The ontology should be able to describe common practices in packaging, concepts of upstream sources, versioning, commits, package versions, and other meta-data related to packages.
I am doing this ontology primarily for myself, but I hope this might be useful for other folks involved in packaging software.
So, here follow a set of concepts related to packaging software, people who like pretty pictures can click on the thumbnail on the right:
- software is a general term used to describe a collection of computer programs, procedures and documentation that perform some tasks on a computer system.
- software is what we are trying to package
- software has names
- software may exist as
- source code
- executable code
- packaged code
- source code is any collection of statements or declarations written in some human-readable computer programming language.
- source code is usually held in one or more text files (blobs).
- A large collection of source code files may be organized into a directory tree, in which case it may also be known as a source tree.
- The source code may be converted into an executable format by a compiler, or executed on the fly from the human readable form with the aid of an interpreter.
- executable format is the form software must be in in order to be run. Running means to cause a computer “to perform indicated tasks according to encoded instructions.”
- software source code has one or more lines of
development. Some Common specific lines of development
for the software to be packaged are:
- upstream line of development
- feature branch is a line of development related to a new feature under development. Often the goal is to merge the feature branches into the upstream line of development
- usually, all feature branches are merged into the integration branch, and the package is created from the integration branch.
- integration branch is the line of development of software that is to be packaged
- some software lines of development have releases
- releases have release dates
- some releases have release versions
- source code may be stored in a version control repository, and maintain history.
- Trees are a collection of blobs and other trees (directories and sub-directories). A tree object describes the state of a directory hierarchy at a particular given time.
- Blobs are simply chunks of binary data - they are the contents of files.
- a tree can be converted into an archive and back
- In git, directories are represented by tree object. They refer to blobs that have the contents of files (file name, access mode, etc is all stored in the tree), and to other trees for sub-directories.
- Commits (or “changesets”) mark points in the history of a line of development, and references to parent commits.
- A commit refers to a tree that represents the state of the files at the time of the commit.
- HEAD is the most recent commit in a line of development or branch.
- A working directory is a directory that corresponds, but might not be identical, to a commit in the version control repository
- Commits from the version control system can be checked out into the working directory
- uncommitted changes are changes in the working directory that make it different from the corresponding commit. Some call the working directory to be in a “dirty” state.
- uncommited changes be checked in into the version control system, creating a new commit
- The working directory may contain a ignore file
- ignore file contains the names of files in the working directory that should be “ignored” by the version control system.
- In git, a commit may also contains references to
parent commits.
- If there is more than one parent commit, then the commit is a merge
- If there are no parent commits, it is an initial commit
- references, or heads, or branches, are movable references to a commit. On a fresh commit, the head or branch reference is moved to the new commit.
- lines of development are usually stored as a branch in the version control repository.
- A new branch may be created by branching from an existing branch
- a patch is a file that contains difference listings between two trees.
- A patch file can be used to transform (patch) one tree into another (tree).
- A quilt series is a method of representing an integration branch as a collection of a series of patches. These patches can be applied in sequence to the upstream branch to produce the integration branch.
- A tag is a named reference to a specific commit, and is not normally moved to point to a different commit.
- A package is an archive format of software created to be installed by a package management system or a self-sufficient installer, derived by transforming a tree associated with an integration branch.
- packages have package names
- package names are related to upstream software names
- packages have package versions
- package versions may have
- an upstream version component
- a distribution or packaging specific component
- package versions are related to upstream software versions
- helper packages provide libraries and other support facilities to help compile an integration branch ultimately yielding a package
This is a continuation from before.
Before I go plunging into writing code for a generic
vcs-pkg implementation, I wanted to take a close look
at my current, working, non-generic implementation: making sure
that the generic implementation can support at least this one
concrete work-flow will keep me grounded.
One of the features of my home grown porcelain for building
package has been that I use a fixed layout for all the packages I
maintain. There is a top level directory for all working trees.
Each package gets a sub-directory under this working area. And in
each package sub-directory, are the upstream versions, the checked
out VCS working directory, and anything else package related. With
this layout, knowing the package name is enough to locate the
working directory. This enable me to, for example, hack away at a
package in Emacs, and when done, go to any open terminal window,
and say stage_release kernel-package or
tag_releases ucf without needing to know what the
current directory is (usually, the packages working directory is
several levels deep —
/usr/local/git/debian/make-dfsg/make-dfsg-3.91, for
instance.
However, this is less palatable for a generic tool – imposing a
directory structure layout is pretty heavy. And I guess I can
always create a function called cdwd, or something, to
take away the tedium of typing out long cd
commands.
Anyway, looking at my code, there is the information that the scripts seem to need in order to do their work.
- Staging area. This is where software to be built is
exported (and this area is visible from my build virtual machine).
- User specified (configuration)
- Working Area. This is the location where all my
packaging work happens. Each package I work on has a sub-directory
in here, and the working directories for each package live in the
package sub-directory. Note: Should not be needed.
- User specified.
- Working directory. This is the checked out tree from the
VCS, and this is the place where we get the source tree from which
the package can be built.
- Since we know the location of the working are, if the package
name is known, we can just look in the package’s sub-directory in
the working area.
- For
rpmbased sources, look for thespecfile - For Debian sources, locate
debian/rules
- For
- If package name is not known, look for
specordebian/rulesin the current directory, and parse either thespecfile ordebian/changelog. - If in a VCS directory, look for the base of the tree
tla tree-rootbzr infogit rev-parse --show cduphg root- You have to climb the tree for subversion
- If you are in a
debiandirectory, andchangelogandrulesfiles exist
Then, look for the
specfile ordebian/rulesin the base directory - Since we know the location of the working are, if the package
name is known, we can just look in the package’s sub-directory in
the working area.
- package name
- User specified, on the command line
- If in the working directory of the package, can be parsed from
the
specorchangelogfiles.
- upstream tar archive
- Usually located in the parent directory of the working directory (the package specific sub-directory of the working area)
- If
pristine-taris in use, given two trees (branches, commits. etc), namely:- a tree for upstream (default: the branch ~upstream~)
- a tree for the delta (default: the branch ~pristine-tar~)
The tree can be generated
- Given an upstream tree (default: the branch ~upstream~),
a tar archive can be generated, but is likely to be not bit-for-bit
identical to the original
tararchive.
So, if I do away with the whole working area layout convention, this can be reduced to just requiring the user to:
- Specify Staging area
- Call the script in the working directory
(
dpkg-buildpackageimposes this too). - Either use
pristine-taror have the upstreamtararchive in the parent directory of the working directory
Hmm. One user specified directory, where the results are dumped.
I can live with that. However, gitpkg has a different
concept: it works purely on the git objects, you feed it upto three
tree objects, the first being the tree with sources to build, and
the second and third trees being looked at only if the upstream tar
archive can not be located, and passes the trees to pristine tar to
re-construct the upstram tar. The package name and version are
constructed after the source-tar archive is extracted
to the staging area. I like the minimality of this.
This is continued here.
Posted Sun 19 Apr 2009 04:07:32 UTCI have been involved in vcs-pkg.org since around
the time it started, a couple of years ago. The discussion has been
interesting, and I learned a lot about the benefits and
disadvantages of serializing patches (and collecting integration
deltas in the feature branches and the specific ordering of
the feature branches) and maintaining integration branches (where
the integration deltas are collected purely in the integration
branch, but might tend to get lost in the history, and a fresh
integration branch having to re-invent the integration deltas
afresh).
However, one of the things we have been lax about is getting
down to brass tacks and getting around to being able to create
generic packaging tools (though for the folks on the serializing
patches side of the debate we have the excellent quilt
and the topgit packages).
I have recently mostly automated my git based work-flow, and
have built fancy porcelain around my git repository setup. During
IRC discussion, the gitpkg script came up. This seems
almost usable, apart from not having any built-in
pristine-tar support, and also not supporting
git submodules, which make is less useful an
alternative than my current porcelain.
But it seems to me that we are pretty close to being able to create a distribution, layout, and patch handler agnostic script that builds distribution packages directly from version control, as long as we take care not to bind people into distributions or tool specific straitjackets. To these ends, I wanted to see what are the tasks that we want a package building script to perform. Here is what I came up with.
- Provide a copy of one or more upstream source tar-balls in the staging area where the package will be built. This staging area may or may not be the working directory checked out from the underlying VCS; my experience has been that most tools of the ilk have a temporary staging directory of some kind.
- Provide a directory tree of the sources from which the package is to be built in the staging area
- Run one or more commands or shell scripts in the staging area to create the package. These series of commands might be very complex, creating and running virtual machines, chroot jails, satisfying build dependencies, using copy-on-write mechanisms, running unit tests and lintian/puiparts checks on the results. But the building a package script may just punt on these scripts to a user specified hook.
The first and third steps above are pretty straight forward, and fairly uncontroversial.
The upstream sources may be handled by one of these three alternatives:
- compressed tar archives of the upstream sources are available, and may be copied.
- There is a pristine-tar VCS branch, which in conjunction with the upstream branch, may be used to reproduce the upstream tr archive
- Export and create an archive from the upstream branch, which may not have the same checksum as the original branch
The command to run may be supplied by the user in a
configuration file or option, and may default based on the native
distribution, to dpkg-buildpackage or
rpm. There are a number of already mature mechanisms
to take a source directory and upstream tar archive and produce
packages from that point, and the wheel need not be
re-invented.
So the hardest part of the task is to present, in the staging area, for further processing, a directory tree of the source package, ready for the distribution specific build commands. This part of the solution is likely to be VCS specific.
This post is getting long, so I’ll defer presenting my evolving
implementation of a generic vcs-pkg tool,
git flavour, to the next blog post.
This is continued here.
Posted Thu 16 Apr 2009 21:52:36 UTCThere are a lot of little git scripts and tools being written by a lot of people. Including a lot of tools written by people I have a lot of respect for. And yet, they are mostly useless for me. Take git-pkg. Can’t use it. Does not work with git submodules. Then there is our nice, new, shiny, incredibly bodacious “3.0 (git)” source format. Again, useless: does not cater to submodules.
I like submodules. They are nice. They allow for projects to take upstream sources, add Debian packaging instructions, and put them into git. They allow you to stitch together disparate projects, with different authors, and different release schedules and goals, into a coherent, integrated, software project.
Yes, I use git submodules for my Debian packaging. I think it is
conceptually and practically the correct solution. Why submodules?
Well, one of the first things I discovered was that most of the
packaging for my packages was very similar – but not identical.
Unfortunately, the previous incarnation of my packages with a
monolithic rules file in each ./debian/ directory, it
was easy for the rules files in packages to get out of sync – and
there was no easy way to merge changes in the common portions an
any sane automated fashion. The ./debian/ directories
for all my packages package that they are instrumental in
packaging. So, since I make the ./debian/ directories
branches of the same project, it is far easier to package a new
package, or to roll out a new feature when policy changes – the
same commit can be applied across all the branches, and thus all my
source packages, easily. With a separate debian-dir
project, I can separate the management of the packaging rules from
the package code itself.
Also, I have abstracted out the really common bits across all my
packages into a ./debian.common directory, which is
yet another project, and included in as a submodule in all the
packages – so there is a central place to change the common bits,
without having to duplicate my efforts 30-odd times.
Now people are complaining since they have no idea how to clone
my package repositories, since apparently no one actually pays
attention to a file called .gitmodules, and even when
they do, they, and the tools they use, have no clue what to do with
it. I am tired of sending emails with one off-cluebats, and I am
building my own porcelain around something I hope to present as a
generic vcs-pkg implementation soon. The firs step is
a wrapper around git-clone, that understands git
submodules.
So,
here is the browsable code (there is a link in there to the
downloadable sources too). Complete with a built in man page. Takes
the same arguments as git-clone, but with fewer
options. Have fun.
I have been meaning to write this up for a long time now, since
I
vaguely made a promise to do so last Debconf. I
have also been wondering about the inefficiencies in my work-flow,
but I kept postponing my analysis since there were still large gaps
in my packaging automation since I moved off Arch as my SCM of
choice. However, recently I have taken a sabbatical from Debian, so
I’ve had time to complete bits and pieces of my package building
framework, enough so that I could no longer justify putting off the
analysis. I tried writing it up, but the result confused even me;
so I instead recorded every shell command during a recent series of
packaging tasks, and converted that into a nice, detailed, activity
diagram that you see over here. This is as efficient a work-flow as
I have been able to come up with.
Along with a git commit hook script, that parses the commit log and adds pending tags to bugs closed in the commit, the figure above represents my complete work-flow – down to the details of every cd command I executed. I think there are too many steps still.
Feedback and commentary would be appreciated, as well as any suggestions to improve efficiency.
Posted Wed 25 Feb 2009 06:55:03 UTCNew features for debcheckout, … now with TopGit support!
Today I’ve spent some time hacking on debcheckout,
which for weird reasons happens to be at the bottom of a stack of
chained things that I need to do in the forthcoming days. Also, I
had neglected debcheckout for a while, and the other
devscripts
folks where ready to shout at me because of that
.
Well, it has been fun, and beside having fixed all the
outstanding bugs, debcheckout has grown some cute new
features:
-
the ability to query a VCS repository (using
-d/--details) for details, at the very minimum it will parse for you theVcs-*fields, but it is expected that in the future will be able to be more telling, and it is already so for TopGit … -
… and speaking about that, debcheckout now has support for TopGit. In two ways: the first one is using
-d, which will tell you whether a GIT repo is TopGit-enabled or not and, if it is so, also the list of available top-bases. For instance:zack@usha:~$ debcheckout -d topgit type git url git://git.debian.org/git/collab-maint/topgit.git top-bases debian/locations topgit yesor even more brutally
zack@usha:~$ debcheckout -d git://git.debian.org/git/pkg-ocaml-maint/packages/ocaml-batteries.git type git url git://git.debian.org/git/pkg-ocaml-maint/packages/ocaml-batteries.git top-bases features/flexi-build topgit yesThe other way in which TopGit is supported, is that when checking out a GIT repo which is detected to be TopGit’s as well, population of top-bases (i.e., TopGit local initialization) is automatically performed.
… yes, a while ago I’ve fallen in love with TopGit, is it that evident?

-
it is now possible to specify custom rules for authenticated mode, this way you can use
-aalso on packages not hosted on well known Debian/Ubuntu VCS servers -
finally, you can now ask
debcheckoutto automatically enable remote tracking of remote GIT branches, which is usually what a maintainer wants to do when doing a fresh checkout
Enjoy!
(ah, of course all this is not uploaded yet, but you can
grab a
preview from devscripts’ VCS or, better, doing
debcheckout devscripts which is soooo bootstrapy.
SCNR.)
During LCA2008, Ed Borland of Melbourne-based Triple R FM Byte Into It show took me aside for an interview and asked some good questions about Debian and my work on cross-distro collaboration. The interview was recorded and is now available as Ogg Vorbis file from the 14 May 2008 issue of Byte Into It.
I am looking forward to any feedback.
Thanks to Ed and Phil Wales for their time and help.
NP: Mono: You Are There
Posted Tue 20 May 2008 16:31:28 UTCWe were given another chance to meet in Extremadura to discuss vcs-pkg issues, after the first opportunity was too short notice.
Currently, the tentatively scheduled dates are 2-7 September 2008. You can get the details from the wiki page. If you’re interested, please reserve those dates and add yourself to the list of participants.
NP: Hooverphonic: The Magnificent Tree
Posted Sun 18 May 2008 13:16:57 UTC“Are you rebasing or merging?” seems to be the 64 thousand dollar question over in vcs-pkg discussions. Various people have offered their preferences, and indeed, several case studies of work flows have been presented, what is lacking is an analysis of the work-flow; an exploration of which methodology has advantages, and whether there are scenarios in which the other work flow would have been better.
Oh, what are all these work flows about, you ask? Most of the issues with packaging software for distributions have a few things in common: there is a mainline or upstream source of development. There are zero or more independent lines of development or ongoing bug fixes that are to be managed. And then there is the tree from which the distribution package is to be built. All this talk about packaging software work flows is how to best manage asynchronous development upstream and in the independent lines of development, and how to create a coherent, debuggable, integrated tree from which to build the distributions package.
The rebasing question goes to the heart of how to handle the independent lines of development using git; since these lines of development are based off the main line of development, and must be periodically synchronized. Here is a first look at a couple of important factors that will have bearing on that question, and packaging software for a distribution using Git in general. This is heavily geared towards git (nothing else does rebases so easily, I think), but some of the concepts should be generic. I am not considering the stacked set of quilt patches source controlled with Git in this article (I don’t understand that model well enough to do an analysis)
As a teaser, there is a third answer: neither. You can just add an independent line of development, and just let it sit: don’t rebase, and don’t merge; and in some circumstances that is a winning strategy.
Posted Fri 04 Apr 2008 19:48:10 UTCI have been using Arch to package my Debian packages since 2003; which means that Arch has had a good long run as my SCM of choice. I have been using CVS for a few years before I moved to arch, and the migration took me about six months, since it involved a while new philosophy of packaging; I am hoping that migrating to git would not involve such a major paradigm shift, and thus be less disruptive and time consuming. What follows is a narrative of my efforts to get educated about Git.
This article is meant to be an annotated, selective, organized set of links to information about Git. How does it differ from the myriad of other link collections about Git proliferating on the web? Well, the value add is in the annotations and the organization: while not quite a narrative of my exploration, this is an idealized version of what I think my discovery process should have been, to be most effective. Staging the information is important; google finds one lots of information that is incomprehensible to someone just coming to Git. This selection of links is actually selective; I have included only pointers to resources that fed me information at the level that I could handle at that stage, and I have eliminated links to information that was not new at that point. I have tried to select the best (in terms of information and clarity) of breed for each kind of information source I have come across so far.
There is a caveat: while still a beginner, though I am able to better judge now what is confusing to a beginner than I shall be when I have become more familiar with the system, I am still enough of a novice not to trust my judgement on what really is best practice. I can fix the latter as I gain experience, but then I’ll need to be careful not to overload on complexity too early in the learning curve.
On the down side, this selection is subjective, and probably shall be even in the long term: I include what appealed to me, and will probably miss loads of pointers to information that I have not yet come across. However, I hope this will make it easier for other people to reach the same goal: use git for their version control needs.
Have fun.
Posted Wed 02 Apr 2008 04:17:27 UTCThis slightly evil hack to bzr-svn allows using bzr-builddeb as a drop-in replacement for svn-buildpackage, making it recognize the “mergeWithUpstream” property svn-buildpackage uses.
cp: Jeff Healey - Mess O’ Blues
Posted Wed 26 Mar 2008 15:21:20 UTCMoving a git layout from debian-only to debian+upstream
… and live happily with git-buildpackage
Let’s say you have a git repository you have used thus far to maintain only the Debian part of some package (i.e. no upstream sources on sight).
OK, you’re right, that would be quite an uncommon scenario for a git repository. In fact the truth is that you are in such a situation because the git repo was obtained converting from a subversion repo which was using the mergeWithUpstream stuff of svn-buildpackage.
Now that you have git’s space efficiency you want to change
this, import upstream sources, and possibly adhere to a branch
layout which is compatible with git-buildpackage
(which is very simple in its minimal requirements: an
upstream
branch containing just upstream sources and a
master
branch containing a debianized source tree).
Out of the box git-import-orig won’t work since don’t have the upstream branch. Creating it branching from master (or somewhere else) won’t work either unfortunately. Indeed after branching you will need to remove the debian/ dir from upstream and when you will merge upstream with master you will be merging the removal as well, bad idea.
By myself I’ve found some horrifying solutions involving using git-filter-branch on the upstream branch merged from master and then squashing all the history of that branch with git-rebase, but they are so horrible that I won’t mention them here more than this.
The nice solution came from a hint by madduck on the creation of branches without ancestry. Here is the complete recipe (assumed to be run from the master branch of the repo, before creating any upstream branch):
$ git-symbolic-ref HEAD refs/heads/upstream
$ git rm --cached -r .
$ git commit --allow-empty -m 'initial upstream branch'
$ git checkout -f master
$ git merge upstream
$ git-import-orig --no-dch ../foo_1.2.3.orig.tar.gz
In short: you should create an upstream branch without any ancestry, in it you should create an empty commit, then merge it (vacuously) in master, and now you’re ready to call git-import-orig to the rescue.
Feature request on git-buildpackage to support this out of the box is on the go: bug #471560.
Posted Wed 19 Mar 2008 15:51:06 UTCIf you are interested in using version control for distro packaging, you
- might like http://vcs-pkg.org.
- might want to join
#vcs-pkgonirc.oftc.net. - should be signed up to our mailing list.
If you read the mailing list, you know about the upcoming Extremadura meeting 2-6 April 2008.
If this is news to you, well, it isn’t anymore.
If you think you should be in Extremadura when this party takes
place, don’t hesitate and reply. The message ID is
20080311193428.GA25745@piper.oerlikon.madduck.net.
Update: mostly due to the short notice, I had to call off the meeting. I will make a run for the next slot and hopefully announce it a lot earlier.
Posted Tue 11 Mar 2008 23:27:57 UTCI speculate that most of what we do for Debian squares with what others do for their respective distro. Thus, it should be possible to identify a conceptual workflow applicable to all distros, consolidate individual workflows on a per-package basis, and profit from each other. Jonathan let me have the after-afternoon-coffee slot of the Distro Summit for an impromptu discussion on the various workflows used by distros for packaging.
The discussion round was very short-notice and despite the announcement sent to the conference mailing list, only ten people showed up: two people familiar with Fedora, and (“versus”) eight Debianites.
Regardless, I think the discussion was success- and fruitful. We were able to identify a one-to-one mapping between the Fedora and Debian workflows, even though we use different techniques:
- both distros separate original software (“orig tarball”) from modifications made to fit the software in with the rest of the distro.
- Fedora keeps the
.specfile, which references the original tarball, alongside any patch files in a per-release directory in their CVS tree, e.g./mdadm/fedora8and/glibc/rawhide. To obtain a source tree, the contributor checks out the CVS subtree, downloads the tarball (from their own cache so as to not be at the mercy of upstream) according to the.specfile, and merges the two. There is a tool to automate this, obviously. This process is regularly executed to produce “source RPMs”. - Debian keeps the original tarball next to a
diff.gzfile on the mirrors, along with adscfile which refers to them both. Tools likedgettake the URL to thedscfile to download all three, then invokedpkg-sourceto unpack the tarball and apply the diff. Individual patch files are either stored in./debian/patches/(and applied by the diff), or they don’t exist (meaning that all modifications are concatenated in thediff.gzfile.
Many Debian package maintainers use version control systems to
maintain the ./debian directory, and if patch files
are stored in ./debian/patches/, then Debian and
Fedora both store patch files in a version control repository,
which seems awful.
Just as I am only one of many who are experimenting with VCS-based workflows for Debian packaging, the Fedora people are also considering the use of version control for packaging. Unlike Fedora, who seem to try to standardise on bzr, I try to cater for the plethora of version control systems in use in Debian, anticipating the impossibility of standardising/converging on a single tool across the entire project.
Update: Toshio Kuratomi wrote in to tell that Fedora has not settled on bzr: “the things that have been tried have spanned most of the current major vcs’s (darcs being the one exception due to it’s not meeting our requirements for keeping history intact.)”
It seems that our two projects are both at the start of a new phase in packaging, a “paradigm shift”. What better time could there be for us to listen to each other and come up with a workflow that works for both projects?
My suggestion currently centres around a common repository for each package across all (participating) distros, and feature branches. Specifically, given an upstream source tree, modifications made during packaging for a given distro fall into four categories:
- upstream changes, such as bug fixes in the original code, or simple things like manpage typos.
- (Linux) distro stuff, such as
init.dscripts or Linux-ifications, which upstream doesn’t care about or doesn’t want. - .deb/.rpm-specific changes, like the
./debiandirectory or the.specfile. - distro-specific modifications, like policy compliance and the like.
Given a version control system with sufficient branching
support, I imagine having different namespaces for branches:
upstream-patches/*, distro/*,
rpm/* or debian/*. Now, when building the
Debian package, I’d apply upstream-patches/*,
distro/*, deb/* and debian/*
in order, while my colleague from the Fedora project would apply
upstream-patches/*, distro/*,
rpm/* and fedora/*, before calling the
build tools and uploading the package.
There are surely problems to be overcome. Pascal Hakim mentioned patch dependencies, and I can’t necessarily say with a clear conscience that my workflow isn’t too complicated to be unleashed into the public… yet. But if we find a conceptual workflow applicable to more than one distro, it should be possible to implement a higher-level tool to implement it.
Also, the above is basically patch maintenance, not the entire workflow. Bug tracking system integration is going to play a role, as well as other aspects of daily distro packaging. I’ll leave those for future time.
For me, this is the start of a potentially fruitful cooperation and I hope that interested parties from other distros jump on. For now, I suggest my mailing list for discussion. You can also find some links on the Debian wiki.
Posted Tue 29 Jan 2008 06:30:14 UTCPreviously, I demonstrated a Debian packaging workflow using Git and I mentioned the possibility of a follow-up post; well, here it is: you want to use my workflow (or one that’s related) for a package that is currently maintained with Subversion on svn.debian.org and you’d like to keep the history during the conversion.
Make sure to read the previous post before this one.
I am again using the example of mdadm since its
Git
packaging repository is in a state of shambles and I want to
restart to get it right and import the history from the
previous Subversion
repository. What better way than to write a blog post as I do
so? Well, plenty actually. This kind of post isn’t
really made for a blog, and I have started work on setting up
ikiwiki on madduck.net, but it’s not yet ready, so
I’ll stick with the blog for now. I will make sure that links don’t
break as I move content over, so feel free to bookmark this…
Importing the package into Git
Thanks to git-svn, the initial step of getting your package imported into Git is a breeze:
$ git-svn clone --stdlayout --no-metadata \
svn+ssh://svn.debian.org/svn/pkg-mdadm/mdadm mdadm
Sit back and enjoy. If that command exits prematurely with an error such as the following:
Malformed network data: Malformed network data at /usr/local/bin/git-svn line 1029
then you should upgrade to a newer Git version, or have a look
here. If
your Git does not know --stdlayout then upgrade as
well (or use -T trunk -t tags -b branches
instead).
Sam Vilain notes that it is important to “get the
attribution right with the final SVN import - getting the authors
map right. I didn’t do that. If you look at the repository
resulting from the above command, you’ll notice strange commit
authors, such as madduck@some-unique-uuid-from-svn.
git-svn allows you to map these to real names with
real email addresses, which ensures that the attributions are good
for the whole world to see.
When done, switch to the repository and run git-branch
-r. As you’ll see, git-svn imported all SVN
branches and tags as remote branches. You need those if you want to
bidirectionally track the Subversion repository, but we are
converting, as you may have guessed by the
--no-metadata switch above.
Therefore, we resort to the Dinosaur method
of converting branches to tags, which I’ll simplify for
mdadm. We also just delete all remote branches after
tagging, since mdadm never used branches in the
SVN repository. Your mileage may vary.
git branch -r | sed -rne 's, *tags/([^@]+)$,\1,p' | while read tag; do
echo "git tag debian/$tag tags/${tag}^; git branch -r -d tags/$tag"
done
git branch -r | while read tag; do
echo "git branch -r -d $tag"
done
If that seems to work alright, then you can execute the commands.
Sam Vilain (again) hints me at
git-pack-refs and then to edit
.git/packed-refs with an editor. This certainly leaves
more room for errors but might be significantly
faster.
Cleaning up the SVN references
Even though we passed --no-metadata to
git-svn, it did leave some traces in
.git/, which we can now safely remove:
$ git config --remove-section svn-remote.svn
$ rm -r .git/svn
Setting things straight
You can skip this section unless you want to know a bit about how to fix up stuff with Git.
There was actually some nasty tagging errors leading up to the
2.5.6-9 release for etch and I could
never be bothered to fix those in SVN, but now I can
(I love Git!):
$ git tag -d debian/2.5.6-10 # never existed
$ git tag -f debian/2.5.6-8 2.5.6-8~2 # mistagged
$ git checkout -b maint/etch 2.5.6-8 # this is when we diverged
$ git apply < /tmp/mdadm-2.5.6-8..2.5.6-9.diff
$ git add debian/po/gl.po debian/po/pt.po debian/changelog
$ git commit -s
$ git tag debian/2.5.6-9
Now that that’s fixed, there is one other thing to worry about,
namely the very last commit to SVN, which obsoletes
the repository and points to the Git repository. But that’s not all
of it. I was also silly enough to include a fix in the
same commit. Let’s see what Git can do. Since the process
of obsoletion involves all but adding a file, we can simply
--amend the last commit and provide a new log
message:
$ git checkout master
$ git rm OBSOLETE debian/OBSOLETE
$ git commit --amend
Now the repository is in an acceptable state.
Making ends meet
The pkg-mdadm
effort on svn.debian.org only maintained the
./debian/ directory, separate from the upstream code,
and boy was that a bad idea. Just to give one example: think about
what’s involved in preparing a Debian-specific patch against the
upstream code… this has to end, and we can make it end right here;
let’s import upstream’s code (again not using his ADSL line, but
the upstream branch of the pkg-mdadm Git
repository; see the previous
post for details):
$ git remote add upstream-repo git://git.debian.org/git/pkg-mdadm/mdadm
$ git config remote.upstream-repo.fetch \
+refs/heads/upstream:refs/remotes/upstream-repo/upstream
$ git fetch upstream-repo
$ git checkout -b upstream upstream-repo/master
Now we have two unconnected ancestries in our repository, and
it’s time to join them together. The most logical way seems to be
to use the last upstream tag for which we have a Debian tag:
2.6.2.
For this, we branch off the corresponding Debian tag
(2.6.2-1) and merge upstream’s 2.6.2 tag
into the new branch. This will be a temporary branch Then, we
rebase (remember, nothing has been published yet) the master branch
on top of this temporary branch, before we end that branch’s short
life. The Debian tag stays where it is since it describes the state
of the repository at time of the release of
2.6.2-1.
$ git checkout -b tmp/join debian/2.6.2-1
$ git merge mdadm-2.6.2
$ git rebase tmp/join master
$ git branch -d tmp/join
It just so happens that the head of the SVN
repository, which is identical to the tip of our
master branch, corresponds to Debian release
2.6.2-2, so we tag it:
$ git tag debian/2.6.2-2
We are now also “born” in the sense that maintenance in Git has started. Let’s mark that point in history. There is no real reason I can foresee for this yet, but nonetheless:
$ git tag -s git-birth
Turning dpatch files into feature branches
We want to turn dpatch files into feature branches
and we somehow make it “proper”. We could branch, apply the patch,
delete the patch file, checkout master and delete the
patch file there as well, but that appears “improper” to me at
least; so instead, we’ll cherry-pick:
$ git checkout -b deb/conffile-location
$ debian/patches/01-mdadm.conf-location.dpatch -apply
$ git rm debian/patches/01-mdadm.conf-location.dpatch
$ git commit -s
$ git commit -s $(git ls-files --others --modified)
I should quickly intervene to make sure you are following. I am
making use of Git’s index here. Applying the patch makes the
changes in the working tree, but we did not tell Git that we want
those to be part of the commit just yet. Instead, we delete the
dpatch with git-rm, which automatically
registers the deletion with the index. Thus, the first
git-commit creates a commit which deletes the
dpatch, while the second git-commit
creates a commit with all the changes from the dpatch,
using git-ls-files to identify new and modified
files.
But for now, let’s move on. We have two commits in the
deb/conffile-location branch, and one of those is
relevant to the master branch, we cherry-pick it:
$ git cherry-pick deb/conffile-location^
If you’re confused, let me explain: our goal is to have a number
of feature branches, of which master is the one in
which most of ./debian/ is maintained. All the
branches later come together in the long-living build
branch, so deb/conffile-location will never be merged
back into master. However, once we applied the
dpatch to the feature branch, we can delete it from
there and the master branch. By cherry-picking, we
“import” the deletion to the master branch.
I repeat the same procedure for deb/docs, merging
all the documentation-related dpatches, but I’ll spare
you the details.
… and then Git let me down
In the next step, I found I had misunderstood Git merging: I
thought Git was smart, but Linus had his reasons for calling Git
the “stupid content tracker” (more on that later). Read on as I am
obsoleting dpatch files that upstream had merged:
99-*-FIX.dpatch.
For consistency, I wanted to cherry-pick each of the appropriate
upstream commits into the master branch along with
deleting the corresponding dpatch file. Here is one
example: 99-monitor-6+10-FIX.dpatch was obsoleted by
upstream’s commit 66f8bbb; the -x records
the original commit ID in the log:
$ git cherry-pick -x 66f8bbb
$ git rm debian/patches/99-monitor-6+10-FIX.dpatch
$ git commit -s -m"remove dpatch obsoleted by $(git rev-parse --short HEAD)"
I repeated the procedure for the other dpatch
files, removed the dpatch infrastructure, and then
went on to merge it all into build to build the
package.
The build branch is a long-living branch off
upstream, but which upstream? I’ll
fast-forward you past a
segfault problem with mdadm, which upstream
(thought to have) resolved with commit 23dc1ae after
2.6.3, but he had not yet released 2.6.4.
Looking at the commits between 23dc1ae and upstream’s
HEAD at the time, I decided to include them all and
snapshot 4450e59:
$ git fetch upstream-repo
$ git checkout upstream
$ git merge upstream-repo
$ git tag mdadm-2.6.3+200709292116+4450e59 4450e59
$ git checkout master
$ git merge --no-commit mdadm-2.6.3+200709292116+4450e59
$ dch -v mdadm-2.6.3+200709292116+4450e59-1
$ git add debian/changelog
$ git commit -s
And then I called poor-mans-gitbuild, which merges
master and then deb/* into
build. Here is when stuff blew up.
I’ll make a long story short (read my description of the problem and Linus’ answer if you want to know more): I thought Git was smart to identify merges common to both branches and do the right thing, but it turn out that Git does not care at all about commits, it only worries about content and the end result. In our case, unfortunately (or fortunately), the outcome meant a conflict because the upstream branch introduced a simple change (last hunk) in the lines surrounding the patch we cherry-picked, and Git can’t handle it.
The solution is not to cherry-pick, to cherry-pick
all commits touching the context of the
dpatch, or to simply merge upstream into
all out feature branches. In our case, the first is the easiest
solution and since importing dpatch files is a
one-time thing (thank $DEITY), I’ll leave it at
that.
Almost.
I have spent two days thinking about this more than I should have. And it was this point Linus made which made me appreciate Git even more:
Conflicts aren’t bad - they’re good. Trying to aggressively resolve them automatically when two branches have done slightly different things in the same area is stupid and just results in more problems. Instead, git tries to do what I don’t think anybody else has done: make the conflicts easy to resolve, by allowing you to work with them in your normal working tree, and still giving you a lot of tools to help you see what’s going on.
The end
This concludes today’s report. Importing the changes from the old Git repo, tagging and merging the branches is all covered in my previous post, or at least you’ll find enough information there to complete the exercise.
I would like to specifically thank Sam Vilain and Linus Torvalds
for their help in preparing this post, as well as the
#git/freenode inhabitants, as always.
If you are interested in the topic of using version control for
distro packaging, I invite you to join the vcs-pkg mailing
list and/or the #vcs-pkg/irc.oftc.net IRC
channel.
Also, if you are interested in Git in general, you can find a list of blog posts on the Git wiki.
NP: The Police: Zenyatta Mondatta
Posted Sun 14 Oct 2007 14:30:10 UTCIntroduction
I gave a joint presentation with Manoj at Debconf7 about using distributed version control for Debian packaging, and I volunteered to do an on-line workshop about using Git for the task, so it’s about time that I should know how to use Git for Debian packaging, but it turns out that I don’t. Or well, didn’t.
After I made a pretty good mess out of the mdadm packaging repository (which is not a big problem as it’s just ugly history up to the point when I start to get it right), I decided to get down with the topic and figure it out once and for all. I am writing this post as I put the pieces together. It’s been cooking for a week, simply so I could gather enough feedback. I am aware that Git is not exactly a showcase of usability, so I took some extra care to not add to the confusion.
It may be the first post in a series, because this time, I am
just covering the case of mdadm, for which upstream
also uses Git and where I am the only maintainer, and I shall
pretend that I am importing mdadm to version control
for the first time, so there won’t be any history juggling. Future
posts could well include tracking Subversion repositories with
git-svn,
and
importing packages previously tracked therewith.
I realise that git-buildpackage exists, but imposes a rather strict branch layout and tagging scheme, which I don’t want to adhere to. And gitpkg (Romain blogged about it recently), deserves another look since, according to its author, it does not impose anything on its user. But in any case, before using such tools (and possibly extending them to allow for other layouts), I’d really rather have done it by hand a couple of times to get the hang of it and find out where the culprits lie.
Now, enough of the talking, just one last thing: I expect this blog post to change quite a bit as I get feedback. Changes shall be highlighted in bold typeface.
Setting up the infrastructure
First, we prepare a shared repository on git.debian.org for later use (using
collab-maint for illustration purposes), download the
Debian source package we want to import (version
2.6.3+200709292116+4450e59-3 at time of writing, but I
pretend it’s -2 because we shall create
-3 further down…), set up a local repository, and link
it to the remote repository. Note that there are
other ways to set up the infrastructure, but this happens to be
the one I prefer, even though it’s slightly more complicated:
$ ssh alioth
$ cd /git/collab-maint
$ ./setup-repository pkg-mdadm mdadm Debian packaging
$ exit
$ apt-get source --download-only mdadm
$ mkdir mdadm && cd mdadm
$ git init
$ git remote add origin ssh://git.debian.org/git/collab-maint/pkg-mdadm
$ git config branch.master.remote origin
$ git config branch.master.merge refs/heads/master
Now we can use git-pull and git-push,
except the remote repository is empty and we can’t pull from there
yet. We’ll save that for later.
Instead, we tell the repository about upstream’s Git repository.
I am giving you the git.debian.org URL though, simply
because I don’t want upstream repository (which lives on an ADSL
line) hammered in response to this blog post:
$ git remote add upstream-repo git://git.debian.org/git/pkg-mdadm/mdadm
Since we’re using the upstream branch of the
pkg-mdadm repository as source (and don’t want all the
other mess I created in that repository), we’ll first limit the set
of branches to be fetched (I could have used the -t
option in the above git-remote command, but I prefer
to make it explicit that we’re doing things slightly differently to
protect upstream’s ADSL line).
$ git config remote.upstream-repo.fetch \
+refs/heads/upstream:refs/remotes/upstream-repo/upstream
And now we can pull down upstream’s history and create a local
branch off it. The “no common commits” warning can be safely
ignored since we don’t have any commits at all at that point (so
there can’t be any in common between the local and remote
repository), but we know what we’re doing, even to the point that
we can forcefully give birth to a branch, which is because we do
not have a HEAD commit yet (our repository is still
empty):
$ git fetch upstream-repo
warning: no common commits
[…]
# in the real world, we'd be branching off upstream-repo/master
$ git checkout -b upstream upstream-repo/upstream
warning: You appear to be on a branch yet to be born.
warning: Forcing checkout of upstream-repo/upstream.
Branch upstream set up to track remote branch
refs/remotes/upstream-repo/upstream.
$ git branch
* upstream
$ ls | wc -l
77
Importing the Debian package
Now it’s time to import Debian’s diff.gz — remember
how I pretend to use version control for package maintenance for
the first time. Oh, and sorry about the messy file names, but I
decided it’s best to stick with real data in case you are playing
along:
Since we’re applying the diff against version
2.6.3+200709292116+4450e59, we ought to make sure to
have the repository at the same state. Upstream never “released”
that version, but I encoded the commit ID of the tip when I
snapshotted it: 4450e59, so we branch off there. Since
we are actually tracking the git.debian.org
pkg-mdadm repository instead of upstream, you can use
the tag I made. Otherwise you could consider tagging yourself:
$ #git tag -s mdadm-2.6.3+200709292116+4450e59 4450e59
$ git checkout -b master mdadm-2.6.3+200709292116+4450e59
$ zcat ../mdadm_2.6.3+200709292116+4450e59-2.diff.gz | git apply
The local tree is now “debianised”, but Git does not know about
the new and changed files, which you can verify with
git-status. We will split the changes made by Debian’s
diff.gz across several branches.
The idea of feature branches
We could just create a debian branch, commit all
changes made by the diff.gz there, and be done with
it. However, we might want to keep certain aspects of Debianisation
separate, and the way to do that is with feature branches (also
known as “topic” branches). For the sake of this demonstration,
let’s create the following four branches in addition to the
master branch, which holds the standard Debian files,
such as debian/changelog, debian/control,
and debian/rules:
upstream-patcheswill includes patches against the upstream code, which I submit for upstream inclusion.deb/conffile-locationmakes/etc/mdadm/mdadm.confthe default over/etc/mdadm.confand is Debian-specific (thus thedeb/prefix).deb/initramfsincludes theinitramfshook and script, which I want to treat separately but not submit upstream.deb/docssimilarly includes Debian-only documentation I add to the package as a service to Debian users.
If you’re importing a Debian package using dpatch,
you might want to convert every dpatch into a single branch, or at
least collect logical units into separate branches. Up to you. For
now, our simple example suffices. Keep in mind that it’s easy to
merge two branch and less trivial to split one into two.
Why? Well, good question. As you will see further down, the
separation between master and
deb/initramfs actually makes things more complicated
when you are working on an issue spanning across both. However,
feature branches also bring a whole lot of flexibility. For
instance, with the above separation, I could easily create
mdadm packages without initramfs
integration (see #434934), a
disk-space-conscious distribution like grml might prefer to leave out the extra
documentation, and maybe another derivative doesn’t like the fact
that the configuration file is in a different place from upstream.
With feature branches, all these issues could be easily addressed
by leaving out unwanted branches from the merge into the
integration/build branch (see further down).
Whether you use feature branches, and how many, or whether you’d like to only separate upstream and Debian stuff is entirely up to you. For the purpose of demonstration, I’ll go the more complicated way.
Setting up feature branches
So let’s commit the individual files to the branches. The output
of the git-checkout command shows modified files that
have not been committed yet (which I trim after the first example);
Git keeps these across checkouts/branch changes. Note that the
./debian/ directory does not show up as Git does not
know about it yet (git-status will tell you that it’s
untracked, or rather: contains untracked files since Git does not
track directories at all):
$ git checkout -b upstream-patches mdadm-2.6.3+200709292116+4450e59
M Makefile
M ReadMe.c
M mdadm.8
M mdadm.conf.5
M mdassemble.8
M super1.c
$ git add super1.c #444682
$ git commit -s
# i now branch off master, but that's the same as 4450e59 actually
# i just do it so i can make this point…
$ git checkout -b deb/conffile-location master
$ git add Makefile ReadMe.c mdadm.8 mdadm.conf.5 mdassemble.8
$ git commit -s
$ git checkout -b deb/initramfs master
$ git add debian/initramfs/*
$ git commit -s
$ git checkout -b deb/docs master
$ git add RAID5_versus_RAID10.txt md.txt rootraiddoc.97.html
$ git commit -s
# and finally, the ./debian/ directory:
$ git checkout master
$ chmod +x debian/rules
$ git add debian
$ git commit -s
$ git branch
deb/conffile-location
deb/docs
* master
upstream
upstream-patches
At this time, we push our work so it won’t get lost if, at this
moment, aliens land on the house, or any other completely plausible
event of apocalypse descends upon you. We’ll push our work to
git.debian.org (the origin, which is the
default destination and thus needs not be specified) by using
git-push --all, which conveniently pushes all local
branches, thus including the upstream code; you may not want to
push the upstream code, but I prefer it since it makes it easier to
work with the repository, and since most of the objects are needed
for the other branches anyway — after all, we branched off the
upstream branch.
Specifying --tags instead of --all
pushes tags instead of heads (branches); you couldn’t have guessed
that! See this
thread if you (rightfully) think that one should be able to do
this in a single command (which is not git push refs/heads/*
refs/tags/*)…
$ git push --all
$ git push --tags
Done. Well, almost…
Building the package (theory)
Let’s build the package. There seem to be two (sensible) ways we could do this, considering that we have to integrate (merge) the branches we just created, before we fire off the building scripts:
-
by using a temporary (or “throw-away”) branch off
upstream, where we integrate all the branches we have just created, build the package, tag ourmasterbranch (it containsdebian/changelog), and remove the temporary branch. When a new package needs to be built, we repeat the process. -
by using a long-living integration branch off
upstream, into which we merge all our branches, tag the branch, and build the package off the tag. When a new package comes around, we re-merge our branches, tag, and build.
Both approaches have a certain appeal to me, but I settled for the second, for two reasons, the first of which leads to the second:
-
When I upload a package to the Debian archive, I want to create a tag which captures the exact state of the tree from which the package was built, for posterity (I will return to this point later). Since the throw-away branches are not designed to persist and are not uploaded to the archive, tagging the merging commit makes no sense. Thus, the only way to properly identify a source tree across all involved branches would be to run
git-tag $branch/$tagname $branchfor each branch, which is purely semantic and will get messy sooner or later. -
As a result of the above: when Debian makes a new stable release, I would like to create a branch corresponding to the package in the stable archive at the time, for security and other proposed updates. I could rename my throw-away branch, if it still existed, or I could create a new branch and merge all other branches, using the (semantic) tags, but that seems rather unfavourable.
So instead, I use a long-living integration branch, notoriously tag the merge commits which produced the tree from which I built the package I uploaded, and when a certain version ends up in a stable Debian release, I create a maintenance branch off the one, single tag which corresponds to the very version of the package distributed as part of the Debian release.
So much for the theory. Let’s build, already!
Building the package (practise)
So we need a long-living integration branch, and that’s easier done than said:
$ git checkout -b build mdadm-2.6.3+200709292116+4450e59
Now we’re ready to build, and the following procedure should
really be automated. I thus write it like a script, called
poor-mans-gitbuild, which takes as optional argument
the name of the (upstream) tag to use, defaulting to
upstream (the tip):
#!/bin/sh
set -eu
git checkout master
debver=$(dpkg-parsechangelog | sed -ne 's,Version: ,,p')
git checkout build
git merge ${1:-upstream}
git merge upstream-patches
git merge master
for b in $(git for-each-ref --format='%(refname)' refs/heads/deb/*); do
git merge -- $b
done
git tag -s debian/$debver
debuild -i.git
git checkout master
Kumar Appaiah spotted that -i.git
is actually needed in the debuild call to make it
exclude the .git directory from the generated
diff.gz.
Note how we are merging each branch in turn, instead of using the octopus merge strategy (which would create a commit with more than two parents) for reasons outlined in this post. An octopus-merge would actually work in our situation, but it will not always work, so better safe than sorry (although you could still achieve the same result).
If you discover during the build that you forgot something, or the build script failed to run, just remove the tag, undo the merges, checkout the branch to which you need to commit to fix the issue, and then repeat the above build process:
$ git tag -d debian/$debver
$ git checkout build
$ git reset --hard upstream
$ git checkout master
$ editor debian/rules # or whatever
$ git add debian/rules
$ git commit -s
$ poor-mans-gitbuild
Before you upload, it’s a good idea to invoke gitk
--all and verify that all goes according to plan:
When you’re done and the package has been uploaded, push your
work to git.debian.org, as before. Instead of using
--all and --tags, I now specify exactly
which refs to push. This is probably a good habit to get into to
prevent publishing unwanted refs:
$ git push origin build tag debian/2.6.3+200709292116+4450e59-3
Now take your dog for a walk, or play outside, or do something else not involving a computer or entertainment device.
Uploading a new Debian version
If you are as lucky as I am, the package you uploaded still has a bug in the upstream code and someone else fixes it before upstream releases a new version, then you might be in the position to release a new Debian version. Or maybe you just need to make some Debian-specific changes against the same upstream version. I’ll let the commands speak for themselves:
$ git checkout upstream-patches
$ git-apply < patch-from-lunar.diff #444682 again
$ git commit --author 'Jérémy Bobbio <lunar@debian.org>' -s
# this should also be automated, see below
$ git checkout master
$ dch -i
$ dpkg-parsechangelog | sed -ne 's,Version: ,,p'
2.6.3+200709292116+4450e59-3
$ git commit -s debian/changelog
$ poor-mans-gitbuild
$ git push
$ git push origin tag debian/2.6.3+200709292116+4450e59-3
That first git-push may require a short
explanation: without any arguments, git-push updates
only the intersection of local and remote branches, so it would
never push a new local branch (such as build above),
but it updates all existing ones; thus, you cannot inadvertedly
publish a local branch. Tags still need to be published
explicitly.
Hacking on the software
Imagine: on a rainy Saturday afternoon you get bored and decide
to implement a better way to tell mdadm when to start which array.
Since you’re a genius, it’ll take you only a day, but you do make
mistakes here and there, so what could be better than to use
version control? However, rather than having a branch that will
live forever, you are just creating a local branch, which you will
not publish. When you are done, you’ll feed your work back into the
existing branches.
Git makes branching really easy and as you may have spotted, the
poor-mans-gitbuild script reserves an entire branch
namespace for people like you:
$ git checkout -b tmp/start-arrays-rework master
Unfortunately (or fortunately), fixing this issue will require
work on two branches, since the initramfs script and
hook are maintained in a separate branch. There are (again) two
ways in which we can (sensibly) approach this:
-
create two separate, temporary branches, and switch between them as you work.
-
merge both into the temporary branch and later cherry-pick the commits into the appropriate branches.
I am undecided on this, but maybe the best would be a combination: merge both into a temporary branch and later cherry-pick the commits into two additional, temporary branches until you got it right, and then fast-forward the official branches to their tips:
$ git merge master deb/initramfs
$ editor debian/mdadm-raid # …
$ git commit -s debian/mdadm-raid
$ editor debian/initramfs/script.local-top # …
$ git commit -s debian/initramfs/script.local-top
[many hours of iteration pass…]
[… until you are done]
$ git checkout -b tmp/start-arrays-rework-init master
# for each commit $c in tmp/start-arrays-rework
# applicable to the master branch:
$ git cherry-pick $c
$ git checkout -b tmp/start-arrays-rework-initramfs deb/initramfs
# for each commit $c in tmp/start-arrays-rework
# applicable to the deb/initramfs branch:
$ git cherry-pick $c
This is assuming that all your commits are logical units. If you find several commits which would better be bundled together into a single commit, this is the time to do it:
$ git cherry-pick --no-commit <commit7>
$ git cherry-pick --no-commit <commit4>
$ git cherry-pick --no-commit <commit5>
$ git commit -s
Before we now merge this into the official branches, let me briefly intervene and introduce the concept of a fast-forward. Git will “fast-forward” a branch to a new tip if it decides that no merge is needed. In the above example, we branched a temporary branch (T) off the tip of an official branch (O) and then worked on the temporary one. If we now merge the temporary one into the official one, Git determines that it can actually squash the ancestry into a single line and push the official branch tip to the same ref as the temporary branch tip. In cheap (poor man’s), ASCII notation:
- - - O >> merge T >> - - - = - - OT
` - - T >> into O >>
This works because no new commits have been made on top of O (if there would be any, we might be able to rebase, but let’s not go there quite yet; rebasing is how you shoot yourself in the foot with Git). Thus we can simply do the following:
$ git checkout deb/initramfs
$ git merge tmp/start-arrays-rework-initramfs
$ git checkout master
$ git merge tmp/start-arrays-rework-init
and test/build/push the result. Or well, since you are not an
mdadm maintainer (We\^W I have open job positions!
Applications welcome!), you’ll want to submit your work as patches
via email:
$ git format-patch -s -M origin/master
This will create a number of files in the current directory, one
corresponding for each commit you made since
origin/master. Assuming each commit is a logical unit,
you can now submit these to an email address. The
--compose option lets you write an introductory
message, which is optional:
$ git send-email --compose --to your@email.address <file1> <file2> <…>
Once you’ve verified that everything is alright, swap your email address for the bug number (or the pkg-mdadm-devel list address).
Thanks (in advance) for your contribution!
Of course, you may also be working on a feature that you want to
go upstream, in which case you’d probably branch off
upstream-patches (if it depends on a patch not yet in
upstream’s repository), or upstream (if it does
not):
$ git checkout -b tmp/cool-feature upstream
[…]
… when a new upstream version comes around
After a while, upstream may have integrated your patches, in
addition to various other changes, to give birth to
mdadm-2.6.4. We thus first fetch all the new refs and
merge them into our upstream branch:
$ git fetch upstream-repo
$ git checkout upstream
$ git merge upstream-repo/master
we could just as well have executed
git-pull, which with the default configuration would
have done the same; however, I prefer to separate the process into
fetching and merging.
Now comes the point when many Git people think about rebasing.
And in fact, rebasing is exactly what you should be doing, iff
you’re still working on an unpublished branch, such as the
previous tmp/cool-feature off upstream.
By rebasing your branch onto the updated upstream
branch, you are making sure that your patch will apply cleanly when
upstream tries it, because potential merge conflicts would be
handled by you as part of the rebase, rather than by upstream:
$ git checkout tmp/cool-feature
$ git rebase upstream
What rebasing does is quite simple actually: it takes every commit you made since you branched off the parent branch and records the diff and commit message. Then, for each diff/commit_message pair, it creates a new commit on top of the new parent branch tip, thus rewrites history, and orphans all your original commits. Thus, you should only do this if your branch has never been published or else you would leave people who cloned from your published branch with orphans.
If this still does not make sense, try it out: create a (source) repository, make a commit (with a meaningful commit message), branch B off the tip, make a commit on top of B (with a meaningful message), clone that repository and return to the source repository. There, checkout the master, make a commit (with a …), checkout B, rebase it onto the tip of master, make a commit (with a …), and now
git-pullfrom the clone; usegitkto figure out what’s going on.
So you should almost never rebase a published branch, and since
all your branches outside of the tmp/* namespace are
published on git.debian.org, you should not rebase
those.
But then again, Pierre actually
rebases a published branch in his workflow, and he does so with
reason: his patches branch is just a collection of
branches to go upstream, from which upstream cherry-picks or which
upstream merges, but which no one tracks (or should be
tracking).
But we can’t (or at least will not at this point) do this for
our feature branches (though we could treat
upstream-patches that way), so we have to merge. At
first, it suffices to merge the new upstream into the
long-living build branch, and to call
poor-mans-gitbuild, but if you run into merge
conflicts or find that upstream’s changes affect the functionality
contained in your feature branches, you need to actually fix
those.
For instance, let’s say that upstream started providing
md.txt (which I previously provided in the
deb/docs branch), then I need to fix that branch:
$ git checkout deb/docs
$ git rm md.txt
$ git commit -s
That was easy, since I could evade the conflict. But what if
upstream made a change to Makefile, which got in the
way with my configuration file location change? Then I’d have to
merge upstream into
deb/conffile-location, resolve the conflicts, and
commit the change:
$ git checkout deb/conffile-location
$ git merge upstream
CONFLICT!
$ git-mergetool
$ git commit -s
When all conflicts have been resolved, I can prepare a new release, as before:
$ git checkout master
$ dch -i
$ dpkg-parsechangelog | sed -ne 's,Version: ,,p'
2.6.3+200709292116+4450e59-3
# git commit -s debian/changelog
$ poor-mans-gitbuild
# git push
$ git push origin tag debian/2.6.3+200709292116+4450e59-3
Note that Git often appears smart about commits that percolated
upstream: since upstream included the two commits in
upstream-patches in his 2.6.4 release, my
upstream-patches branch got effectively annihilated,
and Git was smart enough to figure that out without a
conflict. But before you rejoice, let it be told that this does not
always work.
Creating and using a maintenance branch
Let’s say Debian “lenny” is released with mdadm
2.7.6-1, then:
$ git checkout -b maint/lenny debian/2.7.6-1
You might do this to celebrate the release, or you may wait until the need arises. We’ve already left the domain of reality (“lenny” is not yet released), so the following is just theory.
Now, assume that a security bug is found in mdadm
2.7.6 after “lenny” was released. Upstream is already
on mdadm 2.7.8 and commits
deadbeef and c0ffee fix the security
issue, then you’d cherry-pick them into the
maint/lenny branch:
$ git checkout upstream
$ git pull
$ git checkout maint/lenny
$ git cherry-pick deadbeef
$ git cherry-pick c0ffee
If there are no merge conflicts (which you’d resolve with
git-mergetool), we can just go ahead to prepare the
new package:
$ dch -i
$ dpkg-parsechangelog | sed -ne 's,Version: ,,p'
2.7.6-1lenny1
$ git commit -s debian/changelog
$ poor-mans-gitbuild
$ git push origin maint/lenny
$ git push origin tag debian/2.7.6-1lenny1
Future directions
It should be trivial to create the Debian source package directly from the repository, and in fact, in response to a recent blog post of mine on the dispensability of pristine upstream tarballs, two people showed me their scripts to do it.
My post also caused Joey Hess to clarify his position on pristine tarballs, before he went out to implement dpkg-source v3. This looks very promising.
Yet, as Romain argues, there are benefits with simple patch management systems. Exciting times ahead!
In addition to creating source packages from version control, a couple of other ideas have been around for a while:
-
create
debian/changelogfrom commit log summaries when you merge into thebuildbranch. Guido’s git-dch might be a lead. -
integrate version control with the BTS, bidirectionally:
-
given a bug report, create a temporary branch and apply any patches found in the bug report.
-
upon merging the temporary branch back into the feature branch it modifies, generate a patch, send it to the BTS and tag the bug report
+ pending patch.
-
And I am sure there are more. If you have any, I’d be interested to hear about them!
Wrapping up
I hope this post was useful. Thank you for reading to the end, this was probably my longest blog post ever.
I want to thank Pierre Habouzit, Johannes Schindelin, and all
the others on the #git/freenode IRC channel for their
tutelage. Thanks also to Manoj Srivastava, whose pioneering work on
packaging with GNU arch got me started on most of the concepts
I use in the above. And of course, the members of the the vcs-pkg mailing
list for the various discussions on this subject, especially
those who participated in
the thread leading up to this post. Finally, thanks to Linus
and Junio for Git and the
continuously outstanding high level of support they give.
If you are interested in the topic of using version control for
distro packaging, I invite you to join the vcs-pkg mailing
list and/or the #vcs-pkg/irc.oftc.net IRC
channel.
NP: Aphex Twin: Selected Ambient Works, Volume 2 (at least when I started writing…)
Posted Wed 10 Oct 2007 19:46:22 UTCdebcheckout: some new bits
Some new bits about debcheckout (talk is cheap, code here):
-
authenticated mode. Consider svn (similar arguments stand for other VCS). When checking out alioth repositories using the svn:// prefix, the resulting local copy can’t be committed to, since it would require (assuming you have an alioth account and the needed permissions) a svn+ssh:// access. “authenticated mode” is precisely for that: when checking out well-known repositories (only alioth’s ATM) you can specify an extra “-a” argument, with an optional “-u” to specify your user name, and debcheckout will rewrite the repository URL so that the resulting local copy can be committed to. ATM authenticated mode works for svn, hg, bzr, git.
-
destination dir. It is now possible to specify where do you want to check out a package repository (so that we avoid ending up with tons of anonymous “trunk” directories). The syntax is the common “debcheckout PKG DESTDIR” idiom and DESTDIR, if not provided, defaults to the package name. (Thanks to JoeyH for the idea and the initial patch.)
-
sorry about arch, but ATM it’s almost non-functioning, and I’m not willing to lose time on it, since among all the VCSs supported by debcheckout it’s the only one I’ve never used. If you want support for it, please provide code!
(Perl) Tip of the day: Switch.pm
With “use Switch;” you will win a switch statement for your Perl programs, which can be used as follows:
switch ($repo_type) {
case "cvs" { my $module = pop @cmd; push @cmd, ("-d", $destdir, $module); }
case /^(bzr|darcs|git|hg|svn)$/
{ push @cmd, $destdir; }
else { die "sorry, don't know how to set the destination directory for $repo_type repositories (patches welcome!)\n"; }
}
This is far better than a chain of if/elsif statements and has even a sane semantics (e.g. no need of explicit breaks, possibility to have higher-case branches, …). Unfortunately, it is not possible (using a simple syntax) to match a scalar value against an array case branch. Therefore the only way to factorize branches is (when possible) to rely on regexp alternative branches, as it is done in the code snippet above.
Posted Fri 17 Aug 2007 08:13:04 UTCIntroducing debcheckout
Cute little tiny teeny new addition to devscripts: debcheckout (not yet uploaded though, in the mean time you can get it from here. It checks out the versioning repository used to maintain a given package.
Sample usage:
$ debcheckout devscripts
declared svn repository at svn://svn.debian.org/devscripts/trunk
svn co svn://svn.debian.org/devscripts/trunk ...
A trunk/debian
A trunk/debian/control
A trunk/debian/links
A trunk/debian/dirs
A trunk/debian/compat
<snip>
U trunk
Checked out revision 749.
$
The information about where to find a repository is extracted parsing (in a rather dumb way actually, but I really can’t stand libapt-pkg API!) Vcs-XXX fields.
Intended usages:
-
NMU scenarios: when you’re NMUing, please commit your patches (if possible of course: directly to the repository if it has already adhered to the [[!open your VCS campaign|DDwidecommitonalioth]] I’m sponsoring, or somewhere else if you’re using a distributed VCS); with debcheckout the first step it’s easy
-
ease the creation of patches: isn’t it better to checkout a repository, fiddle around, and then just invoke svn (or whatever) diff instead of remembering (I always forgot that!) to first create a .orig copy of the debianized source tree?
-
retrieving the bleeding edge version of a package which includes the patch for a pending bug you have been waiting for ages
RFC
Let me know what you think of debcheckout, feature requests, whatever. In particular let me know if I did something wrong using some VCS, since I’m not proficient in all VCSs supported by debcheckout; for example: I’m quite sure the Arch part is not working … help is appreciated!
Vcs-Cvs proposed convention
debcheckout can also give the ground for standardizing the VCS-specific meanings of the various Vcs-XXX fields. In writing it I’ve noticed that almost all VCSs have de facto standards about what to put in the field, with the notable exception of CVS. That’s probably because all modern VCSs rely on some URL-like identifier for a repository location, while CVS does not. It needs a pair URL/module to be checkout.
The format I’m currently supporting in debcheckout is a pair of space separated values “CVSROOT MODULENAME”; later on I’m using those values as in cvs -d CVSROOT checkout MODULENAME. Also please note that if you do not put the heading “:pserver:” string in the CVSROOT, users won’t be able to checkout the repository without providing a password.
Tip of the day: Pod::Usage
Perl’s Pod::Usage module is cool. Finally I can write the usage string only once instead of duplicating it in the manpage and in the string to be printed upon —help. Ruby had something similar, but the output on console was so horrible that I preferred duplicating stuff for Ruby scripts.
Update: I’ve changed the link to debcheckout.pl so that it points to the “live” version in the devscripts repository, since some patches are already flowing in …
Posted Wed 15 Aug 2007 14:14:05 UTCWow, after 2:15 hours of continuous IRC hacking, my brain is fried and at least 3-4 people followed the introduction (thanks to bignose for the editing) to Manoj’s packaging art I gave in #pkg-zope. I think it was successful, but there are lessons to be learnt:
-
Prepare a simple example. I planned to do so, but today was just too bad a day and I did not get to it.
-
Do your own hacking somewhere where people can see it, e.g. on alioth (if it isn’t down), in a publicly readable directory. Consider typescript.
-
Have two people, or more. Manoj helped out a lot, but he wasn’t prepared, so I felt sorry for putting him on the spot. Two people are needed when a problem arises: then one can fix it while the other fields peripheral questions.
-
Have the log appear live on the web somewhere, so late-joiners can catch up.
-
Grow an extra 30 fingers. Learn Dvorak. Man, my fingers ache.
What is good though is that as the demonstrator, you have to type everything twice — into the shell and into the IRC window. That gives the people following the demo twice as much time to try things themselves.
I think we should have more demos of this kind in our community.
Update: I had to give up the wiki on my server and the Debian admins have not yet had the time to incorporate the pages into the Debian wiki proper.
Anyway, I suggest against the use of arch, which is a bit too cumbersome. Have a look at some of the other VCS to do what you want. For instance, I just published a typescript from a recent presentation on using modern VCS for Debian packaging, in which I use git for the same workflow.
Posted Thu 11 Aug 2005 21:36:28 UTCEntries are updated every 48 */3 * * * (yes, this
is cron).
