Don't tell my employer, but I'm an
undercover git user. Officially, my
company
uses Clearcase
UCM - but I do almost all my coding outside Clearcase. I use
git behind the official version control system. I'll tell you
why I risk a scolding from my boss1 and the IT
department for using an unsanctioned
tool. But to do that, I'll have to tell you a bit more about both
Clearcase2 and git.
Clearcase Basics
Clearcase is a centrally managed, client-server type system -- a
big server hosts the repository which tracks all the project's info:
individual files, the directory tree, versions, branches, users,
permissions, Social Security numbers, DNA sequences, etc. The client
(that's you) gets a working copy of the project files and tools to
interact with the server. Checking out, checking in and viewing
history all require connecting to the server. You can't do a thing
without being tethered to the central repository.
In the Clearcase UCM development model, programmers create new
"streams" ("branches" in generic terms), each having an "activity"
(which is Clearcase talk for "changeset"). When you checkin a file,
it's recorded against an activity. Therefore, an activity is a set of
checked-in files (and directory changes). Other developers on the
project won't see your changes until you "deliver" (merge) your
activity (changeset) to the parent stream (the main branch). End of
vocabulary lesson.
Where Clearcase Falls Down
My biggest issue with the Clearcase model is that it inhibits both
experimental and iterative programming. I assert the following:
- Experimental coding requires branching. Let's hope this isn't
a
controversial
statement. The only arguments people make against branching are
practical - their VCSs
don't support it well. Or they don't support merging well. Clearly, if
you had a VCS that could branch and merge well, you'd use it all the
time.
- Clearcase discourages experimental branching as streams are
centralized, limited and public. On my project, creating "unnecessary"
experimental streams is discouraged as each stream incurs the
maintenance penalty of using more resources on the server.
- Merging in Clearcase is also less than perfect. Instead of a
patch-based approach, Clearcase restricts merges to streams to have a
common baseline. And guess who has to create, maintain and recommend
these baselines? You, the user. Want to merge between streams that
have diverged - that don't share an exact baseline? Forget it: you
have to go outside Clearcase to do it.
- Finally, Clearcase activities don't allow logical commits within
an activity - a checkin is limited to just one file. You cannot
checkin 4 files as a group representing a logical change. This makes
it hard to both organize your work and revert multi-file changes if
they don't work out.
Let's see if git could help. (Ok, it can - or else why would I
continue writing?)
Git Comes (Quietly) to the Rescue
Git is the distributed VCS used by Linus to maintain the Linux
kernel. Where Clearcase requires a centralized server, git requires
none -- the repository is stored in a single hidden directory at the
top of your project's file tree. You can create a git repository from
an existing project very easily:
cp -R /my/clearcase/project /my/git/project
cd /my/git/project
git init
git add .
git commit
A git repository is local and under your complete control, you
don't need any anybody's permission to create one. And obviously you
don't need network connectivity to a centralized server. You won't
need to file a single TPS report to setup a git repository. And, if
needed, you can keep your repo a secret.
What was I complaining about? Oh yes, branching and logical
commits.
Branching is simple and lightweight in git. Let's say you're
working in your 'bug_fix' branch, and you decide that the
fix could be simplified by moving some methods from class Foo to
class Bar. Here's how you'd create a new branch called
'foo_refactoring', make some changes in it, and then merge those
changes back to the master branch:
cd /my/git/project
git checkout bug_fix
git checkout -b foo_refactoring
// make some changes to Foo and Bar, compile, and test
git add Foo.java Bar.java
git commit -m 'Refactored Foo'
git checkout bug_fix
git merge foo_refactoring
If you decided the refactoring was unnecessary, you could have
skipped the merge -- or even permanently removed the experimental
branch. The branch was created quickly -- just to try out some
ideas -- and it can be ignored or removed. Branching is up to you,
not the sysadmin.
Finally, git lets you build logical changesets. As you can see in
the foo_refactoring example, a commit can contain any number of file
changes. You can build a new feature piece-by-piece, committing chunks
of related work together. This is good for both you and your
reviewers!
So, how are you going to use git behind your VCS?
Getting it to Git
Getting your Clearcase (or CVS, Subversion, Perforce, etc) code into
git is easy: copy your working dir to some local drive space and do
the "git init; git add .; git commit" sequence.
Dealing with rebases is easy too. Other users have probably made
changes to the codebase (in Clearcase) and you'll need to merge your
work with theirs before merging to the main stream. You can handle
this by rsyncing the upstream code into a git branch, then using
git
rebase to merge that code into your development branch - fixing
any conflicts as necessary. Git rebase basically pops your current
commits off your branch, merges with the requested branch, then
re-applies your commits. It helps keep the history of your changes
simple. I recommending doing all development work in a sub-branch off
master (master is git's default branch) and keeping master for
rebases. For example:
cd /my/clearcase/project
cleartool rebase -recommended // or 'cvs up' or whatever
cd /my/git/project
git checkout master
rsync -r /my/clearcase/project /my/git/project
git commit -a -m 'rebased from clearcase'
git checkout dev_branch
git rebase master
Getting it back to Clearcase
Git has made my daily coding much nicer - but if I want my code
built into my product, I still have to get it back to Clearcase.
I find the simplest and safest way to do this is by applying a
series of patches to the Clearcase controlled working dir. You can ask
git to generate a patch for a single commit using git diff:
git diff 345983 > my-change.patch
But if you give git-diff a branch name instead of a commit, it will
generate patch files for each diverging commit between the two
branches. Assuming your master branch represents the rebased Clearcase
branch and your dev branch has been 'git rebased' to master, this is
exactly what you need!
cd /my/project/git
git checkout bug_fix
git diff master > bug_fix.patch
cd /my/clearcase/project
// checkout any files if necessary
patch -p2 < /my/git/project/bug_fix.patch
Build, test and submit in Clearcase - you're done!
Final Thoughts
Having a full understanding of your VCS's data model is essential to using a it correctly. Perhaps the root cause of why I prefer git over almost any other system is
its simple conceptual model (Git for Computer Scientists does a nice job explaining the data model). Clearcase is typical enterprise software -- its feature sheet is very long and highlights words that CIOs love like "reliable", "maintainable", and "support contract", but the documentation is thrifty when discussing the systems internals. Version control is too essential and too difficult to trust to a system you don't understand. So I use git - behind the scenes if necessary.
1 Hi Boss! What I'm discussing here is not really any less safe than having un-committed work in any working dir (which everybody does). But the threat of a knuckle slapping adds some drama to this post - don't you think? ;-)
2 Almost everything discussed in this
post is true of any centralized version control system (CVS,
Subversion, etc), not just Clearcase.
Technorati tags for this post:
work
programming
linux
git
clearcase