Category: work

Git Backstage, Under the Covers and on the Down Low

Don't tell my employer, but I'm an undercover git user. Officially, my company uses Clearcase UCM - but I do almost all my coding outside Clearcase. I use git behind the official version control system. I'll tell you why I risk a scolding from my boss1 and the IT department for using an unsanctioned tool. But to do that, I'll have to tell you a bit more about both Clearcase2 and git.

Clearcase Basics

Clearcase is a centrally managed, client-server type system -- a big server hosts the repository which tracks all the project's info: individual files, the directory tree, versions, branches, users, permissions, Social Security numbers, DNA sequences, etc. The client (that's you) gets a working copy of the project files and tools to interact with the server. Checking out, checking in and viewing history all require connecting to the server. You can't do a thing without being tethered to the central repository.

In the Clearcase UCM development model, programmers create new "streams" ("branches" in generic terms), each having an "activity" (which is Clearcase talk for "changeset"). When you checkin a file, it's recorded against an activity. Therefore, an activity is a set of checked-in files (and directory changes). Other developers on the project won't see your changes until you "deliver" (merge) your activity (changeset) to the parent stream (the main branch). End of vocabulary lesson.

Where Clearcase Falls Down

My biggest issue with the Clearcase model is that it inhibits both experimental and iterative programming. I assert the following:

  1. Experimental coding requires branching. Let's hope this isn't a controversial statement. The only arguments people make against branching are practical - their VCSs don't support it well. Or they don't support merging well. Clearly, if you had a VCS that could branch and merge well, you'd use it all the time.
  2. Clearcase discourages experimental branching as streams are centralized, limited and public. On my project, creating "unnecessary" experimental streams is discouraged as each stream incurs the maintenance penalty of using more resources on the server.
  3. Merging in Clearcase is also less than perfect. Instead of a patch-based approach, Clearcase restricts merges to streams that have a common baseline. And guess who has to create, maintain and recommend these baselines? You, the user. Want to merge between streams that have diverged - that don't share an exact baseline? Forget it: you have to go outside Clearcase to do it.
  4. Finally, Clearcase activities don't allow logical commits within an activity - a checkin is limited to just one file. You cannot checkin 4 files as a group representing a logical change. This makes it hard to both organize your work and revert multi-file changes if they don't work out.

Let's see if git could help. (Ok, it can - or else why would I continue writing?)

Git Comes (Quietly) to the Rescue

Git is the distributed VCS used by Linus to maintain the Linux kernel. Where Clearcase requires a centralized server, git requires none -- the repository is stored in a single hidden directory at the top of your project's file tree. You can create a git repository from an existing project very easily:

cp -R /my/clearcase/project /my/git/project
cd /my/git/project
git init
git add .
git commit

A git repository is local and under your complete control, you don't need any anybody's permission to create one. And obviously you don't need network connectivity to a centralized server. You won't need to file a single TPS report to setup a git repository. And, if needed, you can keep your repo a secret.

What was I complaining about? Oh yes, branching and logical commits.

Branching is simple and lightweight in git. Let's say you're working in your 'bug_fix' branch, and you decide that the fix could be simplified by moving some methods from class Foo to class Bar. Here's how you'd create a new branch called 'foo_refactoring', make some changes in it, and then merge those changes back to the master branch:

cd /my/git/project
git checkout bug_fix
git checkout -b foo_refactoring
// make some changes to Foo and Bar, compile, and test
git add Foo.java Bar.java
git commit -m 'Refactored Foo'
git checkout bug_fix
git merge foo_refactoring

If you decided the refactoring was unnecessary, you could have skipped the merge -- or even permanently removed the experimental branch. The branch was created quickly -- just to try out some ideas -- and it can be ignored or removed. Branching is up to you, not the sysadmin.

Finally, git lets you build logical changesets. As you can see in the foo_refactoring example, a commit can contain any number of file changes. You can build a new feature piece-by-piece, committing chunks of related work together. This is good for both you and your reviewers!

So, how are you going to use git behind your VCS?

Getting it to Git

Getting your Clearcase (or CVS, Subversion, Perforce, etc) code into git is easy: copy your working dir to some local drive space and do the "git init; git add .; git commit" sequence.

Dealing with rebases is easy too. Other users have probably made changes to the codebase (in Clearcase) and you'll need to merge your work with theirs before merging to the main stream. You can handle this by rsyncing the upstream code into a git branch, then using git rebase to merge that code into your development branch - fixing any conflicts as necessary. Git rebase basically pops your current commits off your branch, merges with the requested branch, then re-applies your commits. It helps keep the history of your changes simple. I recommending doing all development work in a sub-branch off master (master is git's default branch) and keeping master for rebases. For example:

cd /my/clearcase/project
cleartool rebase -recommended // or 'cvs up' or whatever
cd /my/git/project
git checkout master
rsync -r /my/clearcase/project /my/git/project
git commit -a -m 'rebased from clearcase'
git checkout dev_branch
git rebase master

Getting it back to Clearcase

Git has made my daily coding much nicer - but if I want my code built into my product, I still have to get it back to Clearcase.

I find the simplest and safest way to do this is by applying a series of patches to the Clearcase controlled working dir. You can ask git to generate a patch for a single commit using git diff:

git diff 345983 > my-change.patch

But if you give git-diff a branch name instead of a commit, it will generate patch files for each diverging commit between the two branches. Assuming your master branch represents the rebased Clearcase branch and your dev branch has been 'git rebased' to master, this is exactly what you need!

cd /my/project/git
git checkout bug_fix
git diff master > bug_fix.patch
cd /my/clearcase/project
// checkout any files if necessary
patch -p2 < /my/git/project/bug_fix.patch

Build, test and submit in Clearcase - you're done!

Final Thoughts

Having a full understanding of your VCS's data model is essential to using a it correctly. Perhaps the root cause of why I prefer git over almost any other system is its simple conceptual model (Git for Computer Scientists does a nice job explaining the data model). Clearcase is typical enterprise software -- its feature sheet is very long and highlights words that CIOs love like "reliable", "maintainable", and "support contract", but the documentation is thrifty when discussing the systems internals. Version control is too essential and too difficult to trust to a system you don't understand. So I use git - behind the scenes if necessary.

Update (2008/8/5): As several commentators have noticed, if you can use Clearcase snapshot views instead of dynamic views, then you can directly init a git repo in the view storage directory. Then you can use git directly out of that directory or clone it. By creating the git repo directly in your snapshot view, you can substitute the rsync step with a "git pull" and the patch step with a "git push". This is a big win. Alas, my employer requires dynamic views due to tool limitations, so I didn't discuss this method in the original entry.


1 Hi Boss! What I'm discussing here is not really any less safe than having un-committed work in any working dir (which everybody does). But the threat of a knuckle slapping adds some drama to this post - don't you think? ;-)

2 Almost everything discussed in this post is true of any centralized version control system (CVS, Subversion, etc), not just Clearcase.

Technorati tags for this post:

My Intranet Sucks

Yes, that's a very frank title. But I'm annoyed to distraction, and I need to get this off my chest. Here's the story of my company's intranet.

I write software for a multi-national telecom company. I'm going to avoid naming the company as I haven't researched their blog policy yet. We're a company that takes tremendous pride in our technological prowess - we've developed a number of truly innovative products, ancient and modern. We have an entire Class-A block of IP addresses, that's how long we've been involved with IP networking. We work with internet protocols day in and day out. But I'm really starting to wonder if we, as a company, understand the value of the internet, specifically the World Wide Web (when was the last time you saw that spelled out?). A company with this much technical know-how shouldn't have an intranet as terrible as ours.

Six years ago, our company had many, many internal web servers scattered about. While the servers themselves where managed by professional IT staffers, the content of these servers was "managed" by amateurs throughout the company. When I joined the company, I took one look at my team's site and volunteered to clean it up. Our site was typical of the content I saw on our intranet at that time: an unorganized and unmaintained mess of non-validating HTML pages, 25% of the links on these pages being dead. It took a month to mop up the mess and build something usable. Our site was tailored for the needs of two dozen people and wasn't particularly well-connected to related sites or corporate sites.

Multiply our messy site by approximately 1,000 and you have an idea of the scope and organization of the company's intranet. There was one shining beacon of hope however - there existed a half-decent search engine. If the content you where looking for existed, the search engine would almost certainly find it. Our intranet of that day wasn't pretty, but it had content, and you could find that content. In short, it worked.

Quite rightly, our IT department did not like managing several hundred web servers, each running different software. An announcement was made that we would begin a server consolidation program. At the same time, the powers-that-be chose a new Content Management System that would become the default storage system for all static media. They choose Livelink, from OpenText.com, for this task. Livelink has a long list of CMS features, but it's primary purpose is to store documents (HTML, Word, Excel, PDF, etc) in a hierarchical folder layout.

Let's start the rant, shall we?

Repeat after me: The web is NOT a folder of Microsoft office documents!

The web is a highly interlinked set of HTML documents that are accessible via HTTP and can be viewed in a web browser. Obviously, the web includes content types other than text/html, but the web page is what makes the web interesting and usable. You know now the web works: users click from page to page, using the contextual information on the page to decide what link to click next. Our corporate IT wizards have replaced a real web-like intranet with a web-accessible directory tree of documents, the majority of which aren't even in HTML.

Let's discuss how our new CMS ruined our intranet.

[Caveat: I suspect many of the criticisms I will heap upon the CMS are due to a misuse of Livelink as our intranet replacement. Perhaps Livelink is a perfectly fine product for some other use.]

The Hierarchical Folder Structure

At first glance it doesn't seem so bad to structure your intranet in a hierarchical structure. For example, one of my documents lives under a directory structure similar to this (names have been changed to protect the innocent):

RegionZ/
  Engineering&Technical/
    Research&Development/
      MajorLineOfBusiness/
        SolutionX/
          ProjectY/
            DocumentZ

That seems logical enough. However, there is a big difference between structuring a small set of content into folders and forcing something as large and diverse as a entire corporation into one. I have several problems with pushing the entire intranet into folders:

  1. The corporate structure changes constantly - and the intranet's folder structure doesn't keep up. This makes it very difficult to navigate up and down the folder tree. One day, you move "up" from your project to discover that the enclosing folder no longer has any relation to your project. In theory, the folder structure could be kept up to date, but this risks breaking everybody's bookmarks.
  2. Frankly, no one cares about the folder structure, no one uses it to find content. Most people are interested in a handful of projects, each project having one or more Livelink folders. They keep bookmarks to these folders. In effect, each real "project" (which is often one level up from the leaf nodes in the tree), is considered a destination in and of itself. The layers above "projects" have little content and are mostly irrelevant. The individual projects should be autonomous "websites" instead of buried in a pointless directory structure.

Finally, and most damningly, projects need far more than a folder to shove documents into. The majority of the content in Livelink exists as files referenced in simple directory listings - just a list of files in a directory. No context or explanation for the files is provided - beyond the file names themselves, of course. Livelink does have the capability to do "index.html"-like things, but no one uses it (perhaps because of the URL problem described below). How different this is from a real site! Ordinarily, you'd see a portal-like page explaining the purpose of this collection of information. The HTML pages would provide clues about what content is available and why. All this is missing from a typical Livelink project folder. Unless of course you can guess that "ProjDoc14.doc" is the starting point.

Terrible URLs

Livelink URLs look like this:

http://our.intranet.com/livelink/livelink.exe?func=ll&objId=9557908&objAction=browse&sort=name

Try keeping that URL in your head! Try to guess what sort of document that link addresses. Try to guess where in the hierarchy of the site that URL belongs. URLs like these violate every guideline for URL design I've ever seen:

  • They are loooooong
  • They are un-guessable and completely opaque
  • They do not follow the site structure
  • They do not allow upward navigation by chopping off the end of the URL

It would be a major improvement to have the URLs follow the directory structure, even though that would still give long URLs. For example:

http://our.intranet.com/livelink/eng_tech/r_d/line_of_business/solnX/projY/docZ.html

But even better would be to provide meaningful, project-oriented URLs. Perhaps something like: http://our.intranet.com/livelink/matts_big_project/.

On the plus side, I believe the Livelink URLs are persistent. The document or folder referenced by a particular object id is always accessible using the URL format above. Then again, I don't know of a way to redirect a URL to the location of a moved document, so maybe they're not that persistent.

No More Websites?

It is tremendously hard to do a traditional "website" hosted in Livelink. This is really a shame. How do we publish information to a wide audience of people in the 21st century? There's only one real answer: we build a website. What's a website anyway? I'd say it has the following characteristics:

  • It is addressable via HTTP at a well known, descriptive URL.
  • It has a default entry point, the "home page", that explains the site's purpose.
  • It is composed primarily of interlinked HTML pages.
  • It usually follows well-understood navigational conventions. Each page let's you know:
    • What site you are in
    • Where you are in the site
    • How to navigate to the main areas of the site

Obviously, a Livelink directory full of files provides none of those features. The CMS lacks a unity of style and purpose, and it has removed all context.

Why have very few people attempted to build a site within Livelink? Perhaps they are deterred by the URL problem. But I'm guessing the biggest reason is that the structure of Livelink, its accepted format as a directory tree of mixed file types, discourages them from even trying. From looking at randomly selected folders, you'd never know that building a site is possible.

No Dynamic Content

Livelink doesn't support user-written dynamic content. No CGI programs, no forms on web pages, not even server-side processing of documents (i.e., Apache's server-side-includes). Understandably, the IT department isn't fond of uncontrolled CGI scripts - they can easily introduce security holes. But there are real business needs for dynamic content and if you don't have hosting apart from Livelink, you're out of luck.

Permissions Problems

Livelink has access control lists enabled for every single folder and document. One has to login to Livelink to view any document, regardless of its ACL. ACL's are occasionally necessary for business reasons, but I believe most content could be safely shared with the entire corporation. I believe that, by default, a Livelink object should be viewable by all.

Setting the permissions for a folder can also be a tremendous time waster. Recently I put a document on Livelink and sent out the URL for review. Within minutes, I had several people call me complaining they where not authorized to view the document. According to the ACL list, the entire corporation could view the document. After some debugging, I discovered that "the entire corporation" doesn't include specific groups of overseas contractors. It took over an hour of trial and error to fix the permissions.

Finally, the ownership model imposed by Livelink is quite restrictive. If you upload a document, only you will be able to modify the permissions on a document. I don't know of a way to enable "group" ownership of a document.

Where's the HTML?

As mentioned several times by this point, Livelink encourages use of directories packed full of documents. Many of the documents uploaded are in formats not natively understood by the browser: Microsoft Office documents especially. There are obviously good reasons for sharing Excel spreadsheets, but Livelink does nothing to discourage posting Word documents - or any other proprietary textual document. What's the harm you ask?

  • Posting proprietary documents excludes them from search engines. Livelink has only limited capabilities to index Microsoft documents.
  • Proprietary documents have to be viewed in their proprietary applications. This doesn't cause much trouble for business people who all have Microsoft Office, but many of our engineers have Linux or HP-UX on their desktops.
  • Posting a document that could have been written in HTML incurs the opportunity cost of not helping build a more usable intranet.

There are over a dozen Regional Livelink servers; the root of every Livelink tree starts with a Region name. Searching across servers is not supported! If you don't know where a document is located, you may get to enjoy doing 12+ separate searches. Need I say more?

Returning to a HTML-based intranet would enable the search methodologies that work so well on the internet. Google offers a search appliance for the enterprise that I'd love to try.

Ugly is a Productivity Waste

I have one more criticism, and it may sound trivial, but I beg to disagree. Livelink is not a particularly attractive application. It is cluttered and generic looking. So what? An application that thousands of users, each spending thousands of hours in must be made attractive. Otherwise each user spends 1 or more seconds of their day thinking "Gosh, this is ugly". One second compounded a million times adds up to real productivity losses.

If They Made Me CTO...

In my opinion, a more useful intranet would feature content based primarily around people and projects, as opposed to teams and hierarchies.

People

Each employee would have a place on the web, with an obvious URL like http://our.intranet.com/people/mkeller, and resources for:

  • Hosting a blog or a simple homepage.
  • File storage for sharing documents of any sort.
  • A wiki for personal use.
  • A sharable web-based calendar (iCal).

Business are built around people, and people want to know who they are working with. I'm imagining a company where each employee keeps a little home page, containing (at the minimum) contact details, a list of their current projects, and a photo (we're a social species and really appreciate knowing faces!).

Some users would love a chance to blog, keep work notes in a wiki or publish a calendar. The IT department benefits by picking standard blogging, wiki and calendaring packages for everyone to use.

Projects

Perhaps I've become infected with David Allen -like thinking, but I see projects everywhere now. Sometimes a project is tackled by a single person, most often by a team. And what's the hardest part of any project? Keeping everybody in sync. I think everybody agrees that email is no longer getting it done. Groupware to the rescue? I think the following feature list is necessary in any online groupware solution:

  • A message board with comments.
  • A milestone tracking mechanism.
  • A project calendar that team members can subscribe to.
  • To-do lists that support assignment.
  • A wiki for project documents and notes.
  • A mechanism to show "recent changes" - RSS works great.

You can cobble together many of the elements above from individual sources, but the key component, "recent changes", is hard to achieve without an integrated system. The recent changes page, or RSS feed, let's everyone know what's been going on in the project: if there are new conversations on the message board, if wiki pages have been edited, if milestones have been added or moved. I've had good personal experiences with the groupware tool from 37Signals called Basecamp.

I dream of URLs like http://our.intranet.com/project/matts_big_project. I dream of an intranet where any employee can quickly create a new project site, give it a name, add some colleagues to the project and be off and running.

I Love HTML! Down With HTML!

I'm advocating giving much more control of our intranet content back to the employees. I'm sure there are some readers who might wonder if we'll just end up with the confused intranet that led to the adoption of a CMS in the first place. Thankfully, very few end users have to write HTML directly these days, very few have to design a site. Instead, we use ready-made tools like wikis and blogs which allow the user to enter simple text that gets converted to HTML and then published to nicely designed sites.

In my vision of the intranet, the IT team picks standard webapps that any employee can use to create content. The obvious applications are wikis, blogs, online calendars and a groupware package. These tools run on centrally located and managed servers, a real boon to the IT department. I think with just those applications, our employees would be much better served. If the choices of applications are wise, very few users will feel any need to have their own sites. Finally, the IT department gets a tremendous savings in maintenance costs when it has only to support a limited set of webapps as compared to a diverse number of custom-built sites.

---

That's it. I'm out of both criticism and pragmatism. Thanks for listening. I feel better now.

Technorati tags for this post:

< Future 10 | Past 10 >