Category: Tech

Who Needs a CIO?

The Long Tail, in "Who Needs a CIO?" reinforces some of the points I made in my recent post My Intranet Sucks.

You might have expected, as I had, that most Chief Information Officers wanted to know about the latest trends in technology so they could keep ahead of the curve. Nothing of the sort. CIOs, it turns out, are mostly business people who have been given the thankless job of keeping the lights on, IT wise. And the best way to ensure that they stay on is to change as little as possible.

No wonder that the prevailing discussion among the university CIOs was about "relevance". Users no longer value what they do. It doesn't require a "C" title to keep fat pipes to the wide-open Internet open. A zillion free hosted services on the web have replaced the functionality of the IT departments service by service, just as minerals replace the cells in dinosaur bones. Talk of extinction was in the air, and rightly so.

Somehow I doubt that my IT department will "get out of the way" and let the employees, the real innovators in the company, use internet-based tools for "confidential" corporate business. Sigh.

Technorati tags for this post:

My Intranet Sucks

Yes, that's a very frank title. But I'm annoyed to distraction, and I need to get this off my chest. Here's the story of my company's intranet.

I write software for a multi-national telecom company. I'm going to avoid naming the company as I haven't researched their blog policy yet. We're a company that takes tremendous pride in our technological prowess - we've developed a number of truly innovative products, ancient and modern. We have an entire Class-A block of IP addresses, that's how long we've been involved with IP networking. We work with internet protocols day in and day out. But I'm really starting to wonder if we, as a company, understand the value of the internet, specifically the World Wide Web (when was the last time you saw that spelled out?). A company with this much technical know-how shouldn't have an intranet as terrible as ours.

Six years ago, our company had many, many internal web servers scattered about. While the servers themselves where managed by professional IT staffers, the content of these servers was "managed" by amateurs throughout the company. When I joined the company, I took one look at my team's site and volunteered to clean it up. Our site was typical of the content I saw on our intranet at that time: an unorganized and unmaintained mess of non-validating HTML pages, 25% of the links on these pages being dead. It took a month to mop up the mess and build something usable. Our site was tailored for the needs of two dozen people and wasn't particularly well-connected to related sites or corporate sites.

Multiply our messy site by approximately 1,000 and you have an idea of the scope and organization of the company's intranet. There was one shining beacon of hope however - there existed a half-decent search engine. If the content you where looking for existed, the search engine would almost certainly find it. Our intranet of that day wasn't pretty, but it had content, and you could find that content. In short, it worked.

Quite rightly, our IT department did not like managing several hundred web servers, each running different software. An announcement was made that we would begin a server consolidation program. At the same time, the powers-that-be chose a new Content Management System that would become the default storage system for all static media. They choose Livelink, from OpenText.com, for this task. Livelink has a long list of CMS features, but it's primary purpose is to store documents (HTML, Word, Excel, PDF, etc) in a hierarchical folder layout.

Let's start the rant, shall we?

Repeat after me: The web is NOT a folder of Microsoft office documents!

The web is a highly interlinked set of HTML documents that are accessible via HTTP and can be viewed in a web browser. Obviously, the web includes content types other than text/html, but the web page is what makes the web interesting and usable. You know now the web works: users click from page to page, using the contextual information on the page to decide what link to click next. Our corporate IT wizards have replaced a real web-like intranet with a web-accessible directory tree of documents, the majority of which aren't even in HTML.

Let's discuss how our new CMS ruined our intranet.

[Caveat: I suspect many of the criticisms I will heap upon the CMS are due to a misuse of Livelink as our intranet replacement. Perhaps Livelink is a perfectly fine product for some other use.]

The Hierarchical Folder Structure

At first glance it doesn't seem so bad to structure your intranet in a hierarchical structure. For example, one of my documents lives under a directory structure similar to this (names have been changed to protect the innocent):

RegionZ/
  Engineering&Technical/
    Research&Development/
      MajorLineOfBusiness/
        SolutionX/
          ProjectY/
            DocumentZ

That seems logical enough. However, there is a big difference between structuring a small set of content into folders and forcing something as large and diverse as a entire corporation into one. I have several problems with pushing the entire intranet into folders:

  1. The corporate structure changes constantly - and the intranet's folder structure doesn't keep up. This makes it very difficult to navigate up and down the folder tree. One day, you move "up" from your project to discover that the enclosing folder no longer has any relation to your project. In theory, the folder structure could be kept up to date, but this risks breaking everybody's bookmarks.
  2. Frankly, no one cares about the folder structure, no one uses it to find content. Most people are interested in a handful of projects, each project having one or more Livelink folders. They keep bookmarks to these folders. In effect, each real "project" (which is often one level up from the leaf nodes in the tree), is considered a destination in and of itself. The layers above "projects" have little content and are mostly irrelevant. The individual projects should be autonomous "websites" instead of buried in a pointless directory structure.

Finally, and most damningly, projects need far more than a folder to shove documents into. The majority of the content in Livelink exists as files referenced in simple directory listings - just a list of files in a directory. No context or explanation for the files is provided - beyond the file names themselves, of course. Livelink does have the capability to do "index.html"-like things, but no one uses it (perhaps because of the URL problem described below). How different this is from a real site! Ordinarily, you'd see a portal-like page explaining the purpose of this collection of information. The HTML pages would provide clues about what content is available and why. All this is missing from a typical Livelink project folder. Unless of course you can guess that "ProjDoc14.doc" is the starting point.

Terrible URLs

Livelink URLs look like this:

http://our.intranet.com/livelink/livelink.exe?func=ll&objId=9557908&objAction=browse&sort=name

Try keeping that URL in your head! Try to guess what sort of document that link addresses. Try to guess where in the hierarchy of the site that URL belongs. URLs like these violate every guideline for URL design I've ever seen:

  • They are loooooong
  • They are un-guessable and completely opaque
  • They do not follow the site structure
  • They do not allow upward navigation by chopping off the end of the URL

It would be a major improvement to have the URLs follow the directory structure, even though that would still give long URLs. For example:

http://our.intranet.com/livelink/eng_tech/r_d/line_of_business/solnX/projY/docZ.html

But even better would be to provide meaningful, project-oriented URLs. Perhaps something like: http://our.intranet.com/livelink/matts_big_project/.

On the plus side, I believe the Livelink URLs are persistent. The document or folder referenced by a particular object id is always accessible using the URL format above. Then again, I don't know of a way to redirect a URL to the location of a moved document, so maybe they're not that persistent.

No More Websites?

It is tremendously hard to do a traditional "website" hosted in Livelink. This is really a shame. How do we publish information to a wide audience of people in the 21st century? There's only one real answer: we build a website. What's a website anyway? I'd say it has the following characteristics:

  • It is addressable via HTTP at a well known, descriptive URL.
  • It has a default entry point, the "home page", that explains the site's purpose.
  • It is composed primarily of interlinked HTML pages.
  • It usually follows well-understood navigational conventions. Each page let's you know:
    • What site you are in
    • Where you are in the site
    • How to navigate to the main areas of the site

Obviously, a Livelink directory full of files provides none of those features. The CMS lacks a unity of style and purpose, and it has removed all context.

Why have very few people attempted to build a site within Livelink? Perhaps they are deterred by the URL problem. But I'm guessing the biggest reason is that the structure of Livelink, its accepted format as a directory tree of mixed file types, discourages them from even trying. From looking at randomly selected folders, you'd never know that building a site is possible.

No Dynamic Content

Livelink doesn't support user-written dynamic content. No CGI programs, no forms on web pages, not even server-side processing of documents (i.e., Apache's server-side-includes). Understandably, the IT department isn't fond of uncontrolled CGI scripts - they can easily introduce security holes. But there are real business needs for dynamic content and if you don't have hosting apart from Livelink, you're out of luck.

Permissions Problems

Livelink has access control lists enabled for every single folder and document. One has to login to Livelink to view any document, regardless of its ACL. ACL's are occasionally necessary for business reasons, but I believe most content could be safely shared with the entire corporation. I believe that, by default, a Livelink object should be viewable by all.

Setting the permissions for a folder can also be a tremendous time waster. Recently I put a document on Livelink and sent out the URL for review. Within minutes, I had several people call me complaining they where not authorized to view the document. According to the ACL list, the entire corporation could view the document. After some debugging, I discovered that "the entire corporation" doesn't include specific groups of overseas contractors. It took over an hour of trial and error to fix the permissions.

Finally, the ownership model imposed by Livelink is quite restrictive. If you upload a document, only you will be able to modify the permissions on a document. I don't know of a way to enable "group" ownership of a document.

Where's the HTML?

As mentioned several times by this point, Livelink encourages use of directories packed full of documents. Many of the documents uploaded are in formats not natively understood by the browser: Microsoft Office documents especially. There are obviously good reasons for sharing Excel spreadsheets, but Livelink does nothing to discourage posting Word documents - or any other proprietary textual document. What's the harm you ask?

  • Posting proprietary documents excludes them from search engines. Livelink has only limited capabilities to index Microsoft documents.
  • Proprietary documents have to be viewed in their proprietary applications. This doesn't cause much trouble for business people who all have Microsoft Office, but many of our engineers have Linux or HP-UX on their desktops.
  • Posting a document that could have been written in HTML incurs the opportunity cost of not helping build a more usable intranet.

There are over a dozen Regional Livelink servers; the root of every Livelink tree starts with a Region name. Searching across servers is not supported! If you don't know where a document is located, you may get to enjoy doing 12+ separate searches. Need I say more?

Returning to a HTML-based intranet would enable the search methodologies that work so well on the internet. Google offers a search appliance for the enterprise that I'd love to try.

Ugly is a Productivity Waste

I have one more criticism, and it may sound trivial, but I beg to disagree. Livelink is not a particularly attractive application. It is cluttered and generic looking. So what? An application that thousands of users, each spending thousands of hours in must be made attractive. Otherwise each user spends 1 or more seconds of their day thinking "Gosh, this is ugly". One second compounded a million times adds up to real productivity losses.

If They Made Me CTO...

In my opinion, a more useful intranet would feature content based primarily around people and projects, as opposed to teams and hierarchies.

People

Each employee would have a place on the web, with an obvious URL like http://our.intranet.com/people/mkeller, and resources for:

  • Hosting a blog or a simple homepage.
  • File storage for sharing documents of any sort.
  • A wiki for personal use.
  • A sharable web-based calendar (iCal).

Business are built around people, and people want to know who they are working with. I'm imagining a company where each employee keeps a little home page, containing (at the minimum) contact details, a list of their current projects, and a photo (we're a social species and really appreciate knowing faces!).

Some users would love a chance to blog, keep work notes in a wiki or publish a calendar. The IT department benefits by picking standard blogging, wiki and calendaring packages for everyone to use.

Projects

Perhaps I've become infected with David Allen -like thinking, but I see projects everywhere now. Sometimes a project is tackled by a single person, most often by a team. And what's the hardest part of any project? Keeping everybody in sync. I think everybody agrees that email is no longer getting it done. Groupware to the rescue? I think the following feature list is necessary in any online groupware solution:

  • A message board with comments.
  • A milestone tracking mechanism.
  • A project calendar that team members can subscribe to.
  • To-do lists that support assignment.
  • A wiki for project documents and notes.
  • A mechanism to show "recent changes" - RSS works great.

You can cobble together many of the elements above from individual sources, but the key component, "recent changes", is hard to achieve without an integrated system. The recent changes page, or RSS feed, let's everyone know what's been going on in the project: if there are new conversations on the message board, if wiki pages have been edited, if milestones have been added or moved. I've had good personal experiences with the groupware tool from 37Signals called Basecamp.

I dream of URLs like http://our.intranet.com/project/matts_big_project. I dream of an intranet where any employee can quickly create a new project site, give it a name, add some colleagues to the project and be off and running.

I Love HTML! Down With HTML!

I'm advocating giving much more control of our intranet content back to the employees. I'm sure there are some readers who might wonder if we'll just end up with the confused intranet that led to the adoption of a CMS in the first place. Thankfully, very few end users have to write HTML directly these days, very few have to design a site. Instead, we use ready-made tools like wikis and blogs which allow the user to enter simple text that gets converted to HTML and then published to nicely designed sites.

In my vision of the intranet, the IT team picks standard webapps that any employee can use to create content. The obvious applications are wikis, blogs, online calendars and a groupware package. These tools run on centrally located and managed servers, a real boon to the IT department. I think with just those applications, our employees would be much better served. If the choices of applications are wise, very few users will feel any need to have their own sites. Finally, the IT department gets a tremendous savings in maintenance costs when it has only to support a limited set of webapps as compared to a diverse number of custom-built sites.

---

That's it. I'm out of both criticism and pragmatism. Thanks for listening. I feel better now.

Technorati tags for this post:

Dreaming in Code

I recently picked up Dreaming in Code which chronicles the Chandler project while investigating the general difficulties of building software on time and under budget. I'll give the book an enthusiastic two thumbs up, with the caveat that the intended audience is the lay public, not those of us who write software daily.

My favorite chapter, Engineers and Artists, opens thusly:

From a panel of experts assembled at a conference on software engineering, here are some comments:

"We undoubtedly produce software by backward technologies."

"Particularly alarming is the seemingly unavoidable fallibility of large software, since a malfunction in an advanced hardware-software system can be a matter of life and death."

"We build systems like the Wright brothers built airplanes -- build the whole thing, push it off a cliff, let it crash, and start over again."

"Production of large software has become a scare item for management. By reputation, it is often an unprofitable morass, costly and unending."

"The problems of scale would not be so frightening if we could at least place limits beforehand on the effort and cost required to complete a software task... There is no theory which enables us to calculate limits on the size, performance, or complexity of software. There is, in many instances, no way event to specify in a logically tight way what the software product is supposed to do or how it is supposed to do it."

...

"Some people opine that any software system that cannot be completed by some four or five people within a year can never be completed."

I nodded along in agreement with each quotation. The author goes on to explain that the conference that produced the words above took place in, ... wait for it... 1968. The event was organized by NATO to address, in their words, "The Software Crisis".

Depending upon how your coding went today, you'll either be heartened by that, or depressed at the state of the field.

I take back my caveat, the book will entertain and provide solice to professional software people. And it should be required reading for anybody thinking about getting into software.

In case you didn't know it already, Rosenberg reminds us, software is hard. Yes it is, yes it is.

Technorati tags for this post:

Creative Commons Metadata

This site is licensed under a Creative Commons Attribution License, and I wanted to mark each page to somehow indicate that fact. CreativeCommons.org recommends that we mark our pages using RDF data embedded in our HTML (commented out). However, embedding the RDF in each page has a few downsides:
  1. It increases the size of each page by a several hundred bytes.
  2. In xhtml pages, it potentially hides the copyright information from xml parsers.
As an alternative (which I didn't invent, I'm just describing), I link to the copyright information with metadata attributes to indicate its role. The pages on this site use 2 methods of specifying the copyright. In the <head> section, I included a <link> to the copyright URL with a rel="copyright" attribute. The copyright link type is defined in HTML4. For example:

<link rel="copyright" href="http://creativecommons.org/licenses/by/2.5/" />

The second method can be used within the body of the document. My footer section includes a link to the CC license. I added a rel="license" attribute to that link. The license relation is a common microformat.

To guarantee that search engines understand the licensing of my site, I ran a few of my URLs through Creative Commons License Validator tool. Everything checks out!

Finally, I also wanted my RSS feed to be covered by the CC license. There is a Creative Commons RSS Module defined for this purpose: you insert a


<cc:license>http://creativecommons.org/licenses/by/2.5/</cc:license>

line into the <channel> section of your feed. I use XML::RSS to generate my feeds (this is a mod_perl/HTML::Mason site) to generate my feed, so I added code like this (new code is bolded):

my $rss = new XML::RSS (version => '1.0'); 

# add the creative commons namespace
$rss->add_module(prefix=>'cc', uri=>'http://web.resource.org/cc/');

$rss->channel(
       title        => "littleredbat/mk: blog",
       link         => "http://www.littleredbat.net/mk/blog/",
       description  => "Matt Keller's Blog",
       dc           => { language => 'en-us', },
       cc           => { license => 'http://creativecommons.org/licenses/by/2.5/', },
     );

There you have it - my site has been CreativeCommonsIfied.

Technorati tags for this post:

Redesign!

It's looking rather fresh around here, isn't it? Perhaps it was the spring weather that brought it on, but I've reworked the xhtml and css underlying this site.

Here's what's new:

  • Fresh new colors!: blue, green and red. The header graphic is always going to be in black and white to allow maximum flexibility in color choice.
  • New header graphics: black and white images that tickle me.
  • Site menu: Where are you? You're either Home, in the Blog, in the Articles, in the Photos, or at the About page. The site menu will let you know.
  • Variable sidebar: the sidebar's contents now change with the section.
  • Search w/ Google: as much as I like coding, it's hard to beat google at the text indexing game.
  • A Fluid layout: go ahead and resize your browser window or increase the text size! The site should be pretty accomodating.
  • Better source ordering: the "contents" of a page now show up before the sidebar in the html source. This is nice for text-based browsers or folks who don't have CSS.

The design was partialy inspired by Tim Bray's site. I liked his clean look.

Enjoy!

Technorati tags for this post:

How to fix Subversion errors after upgrading your Berkeley DB library

After a routine "apt-get upgrade" of Debian testing, I found myself unable to use my Subversion repository. I got an error message when trying to commit a file:

svn: Berkeley DB error while opening environment for filesystem db:
DB_VERSION_MISMATCH: Database environment version mismatch
svn: bdb: Program version 4.3 doesn't match environment version

A note from the Subversion FAQ had this to say:

After upgrading to Berkeley DB 4.3, I'm seeing repository errors.

Normally one can simply run svnadmin recover to upgrade a Berkeley DB repository in-place. However, due to a bug in the way this command invokes the db_recover() API, this won't work correctly when upgrading from BDB 4.0/4.1/4.2 to BDB 4.3.

Use this procedure to upgrade your repository in-place to BDB 4.3:

  • Make sure no process is accessing the repository (stop Apache, svnserve, restrict access via file://, svnlook, svnadmin, etc.)
  • Using an older svnadmin binary (that is, linked to an older BerkeleyDB):
    1. Recover the repository: 'svnadmin recover /path/to/repository'
    2. Make a backup of the repository.
    3. Delete all unused log files. You can see them by running 'svnadmin list-unused-dblogs /path/to/repeository'
    4. Delete the shared-memory files. These are files in the repository's db/ directory, of the form __db.00*

The repository is now usable by Berkeley DB 4.3.

As the instructions note, you need a copy of subversion linked with a pre-4.3 version of the Berkeley database library. Subversion uses Berkeley via the APR Library. So we need to install appropriate verions of Berkeley, APR and Subversion.

My notes are below. Note that I installed the APR and Subversion software into a local directory (/home/mk/proj/svn_db/local in my case). Also, my Subversion repository is in /data/svnroot.

# export LD_CONFIG_PATH=/home/mk/proj/svn_db/local

# wget 'http://downloads.sleepycat.com/db-4.2.52.tar.gz'
# tar -xvzf db-4.2.52.tar.gz
# cd db-4.2.52
# cd build_unix
# ../dist/configure
# make
# make install

# wget 'http://archive.apache.org/dist/apr/apr-0.9.5.tar.gz'
# tar -xvzf apr-0.9.5.tar.gz
# cd apr-0.9.5
# ./configure --prefix=/home/mk/proj/svn_db/local
# make
# make install

# wget 'http://archive.apache.org/dist/apr/apr-util-0.9.5.tar.gz'
# tar -xvzf apr-util-0.9.5.tar.gz
# cd apr-util-0.9.5
# ./configure --prefix=/home/mk/proj/svn_db/local --with-apr=/home/mk/proj/svn_db/local --with-berkeley-db=/usr/local/BerkeleyDB.4.2/

# make
# make install

# wget 'http://subversion.tigris.org/downloads/subversion-1.2.3.tar.bz2'
# tar -xvjf subversion-1.2.3.tar.bz2
# cd subversion-1.2.3
# ./configure --prefix=/home/mk/proj/svn_db/local --with-apr=/home/mk/proj/svn_db/local --with-berkeley-db=/usr/local/BerkeleyDB.4.2/
# make
# make install

# su
# /home/mk/proj/svn_db/local/bin/svnadmin recover /data/svnroot
# tar -cvf ~/svnroot_backup.tar /data/svnroot

Then I executed steps 3 and 4 from the FAQ. At this point, I was able to commit files to my repository again.

Link to story

Technorati tags for this post:

Vim: using views automagically

I use Vim Outliner for note-taking. It is an extremely useful tool. However, my files can get long. Vim's folding features are a great way to deal with that: I simply fold up all the irrelevant sections.

What I really wanted was a way to preserve my folds over editing sessions. I found this gem in the vim help documents:

autocmd BufWinLeave *.otl mkview
autocmd BufWinEnter *.otl silent loadview
This automatically executes mkview on leaving a buffer and loadview on entering a buffer -- but only for outline (*.otl) files.

Ah, the joys of a industrial strength text editor!

Technorati tags for this post:

strlcpy

Why doesn't GNU libc have strlcpy and strlcat?

Link to story

Technorati tags for this post:

Tools Roundup

Tools, tools and more tools for the codemonkey! Here's a little summary of some of the tools I've been using lately.

SCM and Web Tools:

  • ViewCvs: We all love doing "cvs log file.C | more" and then "cvs diff -r1.12 -r1.13 file.C", but I admit that it's really nice to be able to browse the history of your project online. Viewcvs works with cvs and subversion and is used by no less an authority than sourceforge.net.
  • Enscript: I discovered this as it is used by viewcvs to syntax highlight and color code for online viewing. I tend to use it like "encript --language=html --highlight --color=1 -t 'Logger.C' -p Logger.html Logger.C".
  • eSVN: a really nice looking GUI frontend for subversion.
  • GraphViv defines a little language for describing graphs (DAG's and the like). There are utilities for converting the descriptions to beautiful images. See the gallery for examples. GraphViz is so pretty I'll be on the lookout for interesting datasets just so I can graph them.
Code Auditing Tools:
  • Flawfinder and Splint are static program checkers that flag uses of "unsafe" functions (like strcpy). I wish the use of tools like this would become common during development.
  • http://www.daemonkitty.net/lurene/papers/Audit.pdf: OpenBSD continuously audits its codebase, fixing bugs. When a new bug, or class of bugs is found, the entire code base gets re-audited looking for other instances of the bug. There's wisdom in that! We all know about buffer overflows and string format bugs, but what else have the OpenBSD team been fixing? Certainly you could review the security patches they've issued, or watch the changes to the HEAD of their CVS repo. The paper above is a shortcut: it lists the major flaws found in OpenBSD software.

Technorati tags for this post:

Lessig Strikes Again

Larry Lessig is revising his phenomenal book Code and Other Laws of Cyberspace -- but he's not doing it alone. Instead, we posted the book to a wiki where anyone can contribute!

From the site:

Lawrence Lessig first published Code and Other Laws of Cyberspace in 1999. After five years in print and five years of changes in law, technology, and the context in which they reside, Code needs an update. But rather than do this alone, Professor Lessig is using this wiki to open the editing process to all, to draw upon the creativity and knowledge of the community. This is an online, collaborative book update; a first of its kind.

Once the the project nears completion, Professor Lessig will take the contents of this wiki and ready it for publication. The resulting book, Code v.2, will be published in late 2005 by Basic Books. All royalties, including the book advance, will be donated to Creative Commons.

Way cool, Larry!

Link to story

Technorati tags for this post:

< Future 10 | Past 10 >