To read the previous entry in this blog series, click here.

If, a week before moving out of my old house, someone had asked me to list off the top of my head every item that would be loaded onto the truck, and then asked me to reproduce that list a week after we moved in, my guess is the two lists would have varied from each other by 50% and even more than that from reality. Fortunately, my moving company painstakingly compiled page upon page of inventory, categorized by box where necessary, and assigned each item a unique identifier. This same inventory was then used by the move-in team to validate that all of our belongings had made it to the other end. Had such an inventory not been developed, I would have been forced to decide if I was willing to sign off on delivery having only eyeballed an overwhelming volume and variety of stuff, nearly all of it still in boxes.

In a content migration initiative, the typical source system presents an even greater challenge with respect to the quantity and nature of content that it contains. How many pages does the current site have? How many unique content items are displayed on those pages? How many documents does the site house or link to? How many cross-links will need to be updated if we change platforms? A roundabout trip through these questions will lead quickly to exasperation; an attempt to plan a content migration based on off-the-cuff assumptions will lead just as quickly to something worse. Order must be brought to the chaos via a methodically developed content inventory.

Based on knowledge gleaned during high-level planning, some understanding of the flavors of content that are in scope should exist when it comes time to actually build the content inventory. The most critical distinction among these flavors involves web content vs. document content. In some cases, content migration efforts encompass extensive reuse of existing web content - whether in HTML form or in a form that will be published via a content management system - as well as documents accessed by way of the source system. Sometimes the task is simpler and involves web content or document content but not both. Either way, a comprehensive list of all content items contained in the source system must be compiled so that each can be discretely defined as in or out of scope for migration and a holistic metadata migration strategy can be developed. To that end, accompanying the content item's file name should be all metadata that serves either of the following two purposes: the migration team's identification of the content, or the end user's ability to find and recognize the content once it lives in the target system. Consequently, content inventories typically end up being a table with many columns and even more rows; the attached image represents a sample Content Inventory table.

So where does a content inventory come from? The answer: some combination of source system capabilities, external tool capabilities, and elbow grease. Let's take a look at each of these at a high level:

  1. Source System Capabilities: Many content repositories offer the ability to export or otherwise produce an inventory… or at least the starting point for an inventory. Explore whether or not the source system can generate a list of content items, ideally with columns for relevant metadata (both intrinsic metadata such as file size and business-oriented metadata such as "Brand" or "Client"). Even if a crude list in a CSV is the best your source system can produce, automated generation of that list will greatly expedite the inventory process. The inventory can then be manually expanded and evolved to contain other relevant details of the content that the source system wasn't able to associate with the list of content items.
  2. External Tool Capabilities: Content Inventories can sometimes be fast-tracked via the use of an external tool that is capable of crawling a content repository and producing an inventory (or the beginning of one). For example, Metalogix offers products that are capable of crawling the contents of a SharePoint portal, a file share, or other source systems to identify all discrete content items and extract relevant metadata. Open source tools/facilities can also be leveraged to jump start a content inventory.
  3. Elbow Grease: Manual examination, sometimes with the help of custom scripts (Developers' version of "manual"), represents a final and often essential alternative for building a content inventory. Painful as it may be, page-by-page or folder-by-folder scrutiny of a legacy system can be necessary in some circumstances. Manual examination also yields a significant side benefit: the person or people who perform it become experts on the content in question, and the greater number of content experts that exists on a project, the shorter the path to quality will be.

It's worth noting that some level of manual examination of content will almost certainly be required to validate the success of any automated process executed in the development of a content inventory. Limitations of automated approaches are not always apparent up front, and "nested" content can complicate the effort to accurately develop an inventory via a crawl or export. On the other hand, the labor hours that can be saved with an automated approach make such a strategy worthy of significant consideration, even if some manual examination is necessary to ensure quality.

Detailed definition of the approach for content inventory development hinges on a variety of project variables. However it is developed, the completion of a content inventory should immediately be followed by the next step in the migration process: content cleanup, aka rationalization. My next post will address that important topic.