I recently completed a long-term effort that involved moving a client from a legacy publishing intranet portal to SharePoint 2010. The legacy platform was a highly customized, nearly unsupportable version of Vignette that had very little documentation, and subject matter knowledge was limited to a small group of content authors. As with any re-platforming effort, migration was a tricky component of the project that required selecting the proper automated tool and conducting tons of content analysis. In our case, this was especially important because OpenText acquired the legacy platform in 2009!

This post is the first of a four part series that delves into each phase of our migration effort.

Content Analysis

Content migration is an arduous process often overlooked when upgrading a CMS. Cutting corners, improper planning, blind in-place upgrades (e.g. a folder here, becomes a folder there), or lack of dedicated resources can derail the new system prior to end user adoption. A thorough content inventory is the proper starting point for a migration strategy where content hosts are "spidered" and the output is cataloged. Alternatively, screen-scraping mechanisms can be put in place to crawl the front-end presentation without touching content database sources. It is important to note that not all content should be considered in play for migration. A proper cleansing process should occur on the inventory to prevent duplicative content, identify stale content, and expose low fidelity metadata constructs. A thorough understanding from a technical perspective should be undertaken to craft specific content source strategies.

Content Inventory

We took a catalog of all content used or surfaced for the source CMS. This included content databases, file shares, and custom databases external to the environment. We captured this content through the use of "spiders" into a spreadsheet format with as much metadata attributes possible. This aided in sorting and filtering of large result sets for later processes.

Audit Content Inventory, Cleanse, Repeat

From the content inventory, we had a group of power users that highlighted stale, duplicative, or orphaned content. Multiple cleansing of the content prior to migration was undergone to reduce drastic affects on information architecture and thereby improved the user experience and discoverability.

Time and focus was dedicated to identifying and de-scoping all content that did not meet well-defined criteria for inclusion in the migration to the target system. This cleanup step typically yielded the benefit of reducing unnecessary effort during tagging, migration, and validation. Furthermore, "de-cluttering" of content in the target system ultimately benefited the new CMS by reducing content storage requirements, and aided in defining optimized retention/archiving policies in the target system based on an improved business understanding of real needs.

To realize these benefits, our audit focused on the following parameters:

  • Legal/regulatory requirements
  • Strategic relevance
  • When the content was created
  • When the content was last modified
  • When the content was last viewed/downloaded
  • The frequency with which the content has been viewed/downloaded
  • The format of the content and that format's interoperability with the target system
  • Metadata, location, corpus volume, and groupings of the legacy system data
  • Relationship of content in the source system and alignment of those components with the target system
  • Information Architecture modeling that will house and augment the source system content in the target system structures
  • Taxonomy structure, managed metadata, workflows, and retention policies (Enterprise Content Management)

Now that we had the content and corpus volume in play, the important decision of migration tool selection was initiated.

Choosing the Correct Tool

The migration tool analysis and selection process included our client at every turn. It is important that your client be on board early and often to discuss the pros and cons of each possible tool. When we selected a tool, these factors played a critical role:

  1. Cost and pricing models (subscription vs. yearly etc.)
  2. Impact on Migration Plan
  3. Ability for the tool to be flexible for the different types of content discovered in the Analysis phase
  4. Supportability of the tool's vendor

We initially leaned on our past knowledge when selecting a migration tool. However, it quickly became apparent that for this particular effort, our old stand-by copy and paste tools would not suffice. We needed something with more robust support than the standard two-paned target and source drag and drop tools typically used in SharePoint upgrades. After comparing a bevvy of tools, Kapow Katalyst was the clear winner. This tool provides a true Extract, Transform, and Load (ETL) process necessary to do database queries and transactions for additional metadata extractions, screen scraping techniques to pull sections of content from content types, and an IDE canvass that can automate migration. Also, you could pay as you go. For example if we were migrating for three months, taking two off for other development initiatives, we could utilize that model and get cost savings for our client.

Of course it did not hurt that the migration tool name when brought up during multiple client discussions would illicit certain "Kapow!" "Boom!" "Bam!" Batman responses! Despite the cool name, the proof would be in the execution, particularly the extraction component. But before we can get into content extraction, we need to first discuss staging the data.

Finishing Thoughts & Next Steps

We have almost completed the necessary stages of content analysis. Upon completion of content inventory we began working on the next three stages of analysis. There are more? Yes, and equally important. The first of which was defining the new metadata structure. We looked into what SharePoint 2010 provided in terms of better forms of information architecture and discoverability. Our intent was to turn information roadblocks into information assets. During this process, we defined the target template structure. In SharePoint's case, the content types, term sets, page layouts, site columns, inheritance, and the first semblance of a taxonomy was beginning to be born. Finally, we defined the high level migration plan. This served as a living document later refined into a micro-detailed minute-by-minute plan during production execution that coincided with our content freeze.

So what are the next steps in this series? We start getting our hands dirty with PowerShell, SQL Server 2008 R2, and even our good friend Excel! Click here for Part II of this series.