After being involved with recent Alfresco-based enterprise portals, I?ve begun reflecting on best practices in regards to integration with the popular open-source CMS. This article describes some learnings on integrating Alfresco?s Document Management (DM) and Web Content Management (WCM) repositories with an enterprise portal.Refer to Jeff Pott?s blog post for ahigh-level comparison between WCM and DM.

We started out simply using DM to store HTML snippets that were rendered in a simple portlet. When we went further it was obvious that DM did not address basic web content concepts that most portals utilize such as structured content authoring and workflow. Our engineer contacts at Alfresco encouraged us to consider their WCM product. To satisfy security requirements we typically use DM for documents; however, we have leveraged WCM for web content because of the following features (in order of appeal):

  • Web Forms: Forms that capture content from authors and store it in XML. The XML can be used to create one or more renditions per content item using XSLT or FreeMarker.
  • Sandboxes: Each content author receives an isolated view of the repository allowing them to make changes without having to worry about disrupting the live content.
  • Workflow: WCM has an out-of-the-box "submit for approval" workflow that suited our needs.
  • Snapshots: The ability to rollback your content to any previous state.

Our clients typically want a unified portal and content authoring experience and do not want the authors to open the Alfresco Web Client (UI). We have successfully integrated both Alfresco's DM and WCM capabilities into the portal user experience. Below are my high-level observations for using WCM instead of DM for web content in a portal platform.

Authoring content

Web Forms are a very powerful feature of WCM. Unfortunately, in portal there is no elegant way to import this existing authoring functionality. The XForms technology (such as Chiba and Orbeon) that Alfresco uses for authoring forms proved to be difficult to incorporate in portal because those frameworks are developed primarily for servlet- based applications. We found a way to load a frameless (no navigation) authoring form from the Web Client which can be displayed using an IFRAME but it 1) did not mesh with the portal?s theme and 2) was possible for the user to be directed to a full Alfresco UI with navigation. Even if we managed to integrate Web Forms natively, the content editor does not support linking to documents residing in DM or portal pages. We needed this capability from both WYSIWYG and plain string fields. With that, we produced our own Web Form functionality in portal. We used custom URL patterns to denote links to portal pages and documents and provided authors a linking interface in the content editor. One downside is that in order to create these links without having to type them in manually, authors are forced to use the portal unless they know the custom link syntax.

Rendering content

Another important concept with Web Forms is that each web content item has an XML file, where the authored content is stored. When this content is saved, one or many renditions (HTML files, etc.) are generated. The pre-generated renditions are advantageous because the content transformation is taken care of for you automatically. On the downside, dealing with several files can complicate your portlet logic such as having to submit the XML file and renditions to workflow. Even when using theXMLMetadataExtractor, you still need to make separate queries for metadata and rendered content, which can hurt performance. For example you might want to show the content title as the portlet title and the rendered HTML as the body. Normally, this requires two service calls per portlet; however, we placed select metadata in custom HTTP response headers to minimize round trips. As content changes infrequently, we leveraged caching and ETags to improve performance.

Linking to content

When storing pointers to content we used itsnoderef, which acts as the ID of that content. This approach was intended to prevent broken links if content items moved to a different folder. One major difference between WCM and DM is that DM generates a unique ID and WCM uses the path. This means that if WCM content items are moved they receive new noderefs and the old noderefs are invalid and any portlet that is loading this content by noderef will break. DM content noderefs are not vulnerable to relocation but still create problems when the content is deleted. Noderefs have characters that must be escaped if passed in a query string. Generally WCM noderefs require more careful handling than DM noderefs.

Publishing content

Initially we thought that WCM sandboxes would be beneficial from a multi-user perspective because authors would be able to edit content in an isolated area. Instead, we found that sandboxes behave similarly to the DM repository. While content is being drafted it becomes locked preventing other authors from editing it. This means that sandboxes still follow a "check-out, cancel check-out, check-in" model. We assumed that sandboxes could handle multiple authors editing the same content items and be able to handle conflicts, much like Subversion. Alternatively, sandboxes facilitated showing the portal in "published" and "draft" states. Authors can even grant access to other users to view and comment on their draft outside of the rigid approval process. This was accomplished by parameterizing our Web Scripts to handle sandboxes. While I think the same can be accomplished in DM, it was seamless in WCM.

Reverting Content

One consideration about sandbox snapshots is that typically a user might submit several changes at once via the Web Client. Depending on your portal implementation, submissions may only contain portlet specific changes; approximately 2 files (XML and renditions). This means smaller and more frequent snapshots will be created and can result in more clutter in the snapshot view of Alfresco. Should the need to revert content arise, it can be difficult to pinpoint the desired snapshot. This is still beneficial to have and is not included in DM without custom development.

General observations

We found that many of the functions that are part of the WCM repository do not work from the Web Script/JavaScript API. For example calling "save" on an AVM (WCM) node does not trigger rendition generation or XML metadata extraction. Also the current submission APIs bypass the workflow engine and submit the modified items directly into the staging sandbox without approver intervention. To get around all of these obstacles we leveraged the internal Alfresco Web Client code. Read Ron DiFrango?s article on theJSF trick to hook into the Web Client code. Lastly, as of version 3.1 SP1, commonly used WebScripts that return child nodes and metadata were about 66% faster in DM versus WCM. When using WCM you must be very conscious of performance.

Conclusion

So what repository type is best for you? Between WCM and DM, there is not one size that fits all. Generally, I would recommend using WCM when:

  • Your authors are comfortable using the Alfresco Web Client and Web Forms
  • Minimal time should be spent creating a robust presentation tier
  • You need the ability to rollback change sets
  • You only need to link to WCM content and basic URLs
  • Need to have a staging environment where content is reviewed before it is published
  • Simple "submit for approval" workflow is sufficient

The diagram below shows a possible physical architecture for a WCM solution.

I would consider using DM when:

  • Authoring and workflow functionality must be exposed via the portal
  • Developers have experience with XML/XSLT, WYSIWIG editors, and jBPM
  • You can spend time up front designing a robust presentation tier on top of Alfresco
  • A custom solution is desired and the overall solution should be kept simple
  • Need flexibility in order to address future requirements

In conclusion, there is no content management system that cleanly plugs into an enterprise portal. Alfresco provides several approaches for portal integration but in any case you will need to provide at lease some glue code. As far as choosing between WCM and DM, I would primarily base the decision on whether or not you want the authoring and publishing experience to be integrated with your presentation tier. As Alfresco continues to evolve their products it will be interesting to see how they make their systems a better constituent of portal.