Sometimes it feels like the most neglected aspect of implementing a content management system is . . . the content. Content development and migration pose a variety of challenges, and we’ve’ve written a series of articles discussing these issues. This critical look at preparing content for the web will help you create a plan for implementing a CMS on time and on budget.
This first article defines content types that need to be accommodated by your CMS and analyzes common problems we’ve faced when it comes to corraling content. The second article in the series unfolds the roadmap to a successful CMS implementation: the Content Inventory. Our final pieces looks at content migration and the triage process often needed to migrate content in a timely manner.
What Is Content?
Content development and migration comprise a major phase of our development process. While some might argue that Information Architecture is the cornerstone of a successful CMS implementation, you cannot begin to develop a site’s IA until you’ve identified all of your content. For the purpose of this series, we’re defining content as:
- Articles: These are the text-based documents that will constitute the bulk of content in your CMS. As you analyze your content, you’ll find a number of discrete article types (news releases, agenda items, job listings, etc.) which may require special treatment. Many CMSs support a variety of content classes, and it’s important to identify article types early in the process in order to develop a workable strategy for dealing with them.
- Forms: This type of content usually involves the collection of information from your website’s users. Forms trigger an action that usually involves some sort of workflow. If you plan on converting paper forms to a web-based process, you will almost certainly need to do some level of business process analysis. When dealing with forms during the content development phase, it’s not uncommon to discover that many forms (or form fields) are out-of-date or no longer relevant. There’s nothing worse than wasting time developing a web-based process for a paper-based system that serves no puprose. In general, you should do your best to curb your client’s urge to automate bureaucracy.
You’ll notice we didn’t include external links in our definition of content types. For the purposes of populating a content management system, links are generally treated in the same manner as any other article.
Common Content Formats
Though businesses embraced PCs in earnest in the late 1980s, most organizations have a surprisingly difficult time producing usable digital content. As you begin inventorying your content, you’ll discover that it exists in many formats (or, in all too many cases, it doesn’t exist at all). Each format has special challenges to be considered as you migrate it to the web.
Hard Copy — Far more often than you’d imagine, the only available format for certain information is in hard copy. If you’re lucky, there will be a pristine master copy used to make duplicates as necessary. Chances are, the master copy will actually be a hundred generations old, last updated on a manual typewriter, mimeographed, and unreadable.
This type of physical content must be manually gathered (we use an expandable plastic interoffice-style envelope for this; we also use Post-Its to make additional notes about the document when appropriate). Surprisingly, locating actual copies of important physical documents can be a challenge. Your client’s staff will dig through desk drawers, scour filing cabinets, and, in desperation, pull brochures off racks and say, “Use this.” By “use”, they mean “use your magical web conversion machine to turn this into an attractive web page”. What this really means is that you’ll need to rekey or OCR the document to obtain usable text. How you should make this determination is covered in Part Three of this series.
We’ve also learned that documents existing only in hard copy are often outdated, if not downright obsolete. We strongly recommend reviewing hard copy documents with your client to determine relevance and accuracy before you spend three days scanning mountains of paper. We’ve discoverd instances were businesses hand their customers documents signed by employees who left the organization a decade ago. The web project is the first time many standard business documents are reviewed in-depth.
Electronic Files — Electronic files are created by programs such as Word, Excel, PowerPoint, Adobe Acrobat, PageMaker, or one of a zillion other applications. It is not unheard of for a client to produce a file requiring an application that hasn’t been commercially available for over a decade (yes, some people still have WordStar 2000 files). That you do not have, nor can you obtain, a copy of the application will come as a surprise to your client. Regardless of what format these electronic documents exist in, you will have to provide guidance in formatting them for migration to the web.
Though we create guidelines and instructions for collecting and providing content, we’ve found that clients usually organize the content they give us in ways that make the most sense to them. Most people believe their method for aggregating information is logical and helpful. Because your content inventory will likely identify content by owner, we suggest instructing the client to put content into folders named after content owners before turning the content over to you (this is usually done by file transfer or by providing CDs/DVDs full of documents).
But wait . . . once you have the file, how do you identify it? Sure, your client can name each file in accordance with the file name indicated on the content inventory, but what fun is that? Frankly, with some clients, actually receiving content is a challenge. Expecting usable, logical files names is too much to hope for — this is where good titles on your inventory come in handy. Remember: your client has been using naming conventions that are relevant to their business, and, especially when it comes to large projects, renaming the files is not practical.
Though it should go without saying, the electronic files you receive are not likely to be well-formatted. People have been improperly using word processors for nearly 20 years, and it’s not likely they will suddenly start using proper document styles anytime soon. This poses a variety of new challenges, including the fact that content cannot be copied and pasted from Word without significant clean-up. We address this challenge in other articles (see links below) and in Part Three of this series.
Our caveat about relevance and accuracy applies equally to electronic files.
- Ghost Content — It is inevitable that your client will identify content that should exist but doesn’t. Or they know it exists somewhere, but they just can’t find it. Or maybe they just haven’t gotten around to actually creating it yet. In these cases, someone will have to actually write the content. If you have writing skills, you can include content creation in your budget. If you don’t, keep your fingers crossed and hope that the ghost content isn’t destined for the home page.
Most organizations are excited about the prospect of a new website and eager to help you help them. However, most organizations don’t have a good grasp of the work involved in creating that website, especially when it comes to implementing a CMS. Your job is to keep the project on track. Understanding the core issues of content development and migration will allow you to keep things moving.
Next Up: The Content Inventory: Roadmap to a Succesful CMS Implementation