The sad truth about many CMS implementations is that not nearly enough time is spent working with content. All too often, organizations get wrapped up in issues related to technology and design, forgetting what the system is supposed to be managing.
It’s easy to avoid focusing on content when you have no idea what it is or where it is. CMS vendors focus on the user-friendly aspects of their software; rarely do they address the complexities of content migration. Since the system you’re implementing is supposed to make content management a breeze, there’s a temptation to avoid thinking about content until the very last minute. This flawed assumption has doomed all too many CMS projects.
Our first article defined content in all of its many forms. This article focuses on the nuts and bolts of identifying content and coralling it in such a way that you have what you need when it comes time to populate your CMS. The key to achieving this goal is a process called the Content Inventory.
What Is Included In A Content Inventory?
Content Inventories almost always take longer than anticipated. We try to start this phase of the project as early as possible, knowing we can iterate the inventory as new information about the project is revealed. Your inventory serves as the starting point for developing your IA and as the roadmap for ensuring all content is migrated to the new site. While many documents are created in the course of a web development project, chances are your content inventory will become your primary tool and reference point.
This inventory includes:
- Content on the current website. Every item. Without exception.
- Content to be migrated to the new website. Every item. It’s easier to scale back than to add on.
- Web-based applications or transactional systems to be integrated with the new website.
Web-based applications or third-party systems are not necessarily content, but should be noted on the inventory for informational purposes. These applications often require placeholder content or links to external systems. If these systems are not included on the roadmap, they might be overlooked in the final rush to get everything done before the specified launch date.
For each item on your content inventory, you need to capture (at a bare minimum) the following information:
- Description: The title of the content item.
- Content Owner: This is generally the department or individual responsible for the content item.
- Content Type: Content types include articles, forms, and systems.
- Format: Formats include hard copy, electronic file (Word/Excel/PDF, etc.), and links.
- Location: This is the location of the content item. Locations include URLs, the hard drive of an individual, a shared directory, or file drawer in the basement of the building.
- Update Frequency: You might want to include “Frequent”, “Sometimes”, or “Rarely” flags in your inventory. Frequently updated content generally requires a higher profile in the navigation schema.
- Status: The status of content can be current, obsolete (generally content on the current site that will not be migrated), or to be created.
- General Notes: If the content requires special treatment or you have additional information to capture, note it in this section of the content inventory.
Creating The Content Inventory
While we expect our clients to shoulder their share of the burden when it comes to the content inventory, we typically take a first pass by inventorying all content on the existing website; we do this by methodically following every link on every page, in order. This approach allows us to spend quality time with the client’s existing content and leads to a deeper understanding of the client’s business.
Once we’ve seeded the inventory with the existing web content, we provide the inventory worksheets to each department and allow the client’s staff to add any additional content they want to include on their new website. This requires that the client be as specific as possible. You need to know if you’re migrating two agendas or 25. “Various documents” is not a content item — the content migration is not the time to work out the details of what was meant by “various”.
Once the client has completed their work, we analyze the entire content inventory and follow up on the inevitable questions that arise. This follow-up is done via email, phone, or small meetings with key staff. At this point, we’ll iterate on the inventory with the client until we’re sure we’ve captured everything that needs to be included. This process includes tasks like identifying gaps in the content and coming up with appropriate titles for articles.
Content inventories, especially for information-rich sites, can get quite large. A spreadsheet allows you to sort and filter the data into multiple views as needed. Once we’ve completed the initial capture of information, we expand the inventory to include keywords, target customers (especially helpful for developing a user-centric navigation system), and other information relevant to the migration. This content-related data is a crucial part of the raw material used to develop the site’s information architecture.
We said in our first article that the first step to developing an Information Architecture is identifying a website’s content. Once you have an understanding of the type of content that will be included on the new website, you can begin to figure out how all of the pieces will fit together.
For example, if you’re building a site for local government, you’ll notice that various departments have resources targeted toward businesses. In a physical City Hall, business owners may be required to visit multiple departments in order to perform various tasks. As you develop your IA, you can target this customer group by aggregating business-focused content into a common area and organize that area around the various tasks that business users would want to accomplish. The result of this process is a customer-oriented site structure improves the overall user experience.
Once the IA is complete, we add one more column to our inventory: location on the new website. Knowing where things go in advance allows the entire migration process to proceed rapidly. Our worksheet allows us to easily sort by new location and build each section of the site in an orderly manner.
There are many approaches to the content inventory process, and you will ultimately find a system that works best for you. The first step, of course, is creating a usable spreadsheet. Here’s an Excel worksheet that you can use as a starting point. Remember, a content inventory template is just that; you’ll note that ours includes columns for language and audience. This is information we need to capture for the majority of our clients. Different projects have different needs, and you may need to modify this template on a case-by-case basis.
Analyzing The Content Inventory
While individual departments can create their initial inventories, one person should be responsible for analyzing and understanding every item on the completed inventory, from its purpose to its relationship to other content. Employing too many analysts leads to big picture items being missed. For example, some sites will have the same content item posted in multiple locations. A good CMS streamlines the process of managing one piece of information in multiple places; however, the content analyst needs to be aware that this duplicate content exists (we usually indicate this in the Notes section). Otherwise, you will negate the benefits of a CMS. A single content item is easy to maintain; the same item in multiple locations means that someone needs to remember to update the item in every location, every time the document is updated.
This analysis of the content inventory also aids in identifying and understanding various types of content (content classes) that need to be accommodated by the CMS. Articles can take on many forms: news releases, public notices, agendas, FAQs, or even photo galleries. Good content management systems handle a variety of content classes, and identifying the article classes early in the project aids the design process, information architecture, and CMS implentation.
Adaptive Path’s Jeffrey Veen has called content inventories “mind-numbing” (Doing a Content Inventory (Or, A Mind-Numbingly Detailed Odyssey Through Your Web Site)).
There are benefits to this tedium. By the time you’re finished, you will have a better understanding of your client’s business and, likely, a lot of ideas about how to better organize content. You will discover relationships between content that can be maximized. And you will walk into the Information Architecture phase of your project with confidence.
Next Up: Analyzing and Preparing Content for Migration to Your CMS
Related Articles:
How practical is it to inventory a site with more than 36,000 pages?
Michael, the question is: how practical is it to not do the inventory? If you’re doing a CMS implementation and migrating all that content, I believe you have to do an inventory. For a site that large, some of the work can be automated, but you’ll still have to find out what you have. How much of what is currently on the site is relevant, how much is duplicated, how much could be better organized?
If you try to migrate 36,000 pieces of content without a plan, chances are that you’ll end up with a mess (I speak from experience here!).
Kassia Krozser, a very fair assessment.
We’re currently in the early stages of implementing a CMS here and content migration is going to be a big issue. With more than 36,000 pages, this manually managed site has become unwieldly and I know we’re going to find the content migration difficult.
I think I’ll have to present a copy of this article to management and try and push the point across to them, because they seem far too wrapped up in the technical issues of the implementation.
This is why we start the content inventory as early as possible. Quite often, it reveals things that need to be considered in the technical implementation. Nobody likes doing this stuff (okay, I like doing this but that’s how my mind works), but it really does make a difference. I hope you can talk to your management soon — there’s nothing less fun than a CMS implementation gone wrong; fixing problems is always more costly.
We’ll be posting an article on what I like to call content migration triage — assessing content and making practical determinations on how to handle it in the new system — soon. That might help as well.
If you have the resources to inventory all 36000 pages, hey that’s great. If you don’t, I think you’re better off using access logs to figure out the top 10-20% most visited pages, and focusing your inventory efforts on these. Dump everything else into a searchable archive. When the new site goes live, keep an eye on your search and access logs to see if any of the archived content needs to be moved back into the site proper. Eventually decommission the archived content.
Not the best way of doing things, but if you don’t have the necessary resources, what can you do?
Anytime I hear someone talking about 36,000 content items I begin to wonder if maybe they might actually need a document management system as opposed to a web content management system. While there are systems that try to pass themselves off as both, they’re really two entirely different beasts. A big mistake we frequently see is clients wanting to use a CMS to solve document management problems.
I agree with Jesse in theory. You may find that quite a large majority of your 36,000 documents are not accessed on a regular basis. However, I wouldn’t necessarily depend on my log files to tell me which documents are the most in demand by my site’s users. There’s no guarantee that the content they’re finding is actually the content they’re looking for. Logs can be misleading that way. They only tell you what users have found. The don’t really tell you what the users actually had in mind when they came to your site.
I’m in the middle of implementing a CMS migration for a client that really needs a DMS, so I feel your pain Michael. The wonderful thing for me is that I just have to specify how new content goes in, which is to say, the static files. So I have very few content items to ID and my client handles the migration to the new channel.
But Michael, you really should be ID’ing only a few content types which have an attribute of a static file, which is the uploaded document. You can inventory all the individual pages for channel distribution, which is part of your migration plan, but that shouldn’t be part of your initial content type creation.
What is a DMS?
Keith, A DMS is a document management system. While others will have better, more technical definitions, I think of them as large electronic libraries. They are geared toward managing large collections of items that would have formally been kept in paper files. Some convert items to a universal format like PDF, some have access restrictions, some have limited workflow. While they are generally designed to be used internally within an organization, aspects of the system can also be pushed to a website.
Working with Excel is difficult for the ID column. I enter each item like so… 2.0.0, 2.10, 2.1.1.0, 2.1.1.1., 2.1.1.2, etc…
However if I want to later add an item between 2.1.1.1 and 2.1.1.2, and then have to manually renumber all rows subsequent to 2.1.1.2.
Any way for the x.x.x.x numbering scheme to be automatic in Excel, and for all items to renumber upon an insert or removal of a row?
I’ve found it really hard to go back and renumber things using Excel. It, for some reason, doesn’t have a smart outline mode (at least that I’ve been able to discern — maybe someone else has figured this one out).
The best solution I’ve come up with to leave numbering until last, when possible, and to use the autofill feature for renumbering specific sections.
Isn’t it possible to use Words and the index option, then use tabs to separate columns?
What happens if you then convert this text into a table?
Tomas, Word really isn’t an option in my experience. I’ve tried it once, and will never do it again. Word documents are difficult to handle as they get larger(and they do content inventories do get large).
If I feel I’m out of control and elements may pop up, I leave numbers in between like this: 2.00.00; 2.10.00; 2.20.00; 2.20.10; etc. That way I can insert 2.15.00 if something pops up. Next best solution
Can you explain why we need to identify every content. What significance it has got esp. for a CMS? I had always thought that in a CMS we need to understand only the types of content and their general schema… and the no of individual items based on the schema is typically populated by content users… so how will it affect the implementation?
Uma, if you really know your content very well, then you may be able to get away with only document the content types. However, it’s been my experience that those implementing CMS systems are typically not in touch with all of the various content types in any organization. Conversely, the users who really know the content (domain experts) usually aren’t involved in the CMS implementation. The content inventory is a chance for the implementation team to collaborate with the domain experts in order to uncover all of the various content types, as well as exceptions and variations. Essentially, the content inventory is a form of content type discovery.