The SpoolCast with Jared Spool

The SpoolCast has been bringing UX learning to designers’ ears around the world since 2005. Dozens and dozens of hours of Jared Spool interviewing some of the greatest minds in design are available for you to explore.

Episode #235 Stephanie Lemieux - Using Taxonomy to Manage Content Sprawl

February 19, 2014  ·  30 minutes

Listen Now

Download the MP3

Ultimately, your content is the reason users visit your site. Taxonomy can build a structure underneath that content, making it much more dynamic. By employing a layer of taxonomy, your CMS can better understand the relationships between the content. This allows you to easily surface related content, dynamically display bits of information, and improve your users’ experience.

Show Notes

Stephanie Lemieux discusses her approach to taxonomy in her virtual seminar, Managing Content Sprawl. In it she shows how to bake taxonomy into your content model and information architecture. During the live seminar, the audience had a bunch of great questions for Stephanie. She joins Adam Churchill for this podcast to answer some of those questions.

  • When you start a taxonomy, how do you deal with legacy content?
  • Are there tools available to make this process easier?
  • Is there an open source CMS that supports hierarchical taxonomies?
  • How do you do dynamic pages in SharePoint?
  • Are there drawbacks or limitations to using a taxonomy?

Full Transcript

Adam Churchill: Welcome everyone to another edition of the SpoolCast.

Last month Stephanie Lemieux joined us to present her virtual seminar, "Managing Content Sprawl." Stephanie's seminar along with over 120 others that teach the tools and techniques that you need to create great design is now part of the UIE User Experience Training Library, something we're calling UIEs All You Can Learn.

In this seminar, Stephanie explains that information architecture is more than just navigation or structure. It's how your users find you. It's how they understand you and continue interacting with your organization over time.

She makes the case that if flexibility in content publishing is a key goal for your team, then it's time to try taxonomy-driven design. Hey, Stephanie. Thanks for making some more time for us.
Stephanie Lemieux: You're quite welcome, Adam.
Adam: For those that weren't with us for your seminar that day, can you give us an overview?
Stephanie: I feel really bad for them, because it was hella awesome. [laughs]
Adam: It was awesome. [chuckles]
Stephanie: Basically, the crux of the presentation was that using taxonomy to create structure underneath your content lets your content have a whole different life that is way more dynamic and lets you do all sorts of cool stuff in terms of user experience.

Essentially, what you're doing by adding a layer of taxonomy into the back-end information architecture, you're almost creating a logical information architecture rather than a physical one. You're creating all these semantic hooks on your content that help your CMS and other parts of your back-end infrastructure understand the meaning of the content.

You can use those hooks in order to dynamically display content, to have faceted navigation, to have automated related content -- all sorts of really cool sort of UX things.

We covered a lot of those different types of use cases. We definitely talked about navigation extensively and talked about how pure, physical navigation is completely different from taxonomy. I taught you how you can use taxonomy, however, to improve your navigation and to have dynamic bits and pieces in your navigation. The classic example of that is full-on faceted navigation which you see a lot in e-commerce websites.

We also talked about how you can use taxonomy to do related content, and this is used heavily in the publishing world. This is instead of doing things like behavior-based related content -- if you bought this thing, you might be interested in this other thing.

You can also do that based on tags or taxonomy, so we went through a few example of those. There's some fairly complex ways of doing that as well that some big companies like the BBC do, using things called ontologies, which are really just really mega-taxonomies.

We also talked about using taxonomy to create dynamic topic pages. Topic pages are not a new thing. People have been trying to do topic pages for a long time, and they tend to be more or less successful, but you can create some really nice, targeted topic pages using a taxonomy by tagging your content with taxonomy and then creating custom views of your content on a page to say, "Go get all of the content tagged with this topic and show the latest three in this part of the page, and then go get the three latest news items on this topic and show it in this part of the page."

You can basically build a whole page without having humans have to go in there and manually update them. All of this gets, again, driven by that logical layer of taxonomy architecture in the background.

We talked a little bit about how some content management systems either make it harder for you to do this or make it easier for you to do this, even some open-source ones, like Drupal, and we talked a little bit about how SharePoint has gotten their head out of the sand and started making some of this cool taxonomy stuff quite possible as well.

Then, last, we talked about using taxonomy to do audience targeting. Much like you can do those dynamic topic pages, you can also do audience-focused pages using the same technique, basically, by creating an audience taxonomy or letting people self-identify which topics they are interested in and then creating a personalized either newsletter or landing page.

The example we gave in the session was Forrester's. If you go to, you see this cool ability to enter and see all the research based on your role, whether you're a CIO or a CMO or so on and so forth.

Then we wrapped up the session by talking about how user experience professionals, such as interface designers and content architects and information architects, can understand more about taxonomy and help bring it to that table, where you have both your front-end design people and your back-end design people starting to speak a common language and integrating that into the various deliverables.

We talked a little bit about how to build a taxonomy framework to support those kinds of functionalities, and we talked about how to bake taxonomy into your content model and information architecture.

I think that was pretty much enough. [laughs]
Adam: I'm sure you remember, we had a very engaged audience, and there were lots of great questions, but let's get to some of the ones that were left over. One just certainly wanted you to say a bit more about do you get started. How do you get started developing your taxonomy, and what do you do about all the legacy content?
Stephanie: I'm glad you didn't ask me that question during the original seminar, because that's basically a whole seminar in and of itself. Developing a taxonomy is something that people do as a job [laughs] -- hint-hint, myself -- and it's something that takes sometimes a fair amount of time and expertise.

However, that's not to say that people who are not specifically schooled in taxonomy design cannot take a stab at it and do quite well at it. We definitely encourage all information architects and user-experience designers and whoever wants to take a crack at it, go ahead and get educated on it.

Some of the things that you can do to get started is, first of all, by talking to the people in your organization. A lot of the time, there'll actually be some existing vocabularies going on that you might not be aware of that can be used as a starting point.

Maybe the marketing department has a list of topics that they write news articles about, or maybe someone else has some little list of something or other that could be a nice source of gems that you can start building from.

The other thing that you can do and I recommend a lot is to do a content audit. Really have a look at what kinds of content your organization is creating, or putting on the Web or whatever it is you're trying to do with this, and understand how people are using and looking for this kind of content. Once you have a better idea of the type of stuff that you need to be organizing with the taxonomy, then you'll have a better sense of how the taxonomy needs to be structured.

The other part, of course, is understanding, "Well, great. I'm going to spend all this time building a taxonomy, but what do I ultimately want to be able to do with it?" You have to spend some time doing some requirements gathering.

What do you want to do? Do you want to be able to filter content? Do you want to do some of those dynamic-topic-page things that we talked about earlier? Do you want to do faceted search? Really, the answer to that question will dictate what kind of structure you give the taxonomy.

Then you want to start thinking about all that content that you audited and start identifying what some of the important perspectives or aspects of that content are. Essentially, what you're doing here is you're identifying facets or branches of the taxonomy, and that lets you build a framework.

For example, if I'm working for a shoe company, then what's interesting about shoes? Well, the color of the shoe. Or maybe it's the style of shoe, is it a sneaker, or is it a lady's high heel? The size of the shoe. Identifying the elements that might make up the basic structure of the taxonomy.

There's some other ways that you can get at specific vocabulary, such as card sorting. You can actually ask people, put a bunch of words in front of them and ask them to start grouping them, and that gives you a really good sense of how people think about your content and how people group things or organize things.

Another great way is by looking at your search logs. There's two kinds of search logs. There's your internal search logs from your own website or system internally, these are the words that people are typing into your search engine. Or you can actually use the Google keyword tool and find out what kind of search words people are using within a certain domain by doing a little bit of keyword research.

If you're really into it and you want to get even techie about it, you can actually use a crawling software. You can crawl a collection of documents that you have to pull out keywords, and it'll give you a count or a bunch of statistics about how often certain words occur within your content.

Now, you glommed on a second side question there. Let's say that you go through all of this, and you have this great, fantastic taxonomy. What do you do with the thousands, and perhaps millions, of documents that you already have that don't have any of this taxonomy applied to it?

That is a huge question, and it really depends on whether or not that content is worth tagging or not. You have to really assess whether or not the content is really active, whether people could potentially want to use it or search it, or whether you might do a go-forward approach rather than a full retrospective tagging.

In this scenario, you tag anything that's new that gets added to the system going forward, and then, if someone goes retrospectively and pulls up an old document, then they're responsible for tagging that document before putting it back or before concluding with it. That way you're only really tagging active stuff.

However, if you do decide that you want to go ahead and tag all of your old legacy content, couple different ways you can do that. You can do a bulk auto-tagging, based on some sort of existing metadata that that content might have. Or you might need to do some coarse classification by putting them into folders ahead of time and then writing a migration script as you're moving your content from one place to another that will auto-classify the content based on whatever those folders are, or you can get a full-on auto-classification tool.

Then, of course, there's the classic brute-force way, where you hire a bunch of people and pay them to add the content and tags for you.

That's basically the main options.
Adam: Let's talk about that a little bit, because you mentioned some of the tools that you can use. We live in a world of tools and resources, right? Our audience is always after the things that will allow them to do their jobs better. What type of technology do you recommend that makes this whole process possible or easier or allows you to do it better?
Stephanie: That's a great question. Talking about taxonomy and technology, you're actually talking about a few different things. I'm going to focus a little bit on the context that we talked about in the seminar. Let's say that you want to use taxonomy in order to create dynamic content on your website or in your enterprise content management system or Intranet. There's three different buckets of technology that can support you in this endeavor.

The first one is your content management system itself. You really need to understand how it handles taxonomy, because it will either open a ton of doors for you or really tie your hands behind your back and restrict you.

When you think about your content management system, you have to think about, well, how does it let me add taxonomy to content? Is it easy for people to see the taxonomy, to browse it, and to add a tag to a piece of content? Or is it like pulling teeth? Do I have these giant drop-downs? Do I have to fill in three drop-downs in order to fill in one piece of metadata?

The other thing that you need to check into is whether or not it lets you group or filter or display content using those taxonomy tags on the front end. This is a big deal in something like SharePoint. SharePoint recently created this ability to create, on a page, a dynamic area that works by taxonomy search.

You can actually go in and pull content, based on a taxonomy tag, and display it on a page and style it in a particular way. You really want to understand how your CMS, what kinds of functions it gives you, or mechanisms, to group, filter, and display content using the taxonomy that you've built.

Another thing is you want to verify whether your CMS actually understands what on Earth taxonomy is. I've worked with a lot of content management systems that say they do taxonomy, but really, what they do is flat lists of tags. They don't understand taxonomy structure, they don't understand taxonomy relationships, and so you're really limited in what you can do. Basically, all it lets you do is a flat list of blog categories.

I even had a case where I had a CMS that said, "Yes, we do hierarchical relationships, but only three levels. Anything more than three levels and that's it. You can't go any further than that." You really need to understand what the limitations, in terms of structure, your CMS might be imposing on you.

Then the last piece that I'll say, and this is getting pretty nerdy about it, is there are a few tools out there -- and I'll talk about one a little bit more in a minute, especially Drupal -- that treats the taxonomy as an object itself.

Each term in the taxonomy becomes an object of content that can be managed, which means that you can actually put additional information around your taxonomy object, so additional metadata to say, "This taxonomy object is active, and here's a scope note about how you should be applying it," or any other kind of information that you would want to know about that taxonomy.

The second bucket of technology is auto-classification. Being able to automatically apply the taxonomy to your content based on either the structure or the content of that content itself. The thing is, there's a lot of hype about auto-classification, and it's really just not an obvious, easy win.

People really underestimate the amount of investment required in setting up and training an auto-classifier. Out of the box, it gives you about 60 percent or 65 percent quality. In order to get it up from that, you have to either have a really good, rich taxonomy, filled of lots of synonyms and lots of variant words that help the crawler identify that subject within the document.

Then, depending on the type of auto-classifier you get, whether it's statistical or rules-based, you either have to do a training set, which means that for every term in your taxonomy, you have to feed the auto-classifier 20 to 50 documents that you consider representative of that term. Or you have to go in and tweak the rules around that term until you have the rules behaving properly. You have a fair amount of overhead and effort required in order to set this stuff up.

Then the other consideration that you have is whether you want to spend a bunch of money buying a fancy standalone auto-classifier, or if maybe you're in SharePoint or something like that, there are a few smaller, add-on-type products that you can get, kind of like a plug-in, that are much less pricey but perhaps have less quality.

The third bucket of technology around taxonomy is actually a taxonomy management tool. Now, a lot of people ask me questions about taxonomy management tools and which one's the best one and is it worth it. The answer to that is always "it depends." Taxonomy management tools are certainly not cheap, so you want to focus on those things if you're having a large taxonomy or you want to use that taxonomy in more than one place.

In the virtual seminar, we focused on using taxonomy in your Web content management system to do Web content servicing. However, if you're using taxonomy in your CMS and in your ordering system and in your HR systems and your CRM, and really the taxonomy is doing a lot of different things, then you want to think about a taxonomy management tool to centralize that taxonomy and feed it out to all those other systems.

The other reason you might want to do a taxonomy management tool is, if you think back to what I said a few moments ago about content management systems and not understanding taxonomy and not being able to store them or treat them in the way that they're supposed to be, sometimes you want to outsource the taxonomy part and feed it into the CMS from a third-party tool.

The other cool thing is some taxonomy management tools actually have auto-classification built in or as an add-on. You don't have to go and get a whole separate, standalone tool for auto-classification. You can actually get the whole taxonomy management, auto-classification package, all together.
Adam: Very cool. Russell was wondering if you knew about open-source CMS systems that support hierarchical taxonomies out of the box. The example he was asking about specifically is WordPress.
Stephanie: Yeah, definitely. It's funny, because there's all these really fancy, million-dollar content management systems, and a lot of them really suck at taxonomy. It's kind of embarrassing. Whereas, in the open-source world, there's been a lot more advance around taxonomies.

WordPress definitely supports a single hierarchical category-type taxonomy and, of course, flat tagging systems beside that, out of the box. You can also create additional, custom vocabularies around that. The limitation there of that one single hierarchical category view is that you have to have your entire taxonomy in that one hierarchy. You can't do separate facets, unless you glom them all together in that one vocabulary.

However, you can do this custom vocabulary, as I mentioned, but there's no admin UI for it, so you have to go into the code to do it, so you have to be fairly hardy and not afraid to get your hands dirty in the back end of WordPress.

There's also Joomla, which is another quite-popular open-source content management system. They're very similar in that one single category hierarchy-type approach. There are some extensions, there's one called K2, that allows you to do a little bit fancier taxonomy stuff, but again, it's still pretty limited.

However, if you want the McDaddy of all content management, open-source, taxonomy-friendly systems, you got to go Drupal. Drupal has the most powerful taxonomy function, totally out of the box. As I mentioned earlier, this notion of treating taxonomy as an object -- Drupal absolutely does that. You can add metadata to your taxonomy terms, create all sorts of relationships, and do all of that awesome, dynamic content functionality that we talked about in the virtual seminar using views.

I actually did a really interesting proof of concept with a client, not too long ago, who had a content management system that they spent a lot of money on and wanted to be able to do dynamic topic pages based on taxonomy, but also, they wanted the system to understand the hierarchy of the taxonomy.

For example, if they had different articles on "Star Wars" the game, "Star Wars" the movie, "Star Wars Episode Four," and "Star Wars Episode Five," they wanted to be able to have a topic page that reunited all "Star Wars" content based on the relationship in the taxonomy of "Star Wars" to all of the "Star Wars" children movies, without having to go explicitly re-tag all of that content.

Basically, they needed the system to understand that "Star Wars Episode Four" was a child of the franchise "Star Wars." Then, if they created a dynamic page built around "Star Wars" the parent, it would pull in anything tagged with any of those children.

Their fancy, expensive CMS didn't do that. Their fancy, expensive CMS support people could not understand why on Earth they would want to be able to do that. They hired us, and we said, "We will stand up a Drupal proof of concept for you in 20 hours, and not only that, but we're going to create a hookup with the open-source version of Wikipedia, called DBpedia, and go in and pull in information dynamically about 'Star Wars,' about what year it was created, who created it, who the stars were, and then you'll be able to put that into your dynamic page at the same time."

Needless to say, we did our little song-and-dance routine in front of the vendor, and the vendor was shamed [laughs] into improving their taxonomy management and leveraging functionality within their own CMS. It was pretty fun.
Adam: Our friends at Agriculture Canada were wondering if you could speak a bit about doing these dynamic pages in SharePoint.
Stephanie: Absolutely. SharePoint has come a long way in their most recent version. In the old days, by that, I mean pre-2013, there was this thing called the Content Query Web Part. What that would do is that would allow you to roll up content from different parts of a particular site collection based on whatever parameters you had set up, essentially. For example, "Go get the latest three articles with metadata or topic equals cats, and display it here on this page."

The wonky thing about it back then was that it was restricted to a particular site collection, and it didn't use the search engine to do any of it, so it was kind of kludgy and very limited.

Now they've totally revamped the way that it works, and there's a new Content Search Web Part, which uses the search engine to basically query anywhere across site collections, and you can build some really cool queries to go get all sorts of parameters, like the author, any kind of metadata that you put on a document, the content type, basically anything.

You go in, you crawl, you grab, via the search, a bunch of documents that have certain things in common, and then you can show them up on any page, styled however you want, no matter where it came from.

This is really a game-changer. It allows people who have SharePoint and are using it for their Intranets, or even for their websites, to do all sorts of really cool, dynamic content.
Adam: The purpose of the virtual seminar was to advocate and encourage people to consider taxonomy-driven design. What are the negatives? Are there any known drawbacks or limitations for using a taxonomy?
Stephanie: Absolutely not. Everything with taxonomy is super-awesome all the time. [laughs]

I struggled with this question, and I almost didn't pick it. Then I picked it because I wanted to talk about the underbelly of taxonomy. Yes, there's a lot of really cool stuff that you can do with taxonomy, but it's good to be aware of the trade-offs.

You notice this person asked about drawbacks and limitations. One of the drawbacks is, anytime you're creating automatic content, whether it's taxonomy-driven or not, you're trading off between the flexibility and manual control versus the maintenance gain that you're getting by automating something.

If I create a topic page that is automatically built to show me all the content. I'm going to use the cats example again. I have a cat landing page, and I have the latest news on cats and the latest articles that have been written about cats. We have all sorts of dynamic things that are showing up.

If a marketing person comes in and says, "Yes, but I want to show this dog article on this page," well, that page wasn't architected for that. Unless you game the system, you're stuck within the confines of how you've structured the page.

Anytime you're building architecture to use dynamic things based around taxonomy, you have to make sure that your taxonomy is built with a little bit of flexibility into it and that you leave some room on the structure of your page to let people go in and put some manual stuff.

Another drawback is, of course, the level of effort required to tag stuff. Even as a taxonomist, I hate tagging. It takes time. I have to think about, what is this thing really about that I'm uploading? It takes me an extra 10, 20, 30 seconds to upload a document to one of my systems.

You really have to balance that with doing some training and user education to understand the impacts of tagging, because you're going to be asking people to do extra work. You have to explain to them, "If you don't tag this article with 'cat,' it will not show up on the cat page."

There has to be a very clear cause-and-effect relationship for taxonomy tagging. If you're just asking me to tag something and I don't see anything tangible happen to my document or I don't get a distinct benefit out of it, I probably won't do it.

The other thing I wanted to mention here is, especially when you're using taxonomy on the front end of anything, where it's customer-facing or public-facing. You sometimes have to balance what's good for the user and what's good for the machine. By this, I mean SEO.

We do a lot of faceted-navigation-type stuff for e-commerce, and we have to sometimes change the name of stuff in order to make it better for SEO, but sometimes that makes it look kind of goofy in an interface.

For example, I'm working with a general merchandise retailer right now. We're working on this baby section, and boy, I would really love to not have to say "baby" in front of everything -- baby clothing, baby cribs, baby toys, baby strollers, baby this, baby that. However, doing SEO research shows me that having the word "baby" in front of some things actually makes it better for search and more resonant for people.

In some cases, I have to decide, do I want to be cute and have it look nice and be well-designed from an aesthetic perspective, or do I want to fulfill the requirements of getting seen on Google and being good in terms of search results? Again, you have to perhaps have a trade-off between design aesthetics and machine happiness or SEO.
Adam: Stephanie, this was great. Thanks for making some more time for us.
Stephanie: You're quite welcome.
Adam: To our audience, thanks for listening in and for your support of the UIE virtual seminar program.