Episode #157 Noah Iliinsky - The Power of Data Visualizations
A common trap in designing data visualizations is focusing on all the different ways to represent the data, rather than the questions that the data should answer. The presentation of a data set is pointless if it’s not useful, usable, or if people can’t understand it. With so much data to choose from how do you keep the goal of the visualization in mind? How are you sure you’re telling the right story?
We turn to Noah Iliinsky when it comes to data visualization. He is the co-author of Designing Data Visualizations and co-editor of Beautiful Visualization. Drawing from cognitive psychology, Noah explains that there is both an art and science to designing data visualizations. Aspects of shape, color, and placement all play into our brain’s ability to process the data being presented.
With the idea of placement in mind, it helps to think of the constraints and boundaries of your visualization. Careful consideration of its landscape prevents you from ending up with a “hairball” of data. Putting meaning behind placement helps the layout of the data but also conveys greater knowledge about it.
Jared Spool: Welcome, everyone. On today’s SpoolCast, we have with us the fabulous Noah Iliinsky, who is doing a virtual seminar for us here at UIE on February 2, called “Telling the Right Story with Data Visualization.” He is also the recent author of “Designing Data Visualizations,” his second book with his co-author Julie Steele. And today he’s going to talk to us about how you get into projects where you’ve got massive visualizations.
Noah Iliinsky: Hi, Jared. Thanks for having me.
Jared: I am so happy to be talking to you again. It’s so much fun. So, you and I were talking before we got on the air here about this project you have, connecting all the dots between the musicians of Seattle. Tell me a little bit about this project.
Noah: This is a website—people can go check it out right now—called SeattleBandMap.com. This is the “before” state. We’ll be releasing the “after” in a couple of weeks. And, as like so many projects, it started without a real clear plan or design. It was some people in Seattle starting to draw on the kitchen table—well, probably on a napkin—starting to draw the links between the various bands in Seattle, bands that had musicians that had played in one band and then went onto the other band and bands that had recorded albums together and that sort of thing. Seattle’s got a pretty hopping music scene, and the map got pretty big. At one point, they did a poster-size version of it, and they had a large, banner-size version printed.
But the map continues to grow; new bands, obviously, are created all the time. And so they’ve been growing this map. And of course, now there’s an online version, at SeattleBandMap.com. And what it is right now is just a collection of about, I think we’ve got 3,700 or 3,800 bands on this map, and a little hairline link between each band that has shared a member or played on an album together, that sort of thing.
Jared: Yeah, I’m looking at it right now. It looks like a case of bad acne or a Lichtenstein picture, something that’d be hanging up in MoMA.
Noah: Yeah. This is an example of what we in the industry refer to as a “hairball.”
Jared: Yeah, it looks like it. What makes it a hairball?
Noah: Well so, and this happens, by the way, a lot of the time. This is not unique to this project. This is sort of a classic result of people start a visualization with some data, and their goal is “Let’s visualize the data.” Which isn’t, it turns out, actually a goal. It’s a process. So they have created a visualization, naively, and this is not a bad thing, but they didn’t have very specific goals in mind for what information they wanted to reveal. What I’m doing with this project is I’ve come in to help them redesign, specifically, the look and behavior of this network visualization to make it more constructive, more useful, easier to get information from.
One of the difficulties that they’re having right now is that they don’t really have a lot of information represented, and so this is a little bit paradoxical. But if we represent a little more information, it’ll add some more constraints to this visualization.
So right now, all they have is bands and connections between bands. And I guess there’s sort of a third encoding, where the dots are a little bit bigger if the bands have more connections. But there’s very little else represented here in terms of any number of the other things that you can think of that might pertain to the meta information about a band. There’s nothing here about total number of members. There’s nothing here about how many albums the band recorded or how many shows they played, the overall lifespan of the band, genre, is this a band that started in Seattle or it started somewhere else and moved to Seattle.
Because there’s not a scope on this particular data set, it has crept to bands that were never Seattle bands. So the Beatles are on here and Johnny Cash is on here, because at some point somebody from Pearl Jam or something played on a group album with somebody else. And so the network has sort of crept over this initial concept. And none of these are tragic. None of these are fatal flaws. It’s just a reflection of what happens when you don’t have a more well defined goal in mind.
Jared: Right, right. And I’m guessing there’s a bunch of misinformation here, too. Like, these lines have different lengths, but does line length actually mean anything.
Noah: It really doesn’t, as far as I can tell, in this generation. I don’t actually know what the placement algorithm is for this. I think it’s relatively arbitrary. There may be a little force-direct thing going on here, where the clumps get clumped a little tighter. But the point is it’s not even relevant if there is an algorithm there if the humans who are meant to be learning from it can’t understand what those meanings are, so it may as well not exist.
Jared: Right, right. And I’m also seeing, there are places where there are multiple lines. There seem to be lines that go through objects and through points, bands, I guess, and then lines that actually terminate at a band, and it’s not clear whether, in fact, there’s always a connection to the band it goes through or if it’s just an accident that the line just happened to intersect with the dot.
Noah: Yeah, yeah. There’s a lot of ambiguity here.
Jared: Yeah, because it doesn’t go through the center. I’m looking at the band Memes, and there’s lines that go through on the edges and lines that go through on the center, and it’s crazy. Someone’s going to get hurt. [laughs]
Noah: Yeah. This is a dangerous network here.
Jared: It is. It is. OK. You’re not just critiquing this. You’re actually involved in the next generation, right?
Noah: Yeah. I’m designing what this next-generation diagram is going to look like, this network diagram. And I’m also, then, going to create it. I’m going to build it in code. So I have the accountability there of not just being able to wave my hands and say, “Here’s how it should be,” but then it’s up to me to make that actually happen.
Jared: I’m really intrigued now, right? Because I wouldn’t even begin to know how you get started on a project like this. Well, first, give me a little history. How’d they suck you into it? You didn’t just bump in on the street and say, “Oh my gosh, you have a visualization problem. Let me help you.”
Noah: No, not like that. No. It was sort of the other way around. It was a woman who’s a UX designer, who is friends with the people who run the band map, works for a professional acquaintance of mine. And I don’t know how it came up with them, but I got an email saying, “Friends of mine need some help with a visualization, a network visualization in Seattle. Is this something you might be interested in helping out with?” And so the introduction was made. And of course, it’s a fascinating project and it’s a fun project, so I absolutely was excited to work on it. And so I said, “Sure, I can do that. Piece of cake.” And here we are.
Jared: This is obviously very rich, and there are all these connections and all these bands. How does the data look on the back end? Have you looked at that yet?
Noah: I haven’t looked at the raw data. We have another friend of mine who’s working on the database angle of things, and so she’s exported samples of the data and exported versions of the data set for me and that I need to do the design with.
I haven’t seen the complete, unadulterated, raw data set. It’s mostly been user-submitted and user-validated. So I think they believe that the quality is very good, but the completeness of it may not be as complete as they would like. And they fully intend to allow this to be a site that people can add data to, whether they’re musicians or fans or whoever, and certainly allow bands to come in and, for example, put in links to their Wikipedia page, put in links to their MySpace page or the band’s home site. So you find a band on here that’s maybe connected to other bands you like, and you can click through and see when they’re playing or download some tracks or something.
Jared: That’s interesting that you sort of jumped right into the use cases. That’s really critical in terms of understanding how to visualize the data. I’m guessing you really have to start with “How will someone want to use this?” Right?
Noah: Yeah, absolutely. And that’s a little bit of a flaw. Well, this is evidenced by they just started with “Let’s show some data.” And they didn’t say, “Let’s show a particular kind of data,” or “Let’s show data to a particular audience who has a particular interest.” It’s just “Let’s show some data.” And the problem with that approach is that it leaves you a little unfocused. You are less well guided towards particular solutions, and it’s hard to tell when you get there.
So, something that we’ve been discussing in these conversations around this website is what are the sorts of information that people who come to this website are going to be interested in? So, for example, I listed off earlier things like which instrument does each musician play and how many albums did each band release. And a lot of this information is not, probably, going to be represented on this website, because there’s other ways to get it. You can go to the band’s website and look at all their albums, or you can go to their Wikipedia page, or you can go to AllMusic, or there’s any number of ways to get that information from the world, and so that doesn’t need to be a strength that we need to duplicate.
Instead, the goal here is to focus on things that are not well-represented by these other resources, which is to say, show me the network of which musicians have played together in which bands and how those bands are then linked. And that’s a very different perspective that you can’t easily get from any of the other resources that are out there now, so that’s the real strength that this offering has and that we’re trying to focus on.
So that changes, of course, things like the data that we’re going to choose and how we’re going to choose to visually represent the data that we include, because we’re telling a different story than “Here’s all the Seattle bass players for the last 50 years and who they’ve played with” or “Here’s just a timeline of the punk scene in Seattle.” Those are different, more-focused questions. And instead, we’re looking at this greater sort of network, specifically, and less about some of the details that we could.
Jared: That’s interesting to me. It feels like a trap. And this makes perfect sense to me. Tell me if I got this right. There’s a trap that teams fall into, which is they are so neck-deep in all the data they have that if they say, “Oh, we’re going to come up with an interesting way to visualize all this data,” they just start thinking about, “What are all the ways I could represent the data?” But they’re not asking the question, “What are all the questions that our audience wants answered?” to prioritize that data in a way that gets them there, and so they end up, like anything else, building out a lot of functionality that is neat but not useful.
Noah: Yeah, it is exactly that trap, and it’s the trap that UX professionals typically are familiar with, because they’ve seen it happen and are then hired to solve or hired to keep from happening in the first place. And it’s something that I bring to data visualization that I think is a relatively uncommon perspective. Not to say that nobody does it, and clearly there’s a lot of capable and smart data vis practitioners who think deeply about what the goal of their visualization is. But when you look at the whole world of stuff that’s been visualized, a lot of it is, “We had some data, so we graphed it.”
Jared: Yeah. [laughs]
Noah: Or, “We had a lot of data, and check it out: we got it all on the page, all at once.” And that’s really exciting, and it’s kind of fun, but at the end of the day, it doesn’t necessarily solve anybody’s problem or answer anybody’s questions. I find design constraints kind of useful and interesting, because they cause you to think about the problems in ways that you wouldn’t have caused when you have total freedom of expression. And for me, that sort of requirement that constrains what’s possible actually makes me think in more creative ways about what we can do with it.
So, for example, looking at this hairball, I’m a big proponent of axes, because thinking about the landscape of your visualization, the boundaries of your map, axes kind of define the whole world. And if you don’t have them, you kind of get a hairball. There’s nothing that says, “This band should be over here and that band should be over there.” And so it’s difficult to extract meaning from the placement; in fact, there is no meaning in the placement here. And so, if you can make placement meaningful, you’ve now conveyed a lot more knowledge about these bands, and you don’t have to label each band.
So one thing I was thinking of, in terms of what would be some interesting data that would also, for example, help with this layout problem a little bit, and I thought of the time line. So, each band here is a dot, but what if you had a horizontal time line of the last 30 or 40 or 50 years of bands in Seattle, and each band, instead of being a dot, was more of a lozenge, right?
Jared: Oh, OK.
Noah: A band that was around from 1997 to 2002 would have a little length of about five years, and that’s a useful thing and certainly tells you some information about the band. But it also, in the grand scheme of things, gives an enormous coherence to the layout, where now you can look at the bands that were around in the ’60s and the ’70s and the ’80s and the ’90s and see how that evolved. You can say, “I just want to see the bands that were active between 1989 and 1992,” if you’re looking for the birth of the grunge scene.
And it gives you a lot of information. It makes the information you have on the screen more accessible. It organizes it more. So it’s a paradoxical example of how adding more data to the screen can make it easier to find the data that you’re looking for. Now, maybe I’ve created some use cases that didn’t necessarily exist, but that’s OK, in the sense that we are creating an interface that facilitates more use cases that are possible with this particular interface.
And so, rather than saying, well, if we added a little date stamp next to each band names in this map, it would become harder to see everything but wouldn’t actually add a lot of value. It wouldn’t be any easier to extract the information. But when you use that extra, the addition of more data—in this case, time frame—as a constraint, you actually are now molding the data into a shape that’s easier to understand.
Jared: That makes perfect sense to me. What I like about it is that, for me, it’d be really helpful if there were a couple bands that I really liked and I had a sense of their time. I’d be able to see when they happened and who might have influenced whom and what connections they had in terms of the players between them.
And it also helps because, last New Year’s, not this past one but the previous one, I went to a film festival, and one of the films they showed was of the Boston bar scene. And there were all these bands from the ’70s that I’d forgotten about that were in this documentary that was put together. And I could see how long each of those bands lasted and how much they have influenced bands that I like today from the local scene, and even possibly from the national scene. And that information would be really interesting to me, because I hadn’t thought about those bands in years, and I could see, if I had that explorer, I would have these moments and go, “Oh! I remember loving those dudes. What happened to them?”
Noah: Mm-hmm. Mm-hmm. Yeah. And also, being able to trace the lineage of, “Oh, there’s a particular musician,” and “Oh, I didn’t realize that they were in these other bands, and that’s why they kind of sound the same, or that’s why I like…” It gives a whole context, in a way that these isolated little dots on the screen don’t reveal in the same way.
Jared: So it feels to me like there’s this iterative process where, like everything else, you sort of give yourself this constraint—in this case, it’s the timeline thing. And then you say, “OK, what use cases could we design for?” And then you start to ask, “Are those important use cases? Are they not important use cases?” And then you turn back and say, “Well, OK, if they’re important use cases, what might that design look like? What might other constraints that lend themselves to those use cases be?”
Noah: Yes, exactly. And this sort of iteration, it almost doesn’t matter where in that loop you begin, in terms of, “Are we starting with a use case? Are we starting with a design constraint?” It almost doesn’t matter which of those you start with, as long as you do iterate through and you end up with a coherent set that includes some use cases that are hopefully based in reality that are actually going to be useful to your customers. And also includes the right data being revealed to satisfy those use cases, and then eventually involves a design that can be constructed with that data and, again, continues to satisfy those use cases.
Certainly, there are situations where you don’t really know enough about your customer but you’ve got a good sense of the data and you can kind of think, “What are the interesting relationships in the data?” even if I don’t exactly know what my customer is looking for. And there are some times when you have the luxury of saying, “We know exactly what information we’re looking for. I’m going to go to my infinite data reserve and pull that data down.”
So there are situations where, if you want to graph all the census data, for example, versus employment or income, the data is out there in the world, and if you want to cross-reference those, you can probably go find it. So you can sort of assume that the data’s available in some situations and really focus on what are your use cases, or you can say, “We have some of the constraints. Let’s go from there.” But yeah, at the end of the day, you get a set of data, design, use case that kind of go together and hopefully produce something of value at the end.
Jared: Given this, it feels to me like this is actually very similar to designing anything else. There’s nothing special here. You know, here at UIE, we’ve divided up how people make design decisions into different categories, and one of the categories is self-design. So, if I needed this data myself, I could design this for me and I could look at the use cases that I would need, and as long as the rest of my audience has the same sort of needs that I have, that would turn out to be a pretty useful design.
But there’s another type of design, which we call activity-focused design, which would be how you would go out and actually research what those use cases would be. But the methods that we use to research those use cases probably aren’t any different when we’re designing for data visualizations than when we’re designing any other application, right?
Noah: Yeah, I totally agree. In fact, I consider the work that I do, the data-visualization work that I do to be a subset of user-experience work. I’m still designing experiences. I’m still designing interfaces. They just happen to be particularly focused around visualization and the visual conveyance of knowledge rather than forms and drop-downs and scrollbars and panes. And of course, I wrap those things around the data visualizations sometimes. But this does feel like, absolutely, a similar related sub-discipline that just happens to have a product that’s a little more focused.
That’s all for the first portion. And the second portion of designing a data visualization is actually taking the different dimensions of data that you have and choosing, “What do we represent with that axis? What do we represent with color? What do we represent with shape? What do we represent with size?” And that whole second half of the process we haven’t even touched on yet in this conversation, and that’s a whole specialty, another art and science into itself. And there’s definitely both art and science aspects to that phase of the design.
Jared: Yeah. But again, that feels very familiar to me with other parts of UX, right? Because if I’m laying out a form or I’m coming up with a workflow for my users in an application, I still have that sort of mix of art and science. Some of it is just based on my experience and things that I know that have worked well in the past I can draw from that. Based on inspirations I get from other people’s designs, I can draw from that. Based on experimentations that I do and prototypes I build and say, “Oh, that didn’t work so well. Let’s try that again,” I can get inspired or get data from that.
So it’s the same, right? It’s the same sort of thing, except you’re just working with a different toolbox, as it were.
Noah: Yeah, yeah. I think so. I will say, in one aspect, we have pretty good science behind a lot of data visualization in that there’s been a lot of research, in the field of cognitive psychology, there’s been a lot of research in how do people perceive different colors, how do people perceive the meaning of shapes, how do people perceive the meaning of placement? And so there are some well-established, measured, scientifically valid reasons to say, “Use color for this; don’t use color for that,” “Shape is good for these things; shape is not good for those things.”
And so it is treated a lot like an art, but you can burrow underneath that art and you can go back and read the research that explains why so many people use color for categories, for example. It’s great for categories, and we perceive it really excellently in a categorical fashion. And color’s actually not very good for showing quantities. You can use brightness or intensity for quantities, but cycling through the rainbow is actually a poor choice for showing quantities. We can show the studies that measure that as well and talk about why the brain just is never going to be very good at that. It’s not because we’re stupid or we’re from a different culture; it’s because that’s just not what we’re wired for.
And so there really is solid scientific foundations behind all this, which really can make or break a visualization, because there are ways to take certain kinds of data and encode it with encodings that are not very compatible with the shape of that data.
So, if I’m trying to show really fine-grain differences in numbers, trying to represent those with colors is very difficult. When you’re trying to differentiate between a couple of shades of light blue and decide which one is how much darker than the others, that’s a very challenging task that our brains are just not very good at. Whereas if you want to do that with position, you can tell the difference between 34 and 37 on a 100-point scale. If you’ve got a bar graph, you can see, “Well, this one’s 34 and that one’s 37, and look, a 37 one’s clearly longer.” Our brains are very well suited to seeing that difference and quantifying it and understanding it.
And so there is a science underneath all of it, where you can make well-informed choices that will lead you to a design that is easier for people, easier for your customers to understand and get good knowledge from.
Jared: You and Julie do a fabulous job of walking through that stuff in the “Designing Data Visualizations” book. That’s what you’re going to be talking about at the virtual seminar, too, right?
Noah: Yeah. The virtual seminar is actually going to be not quite a literal page-by-page walk through that book. But we’re going to follow the process in that book, starting with a data set, and we’re going to talk through and demonstrate each of those phases; the deciding what to visualize; picking out data that supports that particular story that we’ve decided is relevant. And then, once we’ve selected the data to tell that story with, going through the process of applying visual properties—placement, shape, color, size, all these things—applying these to the different data dimensions, so that what we get is a visualization that actually tells a story and reveals the knowledge that we want to reveal.
Jared: The process that you went through with the Seattle Bands Map stuff, that’s a very typical process that a lot of folks will go through, right? In terms of, “We have all this data, we have to think about the use cases, and then we’re going to apply what we know about good data visualization to pick colors and shapes and all that stuff.” Like any other UX thing, once you realize what the tools you have to work with are, it’s not an overwhelming, “Oh my gosh, this is crazy” thing. It’s, “No, I can get my head around this,” type activity.
Noah: Yeah, absolutely. Absolutely. And that’s exactly the goal of the book, and ideally the goal of the seminar, is to give people a handle on the process, give them enough of a framework and sort of a step-by-step process that they can approach these problems and understand that success is possible. And in fact, it’s a fairly deterministic thing. If you go through these steps, you’re not guaranteed of a beautiful visualization, but your likelihood of creating something that is incredible and successful goes way up, above and beyond most of what you see on the Internet, a lot of which is just sort of, you know, shots in the dark.
Jared: Yeah. I’m really excited to see what you’re going to do with this Seattle Band Maps thing, because it has a lot of potential and it would be really cool, but I completely see how it’s, at this point, in that stage of, “We have a lot of data. Let’s plot it in two dimensions with different-colored dots and then connect lines to them.”
Noah: Yeah. And to be fair to them, and anybody else who’s working with data, this whole process that we’ve been talking about, in some ways, has to come after you already have explored the data a little bit. And you’ve already spent some time doing messy things with the data and you’ve spent some time understanding, “This data set would look very different if most bands had 10 connections versus a data set where most bands have two connections.” And so, understanding the density of the data, and what are the time frames we’re dealing with, and how many bands are we talking about, how many musicians are we talking about, how many connections are we talking about.
You do kind of have to muck around with it. Maybe in a private way, maybe not out in public, but you do kind of have to muck around with it, to get a sense of what it is that you’re dealing with. Because what happens—and I’ve certainly had this experience myself—is somebody says, “Well, we have some data.” And your first thought is, “Oh, well, here’s a great way to visualize it.” And then it turns out that the data is incomplete. Or it’s too big to do effectively with that visualization. Or the patterns you hoped were going to be revealed aren’t really there, or it turns out that 90 percent or 95 percent of your data all looks the same and there’s only a few on the edge that are kind of interesting.
You can only do so much design in the abstract before you start to look at the reality of the data, and you’ve got to just kind of muck around and prototype a little bit. As you do with other kinds of interface design, you’ve got to prototype a little bit and see if your understanding or your conception of what you think you have is actually supported by the reality of what you really have and what you have to deal with.
Jared: You have to really build into your process a chance to do some really fast iterations with the data, so you can just get a feel for what it could be.
Noah: Yeah, get a feel for what it could be. And I’ve certainly found that, even when I had a pretty good sense of what I wanted to build, as soon as I got the data into a tool that allowed me to manipulate it a little bit, I started having some more ideas about what was possible, or I started running into some constraints that I didn’t know existed.
And so, yeah, you’re definitely going to want to leave room in your schedule and your budget, certainly, but also leave room in your headspace for “This initial design that I have in mind isn’t the be-all and end-all or the end answer. It’s a starting place. And then we’re going to let the data and the technology and other constraints that we haven’t even thought of yet drive our understanding and drive our process. So that, at the end of the day, we will end up with something that is closer to the reality than we started out with when we had ideas but weren’t as intimately connected to the reality of what we’re working with.”
Jared: I’m going to bet, once you get it in front of users who aren’t you, other things come up, like they say, “Oh! I wonder if I could do X.” And then suddenly you’re thinking, “Well, we could, but we just didn’t build that.”
Noah: Yeah. Right. “Come back for the next version.”
Noah: For sure, for sure. Actually, one thing I really like about collaborating with different people and different teams on data visualizations is they are the domain experts. And so I’ll come in and say, “How about a visualization like this?” And they say, “Well, that’s not going to show this other thing that we’re interested in looking at,” and they start to describe something else. And so, then if I sketch that or bring that to them, they say, “Oh, we didn’t even think of that aspect.”
And so you get this sort of back-and-forth that’s really well supported by having different points of view and different experiences with the data. You get this back-and-forth where people with different levels of exposure are going to have different ideas, and as they bring those ideas, the commonality of those ideas are going to surprise and inspire other people on the team. It’s a really nice thing to do collaboratively because you get more insight and more ideas than any individual’s ever going to get by themselves.
Jared: So I’m comforted by this thought, that designing great data visualizations is not too far different than designing other types of great user experiences. The big difference is that you’ve got this different set of tools because you’ve got this space and this graphic elements of it, and so you have to understand how size and color and distance and connectivity and all those axes, all those different elements play together. But once you’ve mastered those things and you really get a handle on them, this is pretty much familiar territory.
Noah: Yeah, that’s right. I think the goal for a lot of design processes is to boil down the fundamental process so that you’re not tripping over yourself just getting the process right. And it allows you to, instead, spend that brainpower and that effort creating the interesting aspects for this experience, for this data set, that make it really unique and really compelling, rather than struggling just to get the fundamentals in place.
Jared: That sounds excellent. Well, I’m really looking forward to the virtual seminar, where I’m going to get a chance to learn what those things are and to get a chance to give the book a thorough reading. Thanks so much, Noah, for joining us today and talking about all this.
Noah: Thanks very much for having me, Jared. I’m looking forward to doing the seminar.
Jared: For those of you who want to attend the seminar, you can find out about it at uie.com. It’s “Telling the Right Story with Data Visualization.” Just come and click on the link that says “Virtual Seminars.” It’ll take you right to it. That’ll be on February 2, with Noah Iliinsky. And the book, “Designing Data Visualizations,” published by O’Reilly, you can get it at all your favorite book-buying spots. It’s a nice, wonderful book with Noah and Julie Steele.
Noah, thank you again for spending the time with us.
Noah: Thanks, Jared.