There are several ways to categorize and think about different kinds of visualizations. Here are four of the most useful. The first two are unrelated to the others; the last two are related to each other.
One way to classify a data visualization is by counting how many different data dimensions it represents. By this we mean the number of discrete types of information that are visually encoded in a diagram. For example, a simple line graph may show the price of a company’s stock on different days: that’s two data dimensions. If multiple companies are shown (and therefore compared), there are now three dimensions; if trading volume per day is added to the graph, there are four (Figure 1-1).
Figure 1-1. Four data dimensions are shown in this graph. Adding more points within any of these dimensions won’t change the graph’s complexity.
This count of the number of data dimensions can be described as the level of complexity of the visualization. As visualizations become more complex, they are more challenging to design well, and can be more difficult to learn from. For that reason, visualizations with no more than three or four dimensions of data are the most common—though visualizations with six, seven, or more dimensions can be found.
Adding more volume or data points of the same data dimension doesn’t increase complexity. Showing 100 years of stock data for one stock isn’t more complex than one week of data, it’s just more voluminous. Showing 50 companies instead of two might make the display more crowded or complicated, but fundamentally it’s just more data points in the company dimension, and therefore isn’t making the graph more complex.
There are two main challenges to designing more complex visualizations. The first is that the more dimensions you need to encode visually, the more individual visual properties you need to use. Selecting properties is easy to do for the first few dimensions when most visual properties haven’t been used. However, as more dimensions are added, finding appropriate, unused visual properties becomes more difficult. (Bear in mind that a visualization shows not just types of information but also the relationships between and among those information types.) As this difficulty in design increases, intentionality in the decision-making process becomes ever more necessary.
The way to succeed in the face of this challenge is to be intentional about which property to use for each dimension, and iterate or change encodings as the design evolves. This is the subject of Part II.
The second challenge for designing more complex visualizations is that there are relatively few well-known conventions, metaphors, defaults, and best practices to rely on. Because the safety net of convention may not exist, there is more of a burden on the designer to make good choices that can be easily understood by the reader.
You may have heard the terms infographics and data visualization used in different ways, or interchangeably in different contexts, or even casually by the same person in a single sentence. You may also have heard these terms used politically—that is, with positive or negative connotations attached. Some people use infographic to refer to representations of information perceived as casual, funny, or frivolous, and visualization to refer to designs perceived to be more serious, rigorous, or academic.
The truth is, even though the art of representing statistical information visually is hundreds of years old, the vocabulary of the field is still evolving and settling. Among the general public, there is still confusion over what these two terms mean, but within the information design community, definitions for these terms are solidifying.
In short: The distinction between infographics and data visualizations (or information visualizations) is based on both form and origin (see Figure 1-2).
We suggest that the term infographics is useful for referring to any visual representation of data that is:
manually drawn (and therefore a custom treatment of the information);
specific to the data at hand (and therefore nontrivial to recreate with different data);
aesthetically rich (strong visual content meant to draw the eye and hold interest); and
relatively data-poor (because each piece of information must be manually encoded).
Put another way, infographics are illustrations where the data representation is manually laid out or sketched, probably with drawing software such as Adobe Illustrator. Because of their manually-drawn process of creation, infographics have the option of being aesthetically rich (see Figure 1-3). Another consequence of their manual origins is they tend to be limited in the amount of data they can convey, simply due to the practical limitations of manipulating many data points. Similarly, it is difficult to change or update the data in an infographic, as any changes must be implemented manually.
This is not a complete, universal, or absolute definition, but may be a helpful way to think about and identify the category.
By contrast, we suggest that the terms data visualization and information visualization (casually, data viz and info viz) are useful for referring to any visual representation of data that is:
algorithmically drawn (may have custom touches but is largely rendered with the help of computerized methods);
easy to regenerate with different data (the same form may be repurposed to represent different datasets with similar dimensions or characteristics);
often aesthetically barren (data is not decorated); and
relatively data-rich (large volumes of data are welcome and viable, in contrast to infographics).
Data visualizations are initially designed by a human, but are then drawn algorithmically with graphing, charting, or diagramming software. The advantage of this approach is that it is relatively simple to update or regenerate the visualization with more or new data. While they may show great volumes of data, information visualizations are often less aesthetically rich than infographics.
As you will have inferred from the title of this book, it is this latter category of data visualizations with which we are primarily concerned here. However, the principles we present are relevant to the design of both infographics and data visualizations.
Generally speaking, there are two categories of data visualization: exploration and explanation. The two serve different purposes, and so there are tools and approaches that may be appropriate only for one and not the other. For this reason, it is important to understand the distinction, so that you can be sure you are using tools and approaches appropriate to the task at hand.
Exploratory data visualizations are appropriate when you have a whole bunch of data and you’re not sure what’s in it. When you need to get a sense of what’s inside your data set, translating it into a visual medium can help you quickly identify its features, including interesting curves, lines, trends, or anomalous outliers.
Exploration is generally best done at a high level of granularity. There may be a whole lot of noise in your data, but if you oversimplify or strip out too much information, you could end up missing something important. This type of visualization is typically part of the data analysis phase, and is used to find the story the data has to tell you.
By contrast, explanatory data visualization is appropriate when you already know what the data has to say, and you are trying to tell that story to somebody else. It could be the head of your department, a grant committee, or the general public.
Whoever your audience is, the story you are trying to tell (or the answer you are trying to share) is known to you at the outset, and therefore you can design to specifically accommodate and highlight that story. In other words, you’ll need to make certain editorial decisions about which information stays in, and which is distracting or irrelevant and should come out. This is a process of selecting focused data that will support the story you are trying to tell.
If exploratory data visualization is part of the data analysis phase, then explanatory data visualization is part of the presentation phase. Such a visualization may stand on its own, or may be part of a larger presentation, such as a speech, a newspaper article, or a report. In these scenarios, there is some supporting narrative—written or verbal—that further explains things.
It’s worth noting that there is also a kind of hybrid category, which involves a curated dataset that is nonetheless presented with the intention to allow some exploration on the reader’s part. These visualizations are usually interactive via some kind of graphical interface that lets the reader choose and constrain certain parameters, thereby discovering for herself whatever insights the dataset may have to offer. These might even be insights the creator of the visualization hasn’t come across yet.
So in these hybrid designs there is a certain freedom-of-discovery aspect to the information presented, but it is usually not totally raw; it has been distilled and facilitated to some extent. See http://www.juiceanalytics.com/nfl-visualization/ for an example.
We posit that there are three main categories of explanatory visualizations based on the relationships between the three necessary players: the designer, the reader, and the data.
This section refers to explanatory (or hybrid) visualizations exclusively, because it discusses designing visualizations of data with known parameters and stories. If you don’t yet know the message you intend to convey, then you’re still in an exploration phase, and probably aren’t designing for the same styles of consumption as this section describes.
It is useful to think of an effective explanatory data visualization as being supported by a three-legged stool consisting of the designer, the reader, and the data. Each of these “legs” exerts a force, or contributes a separate perspective, that must be taken into consideration for a visualization to be stable and successful. Chapter 2 will address the considerations of each of the three in much more detail, but we find it helpful to introduce the concept here.
Each of the three legs of the stool has a unique relationship to the other two. While it is necessary to account for the needs and perspective of all three in each visualization project, the dominant relationship will ultimately determine which category of visualization is needed (see Figure 1-4).
An informative visualization primarily serves the relationship between the reader and the data. It aims for a neutral presentation of the facts in such a way that will educate the reader (though not necessarily persuade him). Informative visualizations are often associated with broad data sets, and seek to distill the content into a manageably consumable form. Ideally, they form the bulk of visualizations that the average person encounters on a day-to-day basis—whether that’s at work, in the newspaper, or on a service-provider’s website. The Burning Man Infographic (Figure 1-2) is an example of informative visualization.
A persuasive visualization primarily serves the relationship between the designer and the reader. It is useful when the designer wishes to change the reader’s mind about something. It represents a very specific point of view, and advocates a change of opinion or action on the part of the reader. In this category of visualization, the data represented is specifically chosen for the purpose of supporting the designer’s point of view, and is presented carefully so as to convince the reader of same. See also: propaganda.
While an informative visualization may not have an intentional point of view in the manner that a persuasive visualization does, all visualizations are going to be biased to some degree, based on the fact that designers are human and have to make choices.
A good example of persuasive visualization is the Joint Economic Committee minority’s rendition of the proposed Democratic health care plan in 2010, shown in Figure 4-14.
The third category, visual art, primarily serves the relationship between the designer and the data. Visual art is unlike the previous two categories in that it often entails unidirectional encoding of information, meaning that the reader may not be able to decode the visual presentation to understand the underlying information.
Whereas both informative and persuasive visualizations are meant to be easily decodable—bidirectional in their encoding—visual art merely translates the data into a visual form. The designer may intend only to condense it, translate it into a new medium, or make it beautiful; she may not intend for the reader to be able to extract anything from it other than enjoyment.
This category of visualization is sometimes more easily recognized than others. For example, Nora Ligorano and Marshall Reese designed a project that converts Twitter streams into a woven fiber-optic tapestry (Figure 1-5; http://ligoranoreese.net/fiber-optic-tapestry). A project like this is abstract enough that most people intuitively recognize it as art: something to be appreciated rather than explicitly decoded.
But a project like the Planetary app from Bloom Studios (http://planetary.bloom.io/) is less easily categorized. Ostensibly, one may decode the information represented visually by noting the number of stars (representing artists), planets (representing albums), and moons (representing tracks) in a constellation or galaxy on the screen. But properties such as track length, encoded as the speed at which the each moon orbits its album-planet, are encoded too subtly for the average user to decode—at which point, it just becomes something pretty to look at. A worthy pursuit in its own right, perhaps, but better clearly labeled as visual art, and not confused with informative visualization.