Preface

Visualization is a vital tool for understanding and sharing insights around data. The right visualization can help express a core idea or open a space to examination; it can get the world talking about a dataset or sharing an insight Figure P-1.

Three different types of visualization
Figure P-1. Visualizations can take many forms, from views that support exploratory analysis (top left), to those that provide quick overviews in a dashboard (bottom), to an infographic about popular topics (top right).

Visualizations provide a direct and tangible representation of data. They allow people to confirm hypotheses and gain insights. When incorporated into the data analysis process early and often, visualizations can even fundamentally alter the questions that someone is asking.

Creating effective visualizations is hard. Not because a dataset requires an exotic and bespoke visual representation—for many problems, standard statistical charts will suffice. And not because creating a visualization requires coding expertise in an unfamiliar programming language—off-the-shelf tools like Excel, Tableau, and R are ample enough to suffice.

Rather, creating effective visualizations is difficult because the problems that are best addressed by visualization are often complex and ill-formed. The task of figuring out what attributes of a dataset are important is often conflated with figuring out what type of visualization to use. Picking a chart type to represent specific attributes in a dataset is comparatively easy. Deciding on which data attributes will help answer a question, however, is a complex, poorly defined, and user-driven process that can require several rounds of visualization and exploration to resolve. In this book, we focus on the process of going from high-level questions to well-defined data analysis tasks, and on how to incorporate visualizations along the way to clarify understanding and gain insights.

Who Is This Book For?

This book is for people who have access to data and, perhaps, a suite of computational tools but who are less than sure how to turn that data into visual insights. We find that many data science books assume that you can figure out how to visualize the data once collected, and visualization books assume that you already have a well-defined question, ready to be visualized. If, like us, you would like to address these assumptions, then this book is for you.

This book does not cover how to clean and manage data in detail or how to write visualization code. There are already great books on these topics (and, when relevant, we point to some of them). Rather, this book speaks to why those processes are important. Similarly, this book does not address how to choose a beautiful colormap or select a typeface. Instead, we lay out a framework for how to think about data given the possibilities and constraints of visual exploration. Our goal is to show how to effectively use visualizations to make sense of data.

Who Are We?

The authors of this book have a combined three decades of experience in making sense of data through designing and using visualizations. We have worked with data from a broad range of fields: biology and urban transportation, business intelligence and scientific visualization, debugging code and building maps. We have worked with analysts from a variety of organizations, from small, academic science labs to teams of data analysts embedded in large companies. Some of the projects we have worked on have resulted in sophisticated, bespoke visualization systems designed collaboratively with domain specialists, and at other times we have pointed people to off-the-shelf visualization tools after a few conversations. We have taught university classes in visualization and have given lectures and tutorials. All in all, we have visualized hundreds of datasets.

We have found that our knowledge about visualization techniques, solutions, and systems shapes the way that we think and reason about data. Visualization is fundamentally about presenting data in a way that elicits human reasoning, makes room for individual interpretations, and supports exploration. We help our collaborators make their questions and data reflect these values. The process we lay out in this book describes our method for doing this.

Overview of Chapters

Chapter 1 illustrates the process of making sense with visualizations through a quick example, exposing the role that a visual representation can play in data discovery.

Chapter 2 starts to get into details. It discusses a mechanism to help narrow a question from a broad task into something that can be addressed with an iterative visualization process. For example, the broad question “Who are the best movie directors?” does not necessarily suggest a specific visualization—but “Find movie directors who directed top-grossing movies using an IMDB dataset” can lead more directly to an answer by way of a visualization or two. This process creates an operationalized question, one that consists of particular tasks that can be directly addressed with data.

This process of narrowing a question down to actionable tasks requires input from multiple stakeholders. Chapter 3 lays out an iterative set of steps for getting to the operationalization, which we call data counseling. These steps include finding the right people to talk to, asking effective questions, and rapidly exploring the data through increasingly sophisticated prototypes.

The numerical nitty-gritty of the book follows. Chapter 4 discusses types and relations of data, and defines terms like dimensions, measures, categorical, and quantitative. Chapter 5 then organizes common visualization types by the tasks they fulfill and the data they use. Then, Chapter 6 explores powerful visualization techniques that use multiple views and interaction to support analysis of large, complex datasets. These three chapters are meant to provide an overview of some of the most effective and commonly used ideas for supporting sensemaking with visualizations, and are framed using the operationalization and data counseling process to help guide decision-making about which visualizations to choose.

With this understanding of getting to insight—from questions to data to visualizations—the remainder of the book illustrates two examples of carrying out these steps. The case study in Chapter 7 describes the creation of a business intelligence dashboard in collaboration with a team of developers and analysts at Microsoft. The one in Chapter 8 draws from science, presenting an example with a team of scientists who work with biological data. These case studies illustrate the flexibility of the process laid out in this book, as well as the diverse types of outcomes that are possible.

This book is accompanied by a companion website. From this site you can download the code and interactive versions of the visualizations presented in Chapters 5 and 6, as well as other code and supplementary material.

Acknowledgments

Danyel and Miriah would like to thank Danyel’s colleagues at Microsoft, including Steven Drucker, Mary Czerwinski, and Sue Dumais, for their enthusiasm and encouragement. We also thank Miriah’s research group, the Visualization Design Lab at the University of Utah, including Alex Lex, for helping the project to evolve and providing feedback on ideas. We are both deeply appreciative of our work organizations for supporting the time and energy required by projects like this one, and for seeing the value in communicating our research broadly. At O’Reilly Media, we thank Mike Loukides for encouraging us to start this work, and Shannon Cutt and Rachel Roumeliotis for guiding it from start to finish.

Portions of this work were presented at the IPAM Workshop on Culture Analytics of 2016, Microsoft Data Insights Summit of 2016, University of Illinois Urbana-Champaign HCI Seminar Series of 2016, University of British Columbia HCI Seminar Series of 2016, Women in Data Science Conference at Stanford University in 2017, and O’Reilly Velocity Conference in 2017. Our thanks to the organizers of those events, and to participants who gave us critical feedback and helped clarify our thoughts.

Early feedback on the operationalization process came from Christian Canton of Microsoft. Michael Twidale and Andrea Thomer, both of UIUC, helped inform the discussion of data counseling with their insights on how reference librarians do their work.

We are grateful to Dominik Mortiz and Kanit “Ham” Wongsuphasawat for putting together the examples used in Chapters 5 and 6. Their work, as well as that of the rest of the Vega-Lite team, is helping shape the future of data visualization. We also thank Alex Bigelow for supplying the skateboading visualization figure in the Preface.

We thank Jacqueline Richards for her review and discussion of the case study in Chapter 7. Similarly, the collaboration with Angela DePace and her group at the Harvard Medical School for the case study in Chapter 8 provided valuable and rich insights into the process of designing visualizations for domain experts. The projects described in both of these chapters were deeply influential in our work practices.

Our technical reviewers, Michael Freeman, Jeff Heer, and Jerry Overton, helped clarify and strengthen the arguments we make.

Finally, Miriah thanks Brian Price for his endless support and encouragement, without which she could never do the things she does.

O’Reilly Safari

Note

Safari (formerly Safari Books Online) is a membership-based training and reference platform for enterprise, government, educators, and individuals.

Members have access to thousands of books, training videos, Learning Paths, interactive tutorials, and curated playlists from over 250 publishers, including O’Reilly Media, Harvard Business Review, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Adobe, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, and Course Technology, among others.

For more information, please visit http://oreilly.com/safari.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

  • O’Reilly Media, Inc.
  • 1005 Gravenstein Highway North
  • Sebastopol, CA 95472
  • 800-998-9938 (in the United States or Canada)
  • 707-829-0515 (international or local)
  • 707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://bit.ly/making-data-visual.

To comment or ask technical questions about this book, send email to .

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Get Making Data Visual now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.