Preface

“A picture is worth a thousand words,” says the proverb. Sometimes, a picture is worth a lot of numbers, too! Complex relationships are often more easily grasped by looking at a picture or a graph than they might be if one tried to absorb the nuances in a verbal description or discern the relationships in columns of numbers. This book is about using graphical methods to understand complex data by highlighting important relationships and trends, reducing the data to simpler forms, and making it possible to take in a lot of numbers at a glance.

Who Is This Book For?

Just about anyone who needs to visualize and analyze data will find something useful here. My primary aim, however, is to make graphical data analysis accessible to a wide range of people—especially those who do not have much (or any) previous experience with R but who need or want to create various types of graphs to help them understand data important to them. This will likely include people working in business, media, graphic arts, social sciences, and health sciences who have real needs for data analysis but might not have backgrounds in advanced mathematics and computer programming. Although this book is designed for self-study, it might also find a place as a supplemental text for courses in elementary and intermediate statistics or research methods.

The vehicle for this book is R, but this is not a comprehensive course on R. Many computer classes and computer books attempt to show you every possible thing one can do with a language or tool. For many of us who have attempted to learn this way, it gets to be quite confusing and boring. This book will focus on understanding the elements of graphics for data analysis and how to use R to produce the kinds of graphs discussed here; it will show you how to use some of R’s built-in resources for finding help, and leave a lot of the other stuff for you to pursue elsewhere. You should have access to a computer and feel comfortable using it for some task(s), such as sending email, browsing the Internet, or perhaps using applications such as word processor or spreadsheet. Familiarity with basic statistics will be helpful for some of the topics covered here, but it is not necessary for most of them.

Why R?

It is possible to make useful graphs of small datasets by hand. It is much more efficient, however, to take advantage of computer technology to produce accurate and appealing visual data analyses. For large datasets, hand work is effectively impossible. Computer software, conversely, makes producing complex graphs of even very large datasets practical.

This technology is now readily available through open source software to virtually anyone who has access to a computer. “Open source” refers to programs for which the source code is made available to all—to examine, to use, or to make one’s own modifications or additions.

Open source software products are offered as free downloads to anyone who wants them. Perhaps you suspect that stuff given away for free cannot be of high quality. Let me assure you that some of this free software conforms to the highest professional standards.

The particular software chosen for this book, R, is a programming language and collection of statistical, mathematical, and graphing programs used by literally millions of people around the world, including many leading professionals in science, business, and media. You have likely seen graphics produced by R on websites, in major newspapers, and in other publications. You will be able to produce this kind of professional data visualization, too, because R works on computers running Windows, Macintosh, or Linux operating systems. This covers just about all the desktop and laptop computers out there today!

How to Use This Book

The way to get the most out of this book is to make a lot of graphs yourself. To this end, read the book while seated in front of your computer and reproduce all of the commands given here. Further, many sections have exercises that challenge you to go a step beyond the illustrations in the text, either by refining the example commands or by making another graph of a different dataset. It would be best to do this before going on to the next topic.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Constant width italic

Shows text that should be replaced with user-supplied values or by values determined by context.

Note

This element signifies a general note.

Using Code Examples

This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Graphing Data with R by John Jay Hilfiger (O’Reilly). Copyright 2016 John Jay Hilfiger, 978-1-491-92261-3.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .

Safari® Books Online

Note

Safari Books Online is an on-demand digital library that delivers expert content in both book and video form from the world’s leading authors in technology and business.

Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.

Safari Books Online offers a range of plans and pricing for enterprise, government, education, and individuals.

Members have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and hundreds more. For more information about Safari Books Online, please visit us online.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

  • O’Reilly Media, Inc.
  • 1005 Gravenstein Highway North
  • Sebastopol, CA 95472
  • 800-998-9938 (in the United States or Canada)
  • 707-829-0515 (international or local)
  • 707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://www.oreilly.com/catalog/0636920038382.do.

To comment or ask technical questions about this book, send email to .

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments

A number of people helped to make this book come into being. First, my wife, Karen, whose patience, understanding, and encouragement throughout the process were essential to my completing the task. Our son Eric and daughter Kristen read the first chapter and offered brutally frank assessments, which was humbling but very helpful. The technical reviewers, Drs. Raymond Bajorski, Sarah Boslaugh, and Phillipp K. Janert, were invaluable for their insights, corrections, and suggestions. My editor, Shannon Cutt, was extraordinarily capable and positive. She helped me navigate not only the writing but all the technical and practical details of preparing a manuscript for publication. I had no idea there was so much to do! Finally, the O’Reilly Media team, who do all the things you don’t see and do see that are absolutely essential to producing the quality library of books for which they are so respected. Thank you, all.

Get Graphing Data with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.