Preface

We’ve seen an explosion of interest in data ethics. Why now? Well, we know: fake news, data, and all that. But concern about data ethics started well before the 2016 election. It started well before Google’s automatic photo tagging misidentified some black people as gorillas. Concern for data ethics has been growing ever since we first started talking about data science, and possibly before.

Why indeed? Because data has been integrated into every aspect of our life: the friends and business connections we’re asked to make, the shopping circulars we receive in the mail, the news we see, and the songs we’ve played. Data is collected from us at every turn: every trace of our online presence, and sometimes even traces of our physical presence. We’ve gained some advantages from data, but we’ve also seen the damage that the misuse of data has caused. And many of these concerns were highlighted in multiple reports on data and AI from the White House including the call to the United States that all training programs for data science and technology include ethics and security.

It’s been great to see people gathering to discuss ethics at events like D4G and FAT*. It’s been great to watch the lively discussions of ethical principles on the Data For Democracy Slack. And it’s been great to read the many bloggers and commentators writing about ethics.

But what we’re still missing is an understanding for how to put ethics into practice in data as well as the overall product development process. Ethics really isn’t about agreeing to a set of principles. It’s about changing the way you act. To take one very simple example: it’s one thing to say that you should get permission from users before using their data in an experiment. It’s quite another thing to get permission at web scale. And it’s yet another thing to get permission in a way that explains clearly how the data will be used, and what the expected consequences are. That’s what we need to explore.

It’s also important to realize that ethics isn’t about a fixed list of do’s and don’ts. It’s primarily about having a discussion about how what you’re doing will affect other people, and whether those effects are acceptable.

That’s what this book is all about: putting ethics into practice. That means making room for discussion, making room for dissent, making sure that you think through the consequences at every stage of a project, and much more. What does ethics mean for hiring? How do you teach ethics in an academic setting? These are all big questions, and not questions that can be answered in a short book like this, but they’re questions that we need to talk about.

Data science is a team sport and we need you on the team. Given the pace of technology and evolution of thinking that we expect on data, we consider this work an iterative project. Just like technology releases that move from a 0.1 release to a 1.0 release to a 3.1 release, our hope is that others will contribute new sections and existing sections will be modified. To enable that, we’re making this free for download and also on GitHub so it can be a community effort.

Thanks to our reviewers: Ed Felton, Natalie Evans-Harris, François Chollet, and Casey Lynn Fiesler.

Get Ethics and Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.