About the Authors

(Guilty parties are listed in order of appearance.)

Kevin Fink is an experienced biztech executive with a passion for turning data into business value. He has helped take two companies public (as CTO of N2H2 in 1999 and SVP Engineering at Demand Media in 2011), in addition to helping grow others (including as CTO of WhitePages.com for four years). On the side, he and his wife run Traumhof, a dressage training and boarding stable on their property east of Seattle. In his copious free time, he enjoys hiking, riding his tandem bicycle with his son, and geocaching.

Paul Murrell is a senior lecturer in the Department of Statistics at the University of Auckland, New Zealand. His research area is Statistical Computing and Graphics and he is a member of the core development team for the R project. He is the author of two books, R Graphics and Introduction to Data Technologies, and is a Fellow of the American Statistical Association.

Josh Levy is a data scientist in Austin, Texas. He works on content recommendation and text mining systems. He earned his doctorate at the University of North Carolina where he researched statistical shape models for medical image segmentation. His favorite foosball shot is banked from the backfield.

Adam Laiacano has a BS in Electrical Engineering from Northeastern University and spent several years designing signal detection systems for atomic clocks before joining a prominent NYC-based startup.

Jacob Perkins is the CTO of Weotta, a NLTK contributer, and the author of Python Text Processing with NLTK Cookbook. He also created the NLTK demo and API site text-processing.com, and periodically blogs at streamhacker.com. In a previous life, he invented the refrigerator.

Spencer Burns is a data scientist/engineer living in San Francisco. He has spent the past 15 years extracting information from messy data in fields ranging from intelligence to quantitative finance to social media.

Richard Cotton is a data scientist with a background in chemical health and safety, and has worked extensively on tools to give non-technical users access to statistical models. He is the author of the R packages “assertive” for checking the state of your variables and “sig” to make sure your functions have a sensible API. He runs The Damned Liars statistics consultancy.

Philipp K. Janert was born and raised in Germany. He obtained a Ph.D. in Theoretical Physics from the University of Washington in 1997 and has been working in the tech industry since, including four years at Amazon.com, where he initiated and led several projects to improve Amazon’s order fulfillment process. He is the author of two books on data analysis, including the best-selling Data Analysis with Open Source Tools (O’Reilly, 2010), and his writings have appeared on Perl.com, IBM developerWorks, IEEE Software, and in the Linux Magazine. He also has contributed to CPAN and other open-source projects. He lives in the Pacific Northwest.

Jonathan Schwabish is an economist at the Congressional Budget Office. He has conducted research on inequality, immigration, retirement security, data measurement, food stamps, and other aspects of public policy in the United States. His work has been published in the Journal of Human Resources, the National Tax Journal, and elsewhere. He is also a data visualization creator and has made designs on a variety of topics that range from food stamps to health care to education. His visualization work has been featured on the visualizaing.org and visual.ly websites. He has also spoken at numerous government agencies and policy institutions about data visualization strategies and best practices. He earned his Ph.D. in economics from Syracuse University and his undergraduate degree in economics from the University of Wisconsin at Madison.

Brett Goldstein is the Commissioner of the Department of Innovation and Technology for the City of Chicago. He has been in that role since June of 2012. Brett was previously the city’s Chief Data Officer. In this role, he lead the city’s approach to using data to help improve the way the government works for its residents. Before coming to City Hall as Chief Data Officer, he founded and commanded the Chicago Police Department’s Predictive Analytics Group, which aims to predict when and where crime will happen. Prior to entering the public sector, he was an early employee with OpenTable and helped build the company for seven years. He earned his BA from Connecticut College, his MS in criminal justice at Suffolk University, and his MS in computer science at University of Chicago. Brett is pursuing his PhD in Criminology, Law, and Justice at the University of Illinois-Chicago. He resides in Chicago with his wife and three children.

Bobby Norton is the co-founder of Tested Minds, a startup focused on products for social learning and rapid feedback. He has built software for over 10 years at firms such as Lockheed Martin, NASA, GE Global Research, ThoughtWorks, DRW Trading Group, and Aurelius. His data science tools of choice include Java, Clojure, Ruby, Bash, and R. Bobby holds a MS in Computer Science from FSU.

Steve Francia is the Chief Evangelist at 10gen where he is responsible for the MongoDB user experience. Prior to 10gen he held executive engineering roles at OpenSky, Portero, Takkle and Supernerd. He is a popular speaker on a broad set of topics including cloud computing, big data, e-commerce, development and databases. He is a published author, syndicated blogger (spf13.com) and frequently contributes to industry publications. Steve's work has been featured by the New York Times, Guardian UK, Mashable, ReadWriteWeb, and more. Steve is a long time contributor to open source. He enjoys coding in Vim and maintains a popular Vim distribution. Steve lives with his wife and four children in Connecticut.

Tim McNamara is a New Zealander with a laptop and a desire to do good. He is an active participant in both local and global open data communities, jumping between organising local meetups to assisting with the global CrisisCommons movement. His skills as a programmer began while assisting with the development Sahana Disaster Management System, were refined helping Sugar Labs, the software which runs the One Laptop Per Child XO. Tim has recently moved into the escience field, where he works to support the research community’s uptake of technology.

Marck Vaisman is a data scientist and claims he’s been one before the term was en vogue. He is also a consultant, entrepreneur, master munger, and hacker. Marck is the principal data scientist at DataXtract, LLC where he helps clients ranging from startups to Fortune 500 firms with all kinds of data science projects. His professional experience spans the management consulting, telecommunications, Internet, and technology industries. He is the co-founder of Data Community DC, an organization focused on building the Washington DC area data community and promoting data and statistical sciences by running Meetup events (including Data Science DC and R Users DC) and other initiatives. He has an MBA from Vanderbilt University and a BS in Mechanical Engineering from Boston University. When he’s not doing something data related, you can find him geeking out with his family and friends, swimming laps, scouting new and interesting restaurants, or enjoying good beer.

Pete Warden is an ex-Apple software engineer, wrote the Big Data Glossary and the Data Source Handbook for O’Reilly, created the open-source projects Data Science Toolkit and OpenHeatMap, and broke the story about Apple’s iPhone location tracking file. He’s the CTO and founder of Jetpac, a data-driven social photo iPad app, with over a billion pictures analyzed from 3 million people so far.

Jud Valeski is co-founder and CEO of Gnip, the leading provider of social media data for enterprise applications. From client-side consumer facing products to large scale backend infrastructure projects, he has enjoyed working with technology for over twenty years. He’s been a part of engineering, product, and M&A teams at IBM, Netscape, onebox.com, AOL, and me.dium. He has played a central role in the release of a wide range of products used by tens of millions of people worldwide.

Reid Draper is a functional programmer interested in distributed systems, programming languages, and coffee. He’s currently working for Basho on their distributed database: Riak.

Ken Gleason’s technology career experience spans more than twenty years, including real-time trading system software architecture and development and retail financial services application design. He has spent the last ten years in the data-driven field of electronic trading, where he has managed product development and high-frequency trading strategies. Ken holds an MBA from the University of Chicago Booth School of Business and a BS from Northwestern University.

Q. Ethan McCallum works as a professional-services consultant. His technical interests range from data analysis, to software, to infrastructure. His professional focus is helping businesses improve their standing—in terms of reduced risk, increased profit, and smarter decisions—through practical applications of technology. His written work has appeared online and in print, including Parallel R: Data Analysis in the Distributed World (O’Reilly, 2011).

Get Bad Data Handbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.