You are previewing Practical Data Science Cookbook.
O'Reilly logo
Practical Data Science Cookbook

Book Description

89 hands-on recipes to help you complete real-world data science projects in R and Python

In Detail

As increasing amounts of data is generated each year, the need to analyze and operationalize it is more important than ever. Companies that know what to do with their data will have a competitive advantage over companies that don't, and this will drive a higher demand for knowledgeable and competent data professionals.

Starting with the basics, this book will cover how to set up your numerical programming environment, introduce you to the data science pipeline (an iterative process by which data science projects are completed), and guide you through several data projects in a step-by-step format. By sequentially working through the steps in each chapter, you will quickly familiarize yourself with the process and learn how to apply it to a variety of situations with examples in the two most popular programming languages for data analysis—R and Python.

What You Will Learn

  • Structure a data science project by using the data science pipeline
  • Acquire and ingest data from files, data stores, and directly from the Web
  • Clean, munge, and manipulate data into shape so that it is ready for analysis
  • Draw insights from the data and conduct analyses that will deliver those insights
  • Determine and apply the most appropriate model to your data
  • Interpret the results of your analysis and modeling
  • Communicate your results through a visualization, report, or application
  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

    Table of Contents

    1. Practical Data Science Cookbook
      1. Table of Contents
      2. Practical Data Science Cookbook
      3. Credits
      4. About the Authors
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why Subscribe?
          2. Free Access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Downloading the color images of this book
          3. Errata
          4. Piracy
          5. Questions
      8. 1. Preparing Your Data Science Environment
        1. Introduction
        2. Understanding the data science pipeline
          1. How to do it...
          2. How it works...
        3. Installing R on Windows, Mac OS X, and Linux
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        4. Installing libraries in R and RStudio
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        5. Installing Python on Linux and Mac OS X
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        6. Installing Python on Windows
          1. How to do it...
          2. How it works...
          3. See also
        7. Installing the Python data stack on Mac OS X and Linux
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        8. Installing extra Python packages
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        9. Installing and using virtualenv
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
      9. 2. Driving Visual Analysis with Automobile Data (R)
        1. Introduction
        2. Acquiring automobile fuel efficiency data
          1. Getting ready
          2. How to do it...
          3. How it works…
        3. Preparing R for your first project
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        4. Importing automobile fuel efficiency data into R
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. There's more...
          6. See also
        5. Exploring and describing fuel efficiency data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        6. Analyzing automobile fuel efficiency over time
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        7. Investigating the makes and models of automobiles
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
      10. 3. Simulating American Football Data (R)
        1. Introduction
          1. Requirements
        2. Acquiring and cleaning football data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        3. Analyzing and understanding football data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        4. Constructing indexes to measure offensive and defensive strength
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        5. Simulating a single game with outcomes decided by calculations
          1. Getting ready
          2. How to do it…
          3. How it works…
        6. Simulating multiple games with outcomes decided by calculations
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      11. 4. Modeling Stock Market Data (R)
        1. Introduction
          1. Requirements
        2. Acquiring stock market data
          1. How to do it...
        3. Summarizing the data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        4. Cleaning and exploring the data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        5. Generating relative valuations
          1. Getting ready
          2. How to do it...
          3. How it works...
        6. Screening stocks and analyzing historical prices
          1. Getting ready
          2. How to do it...
          3. How it works...
      12. 5. Visually Exploring Employment Data (R)
        1. Introduction
        2. Preparing for analysis
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        3. Importing employment data into R
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        4. Exploring the employment data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        5. Obtaining and merging additional data
          1. Getting ready
          2. How to do it…
          3. How it works…
        6. Adding geographical information
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        7. Extracting state- and county-level wage and employment information
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        8. Visualizing geographical distributions of pay
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        9. Exploring where the jobs are, by industry
          1. How to do it…
          2. How it works…
          3. There's more…
          4. See also
        10. Animating maps for a geospatial time series
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There is more…
        11. Benchmarking performance for some common tasks
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
      13. 6. Creating Application-oriented Analyses Using Tax Data (Python)
        1. Introduction
          1. An introduction to application-oriented approaches
        2. Preparing for the analysis of top incomes
          1. Getting ready
          2. How to do it...
          3. How it works...
        3. Importing and exploring the world's top incomes dataset
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        4. Analyzing and visualizing the top income data of the US
          1. Getting ready
          2. How to do it...
          3. How it works...
        5. Furthering the analysis of the top income groups of the US
          1. Getting ready
          2. How to do it...
          3. How it works...
        6. Reporting with Jinja2
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
      14. 7. Driving Visual Analyses with Automobile Data (Python)
        1. Introduction
        2. Getting started with IPython
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        3. Exploring IPython Notebook
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        4. Preparing to analyze automobile fuel efficiencies
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        5. Exploring and describing fuel efficiency data with Python
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more...
          5. See also
        6. Analyzing automobile fuel efficiency over time with Python
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        7. Investigating the makes and models of automobiles with Python
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
      15. 8. Working with Social Graphs (Python)
        1. Introduction
          1. Understanding graphs and networks
        2. Preparing to work with social networks in Python
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        3. Importing networks
          1. Getting ready
          2. How to do it...
          3. How it works...
        4. Exploring subgraphs within a heroic network
          1. Getting ready
          2. How to do it…
          3. How it works...
          4. There's more...
        5. Finding strong ties
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        6. Finding key players
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more…
            1. The betweenness centrality
            2. The closeness centrality
            3. The eigenvector centrality
            4. Deciding on centrality algorithm
        7. Exploring the characteristics of entire networks
          1. Getting ready
          2. How to do it...
          3. How it works...
        8. Clustering and community detection in social networks
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        9. Visualizing graphs
          1. Getting ready
          2. How to do it...
          3. How it works...
      16. 9. Recommending Movies at Scale (Python)
        1. Introduction
        2. Modeling preference expressions
          1. How to do it…
          2. How it works…
        3. Understanding the data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Ingesting the movie review data
          1. Getting ready
          2. How to do it…
          3. How it works…
        5. Finding the highest-scoring movies
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        6. Improving the movie-rating system
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        7. Measuring the distance between users in the preference space
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        8. Computing the correlation between users
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        9. Finding the best critic for a user
          1. Getting ready
          2. How to do it…
          3. How it works…
        10. Predicting movie ratings for users
          1. Getting ready
          2. How to do it…
          3. How it works…
        11. Collaboratively filtering item by item
          1. Getting ready
          2. How to do it…
          3. How it works…
        12. Building a nonnegative matrix factorization model
          1. How to do it…
          2. How it works…
          3. See also
        13. Loading the entire dataset into the memory
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        14. Dumping the SVD-based model to the disk
          1. How to do it…
          2. How it works…
        15. Training the SVD-based model
          1. How to do it…
          2. How it works…
          3. There's more…
        16. Testing the SVD-based model
          1. How to do it…
          2. How it works…
          3. There's more…
      17. 10. Harvesting and Geolocating Twitter Data (Python)
        1. Introduction
        2. Creating a Twitter application
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        3. Understanding the Twitter API v1.1
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        4. Determining your Twitter followers and friends
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        5. Pulling Twitter user profiles
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        6. Making requests without running afoul of Twitter's rate limits
          1. Getting ready
          2. How to do it...
          3. How it works...
        7. Storing JSON data to the disk
          1. Getting ready
          2. How to do it...
          3. How it works...
        8. Setting up MongoDB for storing Twitter data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        9. Storing user profiles in MongoDB using PyMongo
          1. Getting ready
          2. How to do it...
          3. How it works...
        10. Exploring the geographic information available in profiles
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        11. Plotting geospatial data in Python
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
      18. 11. Optimizing Numerical Code with NumPy and SciPy (Python)
        1. Introduction
        2. Understanding the optimization process
          1. How to do it…
          2. How it works…
          3. There's more…
        3. Identifying common performance bottlenecks in code
          1. How to do it…
          2. How it works…
        4. Reading through the code
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        5. Profiling Python code with the Unix time function
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        6. Profiling Python code using built-in Python functions
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        7. Profiling Python code using IPython's %timeit function
          1. How to do it…
          2. How it works…
        8. Profiling Python code using line_profiler
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        9. Plucking the low-hanging (optimization) fruit
          1. Getting ready
          2. How to do it…
          3. How it works…
        10. Testing the performance benefits of NumPy
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        11. Rewriting simple functions with NumPy
          1. Getting ready
          2. How to do it…
          3. How it works…
        12. Optimizing the innermost loop with NumPy
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      19. Index