O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data Science For Dummies, 2nd Edition

Book Description

Your ticket to breaking into the field of data science!

Jobs in data science are projected to outpace the number of people with data science skills—making those with the knowledge to fill a data science position a hot commodity in the coming years. Data Science For Dummies is the perfect starting point for IT professionals and students interested in making sense of an organization's massive data sets and applying their findings to real-world business scenarios.

From uncovering rich data sources to managing large amounts of data within hardware and software limitations, ensuring consistency in reporting, merging various data sources, and beyond, you'll develop the know-how you need to effectively interpret data and tell a story that can be understood by anyone in your organization.

  • Provides a background in data science fundamentals and preparing your data for analysis
  • Details different data visualization techniques that can be used to showcase and summarize your data
  • Explains both supervised and unsupervised machine learning, including regression, model validation, and clustering techniques
  • Includes coverage of big data processing tools like MapReduce, Hadoop, Dremel, Storm, and Spark

It's a big, big data world out there—let Data Science For Dummies help you harness its power and gain a competitive edge for your organization.

Table of Contents

    1. Cover
    2. Introduction
      1. About This Book
      2. Foolish Assumptions
      3. Icons Used in This Book
      4. Beyond the Book
      5. Where to Go from Here
    3. Foreword
    4. Part 1: Getting Started with Data Science
      1. Chapter 1: Wrapping Your Head around Data Science
        1. Seeing Who Can Make Use of Data Science
        2. Analyzing the Pieces of the Data Science Puzzle
        3. Exploring the Data Science Solution Alternatives
        4. Letting Data Science Make You More Marketable
      2. Chapter 2: Exploring Data Engineering Pipelines and Infrastructure
        1. Defining Big Data by the Three Vs
        2. Identifying Big Data Sources
        3. Grasping the Difference between Data Science and Data Engineering
        4. Making Sense of Data in Hadoop
        5. Identifying Alternative Big Data Solutions
        6. Data Engineering in Action: A Case Study
      3. Chapter 3: Applying Data-Driven Insights to Business and Industry
        1. Benefiting from Business-Centric Data Science
        2. Converting Raw Data into Actionable Insights with Data Analytics
        3. Taking Action on Business Insights
        4. Distinguishing between Business Intelligence and Data Science
        5. Defining Business-Centric Data Science
        6. Differentiating between Business Intelligence and Business-Centric Data Science
        7. Knowing Whom to Call to Get the Job Done Right
        8. Exploring Data Science in Business: A Data-Driven Business Success Story
    5. Part 2: Using Data Science to Extract Meaning from Your Data
      1. Chapter 4: Machine Learning: Learning from Data with Your Machine
        1. Defining Machine Learning and Its Processes
        2. Considering Learning Styles
        3. Seeing What You Can Do
      2. Chapter 5: Math, Probability, and Statistical Modeling
        1. Exploring Probability and Inferential Statistics
        2. Quantifying Correlation
        3. Reducing Data Dimensionality with Linear Algebra
        4. Modeling Decisions with Multi-Criteria Decision Making
        5. Introducing Regression Methods
        6. Detecting Outliers
        7. Introducing Time Series Analysis
      3. Chapter 6: Using Clustering to Subdivide Data
        1. Introducing Clustering Basics
        2. Identifying Clusters in Your Data
        3. Categorizing Data with Decision Tree and Random Forest Algorithms
      4. Chapter 7: Modeling with Instances
        1. Recognizing the Difference between Clustering and Classification
        2. Making Sense of Data with Nearest Neighbor Analysis
        3. Classifying Data with Average Nearest Neighbor Algorithms
        4. Classifying with K-Nearest Neighbor Algorithms
        5. Solving Real-World Problems with Nearest Neighbor Algorithms
      5. Chapter 8: Building Models That Operate Internet-of-Things Devices
        1. Overviewing the Vocabulary and Technologies
        2. Digging into the Data Science Approaches
        3. Advancing Artificial Intelligence Innovation
    6. Part 3: Creating Data Visualizations That Clearly Communicate Meaning
      1. Chapter 9: Following the Principles of Data Visualization Design
        1. Data Visualizations: The Big Three
        2. Designing to Meet the Needs of Your Target Audience
        3. Picking the Most Appropriate Design Style
        4. Choosing How to Add Context
        5. Selecting the Appropriate Data Graphic Type
        6. Choosing a Data Graphic
      2. Chapter 10: Using D3.js for Data Visualization
        1. Introducing the D3.js Library
        2. Knowing When to Use D3.js (and When Not To)
        3. Getting Started in D3.js
        4. Implementing More Advanced Concepts and Practices in D3.js
      3. Chapter 11: Web-Based Applications for Visualization Design
        1. Designing Data Visualizations for Collaboration
        2. Visualizing Spatial Data with Online Geographic Tools
        3. Visualizing with Open Source: Web-Based Data Visualization Platforms
        4. Knowing When to Stick with Infographics
      4. Chapter 12: Exploring Best Practices in Dashboard Design
        1. Focusing on the Audience
        2. Starting with the Big Picture
        3. Getting the Details Right
        4. Testing Your Design
      5. Chapter 13: Making Maps from Spatial Data
        1. Getting into the Basics of GIS
        2. Analyzing Spatial Data
        3. Getting Started with Open-Source QGIS
    7. Part 4: Computing for Data Science
      1. Chapter 14: Using Python for Data Science
        1. Sorting Out the Python Data Types
        2. Putting Loops to Good Use in Python
        3. Having Fun with Functions
        4. Keeping Cool with Classes
        5. Checking Out Some Useful Python Libraries
        6. Analyzing Data with Python — an Exercise
      2. Chapter 15: Using Open Source R for Data Science
        1. R’s Basic Vocabulary
        2. Delving into Functions and Operators
        3. Iterating in R
        4. Observing How Objects Work
        5. Sorting Out Popular Statistical Analysis Packages
        6. Examining Packages for Visualizing, Mapping, and Graphing in R
      3. Chapter 16: Using SQL in Data Science
        1. Getting a Handle on Relational Databases and SQL
        2. Investing Some Effort into Database Design
        3. Integrating SQL, R, Python, and Excel into Your Data Science Strategy
        4. Narrowing the Focus with SQL Functions
      4. Chapter 17: Doing Data Science with Excel and Knime
        1. Making Life Easier with Excel
        2. Using KNIME for Advanced Data Analytics
    8. Part 5: Applying Domain Expertise to Solve Real-World Problems Using Data Science
      1. Chapter 18: Data Science in Journalism: Nailing Down the Five Ws (and an H)
        1. Who Is the Audience?
        2. What: Getting Directly to the Point
        3. Bringing Data Journalism to Life: The Black Budget
        4. When Did It Happen?
        5. Where Does the Story Matter?
        6. Why the Story Matters
        7. How to Develop, Tell, and Present the Story
        8. Collecting Data for Your Story
        9. Finding and Telling Your Data’s Story
      2. Chapter 19: Delving into Environmental Data Science
        1. Modeling Environmental-Human Interactions with Environmental Intelligence
        2. Modeling Natural Resources in the Raw
        3. Using Spatial Statistics to Predict for Environmental Variation across Space
      3. Chapter 20: Data Science for Driving Growth in E-Commerce
        1. Making Sense of Data for E-Commerce Growth
        2. Optimizing E-Commerce Business Systems
      4. Chapter 21: Using Data Science to Describe and Predict Criminal Activity
        1. Temporal Analysis for Crime Prevention and Monitoring
        2. Spatial Crime Prediction and Monitoring
        3. Probing the Problems with Data Science for Crime Analysis
    9. Part 6: The Part of Tens
      1. Chapter 22: Ten Phenomenal Resources for Open Data
        1. Digging through data.gov
        2. Checking Out Canada Open Data
        3. Diving into data.gov.uk
        4. Checking Out U.S. Census Bureau Data
        5. Knowing NASA Data
        6. Wrangling World Bank Data
        7. Getting to Know Knoema Data
        8. Queuing Up with Quandl Data
        9. Exploring Exversion Data
        10. Mapping OpenStreetMap Spatial Data
      2. Chapter 23: Ten Free Data Science Tools and Applications
        1. Making Custom Web-Based Data Visualizations with Free R Packages
        2. Examining Scraping, Collecting, and Handling Tools
        3. Looking into Data Exploration Tools
        4. Evaluating Web-Based Visualization Tools
    10. About the Author
    11. Connect with Dummies
    12. End User License Agreement