You are previewing Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data.
O'Reilly logo
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data

Book Description

Data Science and Big Data Analytics is about harnessing the power of data for new insights. The book covers the breadth of activities and methods and tools that Data Scientists use. The content focuses on concepts, principles and practical applications that are applicable to any industry and technology environment, and the learning is supported and explained with examples that you can replicate using open-source software.

This book will help you:

  • Become a contributor on a data science team

  • Deploy a structured lifecycle approach to data analytics problems

  • Apply appropriate analytic techniques and tools to analyzing big data

  • Learn how to tell a compelling story with data to drive business action

  • Prepare for EMC Proven Professional Data Science Certification

  • Corresponding data sets are available at www.wiley.com/go/9781118876138.

    Get started discovering, analyzing, visualizing, and presenting data in a meaningful way today!

    Table of Contents

    1. Cover Page
    2. Title Page
    3. Copyright
    4. Credits
    5. About the Key Contributors
    6. Acknowledgments
    7. Contents
    8. Foreword
    9. Introduction
      1. EMC Academic Alliance
      2. EMC Proven Professional Certification
    10. 1: Introduction to Big Data Analytics
      1. 1.1 Big Data Overview
      2. 1.2 State of the Practice in Analytics
      3. 1.3 Key Roles for the New Big Data Ecosystem
      4. 1.4 Examples of Big Data Analytics
      5. Summary
      6. Exercises
      7. Bibliography
    11. 2: Data Analytics Lifecycle
      1. 2.1 Data Analytics Lifecycle Overview
      2. 2.2 Phase 1: Discovery
      3. 2.3 Phase 2: Data Preparation
      4. 2.4 Phase 3: Model Planning
      5. 2.5 Phase 4: Model Building
      6. 2.6 Phase 5: Communicate Results
      7. 2.7 Phase 6: Operationalize
      8. 2.8 Case Study: Global Innovation Network and Analysis (GINA)
      9. Summary
      10. Exercises
      11. Bibliography
    12. 3: Review of Basic Data Analytic Methods Using R
      1. 3.1 Introduction to R
      2. 3.2 Exploratory Data Analysis
      3. 3.3 Statistical Methods for Evaluation
      4. Summary
      5. Exercises
      6. Bibliography
    13. 4: Advanced Analytical Theory and Methods: Clustering
      1. 4.1 Overview of Clustering
      2. 4.2 K-means
      3. 4.3 Additional Algorithms
      4. Summary
      5. Exercises
      6. Bibliography
    14. 5: Advanced Analytical Theory and Methods: Association Rules
      1. 5.1 Overview
      2. 5.2 Apriori Algorithm
      3. 5.3 Evaluation of Candidate Rules
      4. 5.4 Applications of Association Rules
      5. 5.5 An Example: Transactions in a Grocery Store
      6. 5.6 Validation and Testing
      7. 5.7 Diagnostics
      8. Summary
      9. Exercises
      10. Bibliography
    15. 6: Advanced Analytical Theory and Methods: Regression
      1. 6.1 Linear Regression
      2. 6.2 Logistic Regression
      3. 6.3 Reasons to Choose and Cautions
      4. 6.4 Additional Regression Models
      5. Summary
      6. Exercises
    16. 7: Advanced Analytical Theory and Methods: Classification
      1. 7.1 Decision Trees
      2. 7.2 Naïve Bayes
      3. 7.3 Diagnostics of Classifiers
      4. 7.4 Additional Classification Methods
      5. Summary
      6. Exercises
      7. Bibliography
    17. 8: Advanced Analytical Theory and Methods: Time Series Analysis
      1. 8.1 Overview of Time Series Analysis
      2. 8.2 ARIMA Model
      3. 8.3 Additional Methods
      4. Summary
      5. Exercises
    18. 9: Advanced Analytical Theory and Methods: Text Analysis
      1. 9.1 Text Analysis Steps
      2. 9.2 A Text Analysis Example
      3. 9.3 Collecting Raw Text
      4. 9.4 Representing Text
      5. 9.5 Term Frequency—Inverse Document Frequency (TFIDF)
      6. 9.6 Categorizing Documents by Topics
      7. 9.7 Determining Sentiments
      8. 9.8 Gaining Insights
      9. Summary
      10. Exercises
      11. Bibliography
    19. 10: Advanced Analytics— Technology and Tools: MapReduce and Hadoop
      1. 10.1 Analytics for Unstructured Data
      2. 10.2 The Hadoop Ecosystem
      3. 10.3 NoSQL
      4. Summary
      5. Exercises
      6. Bibliography
    20. 11: Advanced Analytics— Technology and Tools: In-Database Analytics
      1. 11.1 SQL Essentials
      2. 11.2 In-Database Text Analysis
      3. 11.3 Advanced SQL
      4. Summary
      5. Exercises
      6. Bibliography
    21. 12: The Endgame, or Putting It All Together
      1. 12.1 Communicating and Operationalizing an Analytics Project
      2. 12.2 Creating the Final Deliverables
      3. 12.3 Data Visualization Basics
      4. Summary
      5. Exercises
      6. References and Further Reading
      7. Bibliography
    22. Index