You are previewing Big Data Fundamentals: Concepts, Drivers & Techniques.
O'Reilly logo
Big Data Fundamentals: Concepts, Drivers & Techniques

Book Description

“This text should be required reading for everyone in contemporary business.”
--Peter Woodhull, CEO, Modus21

“The one book that clearly describes and links Big Data concepts to business utility.”
--Dr. Christopher Starr, PhD

“Simply, this is the best Big Data book on the market!”
--Sam Rostam, Cascadian IT Group

“ of the most contemporary approaches I’ve seen to Big Data fundamentals...”
--Joshua M. Davis, PhD

The Definitive Plain-English Guide to Big Data for Business and Technology Professionals

Big Data Fundamentals provides a pragmatic, no-nonsense introduction to Big Data. Best-selling IT author Thomas Erl and his team clearly explain key Big Data concepts, theory and terminology, as well as fundamental technologies and techniques. All coverage is supported with case study examples and numerous simple diagrams.

The authors begin by explaining how Big Data can propel an organization forward by solving a spectrum of previously intractable business problems. Next, they demystify key analysis techniques and technologies and show how a Big Data solution environment can be built and integrated to offer competitive advantages.

  • Discovering Big Data’s fundamental concepts and what makes it different from previous forms of data analysis and data science

  • Understanding the business motivations and drivers behind Big Data adoption, from operational improvements through innovation

  • Planning strategic, business-driven Big Data initiatives

  • Addressing considerations such as data management, governance, and security

  • Recognizing the 5 “V” characteristics of datasets in Big Data environments: volume, velocity, variety, veracity, and value

  • Clarifying Big Data’s relationships with OLTP, OLAP, ETL, data warehouses, and data marts

  • Working with Big Data in structured, unstructured, semi-structured, and metadata formats

  • Increasing value by integrating Big Data resources with corporate performance monitoring

  • Understanding how Big Data leverages distributed and parallel processing

  • Using NoSQL and other technologies to meet Big Data’s distinct data processing requirements

  • Leveraging statistical approaches of quantitative and qualitative analysis

  • Applying computational analysis methods, including machine learning

  • Table of Contents

    1. About This E-Book
    2. Title Page
    3. Copyright Page
    4. Dedication Page
    5. Contents at a Glance
    6. Contents
    7. Acknowledgments
    8. Reader Services
    9. Part I: The Fundamentals of Big Data
      1. Chapter 1. Understanding Big Data
        1. Concepts and Terminology
          1. Datasets
          2. Data Analysis
          3. Data Analytics
          4. Business Intelligence (BI)
          5. Key Performance Indicators (KPI)
        2. Big Data Characteristics
          1. Volume
          2. Velocity
          3. Variety
          4. Veracity
          5. Value
        3. Different Types of Data
          1. Structured Data
          2. Unstructured Data
          3. Semi-structured Data
          4. Metadata
        4. Case Study Background
          1. History
          2. Technical Infrastructure and Automation Environment
          3. Business Goals and Obstacles
      2. Chapter 2. Business Motivations and Drivers for Big Data Adoption
        1. Marketplace Dynamics
        2. Business Architecture
        3. Business Process Management
        4. Information and Communications Technology
          1. Data Analytics and Data Science
          2. Digitization
          3. Affordable Technology and Commodity Hardware
          4. Social Media
          5. Hyper-Connected Communities and Devices
          6. Cloud Computing
        5. Internet of Everything (IoE)
      3. Chapter 3. Big Data Adoption and Planning Considerations
        1. Organization Prerequisites
        2. Data Procurement
        3. Privacy
        4. Security
        5. Provenance
        6. Limited Realtime Support
        7. Distinct Performance Challenges
        8. Distinct Governance Requirements
        9. Distinct Methodology
        10. Clouds
        11. Big Data Analytics Lifecycle
          1. Business Case Evaluation
          2. Data Identification
          3. Data Acquisition and Filtering
          4. Data Extraction
          5. Data Validation and Cleansing
          6. Data Aggregation and Representation
          7. Data Analysis
          8. Data Visualization
          9. Utilization of Analysis Results
      4. Chapter 4. Enterprise Technologies and Big Data Business Intelligence
        1. Online Transaction Processing (OLTP)
        2. Online Analytical Processing (OLAP)
        3. Extract Transform Load (ETL)
        4. Data Warehouses
        5. Data Marts
        6. Traditional BI
          1. Ad-hoc Reports
          2. Dashboards
        7. Big Data BI
          1. Traditional Data Visualization
          2. Data Visualization for Big Data
    10. Part II: Storing and Analyzing Big Data
      1. Chapter 5. Big Data Storage Concepts
        1. Clusters
        2. File Systems and Distributed File Systems
        3. NoSQL
        4. Sharding
        5. Replication
          1. Master-Slave
          2. Peer-to-Peer
        6. Sharding and Replication
          1. Combining Sharding and Master-Slave Replication
          2. Combining Sharding and Peer-to-Peer Replication
        7. CAP Theorem
        8. ACID
        9. BASE
      2. Chapter 6. Big Data Processing Concepts
        1. Parallel Data Processing
        2. Distributed Data Processing
        3. Hadoop
        4. Processing Workloads
          1. Batch
          2. Transactional
        5. Cluster
        6. Processing in Batch Mode
          1. Batch Processing with MapReduce
          2. Map and Reduce Tasks
          3. A Simple MapReduce Example
          4. Understanding MapReduce Algorithms
        7. Processing in Realtime Mode
          1. Speed Consistency Volume (SCV)
          2. Event Stream Processing
          3. Complex Event Processing
          4. Realtime Big Data Processing and SCV
          5. Realtime Big Data Processing and MapReduce
      3. Chapter 7. Big Data Storage Technology
        1. On-Disk Storage Devices
          1. Distributed File Systems
          2. RDBMS Databases
          3. NoSQL Databases
          4. NewSQL Databases
        2. In-Memory Storage Devices
          1. In-Memory Data Grids
          2. In-Memory Databases
      4. Chapter 8. Big Data Analysis Techniques
        1. Quantitative Analysis
        2. Qualitative Analysis
        3. Data Mining
        4. Statistical Analysis
          1. A/B Testing
          2. Correlation
          3. Regression
        5. Machine Learning
          1. Classification (Supervised Machine Learning)
          2. Clustering (Unsupervised Machine Learning)
          3. Outlier Detection
          4. Filtering
        6. Semantic Analysis
          1. Natural Language Processing
          2. Text Analytics
          3. Sentiment Analysis
        7. Visual Analysis
          1. Heat Maps
          2. Time Series Plots
          3. Network Graphs
          4. Spatial Data Mapping
    11. Appendix A. Case Study Conclusion
    12. About the Authors
      1. Thomas Erl
      2. Wajid Khattak
      3. Paul Buhler
    13. Index