You are previewing Oracle Big Data Handbook.
O'Reilly logo
Oracle Big Data Handbook

Book Description

Transform Big Data into Insight “In this book, some of Oracle’s best engineers and architects explain how you can make use of big data. They’ll tell you how you can integrate your existing Oracle solutions with big data systems, using each where appropriate and moving data between them as needed.” -- Doug Cutting, co-creator of Apache HadoopCowritten by members of Oracle’s big data team, Oracle Big Data Handbook provides complete coverage of Oracle’s comprehensive, integrated set of products for acquiring, organizing, analyzing, and leveraging unstructured data. The book discusses the strategies and technologies essential for a successful big data implementation, including Apache Hadoop, Oracle Big Data Appliance, Oracle Big Data Connectors, Oracle NoSQL Database, Oracle Endeca, Oracle Advanced Analytics, and Oracle’s open source R offerings. Best practices for migrating from legacy systems and integrating existing data warehousing and analytics solutions into an enterprise big data infrastructure are also included in this Oracle Press guide.

  • Understand the value of a comprehensive big data strategy

  • Maximize the distributed processing power of the Apache Hadoop platform

  • Discover the advantages of using Oracle Big Data Appliance as an engineered system for Hadoop and Oracle NoSQL Database

  • Configure, deploy, and monitor Hadoop and Oracle NoSQL Database using Oracle Big Data Appliance

  • Integrate your existing data warehousing and analytics infrastructure into a big data architecture

  • Share data among Hadoop and relational databases using Oracle Big Data Connectors

  • Understand how Oracle NoSQL Database integrates into the Oracle Big Data architecture

  • Deliver faster time to value using in-database analytics

  • Analyze data with Oracle Advanced Analytics (Oracle R Enterprise and Oracle Data Mining), Oracle R Distribution, R Oracle, and Oracle R Connector for Hadoop

  • Analyze disparate data with Oracle Endeca Information Discovery

  • Plan and implement a big data governance strategy and develop an architecture and roadmap

Table of Contents

  1. Cover 
  2. Title Page
  3. Copyright Page
  4. About the Authors
  5. Contents at a Glance
  6. Contents 
  7. Acknowledgments
  8. Introduction
  9. Part I: Introduction
    1. Chapter 1: Introduction to Big Data
      1. Big Data
      2. Google’s MapReduce Algorithm and Apache Hadoop
      3. Oracle’s Big Data Platform
      4. Summary
    2. Chapter 2: The Value of Big Data
      1. Am I Big Data, or Is Big Data Me?
      2. Big Data, Little Data—It’s Still Me
        1. What Happened?
        2. Now What?
      3. Reality, Check Please!
      4. What Do You Make of It?
        1. Information Chain Reaction (ICR)
      5. Big Data, Big Numbers, Big Business?
        1. Twitter
        2. Facebook
        3. Internal Source
        4. ICR: Connect
        5. ICR: Change
      6. Wanted: Big Data Value
        1. Big Data Example 1: Clinical Trial Research Within the Healthcare Industry
        2. Example 2: Improvements in Car Design for Driver Safety Within the Automotive Industry
      7. Summary
  10. Part II: Big Data Platform
    1. Chapter 3: The Apache Hadoop Platform
      1. Software vs. Hardware
      2. The Hadoop Software Platform
        1. Hadoop Distributions and Versions
        2. The Hadoop Distributed File System (HDFS)
        3. Scheduling, Compute, and Processing
      3. Operating System Choices
        1. I/O and the Linux Kernel
      4. The Hadoop Hardware Platform
        1. CPU and Memory
        2. Network
        3. Disk
      5. Putting It All Together
    2. Chapter 4: Why an Appliance?
      1. Why Would Oracle Create a Big Data Appliance?
      2. What Is an Appliance?
      3. What Are the Goals of Oracle Big Data Appliance?
      4. Optimizing an Appliance
      5. Oracle Big Data Appliance Version 2 Software
      6. Oracle Big Data Appliance X3-2 Hardware
      7. Where Did Oracle Get Hadoop Expertise?
      8. Configuring a Hadoop Cluster
        1. Choosing the Core Cluster Components
        2. Assembling the Cluster
      9. What About a Do-It-Yourself Cluster?
      10. Total Costs of a Cluster
      11. Time to Value
      12. How to Build Out Larger Clusters
      13. Can I Add Other Software to Oracle Big Data Appliance?
      14. Drawbacks of an Appliance
    3. Chapter 5: BDA Configurations, Deployment Architectures, and Monitoring
      1. Introduction
        1. Big Data Appliance X3-2 Full Rack (Eighteen Nodes)
        2. Big Data Appliance X3-2 Starter Rack (Six Nodes)
        3. Big Data Appliance X3-2 In-Rack Expansion (Six Nodes)
        4. Hardware Modifications to BDA
        5. Software Supported on Big Data Appliance X3-2
      2. BDA Install and Configuration Process
      3. Critical and Noncritical Nodes
      4. Automatic Failover of the NameNode
      5. BDA Disk Storage Layout
      6. Adding Storage to a Hadoop Cluster
      7. Hadoop-Only Config and Hadoop+NoSQL DB
        1. Hadoop-Only Appliance
        2. Hadoop and NoSQL DB
      8. Memory Options
      9. Deployment Architectures
        1. Multitenancy and Hadoop in the Cloud
        2. Scalability
        3. Multirack BDA Considerations
      10. Installing Other Software on the BDA
      11. BDA in the Data Center
        1. Administrative Network
        2. Client Access Network
        3. InfiniBand Private Network
        4. Network Requirements
        5. Connecting to Data Center LAN
        6. Example Connectivity Architecture
      12. Oracle Big Data Appliance Restrictions on Use
      13. BDA Management and Monitoring
        1. Enterprise Manager
        2. Cloudera Manager
        3. Hadoop Monitoring Utilities: Web GUI
        4. Oracle ILOM
        5. Hue
        6. DCLI Utility
    4. Chapter 6: Integrating the Data Warehouse and Analytics Infrastructure to Big Data
      1. The Data Warehouse as a Historic Database of Record
        1. The Oracle Database as a Data Warehouse
        2. Why the Data Warehouse and Hadoop Are Deployed Together
      2. Completing the Footprint: Business Analyst Tools
      3. Building Out the Infrastructure
    5. Chapter 7: BDA Connectors
      1. Oracle Big Data Connectors
      2. Oracle Loader for Hadoop
        1. Online Mode
        2. Oracle OCI Direct Path Output
        3. JDBC Output
        4. Offline Mode
        5. Oracle Data Pump Output
        6. Delimited Text Output
      3. Installation of Oracle Loader for Hadoop
      4. Invoking Oracle Loader for Hadoop
      5. Input Formats
        1. DelimitedTextInputFormat
        2. RegexInputFormat
        3. AvroInputFormat
        4. HiveToAvroInputFormat
        5. KVAvroInputFormat
        6. Custom Input Formats
      6. Oracle Loader for Hadoop Configuration Files
        1. Loader Maps
        2. Additional Optimizations
        3. Leveraging InfiniBand
        4. Comparison to Apache Sqoop
      7. Oracle SQL Connector for HDFS
      8. Installation of Oracle SQL Connector for HDFS
      9. HIVE Installation
      10. Creating External Tables Using Oracle SQL Connector for HDFS
        1. ExternalTable Configuration Tool
        2. Data Source Types
        3. Configuration Tool Syntax
        4. Required Properties
        5. Optional Properties
        6. ExternalTable Tool for Delimited Text Files
        7. Testing DDL with --noexecute
        8. Adding a New HDFS File to the Location File
        9. Manual External Table Configuration
      11. Hive Sources
        1. ExternalTable Example
      12. Oracle Data Pump Sources
      13. Configuration Files
      14. Querying with Oracle SQL Connector for HDFS
      15. Oracle R Connector for Hadoop
      16. Oracle Data Integrator Application Adapter for Hadoop
    6. Chapter 8: Oracle NoSQL Database
      1. What Is a NoSQL Database System?
        1. NoSQL Applications
      2. Oracle NoSQL Database
        1. A Sample Use Case
      3. Architecture
        1. Client Driver
        2. Key-Value Pairs
        3. Storage Nodes
        4. Replication
        5. Smart Topology
        6. Online Elasticity
        7. No Single Point of Failure
      4. Data Management
        1. APIs
        2. CRUD Operations
        3. Multiple Update Operations
        4. Lookup Operations
        5. Transactions
        6. Predictable Performance
      5. Integration
      6. Installation and Administration
        1. Simple Installation
        2. Administration
      7. How Oracle NoSQL Database Stacks Up
      8. Useful Links
  11. Part III: Analyzing Information and Making Decisions
    1. Chapter 9: In-Database Analytics: Delivering Faster Time to Value
      1. Introduction
        1. Oracle’s In-Database Analytics
        2. Why Running In-Database Is So Important
      2. Introduction to Oracle Data Mining and Statistical Analysis
        1. Oracle’s In-Database Advanced Analytics
        2. Oracle Data Mining
        3. Introduction to R
        4. Text Mining
      3. In-Database Statistical Functions
        1. Making BI Tools Smarter
      4. Spatial Analytics
        1. Understanding the Spatial Data Model
        2. Querying the Spatial Data Model
        3. Using Spatial Analytics
        4. Making BI Tools Smarter
      5. Graph-Based Analytics
        1. Graph Data Model
        2. Querying Graph Data
      6. Multidimensional Analytics
        1. Making BI Tools Smarter and Faster
      7. In-Database Analytics: Bringing It All Together
        1. Integrating Analytics into Extract-Load-Transform Processing
        2. Delivering Guided Exploration
        3. Delivering Analytical Mash-ups
      8. Conclusion
    2. Chapter 10: Analyzing Data with R
      1. Introduction to Open Source R
        1. CRAN, Packages, and Task Views
        2. GUIs and IDEs
      2. Traditional R and Database Interaction vs. Oracle R Enterprise
      3. Oracle’s Strategic R Offerings
        1. Oracle R Enterprise
        2. Oracle R Distribution
        3. ROracle
        4. Oracle R Connector for Hadoop
      4. Oracle R Enterprise: Next-Level View
      5. Oracle R Enterprise Installation and Configuration
      6. Using Oracle R Enterprise
        1. Transparency Layer
        2. Embedded R Execution
        3. Predictive Analytics
      7. Oracle R Connector for Hadoop
        1. Invoking MapReduce Jobs
        2. Testing ORCH R Scripts Without the Hadoop Cluster
        3. Interacting with HDFS from R
        4. HDFS Metadata Discovery
        5. Working with Hadoop Using the ORCH Framework
        6. ORCH Predictive Analytics on Hadoop
        7. ORCHhive
        8. Oracle R Connector for Hadoop and Oracle R Enterprise Interaction
      8. Summary
    3. Chapter 11: Endeca Information Discovery
      1. Why Did Oracle Select Endeca?
        1. Product Suites Overview
      2. Endeca Information Discovery Platform
        1. Major Functional Areas
        2. Key Features
      3. Endeca Information Discovery and Business Intelligence
        1. Difference in Roles and Functions
        2. BI Development Process vs. Information Discovery Approach
        3. Complementary But Not Exclusive
      4. Architecture
        1. Oracle Endeca Server
        2. Oracle Endeca Studio
        3. Oracle Endeca Integration Suite
        4. Endeca on Exalytics
        5. Scalability and Load Balancing
      5. Unifying Diverse Content Sets
        1. Endeca Differentiator
        2. Industry Use Cases
      6. Hands-On with Endeca
        1. Installation and Configuration
        2. Developing an Endeca Application
    4. Chapter 12: Big Data Governance
      1. Key Elements of Enterprise Data Governance
        1. Business Outcome
        2. Information Lifecycle Management
        3. Regulatory Compliance and Risk Management
        4. Metadata Management
        5. Data Quality Management
        6. Master and Reference Data Management
        7. Data Security and Privacy Management
        8. Business Process Alignment
      2. How Does Big Data Impact Enterprise Data Governance?
        1. Modeled Data vs. Raw Data
        2. Types of Big Data
        3. Applying Data Governance to Big Data
        4. Leveraging Big Data Governance
      3. Industry-Specific Use Cases
        1. Utilities
        2. Healthcare
        3. Financial Services
        4. Retail
        5. Consumer Packaged Goods (CPG)
        6. Telecommunications
        7. Oil and Gas
      4. How Does Big Data Impact Data Governance Roles?
        1. Governance Roles and Organization
      5. An Approach to Implementing Big Data Governance
    5. Chapter 13: Developing Architecture and Roadmap for Big Data
      1. Architecture Capabilities for Big Data
        1. New Characteristics of Big Data
        2. Conceptual Architecture Capabilities of Big Data
        3. Product Capabilities and Tools
        4. Making Big Data Architecture Decisions
      2. Architecture Development Process for Realizing Incremental Values
        1. Overview of Oracle Information Architecture Framework
        2. Overview of Applied OADP for Information Architecture
        3. Big Data Architecture Development Process
      3. Impact on Data Management and BI Processes
        1. Traditional BI Development Process
        2. Big Data and Analytics Development Process
      4. Big Data Governance
        1. Traditional Data Governance Focus
        2. New Focus for Governance in Big Data
      5. Developing Skills and Talent
        1. Data Scientist
        2. Big Data Developer
        3. Big Data Administrator
      6. Big Data Best Practices
        1. Align Big Data Initiative with Specific Business Goals
        2. Ensure a Centralized IT Strategy for Standards and Governance
        3. Use a Center of Excellence to Minimize Training and Risk
        4. Correlate Big Data with Structured Data
        5. Provide High-Performance and Scalable Analytical Sandboxes
        6. Reshape the IT Operating Model
  12. Index