O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data Infrastructure for Next-Gen Finance

Book Description

Stricter regulations and changing technology have forced financial services organizations to make major changes in the way they handle sensitive data. With a focus on engineering and infrastructure, this O’Reilly report examines the tools and best practices that leading financial firms are using to migrate data to the cloud, build customer event hubs, and adhere to new rules for governance and security.

Based on talks given at recent Strata + Hadoop World events, this detailed report explains how Capital One, MasterCard Advisors, and the Financial Industry Regulatory Authority (FINRA) tackled major data projects with help from technology leaders such as Cloudera and Intel.

  • Learn how FINRA migrated their portfolio from a data warehouse to the Hadoop cloud ecosystem
  • Understand what’s required to support data governance in finance, and learn about the infrastructure Capital One implemented
  • Delve into Hadoop’s security maturity model, compliance-ready security controls, and enterprise data hub for preventing breaches
  • Examine the architecture of a Customer Event Hub, a tool that’s pushing the boundaries of how organizations interact with customers

Table of Contents

  1. Preface
  2. 1. Cloud Migration: From Data Center to Hadoop in the Cloud
    1. The Balancing Act of FINRA’s Legacy Architecture
    2. Legacy Architecture Pain Points: Silos, High Costs, Lack of Elasticity
    3. The Hadoop Ecosystem in the Cloud
      1. SQL and Hive
      2. Amazon EMR
      3. Amazon S3
      4. Capabilities of a Cloud-Based Architecture
    4. Lessons Learned and Best Practices
    5. Benefits Reaped
  3. 2. Preventing a Big Data Security Breach: The Hadoop Security Maturity Model
    1. Hadoop Security Gaps and Challenges
    2. The Hadoop Security Maturity Model
      1. Stage 1: Proof of Concept (High Vulnerability)
      2. Stage 2: Live Data with Real Users (Ensuring Basic Security Controls)
      3. Stage 3: Multiple Workloads (Data Is Managed, Secure, and Protected)
      4. Stage 4: In Production at Scale (Fully Compliance Ready)
    3. Compliance-Ready Security Controls
      1. Cloudera Manager (Authentication)
      2. Apache Sentry (Access Permissions)
      3. Cloudera Navigator (Visibility)
      4. HDFS Encryption (Protection)
      5. Cloudera RecordService (Synchronization)
    4. MasterCard’s Journey
      1. Looking for Lineage
      2. Segregation of Duties
      3. Documentation
      4. Awareness Training
      5. Strong Authentication
      6. Security Logging and Alerts
      7. Continuous Penetration Testing
      8. Native Data Encryption
      9. Embedding Security in Metadata
      10. Key Management
      11. Keep a Separate Lake of Anonymized Data
  4. 3. Big Data Governance: Practicalities and Realities
    1. The Importance of Big Data Governance
    2. What Is Driving Big Data Governance?
    3. Lineage: Tools, People, and Metadata
    4. ROI and the Business Case for Big Data Governance
    5. Ownership, Stewardship, and Curation
    6. The Future of Data Governance
      1. Ethics
      2. Machine Learning
      3. Data Quality Management
      4. Data Access
  5. 4. The Goal and Architecture of a Customer Event Hub
    1. What Is a Customer Event Hub?
      1. 360-Degree Customer View versus Customer Event Hub
      2. A Customer Event Hub in Action
      3. Key Advantages for Your Business
    2. Architecture of a CEH
      1. Capturing and Integrating Events
      2. Sanitizing and Standardizing Events
      3. Delivering Data for Consumption
    3. Drift: The Key Challenge in Implementing a High-Level Architecture
    4. Ingestion Infrastructures to Combat Drift