You are previewing Hadoop for Finance Essentials.
O'Reilly logo
Hadoop for Finance Essentials

Book Description

Harness big data to provide meaningful insights, analytics, and business intelligence for your financial institution

In Detail

With the exponential growth of data and many enterprises crunching more and more data every day, Hadoop as a data platform has gained a lot of popularity. Financial businesses want to minimize risks and maximize opportunities, and Hadoop, largely dominating the big data market, plays a major role.

This book will get you started with the fundamentals of big data and Hadoop, enabling you to get to grips with solutions to many top financial big data use cases including regulatory projects and fraud detection. It is packed with industry references and code templates, and is designed to walk you through a wide range of Hadoop components.

By the end of the book, you'll understand a few industry leading architecture patterns, big data governance, tips, best practices, and standards to successfully develop your own Hadoop based solution.

What You Will Learn

  • Learn about big data and Hadoop fundamentals including practical finance use cases

  • Walk through Hadoop-based finance projects with explanations of solutions, big data governance, and how to sustain Hadoop momentum

  • Develop a range of solutions for small to large-scale data projects on the Hadoop platform

  • Learn how to process big data in the cloud

  • Present practical business cases to management to scale up existing platforms at enterprise level

  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

    Table of Contents

    1. Hadoop for Finance Essentials
      1. Table of Contents
      2. Hadoop for Finance Essentials
      3. Credits
      4. About the Author
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Errata
          2. Piracy
          3. Questions
      8. 1. Big Data Overview
        1. What is big data?
          1. Data volume
          2. Data velocity
          3. Data variety
        2. Big data technology evolution
          1. History
          2. Current
          3. Future
        3. The big data landscape
          1. Storage
          2. NoSQL
          3. NoSQL database types
          4. Resource management
          5. Data governance
          6. Batch computing
          7. Real-time computing
          8. Data integration tools
          9. Machine learning
          10. Business intelligence and virtualization
          11. Careers in big data
        4. Hadoop architecture
          1. HDFS cluster
          2. MapReduce V1
          3. MapReduce V2 – YARN
        5. The Hadoop jungle explained
          1. Big data tamed
          2. Hadoop – the hero
          3. HDFS – Hadoop Distributed Filesystem
            1. MapReduce
            2. HBase
            3. Hive
            4. Pig
            5. Zookeeper
            6. Oozie
            7. Flume
            8. Sqoop
        6. Hadoop distributions
          1. Distribution – on premise
          2. Distribution – cloud
        7. Summary
      9. 2. Big Data in Financial Services
        1. Big data use cases across industry sectors
          1. Healthcare
          2. Human science
          3. Telecom
          4. Online retailer
        2. Why big data in the financial sector?
        3. Big data use cases in the financial sector
          1. Data archival on HDFS
          2. Regulatory
          3. Fraud detection
          4. Tick data
          5. Risk management
          6. Customer behavior prediction
          7. Sentiment analysis – unstructured
          8. Other use cases
        4. Big data evolution in finance
        5. Big data tools – what to learn
          1. Getting your data into HDFS
          2. Querying data from HDFS
          3. SQL on Hadoop
          4. Real time
          5. Data governance and operations
          6. ETL tools
          7. Data analytics and business intelligence
        6. Big data implementations in finance
          1. The key challenges
          2. Overcoming the challenges
            1. Generate interest – play area
            2. Pilot with a low-cost project
            3. Hadoop is live – now scale it up
        7. Summary
      10. 3. Hadoop in the Cloud
        1. The big data cloud story
          1. The why
          2. The when
          3. What's the catch?
        2. Project details – risk simulations in the cloud
          1. Solution
          2. The current world
          3. The target world
            1. Data collection
            2. Configuring the Hadoop cluster
            3. Data upload
          4. Data transformation
          5. Data analysis
        3. Summary
      11. 4. Data Migration Using Hadoop
        1. Project details – archive your transaction data
          1. Solution
          2. Project Phase 1 – split trade data into DW and Hadoop
            1. The current world
            2. The target world
            3. Data collection
            4. How to do it
            5. Data analysis
              1. HDFS shell
              2. Hive queries
              3. Pig
          3. Project Phase 2 – migrate data from relational DW into Hadoop
            1. The current world
            2. The target world
            3. Data collection
              1. Check the connection to the relational database
              2. Import into Hadoop
                1. Initial data migration
                2. Periodic incremental data migration
              3. Import into Hive
            4. Data analysis
        2. Summary
      12. 5. Getting Started
        1. Project details – risk and regulatory reporting
          1. Solution
          2. The current world
          3. The target world
          4. Data collection
            1. Option 1 – Apache Oozie
            2. Option 2 – ETL tool ingestion
          5. Data transformation
            1. Hive or Pig?
            2. Hive
              1. Step 1 – Staging
              2. Step 2 – Output results
            3. Pig
              1. Step 1 – Staging
              2. Step 2 – Output results
              3. Other small use case to calculate risk – IR01
            4. Java MapReduce
          6. Data analysis
            1. BI tools
        2. Summary
      13. 6. Getting Experienced
        1. Real-time big data
        2. Project details – identifying fraudulent transactions
          1. Solution
          2. The current world
          3. The target world
          4. The Markov Chain Model execution – batch mode
            1. The Storm architecture
            2. The Spark architecture
          5. Data collection
            1. Using Storm
            2. Using Spark
          6. Data transformation
            1. Using Storm
            2. Using Spark
        3. Summary
      14. 7. Scale It Up
        1. Scale it up – actually horizontally
        2. A few more big data use cases
          1. Use case – fraud again
          2. Solution
          3. Use case – customer complaints
          4. Solution
          5. Use case – algorithm trading
          6. Solution
          7. Use case – forex trading
          8. Solution
          9. Use case – social media based trading
          10. Solution
          11. Use case – no big data
          12. Solution
        3. The data lake
        4. The lambda architecture
        5. Big data governance
          1. The Apache Falcon overview
        6. Security
        7. Summary
      15. 8. Sustain the Momentum
        1. The Hadoop distribution upgrade cycle
        2. Best practices and standards
          1. Environments
          2. Integration with the BI and ETL tools
          3. Tips
            1. Business
            2. Infrastructure
            3. Coding
        3. New trends
        4. Summary
      16. Index