You are previewing Big Data Analytics: Turning Big Data into Big Money.

Big Data Analytics: Turning Big Data into Big Money

Cover of Big Data Analytics: Turning Big Data into Big Money by Frank J. Ohlhorst Published by John Wiley & Sons
  1. Cover
  2. Contents
  3. Title
  4. Copyright
  5. Preface
  6. Acknowledgments
  7. Chapter 1: What is Big Data?
    1. The Arrival of Analytics
    2. Where is the Value?
    3. More to Big Data Than Meets the Eye
    4. Dealing with the Nuances of Big Data
    5. An Open Source Brings Forth Tools
    6. Caution: Obstacles Ahead
  8. Chapter 2: Why Big Data Matters
    1. Big Data Reaches Deep
    2. Obstacles Remain
    3. Data Continue to Evolve
    4. Data and Data Analysis are Getting More Complex
    5. The Future is Now
  9. Chapter 3: Big Data and the Business Case
    1. Realizing Value
    2. The Case for Big Data
    3. The Rise of Big Data Options
    4. Beyond Hadoop
    5. With Choice Come Decisions
  10. Chapter 4: Building the Big Data Team
    1. The Data Scientist
    2. The Team Challenge
    3. Different Teams, Different Goals
    4. Don’t Forget the Data
    5. Challenges Remain
    6. Teams versus Culture
    7. Gauging Success
  11. Chapter 5: Big Data Sources
    1. Hunting for Data
    2. Setting the Goal
    3. Big Data Sources Growing
    4. Diving Deeper into Big Data Sources
    5. A Wealth of Public Information
    6. Getting Started with Big Data Acquisition
    7. Ongoing Growth, No End in Sight
  12. Chapter 6: The Nuts and Bolts of Big Data
    1. The Storage Dilemma
    2. Building a Platform
    3. Bringing Structure to Unstructured Data
    4. Processing Power
    5. Choosing among In-house, Outsourced, or Hybrid Approaches
  13. Chapter 7: Security, Compliance, Auditing, and Protection
    1. Pragmatic Steps to Securing Big Data
    2. Classifying Data
    3. Protecting Big Data Analytics
    4. Big Data and Compliance
    5. The Intellectual Property Challenge
  14. Chapter 8: The Evolution of Big Data
    1. Big Data: The Modern Era
    2. Today, Tomorrow, and the Next Day
    3. Changing Algorithms
  15. Chapter 9: Best Practices for Big Data Analytics
    1. Start Small with Big Data
    2. Thinking Big
    3. Avoiding Worst Practices
    4. Baby Steps
    5. The Value of Anomalies
    6. Expediency versus Accuracy
    7. In-Memory Processing
  16. Chapter 10: Bringing it All Together
    1. The Path to Big Data
    2. The Realities of Thinking Big Data
    3. Hands-on Big Data
    4. The Big Data Pipeline in Depth
    5. Big Data Visualization
    6. Big Data Privacy
  17. Appendix: Supporting Data
    1. “The MapR Distribution for Apache Hadoop”
    2. “High Availability: No Single Points of Failure”
  18. About the Author
  19. Index
O'Reilly logo

Chapter 5

Big Data Sources

One of the biggest challenges for most organizations is finding data sources to use as part of their analytics processes. As the name implies, Big Data is large, but size is not the only concern. There are several other considerations when deciding how to locate and parse Big Data sets.

The first step is to identify usable data. While that may be obvious, it is anything but simple. Locating the appropriate data to push through an analytics platform can be complex and frustrating. The source must be considered to determine whether the data set is appropriate for use. That translates into detective work or investigative reporting.

Considerations should include the following:

  • Structure of the data (structured, unstructured, semistructured, table based, proprietary)
  • Source of the data (internal, external, private, public)
  • Value of the data (generic, unique, specialized)
  • Quality of the data (verified, static, streaming)
  • Storage of the data (remotely accessed, shared, dedicated platforms, portable)
  • Relationship of the data (superset, subset, correlated)

All of those elements and many others can affect the selection process and can have a dramatic effect on how the raw data are prepared (“scrubbed”) before the analytics process takes place.

In the IT realm, once a data source is located, the next step is to import the data into an appropriate platform. That process may be as simple as copying data onto a Hadoop cluster or as complicated as scrubbing, indexing, ...

The best content for your career. Discover unlimited learning on demand for around $1/day.