You are previewing Big Data Now: Current Perspectives from O'Reilly Radar.
O'Reilly logo
Big Data Now: Current Perspectives from O'Reilly Radar

Book Description

This collection represents the full spectrum of data-related content we’ve published on O’Reilly Radar over the last year. Mike Loukides kicked things off in June 2010 with “What is data science?” and from there we’ve pursued the various threads and themes that naturally emerged. Now, roughly a year later, we can look back over all we’ve covered and identify a number of core data areas:

Data issues -- The opportunities and ambiguities of the data space are evident in discussions around privacy, the implications of data-centric industries, and the debate about the phrase “data science” itself.

The application of data: products and processes – A “data product” can emerge from virtually any domain, including everything from data startups to established enterprises to media/journalism to education and research.

Data science and data tools -- The tools and technologies that drive data science are of course essential to this space, but the varied techniques being applied are also key to understanding the big data arena.

The business of data – Take a closer look at the actions connected to data -- the finding, organizing, and analyzing that provide organizations of all sizes with the information they need to compete.

Table of Contents

  1. Big Data Now
  2. Foreword
  3. 1. Data Science and Data Tools
    1. What is data science?
      1. What is data science?
      2. Where data comes from
      3. Working with data at scale
      4. Making data tell its story
      5. Data scientists
    2. The SMAQ stack for big data
      1. MapReduce
        1. Hadoop MapReduce
        2. Other implementations
      2. Storage
        1. Hadoop Distributed File System
        2. HBase, the Hadoop Database
        3. Hive
        4. Cassandra and Hypertable
        5. NoSQL database implementations of MapReduce
        6. Integration with SQL databases
        7. Integration with streaming data sources
        8. Commercial SMAQ solutions
      3. Query
        1. Pig
        2. Hive
        3. Cascading, the API Approach
        4. Search with Solr
      4. Conclusion
    3. Scraping, cleaning, and selling big data
    4. Data hand tools
    5. Hadoop: What it is, how it works, and what it can do
    6. Four free data tools for journalists (and snoops)
      1. WHOIS
      2. Blekko
      3. bit.ly
      4. Compete
    7. The quiet rise of machine learning
    8. Where the semantic web stumbled, linked data will succeed
    9. Social data is an oracle waiting for a question
    10. The challenges of streaming real-time data
  4. 2. Data Issues
    1. Why the term “data science” is flawed but useful
      1. It’s not a real science
      2. It’s an unnecessary label
      3. The name doesn’t even make sense
      4. There’s no definition
      5. Time for the community to rally
    2. Why you can’t really anonymize your data
      1. Keep the anonymization
      2. Acknowledge there’s a risk of de-anonymization
      3. Limit the detail
      4. Learn from the experts
    3. Big data and the semantic web
      1. Google and the semantic web
      2. Metadata is hard: big data can help
    4. Big data: Global good or zero-sum arms race?
    5. The truth about data: Once it’s out there, it’s hard to control
  5. 3. The Application of Data: Products and Processes
    1. How the Library of Congress is building the Twitter archive
    2. Data journalism, data tools, and the newsroom stack
      1. Data journalism and data tools
      2. The newsroom stack
      3. Bridging the data divide
    3. The data analysis path is built on curiosity, followed by action
    4. How data and analytics can improve education
    5. Data science is a pipeline between academic disciplines
    6. Big data and open source unlock genetic secrets
    7. Visualization deconstructed: Mapping Facebook’s friendships
      1. Mapping Facebook’s friendships
      2. Static requires storytelling
    8. Data science democratized
  6. 4. The Business of Data
    1. There’s no such thing as big data
      1. Big data and the innovator’s dilemma
    2. Building data startups: Fast, big, and focused
      1. Setting the stage: The attack of the exponentials
      2. Leveraging the big data stack
      3. Fast data
      4. Big analytics
      5. Focused services
      6. Democratizing big data
    3. Data markets aren’t coming: They’re already here
    4. An iTunes model for data
    5. Data is a currency
    6. Big data: An opportunity in search of a metaphor
    7. Data and the human-machine connection
  7. Copyright