O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Strata + Hadoop World New York 2015: Video Compilation

Video Description

The future belongs to those who know how to use data

Whether you want to build a company that will prosper well into the future, or simply do your job better, you’ll want to dive into this complete video compilation of Strata + Hadoop World 2015 in New York, presented by O’Reilly and Cloudera. You’ll have every keynote, tutorial, and workshop from big data’s most influential business decision makers, strategists, architects, developers, and analysts right at your fingertips.

With these videos, you'll learn proven best practices, new analytic approaches, and exceptional technical skills you need to be successful with data.

You’ll explore a treasure trove of presentations on topics including:

  • Data-driven business
  • Data science & advanced analytics
  • Data innovations
  • Hadoop use cases
  • Hadoop internals & development
  • Law, ethics & open data
  • IoT & real-time
  • Organizational changes
  • Security & governance
  • Spark & beyond
  • Design, user experience & visualization
  • Production ready Hadoop

Table of Contents

  1. Keynotes
    1. The next generation - Mike Olson (Cloudera) 00:15:35
    2. Playing with, and for, data - AnnMarie Thomas (School of Engineering and Schulze School of Entrepreneurship, University of St. Thomas) 00:08:24
    3. What 0-50 million users in 7 days can teach us about big data - Joseph Sirosh (Microsoft) 00:10:48
    4. Improving Medical Decision Making with Predictive Analytics on Big Data - Ron Kasabian (Intel) and Michael Draugelis (Penn Medicine) 00:04:53
    5. The race to modernize BI: What it is and why so urgent? - Tim Howes (ClearStory Data) 00:05:02
    6. Unleashing the power of big data today - Jim McHugh (Cisco) 00:05:34
    7. A Transition to Interactive Music Consumption + Data - Joy Johnson (AudioCommon) 00:08:46
    8. Data vs creativity: The last battleground? - David Boyle (BBC Worldwide) 00:08:36
    9. On reflection: What the White House needs from you - DJ Patil (White House Office of Science and Technology Policy) 00:13:39
    10. Improving decisions - Katherine Milkman 00:15:04
    11. O'Reilly Announcements - Ben Lorica (O'Reilly Media) 00:01:28
    12. Context Computing - Jeff Jonas (IBM) 00:15:00
    13. Data science for mission - Doug Wolfe (CIA) 00:10:31
    14. The rise of the citizen data scientist - Ben Werther (Platfora) 00:05:13
    15. Patterns from the future - Paul Kent (SAS) 00:05:47
    16. Doing it Wrong: 10 Problems with Qualitative Data - Farrah Bostic (The Difference Engine) 00:14:53
    17. IBM sponsored keynote - Shivakumar Vaithyanathan (IBM) 00:04:57
    18. What does it take to apply data science for social good? - Jake Porway (DataKind) 00:11:59
    19. Haunted by data - Maciej Ceglowski (Pinboard.in) 00:20:05
  2. Business & Innovation conference sessions
    1. Data 101 - Paco Nathan (O'Reilly Media) 00:15:38
    2. Distributed systems in one lesson - Tim Berglund (DataStax) 00:32:41
    3. The business case for Spark, Kafka, and friends - Edd Dumbill (Silicon Valley Data Science) 00:36:39
    4. How to build your data team: Lessons from unicorn hunting in the wild - Katie Kent (Galvanize) 00:31:50
    5. How to use your data science team: Becoming a data-driven organization - Yael Garten (LinkedIn) 00:28:24
    6. Great debate: Big data will live in the cloud - Alistair Croll (Solve For Interesting), Joseph Adler (Confluent), Margaret Dawson (Red Hat), Joseph Sirosh (Microsoft), Evan Prodromou (Fuzzy.io) 00:34:57
  3. Data Innovations conference sessions
    1. Many streams lead to Kafka - An event data workshop, Part 1 - Jesse Anderson (Confluent ), Ewen Cheslack-Postava (Confluent) 00:45:43
    2. Many streams lead to Kafka - An event data workshop, Part 2 - Jesse Anderson (Confluent ), Ewen Cheslack-Postava (Confluent) 00:42:15
    3. Many streams lead to Kafka - An event data workshop, Part 3 - Jesse Anderson (Confluent ), Ewen Cheslack-Postava (Confluent) 00:59:04
    4. Big data at a crossroads: Time to go meta (on use) - Joe Hellerstein (UC Berkeley) 00:40:45
    5. How companies are using Tachyon, a memory-centric distributed storage - Haoyuan Li (Tachyon Nexus, Inc.) 00:32:17
    6. Data liberation and data integration with Kafka - Martin Kleppmann (Independent) 00:41:33
    7. Real-time analytics with Solr - Yonik Seeley (Cloudera) 00:38:02
    8. Big data at Netflix: Faster and easier - Kurt Brown (Netflix) 00:42:42
    9. Copycat: Fault tolerant streaming data ingestion powered by Apache Kafka - Neha Narkhede (Confluent) 00:39:24
    10. Calculating high-resolution, global-scale geospatial analytics with MapReduce Geospatial - Ryan Smith (DigitalGlobe) 00:25:08
    11. Considerations for building a cognitive application - Venky Ganti (Alation) 00:31:46
  4. Data Science & Advanced Analytics conference sessions
    1. Data science for Wall Street, Part 1 - Sean Owen (Cloudera), Juliet Hougland (Cloudera), Sandy Ryza (Cloudera) 00:50:05
    2. Data science for Wall Street, Part 2 - Sean Owen (Cloudera), Juliet Hougland (Cloudera), Sandy Ryza (Cloudera) 00:41:56
    3. Data science for Wall Street, Part 3 - Sean Owen (Cloudera), Juliet Hougland (Cloudera), Sandy Ryza (Cloudera) 00:50:12
    4. Machine Learning 101, Part 1 - Alice Zheng (Dato), Chris DuBois (Dato), Piotr Teterwak (Dato), Srikrishna Sridhar (Dato) 00:51:22
    5. Machine Learning 101, Part 2 - Alice Zheng (Dato), Chris DuBois (Dato), Piotr Teterwak (Dato), Srikrishna Sridhar (Dato) 00:34:45
    6. Machine Learning 101, Part 3 - Alice Zheng (Dato), Chris DuBois (Dato), Piotr Teterwak (Dato), Srikrishna Sridhar (Dato) 00:51:00
    7. Machine Learning 101, Part 4 - Alice Zheng (Dato), Chris DuBois (Dato), Piotr Teterwak (Dato), Srikrishna Sridhar (Dato) 00:32:03
    8. Machine Learning 101, Part 5 - Alice Zheng (Dato), Chris DuBois (Dato), Piotr Teterwak (Dato), Srikrishna Sridhar (Dato) 00:43:56
    9. Machine Learning 101, Part 6 - Alice Zheng (Dato), Chris DuBois (Dato), Piotr Teterwak (Dato), Srikrishna Sridhar (Dato) 00:39:28
    10. Machine Learning 101, Part 7 - Alice Zheng (Dato), Chris DuBois (Dato), Piotr Teterwak (Dato), Srikrishna Sridhar (Dato) 00:35:03
    11. Scaling Python Analytics on Impala - Wes McKinney (Cloudera) 00:45:31
    12. Mapping Big Data: A Data Driven Market Report - Russell Jurney (Relato) 00:22:58
    13. Queering Quant: How Having All the Data Isn’t Enough to Represent a Complex Social Phenomena - Lauralea Banks Edwards (Washington State University) 00:21:08
    14. Data Science in the Wall Street Journal - Juan Huerta (Dow Jones) 00:54:06
    15. Data modeling for data science: Simplify your workload with complex types - Marcel Kornacker (Cloudera, Inc.), Josh Wills (Cloudera), Alexander Behm (Cloudera) 00:47:28
    16. Running experiments with logged-out users: Solving the mixed group problem - Raphael Lee (Airbnb), Victor Vazquez (Airbnb) 00:36:40
    17. How data science helps prevent churn at Avira, a 100-million user company - Iulia Pasov (Avira), Calin-Andrei Burloiu (Avira) 00:19:35
    18. Probabilistic programming in data science - Thomas Wiecki (Quantopian) 00:22:17
    19. Tackling machine learning complexity for data curation - Ihab Ilyas (Tamr, Inc.) 00:15:36
    20. Learning to love Bayesian statistics - Allen Downey (Olin College of Engineering) 00:18:35
  5. Design, User Experience, & Visualization conference sessions
    1. Introduction to visualizations using D3, Part 1 - Brian Suda ((optional.is)) 00:46:07
    2. Introduction to visualizations using D3, Part 2 - Brian Suda ((optional.is)) 00:53:28
    3. Introduction to visualizations using D3, Part 3 - Brian Suda ((optional.is)) 00:38:26
    4. Introduction to visualizations using D3, Part 4 - Brian Suda ((optional.is)) 00:30:55
    5. Value in the details - understanding data through visual exploration - Richard Brath (Uncharted Software), Rob Harper (Uncharted) 00:37:12
    6. Data inclusion for all - Alex Kelly (General Motors), Kim Le (General Motors) 00:28:51
    7. Visualising Music Services - Alan Hannaway (7digital) 00:38:09
    8. From profiling to analysis: Designing visualization tools for purpose - Jeffrey Heer (Trifacta | University of Washington), Jock Mackinlay (Tableau) 00:38:18
    9. What have you done!? How to visualize methods and models for decision makers - Michael Freeman (University of Washington) 00:37:21
    10. LIVE from New York: An introduction to Linked Immersive Visualization Environments - Margit Zwemer (LiquidLandscape) 00:33:46
    11. Data, Design, and Organizations: Design thinking and prototyping approaches to data challenges in orgs - Peter Olson (IDEO), David Boardman (IDEO) 00:38:58
    12. Designing happiness with data - Pamela Pavliscak (Change Sciences) 00:33:12
  6. Hadoop Internals & Development conference sessions
    1. Hadoop application architectures: Fraud detection, Part 1 - Gwen Shapira (Confluent), Ted Malaska (Cloudera), Mark Grover (Cloudera), Jonathan Seidman (Cloudera) 00:48:04
    2. Hadoop application architectures: Fraud detection, Part 2 - Gwen Shapira (Confluent), Ted Malaska (Cloudera), Mark Grover (Cloudera), Jonathan Seidman (Cloudera) 00:41:33
    3. Hadoop application architectures: Fraud detection, Part 3 - Gwen Shapira (Confluent), Ted Malaska (Cloudera), Mark Grover (Cloudera), Jonathan Seidman (Cloudera) 00:38:13
    4. Hadoop application architectures: Fraud detection, Part 4 - Gwen Shapira (Confluent), Ted Malaska (Cloudera), Mark Grover (Cloudera), Jonathan Seidman (Cloudera) 00:49:54
    5. Simplifying Hadoop: RecordService, a secure and unified data access path for compute frameworks - Lenni Kuff (Cloudera), Nong Li (Cloudera), Stephen Romanoff (Capital One ) 00:39:39
    6. Hadoop's storage gap: Resolving transactional access/analytic performance trade-offs with Kudu - Todd Lipcon (Cloudera) and Binglin Chang (Xiaomi) 00:39:32
    7. Native erasure coding support inside HDFS - Zhe Zhang (Cloudera), Weihua Jiang (Intel) 00:40:27
    8. Transaction processing with Apache Hive, HBase, and Phoenix - Alan Gates (Hortonworks) 00:40:59
    9. OLTP on Hadoop: Reviewing the first Hadoop-based TPC-C benchmarks - Monte Zweben (Splice Machine Inc.), John Leach (Splice Machine) 1:02:30
    10. What does it mean to virtualize the Hadoop distributed file system? - Thomas Phelan (BlueData) 00:41:57
    11. HDFS operations made easy: Guide to the improved, full service HDFS File Browser - Ravi Prakash (Altiscale) 00:22:39
  7. IoT & Real-time conference sessions
    1. Process, store, and analyze like a boss with Team Apache: Kafka, Spark, and Cassandra, Part 1 - Patrick McFadin (DataStax) 00:46:38
    2. Process, store, and analyze like a boss with Team Apache: Kafka, Spark, and Cassandra, Part 2 - Patrick McFadin (DataStax) 00:35:38
    3. Process, store, and analyze like a boss with Team Apache: Kafka, Spark, and Cassandra, Part 3 - Patrick McFadin (DataStax) 00:32:48
    4. Process, store, and analyze like a boss with Team Apache: Kafka, Spark, and Cassandra, Part 4 - Patrick McFadin (DataStax) 00:27:44
    5. When it absolutely, positively, has to be there: Reliability guarantees in Kafka - Gwen Shapira (Confluent), Jeff Holoman (Cloudera) 00:42:25
    6. What does your smart device know about you? - Charles Givre (Booz | Allen | Hamilton) 00:41:55
    7. Twitter Heron: Stream processing at scale - Karthik Ramasamy (Twitter) 00:43:08
    8. Streaming in the extreme - Jim Scott (MapR Technologies, Inc.) 00:39:01
    9. IoT with Spark Streaming: Practical lessons from real-world use cases - Hari Shreedharan (Cloudera), Anand Iyer (Cloudera) 00:40:22
    10. An open source approach to gathering and analyzing device-sourced health data - Ian Eslick (VitalLabs) 00:38:16
    11. Elastic stream processing without tears - Michael Hausenblas (Mesosphere) 00:29:33
    12. Modeling predictive maintenance applications in the IoT Era - Yan Zhang (Microsoft) 00:36:38
    13. Building a real-time analytics stack with Kafka, Samza, and Druid - Fangjin Yang (Imply), Gian Merlino (Stealth) 00:42:01
    14. Oulu Smart City pilot - Susanna Pirttikangas (University of Oulu) 00:36:16
  8. Production Ready Hadoop conference sessions
    1. Apache Hadoop operations for production systems, Part 1 - Kathleen Ting (Cloudera), Miklos Christine (Databricks), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.) 00:50:42
    2. Apache Hadoop operations for production systems, Part 2 - Kathleen Ting (Cloudera), Miklos Christine (Databricks), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.) 00:34:31
    3. Apache Hadoop operations for production systems, Part 3 - Kathleen Ting (Cloudera), Miklos Christine (Databricks), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.) 00:46:02
    4. Apache Hadoop operations for production systems, Part 4 - Kathleen Ting (Cloudera), Miklos Christine (Databricks), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.) 00:42:46
    5. Apache Hadoop operations for production systems, Part 5 - Kathleen Ting (Cloudera), Miklos Christine (Databricks), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.) 00:55:11
    6. Apache Hadoop operations for production systems, Part 6 - Kathleen Ting (Cloudera), Miklos Christine (Databricks), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.) 00:26:47
    7. Apache Hadoop operations for production systems, Part 7 - Kathleen Ting (Cloudera), Miklos Christine (Databricks), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.) 00:49:31
    8. Apache Hadoop operations for production systems, Part 8 - Kathleen Ting (Cloudera), Miklos Christine (Databricks), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.) 00:33:12
    9. Building a Hadoop data application, Part 1 - Tom White (Cloudera), Ryan Blue (Cloudera) 00:54:26
    10. Building a Hadoop data application, Part 2 - Tom White (Cloudera), Ryan Blue (Cloudera) 00:34:31
    11. Building a Hadoop data application, Part 3 - Tom White (Cloudera), Ryan Blue (Cloudera) 00:46:41
    12. Hadoop in the cloud: An architectural how-to - Jairam Ranganathan (Cloudera) 00:38:10
    13. Multi-tenant, multi-cluster, and multi-container Apache HBase deployment - Jonathan Hsieh (Cloudera, Inc), Dima Spivak (Cloudera) 00:39:28
    14. The glue: Building the connectors and tools to manage big data warehouses - Siwei Zhu (Scribd), Kevin Perko (Scribd) 00:36:58
    15. Failing fast and falling often is no way to run a cluster! - Michael Segel (Segel & Associates) 00:36:50
    16. Real-world NoSQL schema design - Ted Dunning (MapR Technologies) 00:42:44
    17. Hadoop's storage gap: Resolving transactional access/analytic performance trade-offs with Kudu - Todd Lipcon (Cloudera, Inc.) 00:39:32
  9. Spark & Beyond conference sessions
    1. Apache Drill bootcamp, Part 1 - Tomer Shiran (Dremio), Jacques Nadeau (Dremio) 00:42:37
    2. Apache Drill bootcamp, Part 2 - Tomer Shiran (Dremio), Jacques Nadeau (Dremio) 00:30:58
    3. Apache Drill bootcamp, Part 3 - Tomer Shiran (Dremio), Jacques Nadeau (Dremio) 00:45:40
    4. Apache Drill bootcamp, Part 4 - Tomer Shiran (Dremio), Jacques Nadeau (Dremio) 00:41:23
    5. Architecting a data platform, Part 1 - Stephen OSullivan (Silicon Valley Data Science), John Akred (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science) 00:46:28
    6. Architecting a data platform, Part 2 - Stephen OSullivan (Silicon Valley Data Science), John Akred (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science) 00:46:43
    7. Architecting a data platform, Part 3 - Stephen OSullivan (Silicon Valley Data Science), John Akred (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science) 00:44:53
    8. Architecting a data platform, Part 4 - Stephen OSullivan (Silicon Valley Data Science), John Akred (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science) 00:36:31
    9. What's coming for the Spark community - Patrick Wendell (Databricks) 00:49:26
    10. Supercharging R with Spark for end-to-end data science - Hossein Falaki (Databricks Inc.) 00:40:08
    11. Next-generation genomics analysis with Apache Spark - Timothy Danford (Tamr, Inc.) 00:38:50
    12. Lifelogging for insights - Håkan Jonsson (Sony Mobile Communications) 00:39:26
    13. Effective testing of Spark programs and jobs - Holden Karau (IBM) 00:33:13
    14. Estimating financial risk with Apache Spark - Sandy Ryza (Cloudera) 00:35:48
    15. Netflix: Integrating Spark at petabyte scale - Daniel Weeks (Netflix) 00:38:07
    16. First-ever scalable, distributed deep learning architecture using Spark and Tachyon - Christopher Nguyen (Adatao, Inc.), Vu Pham (Adatao, Inc), Michael Bui (Adatao, Inc.) 00:37:42
    17. Spark on Mesos - Dean Wampler (Typesafe) 00:37:29
    18. How Spark is working out at Comcast scale - Sridhar Alla (Comcast), Jan Neumann (Comcast) 00:44:58
  10. Financial Services conference sessions
    1. Big data governance - Steven Totman (Cloudera), Mark Donsky (Cloudera), Kristi Cunningham (Capital One), Ben Harden (CapTech Consulting) 00:42:12
    2. Continuous curation of event data for a customer event hub - Arvind Prabhakar (StreamSets) 00:40:27
    3. Ethical big data - what's legal and what's right - Steven Totman (Cloudera), Sam Heywood (Cloudera), Nick Curcuru (MasterCard Advisors) 00:40:46
    4. Hadoop and self-service analytics: Allstate Insurance's journey - Kristi Marotta (Allstate) 00:12:41
    5. Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud, a real-world case study - Jaipaul Agonus (FINRA) 00:42:52
    6. How women are conquering the S&P 500 - Karen Rubin (Quantopian) 00:32:37
    7. Preventing a big data security breach - Sam Heywood (Cloudera), Nick Curcuru (MasterCard Advisors), Ritu Kama (Intel) 00:39:23
  11. Data-driven Business conference sessions
    1. Welcome to data-driven business day - Alistair Croll (Solve For Interesting) 00:14:42
    2. Hacking the bias - Farrah Bostic (The Difference Engine) 00:28:46
    3. Beer, diapers, and correlation: A tale of ambiguity - Mark Madsen (Third Nature) 00:31:19
    4. How big data is creating a new breed of CFOs - Krish Venkataraman (Syncsort) 00:19:44
    5. Bringing the human dimension to data: A case study on transforming research at O’Reilly Media - Tricia Wang (Constellate Data ), Matt LeMay (Constellate Data) 00:18:23
    6. Human-in-the-loop-computing-as-a-service - Adam Devine (WorkFusion) 00:20:21
    7. Farming in the 21st century and beyond - Gary Short (Duncodin Limited) 00:20:31
    8. One trillion streams and counting - Alexander White (Next Big Sound) 00:17:24
    9. Building an insight machine - Matthew Granade (Domino Data Lab) 00:15:08
    10. When AI joins the team: Onboarding the next generation of employees - Jana Eggers (Nara Logics) 00:17:34
    11. Developing a modern enterprise data strategy, Part 1 - Scott Kurth (Silicon Valley Data Science), Edd Dumbill (Silicon Valley Data Science) 00:47:28
    12. Developing a modern enterprise data strategy, Part 2 - Scott Kurth (Silicon Valley Data Science), Edd Dumbill (Silicon Valley Data Science) 00:59:07
    13. Developing a modern enterprise data strategy, Part 3 - Scott Kurth (Silicon Valley Data Science), Edd Dumbill (Silicon Valley Data Science) 00:56:34
    14. Big Data is Not Enough - Rahel Jhirad (Hearst) 00:17:54
    15. Death of the click: How big data is killing your favorite metrics - Claudia Perlich (Dstillery) 00:35:27
    16. How a global entertainment company successfully built a data lake for continued digital dominance - Joe Caserta (Caserta Concepts), Elliott Cordo (Caserta Concepts, LLC) 00:31:54
    17. How Hadoop is powering Walmart’s data-driven business - Jeremy King (Walmart Global eCommerce) 00:39:03
    18. What can Big Pharma teach us about Wall Street? What can Wall Street teach us about Big Pharma? - Joe Klobusicky (Geisinger Health System), Ali Habib (Northwestern Feinberg School of Medicine), Ekaterina Volkova (Cornell University) 00:35:27
    19. Your data is screaming at you. Learn to listen through customer choice modeling - Vivek Farias (Celect) 00:41:04
    20. Science fiction to product: Data-driven development - Micha Gorelick (Fast Forward Labs) 00:34:27
  12. PyData at Strata
    1. How to Build a Company on Open Source - Travis Oliphant (Continuum Analytics, Inc.), Peter Wang (Continuum Analytics, Inc.) 00:45:42
    2. How to Build Publishing & On-Demand Learning Environments with IPython - Kyle Kelley (Rackspace), Andrew Odewahn (O'Reilly Media) 00:35:58
    3. How to Use Pandas for Data Analysis - Jeff Reback (Continuum Analytics, Inc.) - Part 1 00:34:04
    4. How to Use Pandas for Data Analysis - Jeff Reback (Continuum Analytics, Inc.) - Part 2 00:36:45
    5. How to Create Beautiful Visualizations with Bokeh, Part 1 - Sarah Bird (Aptivate), Bryan Van de Ven (Continuum Analytics) 00:50:18
    6. How to Create Beautiful Visualizations with Bokeh, Part 2 - Bryan Van de Ven (Continuum Analytics), Sarah Bird (Aptivate) 00:24:02
    7. How to Build Big Data Workflows - Andy Terrel (Fashion Metric), Ben Zaitlen (Continuum Analytics, Inc.) 00:37:09
    8. How to Solve Problems in Geophysics with Python - Paige Bailey (Chevron) 00:44:22
    9. How to Leverage the Blaze Ecosystem - Jim Crist (Continuum Analytics) & Phil Cloud (Continuum Analytics) 00:43:37
    10. How to Think About Python - James Powell (NumFOCUS) 00:35:06
    11. Introduction to Publication Quality Plotting with Matplotlib - Damon McDougall (UT Austin), Michael Droettboom (Space Telescope Science Institute) 00:41:40
    12. Interactive Computing in the Jupyter Notebook – Present and Future - Jason Grout (Bloomberg L.P.), Chris Colbert (Continuum Analytics) 00:41:55
  13. R Day at Strata
    1. R Quickstart: Wrangle, transform, and visualize data, Part 1 - Garrett Grolemund (RStudio) 00:43:13
    2. R Quickstart: Wrangle, transform, and visualize data, Part 2 - Garrett Grolemund (RStudio) 00:32:17
    3. Work with Big Data in R - Nathan Stephens (RStudio, Inc.) 00:57:22
    4. Reproducible Reports with Big Data, Part 1 - Yihui Xie (RStudio, Inc.) 00:35:13
    5. Reproducible Reports with Big Data, Part 2 - Yihui Xie (RStudio, Inc.) 00:27:45
    6. Interactive Shiny Applications built on Big Data, Part 1 - Garrett Grolemund (RStudio) 00:42:18
    7. Interactive Shiny Applications built on Big Data, Part 2 - Garrett Grolemund (RStudio) 00:28:14
  14. Hardcore Data Science conference sessions
    1. GPU/CPU acceleration for matrix computations and neural networks on Spark - Reza Zadeh (Stanford University) 00:26:31
    2. Tensor methods: A new paradigm for training probabilistic models and for feature learning - Anima Anandkumar (UC Irvine) 00:30:09
    3. Once upon a graph: Getting from now to then in massive networks - Jennifer Chayes (Microsoft Research) 00:29:11
    4. How ubiquitous computing is transforming the treatment of mental health disorders - Tanzeem Choudhury (Cornell and HealthRhythms) 00:31:39
    5. Crowdsourcing your data - Jenn Wortman Vaughan (Microsoft Research) 00:28:06
    6. Minds and machines: Humans where they're best, robots for the rest - Adam Marcus (Unlimited Labs) 00:23:08
    7. Submodularity in Machine Learning - Stefanie Jegelka (M.I.T.) 00:31:24
    8. Learning with Counts: Extreme-scale featurization made easy - Mikhail Bilenko (Microsoft) 00:27:30
    9. Sketching big data with Spark: Randomized algorithms for large-scale data analytics - Reynold Xin (Databricks) 00:27:46
  15. I+G conference sessions
    1. The importance of UI design for data - Ari Gesher (Palantir Technologies) 00:27:15
    2. Hacking to make a safer enterprise - Cack Wilhelm (Scale Venture Partners), Alex Rice (HackerOne) 00:25:20
    3. Team collaboration and document storage (in the cloud) using searchable encryption - Matthew Tamayo-Rios (Kryptnostic) 00:16:21
    4. Context will define 21st century logistics - Chris Wake (Spire Global, Inc.) 00:15:15
    5. Talking to your digital customers - Ann Johnson (Interana) 00:26:39
    6. First name Amy, last name @x.ai - Dennis Mortensen (x.ai) 00:20:50
    7. Unconventional AI, unconventional results - Peter Brodsky (HyperScience) 00:11:08
    8. Smart machines - and what they can still learn from people - Gary Marcus (Geometric Intelligence) 00:26:48
    9. Why machine learning matters, mining data for performance: Lessons from Formula 1 - Shivon Zilis (Bloomberg Beta), Jacomo Corbo (QuantumBlack) 00:29:52
    10. How global crowd sourcing is flattening finance - Eva Ho (Susa Ventures), Jessica Stauth (Quantopian) 00:28:06
    11. Data points and emerging commerce - Harper Reed (Modest) 00:23:16
  16. Hadoop Use Cases conference sessions
    1. The Jedi Masters Guide to Wrangling JSON - Greg Rahn (Snowflake Computing) 00:37:59
    2. Transitioning from reactive to proactive: Etsy's data platform team - Melissa Santos (Etsy) 00:36:31
    3. The data-driven future of biotechnology - Aaron Kimball (Zymergen, Inc.) 00:35:22
    4. Migrating workloads from data warehouses to Hadoop - Alan Choi (Cloudera) 00:41:38
    5. Leverage data analytics to reduce human space mission risks - Haden Land (Lockheed Martin IS&GS), Jason Loveland (Lockheed Martin) 00:34:54
    6. Data and music: How India’s music streaming service uses big data to address a 1 billion-user market - Sriranjan Manjunath (Saavn Inc), Rahul Saxena (Saavn) 00:43:15
    7. Big data, small internet: How to circumnavigate your information - Raymond Collins (TE Connectivity), Scott Sokoloff (Orderup) 00:16:15
    8. Use case examples of building applications on Hadoop with CDAP - Jonathan Gray (Cask) 00:43:51
    9. Re-engineering legacy analytics solutions with big data - Rosaria Silipo (KNIME) 00:33:11
  17. Sponsored conference sessions
    1. Putting Modern BI to Work: Innovative Use Cases - Ali Tore (ClearStory Data) 00:42:04
    2. Big data analytics in the cloud - Matt Winkler (Microsoft) 00:43:08
    3. Where Do You Go From Here? Lessons and Landmarks from Real-World Cisco USC - Robert Novak (Cisco) 00:41:28
    4. Expand your mind to fit the big data Data Center: the scale and cost of information management architectures - Robert Eve (Cisco), Robert Novak (Cisco), Nenshad Bardoliwalla (Paxata, Inc.) 00:39:11
    5. Delivering trusted data for analyst autonomy and operational agility with a unified big data fabric - Vishal Bamba (Transamerica), Murthy Mathiprakasam (Informatica) 00:41:23
    6. End User Panel on Real-Time Data Analytics - Eric Frenkiel (MemSQL), Ian Hanson (Digital Ocean), Noah Zucker (Novus Partners), Michael DePrizio (Akamai Technologies) 00:36:21
    7. How Riot Games uses Platfora to improve League of Legends' performance - Peter Schlampp (Platfora), Chris Kudelka (Riot Games) 00:46:35
    8. Hydrate a data lake in days with CDAP - Jonathan Gray (Cask) 00:55:01
    9. Machine learning in big data – look forward or be left behind - Bill Porto (RedPoint Global) 00:43:40
    10. Design patterns for real-time data analytics - Sheetal Dolas (Hortonworks) 00:40:56
    11. How Pepsi Wrangles the Diverse Data of Consumer Packaged Goods - Matthew Derda (Pepsi), Doug Stradley (Trifacta) 00:37:58
    12. Catalog, secure, and govern your Hadoop data lake - Alex Gorelik (Waterline Data), Jim Kaskade (CSC), David Tabacco (Merck & Co., Inc.), David Paige (Cox Automotive) 00:45:07
    13. Enter the snake pit for fast and easy Spark and Cassandra - Jon Haddad (DataStax) 00:24:22
    14. Combining open source software and cloud-native data processing services on Google Cloud Platform - Eric Brewer (Google) 00:33:21
    15. Think like a data scientist: Build your big data blueprint - Bill Schmarzo (EMC Consulting) 00:45:58
    16. Fast fish eat slow fish: How to move faster - Samuel Cozannet (Canonical) 00:14:28
    17. Requirements for Secure, Multi-Tenant Hadoop: It’s Much More than YARN - Anant Chintamaneni (BlueData) 00:43:15
    18. Patterns from the future - Paul Kent (SAS) 00:46:33
    19. Pentaho featuring Forrester: Delivering governed data for analytics at scale - Michele Goetz (Forrester Research), Chuck Yarbrough (Pentaho) 00:31:37
    20. Do you know where your data is? - Nidhi Aggarwal (Tamr, Inc.) 00:36:21
    21. Case study: How YP.com addresses real-world analytical challenges for SQL on Hadoop - William Theisinger (YP), Ignacio Hwang (HP) 00:27:56
    22. How Autodesk is using Tableau to visualize its Kafka-Splunk-Hadoop pipeline - Charlie Crocker (Autodesk) 00:42:33
    23. Eventual consistent systems a.k.a mostly inconsistent systems vs. strongly consistent systems in big data - Jagane Sundar (WANdisco) 00:42:26
    24. Faster time to insight using Spark, Tachyon, and Zeppelin - Nirmal Ranganathan (Rackspace Hosting) 00:36:04
    25. Big data modeling and analytic patterns – beyond schema on read - Ron Bodkin (Think Big, a Teradata Company) 00:38:43
    26. Commercializing IOT: What do you need to know? - Ashish Verma (Deloitte) 00:46:03
    27. Enable secure data sharing and analytics in Hadoop with 5 key steps - Reiner Kappenberger (HP Security Voltage) 00:45:23
    28. Apache Spark as a code-free data science workbench - Michal Iwanowski (DeepSense.io), Piotr Niedzwiedz (DeepSense.io) 00:35:06
    29. SAP HANA Vora to query Big Data with greater ease - Balalji Krishna (SAP) 00:54:19
  18. Law, Ethics, & Open Data conference sessions
    1. Personal information out of context: Building a consumer subject review board - Evan Selinger (Rochester Institute of Technology), Jules Polonetsky (Future of Privacy Forum) 00:38:04
    2. Protecting the humanity in data II: Personalized crisis counseling/messiness of interpretation - Jake Porway (DataKind), Bob Filbin (Crisis Text Line), Danah Boyd (Microsoft Research) 00:40:39
    3. How we amplify privilege with supervised machine learning - Michael Williams (Fast Forward Labs) 00:40:43
    4. Fixing Chicago’s crime data - Jay Margalus (MapR), Mike Emerick (MapR) 00:41:44
  19. Ask Me Anything conference sessions
    1. Ask me anything: Hadoop application architectures - Gwen Shapira (Confluent), Jonathan Seidman (Cloudera), Ted Malaska (Cloudera), Mark Grover (Cloudera) 00:38:35
    2. Ask me anything: Apache Spark - Patrick Wendell (Databricks), Reynold Xin (Databricks) 00:36:27
    3. Ask me anything: Hadoop operations for production systems - Miklos Christine (Databricks), Kathleen Ting (Cloudera), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.) 00:39:14
    4. Ask me anything: Developing a modern enterprise data strategy - John Akred (Silicon Valley Data Science), Julie Steele (Silicon Valley Data Science), Scott Kurth (Silicon Valley Data Science) 00:39:19
  20. Security & Governance conference sessions
    1. Leveraging asset reputation systems to detect and prevent fraud and abuse at LinkedIn - Jenelle Bray (LinkedIn) 00:33:14
    2. Data democratization versus data governance - Peter Guerra (Booz Allen Hamilton) 00:38:57
    3. Transparent encryption in HDFS - Charles Lamb (Cloudera), Andrew Wang (Cloudera) 00:38:01
  21. Solutions Showcase Theater
    1. Perspectives of a customer journey into Big Data - Chris Harrold (EMC) 00:08:50
    2. It's All About That Data: Turning Security Information Into Real-Time Intelligence - Tom Bain (Countertack) 00:11:34
    3. Managing Distributed Hardware with Redfish DMTF - Yagnesh Ashara (Super Micro Computer Inc.) 00:11:51
    4. Graph Based Smart Data Lakes: A Unique View on Break Through Analytics - Ben Szekely (Cambridge Semantics) 00:10:21
    5. Dr. Syslog Or: How I Learned to Stop Worrying and Love Big Data - Joey Echeverria (Rocana) 00:10:12
    6. The Importance of A Managed Data Lake - Craig Lukasik (Zaloni) 00:10:56
    7. Code Free Blending of Massive Datasets Using Cloudera and Alteryx - JC Raveneau (Alteryx) 00:09:19
    8. Using Spark: 5 Quick and Easy Ways to Get More out of Hadoop - Ashley Stirrup (Talend) 00:07:35
    9. Forget the Needle in the Haystack, Focus on the Hay - Clint Green (L-3 Data Tactics) 00:10:17
    10. Jet Ski on your Data Lake: Natural Language Query on Hadoop - Mike Finley (AnswerRocket) 00:10:33
    11. Best practices to build a governed data lake - Oliver Claude (Waterline Data Science) 00:09:10
    12. Mongo+Spark - Bryan Reinero (MongoDB) 00:11:40
    13. The Rise of Insight-Driven Business: 7 Guiding Principles - Nilotpal Roy (CapGemini) 00:07:03
    14. How 3rd platform applications are changing the enterprise Data Center - Gus Horn (NetApp) 00:11:47
    15. Taking the Complexity Out of Big Data Visualization - Priyank Patel (Arcadia Data) 00:13:01
    16. Big Data Governance - Ben Harden (CapTech Consulting) 00:16:34
    17. Bringing Together the Full Customer Journey at eBay Enterprise Marketing Solutions - Paul Mazak (eBay Enterprise Attribution) 00:13:29
    18. Deep Learning for Large Scale Biodiversity Monitoring - David Klein (Conservation Metrics) 00:11:16
    19. Web-Based Visualization and Prediction of Urban Energy Use from Building Benchmarking Data - Constantine Kontokosta (NYU), Christopher Tull (NYU) 00:10:42
    20. Identifying Earmarks in Congressional Bills - Elena Enova (The University of Chicago Center for Data Science and Public Policy) 00:09:18
    21. Open Data Liaisons in Government: A Means to Participatory Democracy - Ben Wellington (Pratt Institute) 00:10:31
    22. Social Capital Deserts: Obesity Surveillance using a Location-Based Social Network - Hongyang Bai (University of Illinois at Urbana-Champaign) 00:07:08
    23. Big Data in Financial Services - Josh West (Red Hat) 00:07:33
    24. MarketShare & Alation: Increasing Analyst Productivity & Innovation on Hadoop - Robert Stratton (Alation) 00:09:18
    25. Renovating Data Platform for Real-Time Big Data Processing - Steven Oh (DataStreams) 00:08:02
    26. 5 Tips for Enhancing Your Data Architecture with Hadoop - Rodan Zadeh (Attunity) 00:09:44
    27. Enterprise Archiving with Hadoop - John Ottman (Solix Technologies) 00:11:39
    28. Hadoop in Action: Building a 360-degree view of the supply chain - Kathleen deValk (Siemens Cloud Services) 00:11:58
    29. Advanced Security & Data Abstraction - Eddie Satterly (Computer Sciences Corporation) 00:08:55
    30. Building hybrid jobs for creating targeted and innovative campaigns in realtime - Uday Sagi (Apervi) 00:09:17
    31. Combinatorial Complexity and the Future of Analytics - Mukund Ramachandran (Ayasdi) 00:09:34
    32. Business on Hadoop: Speed, Scale and Security - Dave Mariani (AtScale) 00:17:41
    33. Reinventing Big Data Infrastructure with Containers, Flash, and Software Defined Storage - Sujatha Kashyap (Robin Systems) 00:10:49
    34. Spark ROI: Unlock the Millions From Your Big Data! - Ruban Phukan (Data RPM), Dave Shuman (Cloudera) 00:09:19
    35. Why You Need Big Data Governance - Felix Van de Maele (Collibra) 00:11:32
    36. Using Spark: 5 Quick and Easy Ways to Get More out of Hadoop - Ashley Stirrup (Talend) 00:10:27
    37. Customer Insights at AllState - Mark Slusar (AllState) 00:09:38
    38. Combat cyber-attacks and insider threats with big data and machine learning - Anurag Gurtu (Splunk) 00:10:55
    39. Mainframe big data projects today: What's working? - Mike Combs (Veristorm) 00:11:16
    40. Big Data Scaled with NETbuilder - Tony Lysak (NETbuilder) 00:09:42
    41. Data Warehousing in the Cloud - Steve Herskovitz (Snowflake Computing) 00:11:55
    42. Visualize, Detect, Predict and Take Action on Big Data - Mike Shumpert (Software AG) 00:11:02
    43. HCUBE, smart and simple solution for data transfer to Hadoop - Devi Kondapi (MSRCosmos LLC) 00:10:10