O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Strata + Hadoop World 2016 - London, United Kingdom: Video Compilation

Video Description

Sold out Strata+Hadoop London 2016 is a tour through the giant city of data led by guides expert in knowing just where to go. There is a lot to see in this video compilation that shows you every bit: 211 speakers, 108 sessions, 20 keynotes and 14 tutorials. Start your trip with a long-form tutorial exploring data territory such as: An 8-hour deep dive into all phases of managing Hadoop clusters; an 8-hour excursion through the hardcore data science world of data management, machine learning, natural language processing, crowd-sourcing, and algorithm design; an 8-hour Spark camp on all things Apache; or 3½-hour tours on D3 data visualizations, artificial intelligence, optimizing workflow in R, and more. Want something shorter? Try visiting a mind-blowing conference session (30-40 minutes each) on topics ranging from H20 and TensorFlow to e-commerce A/B testing, predictive analysis, and natural language processing. Not interested? How about streaming analytics at 300 billion events per day with Kafka, Samza, and Druid or using Spark and Hadoop in high-speed trading environments? It’s a travelogue of data wonders with something for everyone.

  • Gain front row access to all 211 speakers, 108 sessions, 20 keynotes, and 14 tutorials
  • Download the videos or view them through O'Reilly's HD player
  • Hear from big data experts at Intel, deepsense.io, IBM, Google, Terradata, and more
  • Watch Cloudera’s Doug Cutting and Tom White predict the future of Apache Hadoop
  • Learn about Spark, Kafka Streams, Kudu, Kappa, Drill, Heron, Flink, Eagle, and NiFi
  • Be inspired by data innovations in cancer research, epilepsy monitoring, and mine field clearing
  • Explore Scotland's Data Lab, the Danish Agency for Digitstation, and the ethics of data processing
  • Hear about big data use at LinkedIn, Intuit, Uber, Etsy, HPE, Docker, Facebook, and Microsoft

Table of Contents

  1. Keynotes
    1. Modern data strategy and CERN - Mike Olson (Cloudera) and Manuel Martin Marquez (CERN) 00:15:08
    2. The Internet of Things: It’s the (sensor) data, stupid - Martin Willcox (Teradata International) 00:11:11
    3. Data relativism and the rise of context services - Joe Hellerstein (UC Berkeley) 00:15:09
    4. Saving whales with deep learning - Piotr Niedzwiedz (deepsense.io) 00:05:15
    5. Data wants to be shareable - Mona Vernon (Thomson Reuters Labs) 00:13:21
    6. Analytics innovation in cancer research - Gilad Olswang (Intel) 00:05:53
    7. The future of (artificial) intelligence - Stuart Russell (UC Berkeley) 00:20:19
    8. The curious case of the data scientist - David Selby (IBM) 00:11:05
    9. Drawing insights from imperfection: A year of Dear Data - Stefanie Posavec (NA) 00:14:41
    10. Big data at Google: Solving problems at scale - Jordan Tigani (Google) 00:05:03
    11. The other half of big data - Tricia Wang (Constellate Data) 00:17:13
    12. Bringing big data and design to policy making - Cat Drew (UK Policy Lab and Government Data Science Partnership) 00:13:45
    13. Machine learning for human rights advocacy: Big benefits, serious consequences - Megan Price (Human Rights Data Analysis Group) 00:15:13
  2. Data innovations
    1. A hands-on introduction to Apache Kafka - Ian Wrigley (Confluent) - Part 1 1:26:03
    2. A hands-on introduction to Apache Kafka - Ian Wrigley (Confluent) - Part 2 1:33:54
    3. AI for business: A hands-on introduction to what machine learning can do - Marc Warner (ASI) - Part 1 1:23:07
    4. AI for business: A hands-on introduction to what machine learning can do - Marc Warner (ASI) - Part 2 1:08:30
    5. Experiments in The Data Lab: Creating a national hub for data science in Scotland - Brian Hills (The Data Lab) 00:36:29
    6. The innards of H2O - Cliff Click (0xdata) 00:40:55
    7. TensorFlow: Machine learning for everyone - Sherry Moore (Google) 00:38:57
    8. The future of column-oriented data processing with Arrow and Parquet - Julien Le Dem (Dremio) 00:43:41
    9. 90% of the world's trade is transported by sea, but what data do we have about ship activity worldwide? - Tal Guttman (Windward) 00:39:50
    10. The evolution of massive-scale data processing - Tyler Akidau (Google) 00:41:36
    11. Streaming analytics at 300 billion events per day with Kafka, Samza, and Druid - Xavier Léauté (Metamarkets) 00:43:50
    12. Triggers in Apache Beam (incubating): User-controlled balance of completeness, latency, and cost in streaming big data pipelines - Kenneth Knowles (Google) 00:44:32
    13. Introducing Kafka Streams, Apache Kafka's new stream processing library - Neha Narkhede (Confluent) 00:47:05
  3. Data science & advanced analytics
    1. R and reproducible reporting for big data - Aimee Gott (Mango Solutions), Mark Sellors (Mango Solutions), and Richard Pugh (Mango Solutions) - Part 1 1:19:29
    2. R and reproducible reporting for big data - Aimee Gott (Mango Solutions), Mark Sellors (Mango Solutions), and Richard Pugh (Mango Solutions) - Part 2 1:17:56
    3. Deep learning and natural language processing with Spark - Andy Petrella (Data Fellas) and Melanie Warrick (Skymind) 00:40:29
    4. Semantic natural language understanding with Spark Streaming, UIMA, and machine-learned ontologies - David Talby (Atigeo) and Claudiu Branzan (Atigeo) 00:45:30
    5. Sightseeing, venues, and friends: Predictive analytics with Spark ML and Cassandra - Natalino Busa (Teradata) 00:38:44
    6. Introduction to generalized low-rank models and missing values - Jo-fai Chow (H2O.ai) 00:29:12
    7. Petascale genomics - Tom White (Cloudera) 00:40:13
    8. Panel: The future of intelligence - Marc Warner (ASI), Stuart Russell (UC Berkeley), and Jaan Tallinn (CSER) 00:39:20
    9. The polyglot data scientist - Jeroen Janssens (Tilburg University) 00:25:11
    10. Beyond guide dogs: How advances in deep learning can empower the blind community - Anirudh Koul (Microsoft) and Saqib Shaikh (Microsoft) 00:37:52
    11. Predicting out-of-sample performance of a large cohort of trading algorithms with machine learning - Thomas Wiecki (Quantopian) 00:38:30
    12. Scala: The unpredicted lingua franca for data science - Andy Petrella (Data Fellas) and Dean Wampler (Lightbend) 00:42:56
    13. Land mine or Coke can: Machine learning from GPR data - Dirk Gorissen (Skycap | World Bank) 00:33:39
    14. Data modeling for data science: Simplify your workload with complex types - Marcel Kornacker (Cloudera) 00:40:28
    15. Applications of natural language understanding: Tools and technologies - Alyona Medelyan (Entopix) 00:39:31
  4. Data-driven business
    1. Developing a modern enterprise data strategy - Scott Kurth (Silicon Valley Data Science) and John Akred (Silicon Valley Data Science) - Part 1 1:28:28
    2. Developing a modern enterprise data strategy - Scott Kurth (Silicon Valley Data Science) and John Akred (Silicon Valley Data Science) - Part 2 1:24:42
    3. The Bag of Little Bootstraps: A/B experimenting with big data made small - Emily Sommer (Etsy) 00:38:23
    4. Beyond the hunch: Communicating uncertainty for effective data-driven business - Abigail Lebrecht (uSwitch) 00:40:31
    5. What’s next for music services? The answer is in the data - Paul Shannon (7digital Group Plc) and Alan Hannaway (7digital) 00:42:48
    6. Intuit, Uber, and Etsy: Scaling innovation with A/B testing - Lucian Lita (Intuit), Mita Mahadevan (Intuit Inc.), Shalin Mantri (Uber), and Gabrielle Gianelli (Etsy) 00:43:48
    7. How AI revolutionizes business strategy - Kenneth Cukier (The Economist) 00:43:32
    8. The best university in the world - Duncan Ross (TES Global) and Francine Bennett (Mastodon C) 00:44:50
    9. 20 percent blissful, 80 percent ignorance - Phil Harvey (DataShaka) 00:24:40
    10. Data gravity and complex systems - Dave McCrory (Basho Technologies) 00:28:21
    11. Analytics: A first-class architectural concern in a SaaS platform - Calum Murray (Intuit) 00:35:08
    12. Situational awareness: On the importance of mapping - Simon Wardley (Leading Edge Forum (CSC)) 00:42:31
    13. Data-driven businesses: Disrupting business models with big data - Carme Artigas (Synergic Partners) 00:24:35
    14. Building better cross-team communication - Ellen Friedman (Independent) 00:23:46
    15. What Esperanto can teach us about collaboration in the big data environment - Anne Sophie Roessler (Dataiku) 00:19:53
    16. What should I eat: The road map to better food and smarter nutrition science - Taryn Fixel (ingredient1) 00:22:35
    17. Your TOS is not informed consent: Ethical experimentation for the Web - Rachel Shadoan (Akashic Labs) 00:22:19
    18. How to ask good questions - Farrah Bostic (The Difference Engine) 00:30:27
    19. Every business is a data business - Mona Vernon (Thomson Reuters Labs) 00:27:36
    20. Data scientists everywhere - Kim Nilsson (Pivigo) 00:21:09
    21. Harnessing big data to transform the energy sector - Erik Nygard (Limejump Ltd) 00:13:57
    22. Data science as catalyst of Autodesk's business model transformation - Laurent Gaubert (Autodesk) 00:19:24
    23. My AlgorithmicMe knows me better than Google or my mum - Majken Sander (BusinessAnalyst.dk) 00:22:49
    24. Otto’s little army of real-time bots: How online retailers can defend shopping carts and retarget customers in real time - Rupert Steffner (Otto GmbH & Co. KG) 00:21:42
    25. My AlgorithmicMe: The "Who is. . .?" of the future - Majken Sander (BusinessAnalyst.dk) and Joerg Blumtritt (Datarella) 00:38:28
    26. Demonstrating the art of the possible with Spark and Hadoop - Joy Spohn (IBM) and Adrian Houselander (IBM) 00:34:48
  5. Enterprise adoption
    1. Apache Hadoop operations for production systems - Jayesh Seshadri (Cloudera), Justin Hancock (Cloudera), Mark Samson (Cloudera), and Wellington Chevreuil (Cloudera) - Part 1 1:25:46
    2. Apache Hadoop operations for production systems - Jayesh Seshadri (Cloudera), Justin Hancock (Cloudera), Mark Samson (Cloudera), and Wellington Chevreuil (Cloudera) - Part 2 1:34:04
    3. Apache Hadoop operations for production systems - Jayesh Seshadri (Cloudera), Justin Hancock (Cloudera), Mark Samson (Cloudera), and Wellington Chevreuil (Cloudera) - Part 3 1:23:20
    4. Apache Hadoop operations for production systems - Jayesh Seshadri (Cloudera), Justin Hancock (Cloudera), Mark Samson (Cloudera), and Wellington Chevreuil (Cloudera) - Part 4 1:19:21
    5. Architecting a data platform - John Akred (Silicon Valley Data Science) and Stephen O'Sullivan (Silicon Valley Data Science) - Part 1 1:24:41
    6. Architecting a data platform - John Akred (Silicon Valley Data Science) and Stephen O'Sullivan (Silicon Valley Data Science) - Part 2 1:36:03
    7. Big SQL: The future of in-cluster analytics and enterprise adoption - Moderated by: Surya Mukherjee (Ovum) - Panelists: Lloyd Tabb (Looker Data Science), Nick Amabile (FullStack Analytics), Rex Gibson (Knewton), dp Suresh (Yahoo!) 00:39:16
    8. BI on Hadoop: What are your options? - Tomer Shiran (Dremio) 00:40:44
  6. Hadoop internals & development
    1. Hadoop application architectures: Fraud detection - Jonathan Seidman (Cloudera), Mark Grover (Cloudera), Gwen Shapira (Confluent), and Ted Malaska (Cloudera) - Part 1 1:29:48
    2. Hadoop application architectures: Fraud detection - Jonathan Seidman (Cloudera), Mark Grover (Cloudera), Gwen Shapira (Confluent), and Ted Malaska (Cloudera) - Part 2 1:23:17
    3. The next 10 years of Apache Hadoop - Doug Cutting (Cloudera), Tom White (Cloudera), and Ben Lorica (O'Reilly Media) 00:39:56
    4. Hadoop's storage gap: Resolving transactional access/analytic performance trade-offs with Apache Kudu (incubating) - Todd Lipcon (Cloudera, Inc.) 00:41:54
    5. Building real-time BI systems with HDFS and Kudu - Ruhollah Farchtchi (Zoomdata) 00:35:37
    6. Why is my Hadoop job slow? - Bikas Saha (Hortonworks Inc) 00:39:04
    7. Scaling out to 10 clusters, 1,000 users, and 10,000 flows: The Dali experience at LinkedIn - Carl Steinbach (LinkedIn) 00:35:43
    8. Floating elephants: Developing data wrangling systems on Docker - Chad Metcalf (Docker) and Seshadri Mahalingam (Trifacta) 00:29:07
  7. Data 101
    1. Developing data scientists: Breaking the skills cap - Yuelin Li (ASI) 00:28:42
    2. The business case for Spark, Kafka, and friends - John Akred (Silicon Valley Data Science) 00:31:12
    3. What is AI? - Melanie Warrick (Skymind) 00:28:01
  8. Hardcore data science
    1. Mobile advertising: The preclick experience - Mounia Lalmas (Yahoo) 00:26:40
    2. Analytics for large-scale time series and event data - Ira Cohen (Anodot) 00:29:31
    3. Recent trends in recommender systems - Danny Bickson (1972) 00:28:50
    4. Visual data analysis for intelligent machines - Francesca Odone (University of Genova) 00:33:09
    5. Deep learning for web-scale text - Piotr Mirowski (Google DeepMind) 00:27:54
    6. Detecting anomalies in the real world - Alessandra Staglianò (The ASI) 00:31:05
    7. Recent advances in deep learning research - Olivier Grisel (Inria & scikit-learn) 00:31:46
    8. Hardcore data science in practice - Mikio Braun (Zalando SE) 00:29:16
    9. Data science++: Improving data science by adding domain understanding - Matthew Smith (Microsoft Research) 00:28:31
    10. A methodology for taxonomy generation and maintenance from large collections of textual data - Roxana Danger (reed.co.uk) 00:27:58
    11. A functional data integration pipeline using Scala - Johannes Bauer (Cambridge Analytica) 00:40:11
  9. IoT & real-time
    1. An Introduction to time series with Team Apache - Patrick McFadin (DataStax) - Part 1 1:12:50
    2. An Introduction to time series with Team Apache - Patrick McFadin (DataStax) - Part 2 1:31:21
    3. What does your smart car know about you? - Charles Givre (Booz | Allen | Hamilton) 00:42:25
    4. When it absolutely, positively has to be there: Reliability guarantees in Kafka - Gwen Shapira (Confluent) and Jeff Holoman (Cloudera) 00:43:27
    5. Real-time epilepsy monitoring with smart clothing: A case study in time series, open source technology, and connected devices - Eric Kramer (Dataiku) 00:37:58
    6. Industrial big data and sensor time series data: Different but not difficult - Gopal GopalKrishnan (OSIsoft, LLC.) and Hoa Tram (OSIsoft) 00:50:56
    7. High-performance data flow with a GUI—and guts - Simon Elliston Ball (Hortonworks) 00:41:47
    8. Watermarks: Time and progress in streaming dataflow and beyond - Slava Chernyak (Google Inc.) 00:35:01
    9. Putting Kafka into overdrive - Gwen Shapira (Confluent) and Todd Palino (LinkedIn) 00:39:39
    10. Stream analytics in the enterprise: A look at Intel’s internal IoT implementation - Moty Fania (Intel) 00:39:42
    11. Legacy or Kafka? What an ideal messaging system should bring to Hadoop - Jim Scott (MapR Technologies, Inc.) 00:38:51
    12. Making sense of exactly-once semantics - Flavio Junqueira (Confluent) 00:39:45
    13. Processing billions of events in real time with Heron - Karthik Ramasamy (Twitter) 00:48:05
    14. Data privacy in the age of the Internet of Things - Alasdair Allan (Babilim Light Industries) 00:35:03
    15. Kappa architecture in the telecom industry - Ignacio Manuel Mulas Viela (Ericsson) and Nicolas Seyvet (Ericsson AB) 00:33:51
  10. Spark & beyond
    1. Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera), and Krishna Sankar (Volvo Cars) - Part 1 1:30:58
    2. Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera), and Krishna Sankar (Volvo Cars) - Part 2 1:28:39
    3. Spark 2.0: What’s next? - Tathagata Das (Databricks) 00:41:39
    4. Anomaly detection in telecom with Spark - Ted Dunning (MapR Technologies) 00:44:47
    5. Beyond shuffling: Tips and tricks for scaling Spark jobs - Holden Karau (IBM) 00:41:25
    6. Securing Apache Spark on production Hadoop clusters - Kostas Sakellis (Cloudera) 00:40:19
    7. The future of streaming in Spark: Structured streaming - Tathagata Das (Databricks) 00:41:57
    8. Introduction to Apache Spark for Java and Scala developers - Ted Malaska (Cloudera) 00:39:53
    9. Breaking Spark: Top five mistakes to avoid when using Apache Spark in production - Neelesh Srinivas Salian (Cloudera) 00:27:42
  11. Visualization & user experience
    1. Introduction to visualizations using D3 - Brian Suda ((optional.is)) - Part 1 1:19:09
    2. Introduction to visualizations using D3 - Brian Suda ((optional.is)) - Part 2 1:20:15
    3. Good city life - Daniele Quercia (Bell Labs) 00:39:31
    4. Pixels and place: What online experiences can borrow from offline spaces and vice versa - Kate O'Neill (KO Insights) 00:42:14
    5. Opportunities for hardware acceleration in big data analytics - Kanu Gulati (Zetta Venture Partners) 00:27:42
    6. The rise of the GPU: GPUs will change how you look at big data - Todd Mostak (MapD) 00:45:36
  12. Sponsored
    1. Which whale is it anyway? Face recognition for right whales using deep learning - Robert Bogucki (deepsense.io) and Maciej Klimek (deepsense.io) 00:33:28
    2. Realizing the value of combining the IoT and big data analytics - Frank Saeuberlich (Teradata) and Eliano Marques (Think Big Analytics) 00:42:01
    3. Federated analytics innovation in cancer research - Gilad Olswang (Intel) 00:43:22
    4. Best practices to extract value from Hadoop with predictive analytics - Zoltan Prekopcsak (RapidMiner) 00:33:13
    5. Building a modern data architecture - Ben Sharma (Zaloni) 00:36:20
    6. High-frequency decisioning, from big data to fast data - Tugdual Grall (MapR Technologies) 00:40:38
    7. Avoid big data becoming a big problem - Raghunath Nambiar (Cisco) 00:43:59
    8. Operating batch in the data-driven enterprise - Joe Goldberg (BMC Software Inc.) 00:40:11
    9. Developing a successful big data strategy - Seb Darrington (EMC) 00:38:36
    10. Business transformation and outcomes through big data - Louise Matthews (Hortonworks) 00:34:36
    11. The business bottom line of data lakes: Real-life experiences - Franz Aman (Informatica) 00:41:22
  13. Security
    1. Simplifying Hadoop with RecordService, a secure and unified data access path for compute frameworks - Alex Leblang (Cloudera) 00:33:26
    2. Best practices and solutions to manage and govern a multinational big data platform - Clara Fletcher (Accenture) 00:38:30
    3. HopsWorks: Multitenant Hadoop as a service - Jim Dowling (Swedish ICT - SICS) 00:39:12
  14. Hadoop use cases
    1. Improving the customer experience with big data wrangling on Hadoop - Dan Jermyn (Royal Bank of Scotland) and Connor Carreras (Trifacta) 00:35:55
    2. Simple, fast, and flexible risk aggregation in Hadoop - Deenar Toraskar (Think Reactive) 00:29:26
    3. Risk data aggregation and risk reporting for financial services - Ben Sharma (Zaloni) 00:33:20
    4. The future is now: Leveraging Hadoop for real-time, predictive insights - Steven Noels (NGDATA) 00:43:03
    5. Year 2025: Big data as enabler of fully automated vehicles - Dr. Thomas Beer (Continental) and Felix Werkmeister (Continental) 00:40:59
    6. Analyzing dynamic JSON with Apache Drill - Tomer Shiran (Dremio) 00:40:56
  15. Law, ethics, governance
    1. Denmark is data driven - Mads Hjorth (Danish Agency for Digitisation) 00:39:47
    2. Using data for evil IV: The journey home - Duncan Ross (TES Global) and Francine Bennett (Mastodon C) 00:39:53
    3. Protecting individual privacy in a data-driven world - Jason McFall (Privitar) 00:39:37
    4. Don't build a data swamp: Hadoop governance case studies for financial services - Mark Donsky (Cloudera) and Chang She (Cloudera) 00:39:02