O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Strata Data Conference 2017 - London, United Kingdom

Video Description

Strata is the largest data conference series in the world; the place where cutting-edge science and new business fundamentals intersect—and merge. This video compilation gives you total access to each of the 112 sessions, 19 tutorials, and 16 keynotes delivered at Strata London 2017. Broad and deep as the Thames itself, you'll hear presentations on stream processing and analytics, deep learning, big data and the Cloud, data-driven business management, and cybersecurity. You'll enjoy overviews of technologies and platforms like Microsoft’s Azure HDInsight, O'Reilly's Oriole, BMC's Control-M, Google's TensorFlow, AWS Lambda, Twitter's Heron, Intel's BigDL, as well as talks on almost every Apache ecosystem tool available today, including Arrow, Beam, Cassandra, Druid, Flink, Flume, HAWQ, Hive, Impala, Kafka, Kinetica, Kudu, MXNet, Presto, Spark, Spark ML, Spark Streaming, and Spark SQL.

But Strata is more than topics and technologies, it's the people who speak: Big Data visionaries like Luke Han (CEO, Kyligence) on Apache Kylin use cases in China; Reynold Xin (Chief Architect, Databricks) on Spark's API and engine evolutions; Eric Tilenius (CEO, Blue Talon) on the EU's GDPR's potential for better data security; Duncan Ross (Director, TES Global) on doing good with big data; M. C. Srivas (Chief Data Architect, Uber) on the real-time intelligence that drives Uber; and Anthony Goldbloom (CEO, Kaggle) on lessons learned from the million(!) data scientists who participate in Kaggle's machine learning competitions. This compilation gives you a front row view to each of these speakers and to all of the 211 data leaders who spoke at Strata London 2017. Highlights include:

  • Strata Business Summit — A set of 36 sessions specifically curated for the senior-level business executive and strategist, the Summit is the missing MBA for data-driven business. It includes clear-eyed guidance from Tim O'Reilly (Founder, O'Reilly Media) on the future of AI as a jobs generator; every Executive Briefing; five tutorials (an intro to data architecture's core principles, a how-to on developing a modern enterprise data strategy, and so on); sessions on data lakes, interactive visualizations, and natural language generation; overviews of Spark, Docker, Containers, and Notebooks; and a wealth of big data and reg tech case studies from Cox Automotive, EasyJet, Barclays, Santander, Transport for London, and more.
  • Executive Briefings: six high-level overviews on data as the driver of business value, including hard-nosed evaluations of cloud strategies by Manuel Sevilla (CTO, Capgemini); Carme Artigas (Synergic Partners/CEOE Innovation Board) on proven tactics for accelerating big data adoption; Nicolaus Henke (Senior Partner, McKinsey) on what CEO's really think about AI; and an insider's look at EU General Data Protection Regulation's privacy obligations by Aurélie Pols (Co-Chair, IEEE Data Privacy Process).
  • Hardcore Data Science Day: eight hours on advanced techniques in deep learning, NLP, and algorithm design with sessions on Microsoft's LightGBM, Google's CausalImpact, Intel's BigDL, Toupee, and more from data science leaders like David Barber (UCL) and Angie Ma (ASI Data Science).
  • 19 tutorials: long form sessions including Spark Camp, the all-day intro to Apache Sparks core concepts and machine learning library; Anima Anandkumar (AWS) on distributed deep learning with Apache MXNet; Tim Berglund (Confluent) on building real-time streaming pipelines with Kafka Connect and Kafka Streams; Bargava Subramanian (Red Hat) on interactive data visualizations using Visdown; Aimee Gott (Mango Solutions) on scaling R data analysis with Spark and sparklyr; and four senior engineers from Cloudera on deploying and managing Hive, Spark, and Impala in the public cloud.
  • 39 Data Science & Advanced Analytics Sessions: covering topics like the state of TensorFlow in 2017 by Sherry Moore (Google Brain Team); deep learning in day- to-day practice by Mikio Braun (Zalando SE); and What "50 Years of Data Science" leaves out—a practical, balanced view of what building data science capability means today by Sean Owen (Cloudera London).
  • 23 Data Engineering and Architecture Sessions: illuminating solutions from data pros like Jacques Nadeau (Dremio/Apache Drill PMC) on creating virtual data lakes with Apache Arrow; John Akred and Stephen O'Sullivan (Silicon Valley Data Science) on architecting a data platform; and Tyler Akidau (Google) on realizing the promise of portability with Apache Beam.

Ready to take in Strata London 2017? Get it from O'Reilly Media's Safari and explore it at your own pace.

Table of Contents

  1. Keynotes
    1. The science of visual interactions - Miriam Redi (Bell Labs Cambridge, UK) 00:13:40
    2. Machine learning is a moonshot for us all (sponsored by Google) - Darren Strange (Google) 00:05:43
    3. What Kaggle has learned from almost a million data scientists - Anthony Goldbloom (Kaggle) 00:15:34
    4. Another one bytes the dust (sponsored by Dell EMC) - Paul Brook (Dell EMC) 00:05:27
    5. The data subject first? - Aurélie Pols (Mind Your Group by Mind Your Privacy) 00:09:25
    6. Real-time intelligence gives Uber the edge - M. C. Srivas (Uber) 00:13:27
    7. Lessons from piloting the London Office of Data Analytics - Eddie Copeland (Nesta) 00:14:01
    8. Accelerate analytics and AI innovations with Intel (sponsored by Intel) - Ziya Ma (Intel Corp) 00:11:06
    9. Enabling data science in the enterprise - Mike Olson (Cloudera), Tom Smith (Office of National Statistics) 00:10:37
    10. Is finance ready for AI? - Aida Mehonic (ASI Data Science) 00:27:51
    11. Peeking into the black box: Lessons from the front lines of machine-learning product launches - Grace Huang (Pinterest) 00:12:35
    12. Using AI to create new jobs - Tim O'Reilly (O'Reilly Media) 00:28:41
  2. FinData
    1. Crossing the river by feeling the stones - Simon Wardley (Leading Edge Forum) 00:34:14
  3. Sponsored
    1. Deep Learning: Assessing Analytics Project Feasibility and Its Computational Requirements - Adam Grzywaczewski (NVIDIA LTD) 00:37:11
    2. Architecting the future: Insights learned from Google’s journey in data - Darren Strange (Google) 00:41:42
    3. The added value of data science - Jan Willem Gehrels (IBM Corporation) 00:29:28
    4. Build big data enterprise solutions faster on Azure HDInsight - Pranav Rastogi (Microsoft) 00:41:05
    5. Architecture best practices for big data deployments - Cory Minton (EMC) 00:39:52
    6. Migrating petabyte-scale Hadoop clusters with zero downtime - Alon Elishkov (Outbrain) 00:39:32
    7. Ingest, process, analyze: Automation and integration through the big data journey - Neil Cullum (BMC Software), Alon Lebenthal (BMC Software) 00:34:58
    8. The digital twin: Real and gaining ground - Shree Dandekar (Honeywell) 00:38:42
    9. Empowering data analytics: Real-life use cases - Martin Oberhuber (Think Big, a Teradata company) 00:41:45
  4. Data Case Studies
    1. Making the future happen sooner - Alistair Croll (Solve For Interesting) 00:31:20
    2. The mystery of the vanishing pins: Building a sustainable content ecosystem at Pinterest - Grace Huang (Pinterest) 00:37:37
    3. TensorFlow in the wild; Or, the democratization of machine intelligence - Kazunori Sato (Google) 00:39:42
  5. Data-driven business management
    1. The five dysfunctions of a data engineering team - Jesse Anderson (Big Data Institute) 00:44:17
    2. Principles of data science management - David Martinez Rego (DataSpartan) 00:40:30
  6. Data science and advanced analytics
    1. AI within O'Reilly Media - Paco Nathan (O'Reilly Media) 00:46:23
    2. Machine learning with partial and biased feedback - Damien Lefortier (Facebook) 00:37:58
    3. Enterprise artificial intelligence - Laura Frolich (Think Big, A Teradata Company) 00:33:18
    4. Reducing neural-network training time through hyperparameter optimization - Amitai Armon (Intel), Yahav Shadmi (Intel) 00:27:03
    5. Distributed deep learning on AWS using Apache MXNet - Anima Anandkumar (UC Irvine) 00:40:33
    6. TensorFlow and deep learning (without a PhD) - Martin Görner (Google) 00:40:26
    7. Deep learning in practice - Mikio Braun (Zalando SE) 00:42:59
    8. The state of TensorFlow and where it is going in 2017 - Sherry Moore (Google) 00:37:37
    9. Tensor abuse in the workplace - Ted Dunning (MapR Technologies) 00:33:50
    10. What does your postcode say about you? A technique to understand rare events based on demographics - Gary Willis (ASI) 00:33:43
    11. Relevancer: Finding and labeling relevant information in tweet collections - Ali Hürriyetoglu (Statistics Netherlands), Nelleke Oostdijk (Radboud University) 00:27:50
    12. Deep learning with Microsoft Cognitive Toolkit - Barbara Fusinska (Microsoft) 00:41:36
    13. Machine learning to automate localization with Apache Spark and other open source tools - Michelle Casbon (Qordoba) 00:38:46
    14. Conversation AI: From theory to the great promise - Yishay Carmiel (Spoken Communications) 00:40:20
    15. When models go rogue: Hard-earned lessons about using machine learning in production - David Talby (Atigeo) 00:40:53
    16. Efficient R programming - Colin Gillespie (Jumping Rivers | Newcastle University) 00:36:24
    17. What "50 Years of Data Science" leaves out - Sean Owen (Cloudera) 00:30:32
    18. Faster deep learning solutions from training to inference - Nir Lotan (Intel), Barak Rozenwax (Intel) 00:35:56
    19. Fighting bad guys with data science - Jonathon Morgan (New Knowledge) 00:44:10
  7. Visualization & user experience
    1. Create interactive maps in seconds with R and Leaflet - Jeroen Janssens (Data Science Workshops) 00:42:08
    2. Visualizing the health of the internet with Measurement Lab - Irene Ros (Bocoup) 00:36:45
  8. Spark & beyond
    1. A behind-the-scenes look into Spark's API and engine evolutions - Reynold Xin (Databricks) 00:41:30
    2. Debugging Apache Spark - Holden Karau (IBM) 00:44:09
    3. Spark machine-learning pipelines: The good, the bad, and the ugly - Vincent Van Steenbergen (w00t data) 00:32:18
    4. How to secure Apache Spark? - Neelesh Srinivas Salian (Stitch Fix) 00:28:09
  9. Hardcore Data Science
    1. Learning the relationships between time series metrics at scale; or, Why you can never find a taxi in the rain - Ira Cohen (Anodot) 00:32:08
    2. Inferring the effect of an event using CausalImpact - Kay Brodersen (Google) 00:29:48
    3. Reliable prediction: Handling uncertainty - Robin Senge (inovex GmbH) 00:30:44
  10. Hadoop platform and applications
    1. Apache Kylin use cases in China - Luke Han (Kyligence) 00:39:55
    2. Tuning Impala: The top five performance optimizations for the best BI and SQL analytics on Hadoop - Marcel Kornacker (Cloudera), Mostafa Mokhtar (Cloudera) 00:33:09
    3. Creating real-time, data-centric applications with Impala and Kudu - Marcel Kornacker (Cloudera) 00:40:15
  11. Data engineering and architecture
    1. Building a modern data architecture for scale - Ben Sharma (Zaloni) 00:33:44
    2. Automated data exploration: Building efficient analysis pipelines with dask - Victor Zabalza (ASI Data Science) 00:40:33
    3. Creating a virtual data lake with Apache Arrow - Tomer Shiran (Dremio), Jacques Nadeau (Dremio) 00:41:39
    4. Performance and security: A tale of two cities - Rekha Joshi (Intuit) 00:43:06
  12. Big data and the Cloud
    1. How to optimally run Cloudera batch data engineering workflows in AWS - Andrei Savu (Cloudera), Philip Langdale (Cloudera) 00:41:42
    2. Building containerized Spark on a solid foundation with Quobyte and Kubernetes - Daniel Bäurer (inovex GmbH), Sascha Askani (inovex GmbH) 00:39:08
    3. Journey to AWS: Straddling two worlds - Calum Murray (Intuit) 00:38:36
  13. Stream processing and analytics
    1. Speeding up Twitter Heron streaming by 5x - Sanjeev Kulkarni (Streamlio), Maosong Fu (Twitter) 00:39:06
    2. Unified stateful big data processing in Apache Beam (incubating) - Aljoscha Krettek (data Artisans) 00:40:15
    3. Elastic streams: Dynamic data redistribution in Apache Kafka - Ben Stopford (Confluent), Ismael Juma (Confluent) 00:41:25
    4. Stream all the things! - Dean Wampler (Lightbend) 00:31:24
    5. Stream analytics with SQL on Apache Flink - Fabian Hueske (data Artisans) 00:38:05
  14. Law, ethics, governance
    1. Data citizenship: The next stage of data governance - Antonio Alvarez (Santander Group), Lidia Crespo (Santander UK) 00:42:13
    2. GDPR, data privacy, anonymization, minimization. . .oh my! - Steve Touw (Immuta) 00:42:36
  15. Data 101
    1. Cloudy with a chance of on-prem - Jim Scott (MapR Technologies, Inc.) 00:28:55
  16. Platform Security and Cybersecurity
    1. Safeguarding electronic stock trading: Challenges and key lessons in network security - Graham Ahearne (Corvil), Fergal Toomey (Corvil) 00:43:06
    2. Machine learning to "spot" cybersecurity incidents at scale - Eddie Garcia (Cloudera) 00:40:59
    3. Speed up big data encryption in Apache Hadoop and Spark - Haifeng Chen (Intel) 00:29:59
  17. Enterprise adoption
    1. Data science governance: What and how - Andy Petrella (Kensu) 00:39:55
  18. Emerging Technologies
    1. Algorithmic regulation - Daniele Quercia (Bell Labs), Giovanni Quattrone (UCL) 00:39:51
  19. Tutorials
    1. Fast and effective training for deep learning - David Barber (Department of Computer Science, UCL) 00:27:09
    2. Challenges in commercializing deep learning - Eduard Vazquez (Cortexica Vision Systems) 00:25:12
    3. Ensembles in deep learning with Toupee - Alan Mosca (Sendence | Birkbeck, University of London) 00:27:12
    4. Deep learning in commodities markets - Aida Mehonic (ASI Data Science) 00:26:16
    5. Machine-learning algorithms: What they do and when to use them - Darren Cook (QQ Trend Ltd.) 00:29:15
    6. Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Stephane Rion (Big Data Partnership) - Part 1 1:14:22
    7. Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Stephane Rion (Big Data Partnership) - Part 2 1:01:14
    8. Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Stephane Rion (Big Data Partnership) - Part 3 1:35:49
    9. Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Stephane Rion (Big Data Partnership) - Part 4 00:52:28
    10. Distributed deep learning on AWS using Apache MXNet - Anima Anandkumar (UC Irvine) - Part 1 1:30:41
    11. Distributed deep learning on AWS using Apache MXNet - Anima Anandkumar (UC Irvine) - Part 2 1:29:45
    12. Practical machine learning with Python - Charlotte Werger (ASI Data Science) - Part 1 00:57:21
    13. Practical machine learning with Python - Charlotte Werger (ASI Data Science) - Part 2 00:46:06
    14. Discover the business value of open data - Majken Sander (TimeXtender) 00:31:21
    15. 10 ways your data project is going to fail and how to prevent it - Martin Goodson (Evolution AI) 00:28:32
    16. Growing a data-driven organization at easyJet - Alberto Rey (easyJet PLC) 00:33:45
    17. Big data at Cox Automotive: Delivering actionable insights to transform the way the world buys, sells, and owns vehicles - Allison Nau (Cox Automotive UK) 00:33:30
    18. Interactive data visualizations using Visdown - Amit Kapoor (narrativeVIZ Consulting), Bargava Subramanian (Red Hat) - Part 1 1:22:38
    19. Interactive data visualizations using Visdown - Amit Kapoor (narrativeVIZ Consulting), Bargava Subramanian (Red Hat) - Part 2 1:28:01
    20. Architecting and building enterprise-class Spark and Hadoop in cloud environments - John Mikula (Google Cloud) 1:28:22
    21. Architecting a next-generation data platform - Jonathan Seidman (Cloudera), Mark Grover (Cloudera), Ted Malaska (Blizzard) - Part 1 1:25:15
    22. Architecting a next-generation data platform - Jonathan Seidman (Cloudera), Mark Grover (Cloudera), Ted Malaska (Blizzard) - Part 2 1:30:57
    23. Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 1 1:29:58
    24. Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 2 1:20:40
    25. Spark and R with sparklyr - Douglas Ashton (Mango Solutions), Aimee Gott (Mango Solutions), Mark Sellors (Mango Solutions) - Part 1 1:04:30
    26. Spark and R with sparklyr - Douglas Ashton (Mango Solutions), Aimee Gott (Mango Solutions), Mark Sellors (Mango Solutions) - Part 2 1:21:19
    27. How Apache Spark and AWS Lambda empower researchers to identify disease-causing mutations and engineer healthier genomes - Denis C. Bauer (Commonwealth Scientific and Industrial Research Organisation) 00:31:16
    28. Unraveling data with Spark using machine learning - Jeffrey Shmain (Cloudera), Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera) - Part 1 1:31:38
    29. Unraveling data with Spark using machine learning - Jeffrey Shmain (Cloudera), Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera) - Part 2 1:18:41
    30. Big data science, the IoT, and the transportation sector - Wael Elrifai (Pentaho) 00:31:11
  20. Multiple Topics
    1. Building deep learning-powered big data - Radhika Rangarajan (Intel) 00:33:49
    2. Meta-data science: When all the world's data scientists are just not enough - Leah McGuire (Salesforce) 00:35:37
    3. Executive Briefing: Advanced analytics in the cloud - Jerry Overton (DXC) 00:39:34
    4. Executive Briefing: Cloud strategy - Manuel Sevilla (Capgemini) 00:40:27
    5. Executive Briefing: Data governance and evolving privacy legislation: Daring to move beyond compliance - Aurélie Pols (Mind Your Group by Mind Your Privacy) 00:47:18
    6. Distributed deep learning at scale on Apache Spark with BigDL - Ding Ding (Intel) 00:24:40
    7. A deep dive into Spark SQL's Catalyst optimizer - Herman van Hövell tot Westerflier (Databricks) 00:34:15
    8. Organizing the data lake - Mark Madsen (Third Nature) 00:43:22
    9. Executive Briefing: Dealing with device data - Mark Madsen (Third Nature) 00:43:24
    10. Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x performance improvement to Qunar’s streaming processing - Xueyan Li (Qunar), Yupeng Fu (Alluxio) 00:36:03
    11. Realizing the promise of portability with Apache Beam - Tyler Akidau (Google) 00:41:45
    12. Dask: Flexible analytic computing for Python - Matthew Rocklin (Continuum) 00:37:42
    13. Is finance ready for AI? - Aida Mehonic (ASI Data Science) 00:27:51
    14. Artificial intelligence in the enterprise - Martin Goodson (Evolution AI), Andrew Crisp (Dun & Bradstreet) 00:36:24
    15. Driving the next wave of data lineage with automation, visualization, and interaction - Sean Kandel (Trifacta) 00:43:52
    16. Computable content: Notebooks, containers, and data-centric organizational learning - Paco Nathan (O'Reilly Media) 00:43:32
    17. Continuous analytics: Integrating the data hub in a DevOps pipeline - Arturo Bayo (Synergic Partners), Alvaro Fernandez Velando (Santander Spain) 00:37:23
    18. Executive Briefing: Analytics centers of excellence as a way to accelerate big data adoption by business - Carme Artigas (Synergic Partners) 00:33:28
    19. Rethinking stream processing with Apache Kafka: Applications versus clusters and streams versus databases - Michael Noll (Confluent) 00:40:22
    20. How knowledge graphs can help dramatically improve recommendations - Aurélien Géron (Kiwisoft) 00:42:45
    21. Near-real-time ingest with Apache Flume and Apache Kafka at 1 million-events-per-second scale - Tristan Stevens (Cloudera) 00:40:53
    22. How do you help charities do data? - Duncan Ross (TES Global), Emma Prest (DataKind) 00:43:52
    23. Hadoop as a service: How to build and operate an enterprise data lake supporting operational and streaming analytics - Phillip Radley (BT) 00:46:29
    24. Building a scalable recommendation engine with Spark and Elasticsearch - Seth Hendrickson (Cloudera) 00:40:14
    25. Real-time machine learning with Redis, Apache Spark, TensorFlow, and more - Kamran Yousaf (Redis Labs) 00:33:06
    26. EU GDPR as an opportunity to address both big data security and compliance - Eric Tilenius (BlueTalon) 00:35:37
    27. Identifying and exploiting the keys to digital transformation - Jack Norris (MapR Technologies) 00:40:25
    28. Speeding up machine-learning applications with the LightGBM library in real-time domains - Mathew Salvaris (Microsoft), Miguel Gonzalez-Fierro (Microsoft) 00:24:47
    29. Making recommendations using graphs and Spark - Harry Powell (Barclays), Raffael Strassnig (Barclays) 00:45:21
    30. Conversation interfaces for data science models - Galiya Warrier (Microsoft) 00:37:16
    31. How to prevent future accidents in autonomous driving - Dr.-Ing. Michael Nolting (Volkswagen Commercial Vehicles) 00:48:10
    32. Lessons learned working with Spark and Cassandra - Matthias Niehoff (codecentric AG) 00:33:35
    33. The future of natural language generation, 2016–2026 - Adam Smith (Automated Insights) 00:40:32
    34. Fast data at ING: Utilizing Kafka, Spark, Flink, and Cassandra for data science and streaming analytics - Bas Geerdink (ING) 00:38:28
    35. Spark Structured Streaming for machine learning - Holden Karau (IBM), Seth Hendrickson (Cloudera) 00:39:44
    36. From data dinosaurs to data stars in five weeks: Lessons from completing 80 data science projects - Kim Nilsson (Pivigo) 00:39:09
    37. Deploy Spark ML TensorFlow AI models from notebooks to hybrid clouds (including GPUs) - Chris Fregly (PipelineAI) 00:48:06
    38. Mastering computer vision problems with state-of-the art deep learning architectures, MXNet, and GPU virtual machines - Miguel Gonzalez-Fierro (Microsoft) 00:38:05
    39. Big data computations: Comparing Apache HAWQ, Druid, and GPU databases - Dr. Vijay Srinivas Agneeswaran (SapientNitro) 00:45:10
    40. The business case for deep learning, Spark, and friends - Sanjay Mathur (Silicon Valley Data Science) 00:29:02
    41. "Smartifying" the game - Iñaki Puigdollers (Social Point) 00:29:06
    42. A wealth of information leads to a poverty of attention: Why adopting the cloud can help you stay focused on the right things - Yuval Dvir (Google) 00:39:50