Video description
Sold out Strata+Hadoop London 2016 is a tour through the giant city of data led by guides expert in knowing just where to go. There is a lot to see in this video compilation that shows you every bit: 211 speakers, 108 sessions, 20 keynotes and 14 tutorials. Start your trip with a long-form tutorial exploring data territory such as: An 8-hour deep dive into all phases of managing Hadoop clusters; an 8-hour excursion through the hardcore data science world of data management, machine learning, natural language processing, crowd-sourcing, and algorithm design; an 8-hour Spark camp on all things Apache; or 3½-hour tours on D3 data visualizations, artificial intelligence, optimizing workflow in R, and more. Want something shorter? Try visiting a mind-blowing conference session (30-40 minutes each) on topics ranging from H20 and TensorFlow to e-commerce A/B testing, predictive analysis, and natural language processing. Not interested? How about streaming analytics at 300 billion events per day with Kafka, Samza, and Druid or using Spark and Hadoop in high-speed trading environments? It’s a travelogue of data wonders with something for everyone.
- Gain front row access to all 211 speakers, 108 sessions, 20 keynotes, and 14 tutorials
- Download the videos or view them through O'Reilly's HD player
- Hear from big data experts at Intel, deepsense.io, IBM, Google, Terradata, and more
- Watch Cloudera’s Doug Cutting and Tom White predict the future of Apache Hadoop
- Learn about Spark, Kafka Streams, Kudu, Kappa, Drill, Heron, Flink, Eagle, and NiFi
- Be inspired by data innovations in cancer research, epilepsy monitoring, and mine field clearing
- Explore Scotland's Data Lab, the Danish Agency for Digitstation, and the ethics of data processing
- Hear about big data use at LinkedIn, Intuit, Uber, Etsy, HPE, Docker, Facebook, and Microsoft
Publisher resources
Table of contents
-
Keynotes
- Modern data strategy and CERN - Mike Olson (Cloudera) and Manuel Martin Marquez (CERN)
- The Internet of Things: It’s the (sensor) data, stupid - Martin Willcox (Teradata International)
- Data relativism and the rise of context services - Joe Hellerstein (UC Berkeley)
- Saving whales with deep learning - Piotr Niedzwiedz (deepsense.io)
- Data wants to be shareable - Mona Vernon (Thomson Reuters Labs)
- Analytics innovation in cancer research - Gilad Olswang (Intel)
- The future of (artificial) intelligence - Stuart Russell (UC Berkeley)
- The curious case of the data scientist - David Selby (IBM)
- Drawing insights from imperfection: A year of Dear Data - Stefanie Posavec (NA)
- Big data at Google: Solving problems at scale - Jordan Tigani (Google)
- The other half of big data - Tricia Wang (Constellate Data)
- Bringing big data and design to policy making - Cat Drew (UK Policy Lab and Government Data Science Partnership)
- Machine learning for human rights advocacy: Big benefits, serious consequences - Megan Price (Human Rights Data Analysis Group)
-
Data innovations
- A hands-on introduction to Apache Kafka - Ian Wrigley (Confluent) - Part 1
- A hands-on introduction to Apache Kafka - Ian Wrigley (Confluent) - Part 2
- AI for business: A hands-on introduction to what machine learning can do - Marc Warner (ASI) - Part 1
- AI for business: A hands-on introduction to what machine learning can do - Marc Warner (ASI) - Part 2
- Experiments in The Data Lab: Creating a national hub for data science in Scotland - Brian Hills (The Data Lab)
- The innards of H2O - Cliff Click (0xdata)
- TensorFlow: Machine learning for everyone - Sherry Moore (Google)
- The future of column-oriented data processing with Arrow and Parquet - Julien Le Dem (Dremio)
- 90% of the world's trade is transported by sea, but what data do we have about ship activity worldwide? - Tal Guttman (Windward)
- The evolution of massive-scale data processing - Tyler Akidau (Google)
- Streaming analytics at 300 billion events per day with Kafka, Samza, and Druid - Xavier Léauté (Metamarkets)
- Triggers in Apache Beam (incubating): User-controlled balance of completeness, latency, and cost in streaming big data pipelines - Kenneth Knowles (Google)
- Introducing Kafka Streams, Apache Kafka's new stream processing library - Neha Narkhede (Confluent)
-
Data science advanced analytics
- R and reproducible reporting for big data - Aimee Gott (Mango Solutions), Mark Sellors (Mango Solutions), and Richard Pugh (Mango Solutions) - Part 1
- R and reproducible reporting for big data - Aimee Gott (Mango Solutions), Mark Sellors (Mango Solutions), and Richard Pugh (Mango Solutions) - Part 2
- Deep learning and natural language processing with Spark - Andy Petrella (Data Fellas) and Melanie Warrick (Skymind)
- Semantic natural language understanding with Spark Streaming, UIMA, and machine-learned ontologies - David Talby (Atigeo) and Claudiu Branzan (Atigeo)
- Sightseeing, venues, and friends: Predictive analytics with Spark ML and Cassandra - Natalino Busa (Teradata)
- Introduction to generalized low-rank models and missing values - Jo-fai Chow (H2O.ai)
- Petascale genomics - Tom White (Cloudera)
- Panel: The future of intelligence - Marc Warner (ASI), Stuart Russell (UC Berkeley), and Jaan Tallinn (CSER)
- The polyglot data scientist - Jeroen Janssens (Tilburg University)
- Beyond guide dogs: How advances in deep learning can empower the blind community - Anirudh Koul (Microsoft) and Saqib Shaikh (Microsoft)
- Predicting out-of-sample performance of a large cohort of trading algorithms with machine learning - Thomas Wiecki (Quantopian)
- Scala: The unpredicted lingua franca for data science - Andy Petrella (Data Fellas) and Dean Wampler (Lightbend)
- Land mine or Coke can: Machine learning from GPR data - Dirk Gorissen (Skycap | World Bank)
- Data modeling for data science: Simplify your workload with complex types - Marcel Kornacker (Cloudera)
- Applications of natural language understanding: Tools and technologies - Alyona Medelyan (Entopix)
-
Data-driven business
- Developing a modern enterprise data strategy - Scott Kurth (Silicon Valley Data Science) and John Akred (Silicon Valley Data Science) - Part 1
- Developing a modern enterprise data strategy - Scott Kurth (Silicon Valley Data Science) and John Akred (Silicon Valley Data Science) - Part 2
- The Bag of Little Bootstraps: A/B experimenting with big data made small - Emily Sommer (Etsy)
- Beyond the hunch: Communicating uncertainty for effective data-driven business - Abigail Lebrecht (uSwitch)
- What’s next for music services? The answer is in the data - Paul Shannon (7digital Group Plc) and Alan Hannaway (7digital)
- Intuit, Uber, and Etsy: Scaling innovation with A/B testing - Lucian Lita (Intuit), Mita Mahadevan (Intuit Inc.), Shalin Mantri (Uber), and Gabrielle Gianelli (Etsy)
- How AI revolutionizes business strategy - Kenneth Cukier (The Economist)
- The best university in the world - Duncan Ross (TES Global) and Francine Bennett (Mastodon C)
- 20 percent blissful, 80 percent ignorance - Phil Harvey (DataShaka)
- Data gravity and complex systems - Dave McCrory (Basho Technologies)
- Analytics: A first-class architectural concern in a SaaS platform - Calum Murray (Intuit)
- Situational awareness: On the importance of mapping - Simon Wardley (Leading Edge Forum (CSC))
- Data-driven businesses: Disrupting business models with big data - Carme Artigas (Synergic Partners)
- Building better cross-team communication - Ellen Friedman (Independent)
- What Esperanto can teach us about collaboration in the big data environment - Anne Sophie Roessler (Dataiku)
- What should I eat: The road map to better food and smarter nutrition science - Taryn Fixel (ingredient1)
- Your TOS is not informed consent: Ethical experimentation for the Web - Rachel Shadoan (Akashic Labs)
- How to ask good questions - Farrah Bostic (The Difference Engine)
- Every business is a data business - Mona Vernon (Thomson Reuters Labs)
- Data scientists everywhere - Kim Nilsson (Pivigo)
- Harnessing big data to transform the energy sector - Erik Nygard (Limejump Ltd)
- Data science as catalyst of Autodesk's business model transformation - Laurent Gaubert (Autodesk)
- My AlgorithmicMe knows me better than Google or my mum - Majken Sander (BusinessAnalyst.dk)
- Otto’s little army of real-time bots: How online retailers can defend shopping carts and retarget customers in real time - Rupert Steffner (Otto GmbH Co. KG)
- My AlgorithmicMe: The "Who is. . .?" of the future - Majken Sander (BusinessAnalyst.dk) and Joerg Blumtritt (Datarella)
- Demonstrating the art of the possible with Spark and Hadoop - Joy Spohn (IBM) and Adrian Houselander (IBM)
-
Enterprise adoption
- Apache Hadoop operations for production systems - Jayesh Seshadri (Cloudera), Justin Hancock (Cloudera), Mark Samson (Cloudera), and Wellington Chevreuil (Cloudera) - Part 1
- Apache Hadoop operations for production systems - Jayesh Seshadri (Cloudera), Justin Hancock (Cloudera), Mark Samson (Cloudera), and Wellington Chevreuil (Cloudera) - Part 2
- Apache Hadoop operations for production systems - Jayesh Seshadri (Cloudera), Justin Hancock (Cloudera), Mark Samson (Cloudera), and Wellington Chevreuil (Cloudera) - Part 3
- Apache Hadoop operations for production systems - Jayesh Seshadri (Cloudera), Justin Hancock (Cloudera), Mark Samson (Cloudera), and Wellington Chevreuil (Cloudera) - Part 4
- Architecting a data platform - John Akred (Silicon Valley Data Science) and Stephen O'Sullivan (Silicon Valley Data Science) - Part 1
- Architecting a data platform - John Akred (Silicon Valley Data Science) and Stephen O'Sullivan (Silicon Valley Data Science) - Part 2
- Big SQL: The future of in-cluster analytics and enterprise adoption - Moderated by: Surya Mukherjee (Ovum) - Panelists: Lloyd Tabb (Looker Data Science), Nick Amabile (FullStack Analytics), Rex Gibson (Knewton), dp Suresh (Yahoo!)
- BI on Hadoop: What are your options? - Tomer Shiran (Dremio)
-
Hadoop internals development
- Hadoop application architectures: Fraud detection - Jonathan Seidman (Cloudera), Mark Grover (Cloudera), Gwen Shapira (Confluent), and Ted Malaska (Cloudera) - Part 1
- Hadoop application architectures: Fraud detection - Jonathan Seidman (Cloudera), Mark Grover (Cloudera), Gwen Shapira (Confluent), and Ted Malaska (Cloudera) - Part 2
- The next 10 years of Apache Hadoop - Doug Cutting (Cloudera), Tom White (Cloudera), and Ben Lorica (O'Reilly Media)
- Hadoop's storage gap: Resolving transactional access/analytic performance trade-offs with Apache Kudu (incubating) - Todd Lipcon (Cloudera, Inc.)
- Building real-time BI systems with HDFS and Kudu - Ruhollah Farchtchi (Zoomdata)
- Why is my Hadoop job slow? - Bikas Saha (Hortonworks Inc)
- Scaling out to 10 clusters, 1,000 users, and 10,000 flows: The Dali experience at LinkedIn - Carl Steinbach (LinkedIn)
- Floating elephants: Developing data wrangling systems on Docker - Chad Metcalf (Docker) and Seshadri Mahalingam (Trifacta)
- Data 101
-
Hardcore data science
- Mobile advertising: The preclick experience - Mounia Lalmas (Yahoo)
- Analytics for large-scale time series and event data - Ira Cohen (Anodot)
- Recent trends in recommender systems - Danny Bickson (1972)
- Visual data analysis for intelligent machines - Francesca Odone (University of Genova)
- Deep learning for web-scale text - Piotr Mirowski (Google DeepMind)
- Detecting anomalies in the real world - Alessandra Staglianò (The ASI)
- Recent advances in deep learning research - Olivier Grisel (Inria scikit-learn)
- Hardcore data science in practice - Mikio Braun (Zalando SE)
- Data science++: Improving data science by adding domain understanding - Matthew Smith (Microsoft Research)
- A methodology for taxonomy generation and maintenance from large collections of textual data - Roxana Danger (reed.co.uk)
- A functional data integration pipeline using Scala - Johannes Bauer (Cambridge Analytica)
-
IoT real-time
- An Introduction to time series with Team Apache - Patrick McFadin (DataStax) - Part 1
- An Introduction to time series with Team Apache - Patrick McFadin (DataStax) - Part 2
- What does your smart car know about you? - Charles Givre (Booz | Allen | Hamilton)
- When it absolutely, positively has to be there: Reliability guarantees in Kafka - Gwen Shapira (Confluent) and Jeff Holoman (Cloudera)
- Real-time epilepsy monitoring with smart clothing: A case study in time series, open source technology, and connected devices - Eric Kramer (Dataiku)
- Industrial big data and sensor time series data: Different but not difficult - Gopal GopalKrishnan (OSIsoft, LLC.) and Hoa Tram (OSIsoft)
- High-performance data flow with a GUI—and guts - Simon Elliston Ball (Hortonworks)
- Watermarks: Time and progress in streaming dataflow and beyond - Slava Chernyak (Google Inc.)
- Putting Kafka into overdrive - Gwen Shapira (Confluent) and Todd Palino (LinkedIn)
- Stream analytics in the enterprise: A look at Intel’s internal IoT implementation - Moty Fania (Intel)
- Legacy or Kafka? What an ideal messaging system should bring to Hadoop - Jim Scott (MapR Technologies, Inc.)
- Making sense of exactly-once semantics - Flavio Junqueira (Confluent)
- Processing billions of events in real time with Heron - Karthik Ramasamy (Twitter)
- Data privacy in the age of the Internet of Things - Alasdair Allan (Babilim Light Industries)
- Kappa architecture in the telecom industry - Ignacio Manuel Mulas Viela (Ericsson) and Nicolas Seyvet (Ericsson AB)
-
Spark beyond
- Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera), and Krishna Sankar (Volvo Cars) - Part 1
- Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera), and Krishna Sankar (Volvo Cars) - Part 2
- Spark 2.0: What’s next? - Tathagata Das (Databricks)
- Anomaly detection in telecom with Spark - Ted Dunning (MapR Technologies)
- Beyond shuffling: Tips and tricks for scaling Spark jobs - Holden Karau (IBM)
- Securing Apache Spark on production Hadoop clusters - Kostas Sakellis (Cloudera)
- The future of streaming in Spark: Structured streaming - Tathagata Das (Databricks)
- Introduction to Apache Spark for Java and Scala developers - Ted Malaska (Cloudera)
- Breaking Spark: Top five mistakes to avoid when using Apache Spark in production - Neelesh Srinivas Salian (Cloudera)
-
Visualization user experience
- Introduction to visualizations using D3 - Brian Suda ((optional.is)) - Part 1
- Introduction to visualizations using D3 - Brian Suda ((optional.is)) - Part 2
- Good city life - Daniele Quercia (Bell Labs)
- Pixels and place: What online experiences can borrow from offline spaces and vice versa - Kate O'Neill (KO Insights)
- Opportunities for hardware acceleration in big data analytics - Kanu Gulati (Zetta Venture Partners)
- The rise of the GPU: GPUs will change how you look at big data - Todd Mostak (MapD)
-
Sponsored
- Which whale is it anyway? Face recognition for right whales using deep learning - Robert Bogucki (deepsense.io) and Maciej Klimek (deepsense.io)
- Realizing the value of combining the IoT and big data analytics - Frank Saeuberlich (Teradata) and Eliano Marques (Think Big Analytics)
- Federated analytics innovation in cancer research - Gilad Olswang (Intel)
- Best practices to extract value from Hadoop with predictive analytics - Zoltan Prekopcsak (RapidMiner)
- Building a modern data architecture - Ben Sharma (Zaloni)
- High-frequency decisioning, from big data to fast data - Tugdual Grall (MapR Technologies)
- Avoid big data becoming a big problem - Raghunath Nambiar (Cisco)
- Operating batch in the data-driven enterprise - Joe Goldberg (BMC Software Inc.)
- Developing a successful big data strategy - Seb Darrington (EMC)
- Business transformation and outcomes through big data - Louise Matthews (Hortonworks)
- The business bottom line of data lakes: Real-life experiences - Franz Aman (Informatica)
-
Security
- Simplifying Hadoop with RecordService, a secure and unified data access path for compute frameworks - Alex Leblang (Cloudera)
- Best practices and solutions to manage and govern a multinational big data platform - Clara Fletcher (Accenture)
- HopsWorks: Multitenant Hadoop as a service - Jim Dowling (Swedish ICT - SICS)
-
Hadoop use cases
- Improving the customer experience with big data wrangling on Hadoop - Dan Jermyn (Royal Bank of Scotland) and Connor Carreras (Trifacta)
- Simple, fast, and flexible risk aggregation in Hadoop - Deenar Toraskar (Think Reactive)
- Risk data aggregation and risk reporting for financial services - Ben Sharma (Zaloni)
- The future is now: Leveraging Hadoop for real-time, predictive insights - Steven Noels (NGDATA)
- Year 2025: Big data as enabler of fully automated vehicles - Dr. Thomas Beer (Continental) and Felix Werkmeister (Continental)
- Analyzing dynamic JSON with Apache Drill - Tomer Shiran (Dremio)
-
Law, ethics, governance
- Denmark is data driven - Mads Hjorth (Danish Agency for Digitisation)
- Using data for evil IV: The journey home - Duncan Ross (TES Global) and Francine Bennett (Mastodon C)
- Protecting individual privacy in a data-driven world - Jason McFall (Privitar)
- Don't build a data swamp: Hadoop governance case studies for financial services - Mark Donsky (Cloudera) and Chang She (Cloudera)
Product information
- Title: Strata + Hadoop World 2016 - London, United Kingdom: Video Compilation
- Author(s):
- Release date: June 2016
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491944639
You might also like
video
Strata + Hadoop World London 2015: Video Compilation
Explore solutions to your most challenging data problems How are large businesses using data? What happens …
video
Strata + Hadoop World New York 2015: Video Compilation
The future belongs to those who know how to use data Whether you want to build …
video
Strata + Hadoop World Conference in Barcelona 2014: Complete Video Compilation
Immerse yourself in the world of data Unable to attend Strata + Hadoop World Conference in …
video
Strata + Hadoop World 2016 - San Jose, California: Video Compilation
Make data work, a simple phrase a mile deep, was the theme of Strata+ Hadoop San …