O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Strata + Hadoop World 2016 - New York, New York

Video Description

Big data's most influential business decision makers, strategists, architects, developers, and analysts gathered together at the Strata + Hadoop NY 2016 conference to discuss and explore the cutting-edge issues around the processes of rapidly acquiring, processing, preparing, storing, and analyzing data on a massive scale. This video compilation gives you complete access to all of the conference’s tutorials, sessions, and keynotes— more than 200 events in total.

A short list of the technologies you’ll learn about include Apache Hadoop, Spark, Hive, Beam, Impala, Solr, Kudu, Mesos, Zeppelin, Akka, Cassandra, Myriad, Kafka, Apex, Druid, Python, R, graph databases, data lakes, the Trusted Analytics platform, YARN, Spark Streaming, JupyterLab, holographic data visualizations, and event-based microservices applications. Who were the speakers? Just a few of the more than 290 experts who presented include luminaries such as Google’s Tyler Akidau, Confluent’s Neha Narkhede, June Andrews from Pinterest, Kurt Brown from Netflix, Jill Lepore from Harvard, and IBM’s Raj Krishnamurthy. And like every Strata + Hadoop event, the core of the conference are the multi-hour tutorials. Thought-provoking and intense, tutorials include: A half-day practical guide to securing Hadoop clusters led by principals at Cloudera; Data 101, an overview of the core principles of data architecture from experts at Silicon Valley Data Science, EMC, and Metis; Spark Camp, eight hours of instruction on DataFrames, Spark SQL, GraphX, and Spark streaming; and eight hours on Big Data issues in the financial industry with case studies from Morgan Stanley, Goldman Sachs, Credit Suisse, and Sand Hill Econometrics.

With sessions for the Big Data beginner, the intermediate and the pro, the business executive from the C-suite and the high-level academic, Hadoop + Strata NY 2016 is the proverbial big tent with something smart for everyone.

  • Gain total access to every tutorial, session, and keynote at Strata + Hadoop NY 2016
  • See each of the conference’s 22 tutorials, 157 sessions, and 22 keynotes
  • Hear from the experts at Cloudera, MapR, Microsoft, AWS, Intel, Confluent, and hundreds more
  • Enjoy 28 sessions on real-time stream processing and 27 sessions on architecting data pipelines
  • See 26 data science sessions on topics like machine learning with sci-kit, Python, and TensorFlow
  • Watch 16 sessions on data innovations like Apache Solr, JupyterLab, Parquet, and Druid
  • Survey 13 different how-to sessions on deploying enterprise workloads to the public cloud
  • Take in 8 sessions on Hadoop use cases, 5 sessions on data pipeline security, and 17 on the IOT
  • See Big Data case studies in retail, health care, finance, ad/media, and telecomm
  • Learn about Google’s BigQuery, eBay’s Pulsar, Amazon’s Kinesis, and IBM’s Immersive 3D

Table of Contents

  1. Strata + Hadoop World Keynotes
    1. The new dynamics of big data - Mike Olson (Cloudera) 00:16:18
    2. Decision 2016: What is your data platform? - Jack Norris (MapR Technologies) 00:10:50
    3. US venture: Risk, values, founder outcomes - Susan Woodward (Sand Hill Econometrics) 00:16:31
    4. Driving open source adoption within the enterprise - Ron Bodkin (Think Big Analytics) 00:06:41
    5. Modern analytics with Dell EMC - Patricia Florissi (Dell EMC) 00:06:49
    6. Transforming healthcare through precision data science - Sriram Vishwanath (Accordion Health Inc. | University of Texas, Austin) 00:09:09
    7. Hadoop in the cloud: A Nielsen use case - Tom Reilly (Cloudera) and James Powell (Nielsen) 00:09:24
    8. Inbox is the Trojan horse of AI - Alistair Croll (Solve For Interesting) 00:07:04
    9. The tech behind the biggest journalism leak in history - Mar Cabra (International Consortium of Investigative Journalists) 00:09:07
    10. Business insights driven by speed - Todd Brannon (Cisco) 00:05:38
    11. Google BigQuery for enterprise - Chad W. Jennings (Google) 00:05:38
    12. From big data to human-level artificial intelligence, Gary Marcus (Geometric Intelligence) 00:12:41
  2. Data 101
    1. The business case for Spark, Kafka, and friends - Edd Wilder-James (Silicon Valley Data Science) 00:29:39
    2. Cloud computing and big data - Ben Sharma (Zaloni) 00:25:43
    3. Data science from idea to pilot to production: Challenges and lessons learned - Amihai Savir (EMC) 00:26:49
    4. How to build (and execute) a real data strategy - Jerry Overton (CSC) 00:28:46
    5. Statistics and the art of deception - Deborah Berebichez (Metis) 00:26:36
    6. Encoding new data visualizations - Julie Rodriguez (Sapient Global Markets) 00:25:11
  3. Data Science & Advanced Analytics
    1. Machine learning in Python - Andreas Mueller (NYU) - Part 1 00:33:50
    2. Machine learning in Python - Andreas Mueller (NYU) - Part 2 00:31:14
    3. Machine learning in Python - Andreas Mueller (NYU) - Part 3 00:42:39
    4. Machine learning in Python - Andreas Mueller (NYU) - Part 4 00:34:27
    5. Guerrilla guide to Python and Apache Hadoop - Juliet Hougland (Cloudera) and Sean Owen (Cloudera) - Part 1 00:49:42
    6. Guerrilla guide to Python and Apache Hadoop - Juliet Hougland (Cloudera) and Sean Owen (Cloudera) - Part 2 00:47:29
    7. Guerrilla guide to Python and Apache Hadoop - Juliet Hougland (Cloudera) and Sean Owen (Cloudera) - Part 3 00:27:51
    8. Interactive data applications in Python - Bryan Van de Ven (Continuum Analytics) and Sarah Bird (Continuum Analytics) - Part 1 00:29:41
    9. Interactive data applications in Python - Bryan Van de Ven (Continuum Analytics) and Sarah Bird (Continuum Analytics) - Part 2 00:34:43
    10. Interactive data applications in Python - Bryan Van de Ven (Continuum Analytics) and Sarah Bird (Continuum Analytics) - Part 3 00:44:24
    11. Interactive data applications in Python - Bryan Van de Ven (Continuum Analytics) and Sarah Bird (Continuum Analytics) - Part 4 00:38:39
    12. Why should I trust you? Explaining the predictions of machine-learning models - Carlos Guestrin (University of Washington & Apple) 00:35:45
    13. Data science at eHarmony: A generalized framework for personalization - Jonathan Morra (eHarmony) 00:41:43
    14. Iterative supervised clustering: A dance between data scientists and machine learning - June Andrews (Pinterest) 00:29:52
    15. How the Washington Post uses machine learning to predict article popularity - Eui-Hong Han (The Washington Post) and Shuguang Wang (The Washington Post) 00:35:37
    16. Using parallel graph-processing libraries for cancer genomics - Crystal Valentine (MapR Technologies) 00:43:07
    17. Unlocking unstructured text data with summarization - Michael Williams (Fast Forward Labs) 00:50:28
    18. Removing complexity from scalable machine learning - Martin Wicke (Google) 00:45:38
    19. Tackling machine-learning complexity for data curation - Ihab Ilyas (University of Waterloo | Tamr, Inc.) 00:44:33
    20. Recent advances in applications of deep learning for text and speech - Yishay Carmiel (Spoken Communications) 00:37:09
    21. Data science and the Internet of Things: It's just the beginning - Mike Stringer (Datascope Analytics) 00:42:19
    22. Semantic natural language understanding with Spark Streaming, UIMA, and machine-learned ontologies - David Talby (Atigeo) and Claudiu Branzan (G2 Web Services) 00:37:11
    23. Fast deep learning at your fingertips - Amitai Armon (Intel) and Nir Lotan (Intel) 00:28:31
    24. Model visualization - Amit Kapoor (narrativeVIZ Consulting) 00:39:07
    25. A data-driven approach to the US presidential election - Khaled Ammar (Thomson Reuters) 00:20:18
    26. Machine-learning techniques for class imbalances and adversaries - Brendan Herger (Capital One) 00:24:13
    27. Machine intelligence at Google scale - Kazunori Sato (Google) 00:35:29
    28. Evaluating models for a needle in a haystack: Applications in predictive maintenance - Danielle Dean (Microsoft) and Shaheen Gauher (Microsoft) 00:39:34
    29. Predicting patent litigation - Josh Lemaitre (Thomson Reuters) 00:39:38
  4. Data-driven Business
    1. Data science that works: Best practices for designing data-driven improvements, making them real, and driving change in your enterprise - Jerry Overton (CSC) - Part 1 00:38:08
    2. Data science that works: Best practices for designing data-driven improvements, making them real, and driving change in your enterprise - Jerry Overton (CSC) - Part 2 00:27:45
    3. Data science that works: Best practices for designing data-driven improvements, making them real, and driving change in your enterprise - Jerry Overton (CSC) - Part 3 00:39:09
    4. Data science that works: Best practices for designing data-driven improvements, making them real, and driving change in your enterprise - Jerry Overton (CSC) - Part 4 00:52:47
    5. Developing a modern enterprise data strategy - Edd Wilder-James (Silicon Valley Data Science) and Scott Kurt (Silicon Valley Data Science) - Part 1 00:38:10
    6. Developing a modern enterprise data strategy - Edd Wilder-James (Silicon Valley Data Science) and Scott Kurt (Silicon Valley Data Science) - Part 2 00:43:29
    7. Developing a modern enterprise data strategy - Edd Wilder-James (Silicon Valley Data Science) and Scott Kurt (Silicon Valley Data Science) - Part 3 00:44:21
    8. Developing a modern enterprise data strategy - Edd Wilder-James (Silicon Valley Data Science) and Scott Kurt (Silicon Valley Data Science) - Part 4 00:45:40
    9. Making on-demand grocery delivery profitable with data science - Jeremy Stanley (Instacart) 00:42:18
    10. Creating and evaluating a distance measure - Melissa Santos (Big Cartel) 00:38:43
    11. A data-first approach to drive real-time applications - Jack Norris (MapR Technologies) 00:34:06
    12. Architecting for change: LinkedIn's new data ecosystem - Shirshanka Das (LinkedIn) and Yael Garten (LinkedIn) 00:41:58
    13. Winning with data: How ThredUp, Twilio, and Warby Parker use data to build advantage - Daniel Mintz (Looker) 00:37:52
    14. What Crimean War gunboats teach us about the need for schema registries - Alexander Dean (Snowplow Analytics Ltd) 00:30:17
    15. AI-fueled customer experience: How online retailers are moving toward real-time perception, reasoning, and learning - Rupert Steffner (Otto GmbH & Co. KG) 00:30:04
    16. Breeding data scientists: A four-year study - Danielle Dean (Microsoft) and Amy O'Connor (Cloudera) 00:39:49
    17. Corporate strategy: Artificial intelligence or bust - Stephen Pratt (Noodle.ai) 00:41:28
    18. Using the explosion of data in the utility industry to prevent explosions in utility infrastructure - Kim Montgomery (GridCure) 00:36:00
    19. Helping computers help us see - Susan Etlinger (Altimeter Group) 00:32:47
    20. Women in Big Data Forum meetup 00:39:03
  5. Enterprise Adoption
    1. Big data in healthcare - Sabrina Dahlgren (Kaiser Permanente), Taposh Roy (Kaiser Permanente), and Rajiv Synghal (Kaiser Permanente) 00:32:20
    2. Building data lakes in the cloud - Alex Bordei (Bigstep) 00:39:48
    3. Swipe, dip, and hover: Managing card payment data at Visa - Nandu Jayakumar (Visa Inc.) 00:38:20
    4. A unified ecosystem for market data visualization - Janaki Parameswaran (FINRA) and Kishore Ramachandran (FINRA) 00:38:34
    5. BI and SQL analytics with Hadoop in the cloud - Henry Robinson (Cloudera) and Devadutta Ghat (Cloudera) 00:45:09
    6. Machine intelligence in the wild: How AI will reshape global industries - David Beyer (Amplify Partners) 00:44:46
  6. Hadoop Use Cases
    1. Hadoop application architectures: Architecting a next-generation data platform for real-time ETL, data analytics, and data warehousing - Jonathan Seidman (Cloudera),Mark Grover (Cloudera), and Ted Malaska (Cloudera) - Part 1 00:38:08
    2. Hadoop application architectures: Architecting a next-generation data platform for real-time ETL, data analytics, and data warehousing - Jonathan Seidman (Cloudera),Mark Grover (Cloudera), and Ted Malaska (Cloudera) - Part 2 00:52:39
    3. Hadoop application architectures: Architecting a next-generation data platform for real-time ETL, data analytics, and data warehousing - Jonathan Seidman (Cloudera),Mark Grover (Cloudera), and Ted Malaska (Cloudera) - Part 3 00:35:37
    4. Hadoop application architectures: Architecting a next-generation data platform for real-time ETL, data analytics, and data warehousing - Jonathan Seidman (Cloudera),Mark Grover (Cloudera), and Ted Malaska (Cloudera) - Part 4 00:36:33
    5. How the largest US healthcare dataset in Hadoop enables patient-level analytics in near real time - Navdeep Alam (IMS Health) 00:44:22
    6. Creating real-time, data-centric applications with Impala and Kudu - Marcel Kornacker (Cloudera) and Todd Lipcon (Cloudera) 00:40:21
    7. Big data processing with Hadoop and Spark, the Uber way - Praveen Murugesan (Uber Technologies Inc) 00:42:23
    8. How a Spark-based feature store can accelerate big data adoption in financial services - Kaushik Deka (Novantas) and Phil Jarymiszyn (Novantas) 00:39:31
    9. Zillow: Transforming real estate through big data and data science - Jasjeet Thind (Zillow) 00:42:19
    10. Hadoop and Spark at ING: An overview of the architecture, security, and business cases at a large international bank - Bas Geerdink (ING) 00:51:35
  7. IoT & Real-time
    1. Learn stream processing with Apache Beam - Tyler Akidau (Google) and Jesse Anderson (Smoking Hand) - Part 1 00:44:05
    2. Learn stream processing with Apache Beam - Tyler Akidau (Google) and Jesse Anderson (Smoking Hand) - Part 2 00:26:50
    3. Learn stream processing with Apache Beam - Tyler Akidau (Google) and Jesse Anderson (Smoking Hand) - Part 3 00:55:48
    4. Powering real-time analytics on Xfinity using Kudu - Sridhar Alla (Comcast) and Kiran Muglurmath (Comcast) 00:41:04
    5. Apache Kafka: The rise of real-time data and stream processing - Neha Narkhede (Confluent) 00:40:44
    6. Watermarks: Time and progress in Apache Beam (incubating) and beyond - Slava Chernyak (Google) 00:33:21
    7. Triggers in Apache Beam (incubating) - Kenneth Knowles (Google) 00:41:15
    8. Analytics for large-scale time series and event data - Ira Cohen (Anodot) 00:42:37
    9. Pulsar: Real-time analytics at scale leveraging Kafka, Kylin, and Druid - Tony Ng (eBay, Inc.) 00:40:40
    10. Implementing extreme scaling and streaming in finance - Jim Scott (MapR Technologies, Inc.) 00:39:18
    11. When one data center is not enough: Building large-scale stream infrastructures across multiple data centers with Apache Kafka - Ewen Cheslack-Postava (Confluent) 00:34:51
    12. How to achieve zero-latency IoT and FSI data processing with Spark - Yaron Haviv (iguaz.io) 00:33:47
    13. Stream analytics in the enterprise: A look at Intel’s internal IoT implementation - Moty Fania (Intel) 00:37:58
  8. Security
    1. A practitioner’s guide to securing your Hadoop cluster - Michael Yoder (Cloudera), Ben Spivey (Cloudera), Mark Donsky (Cloudera), and Mubashir Kazia (Cloudera) - Part 1 00:58:31
    2. A practitioner’s guide to securing your Hadoop cluster - Michael Yoder (Cloudera), Ben Spivey (Cloudera), Mark Donsky (Cloudera), and Mubashir Kazia (Cloudera) - Part 2 00:38:34
    3. A practitioner’s guide to securing your Hadoop cluster - Michael Yoder (Cloudera), Ben Spivey (Cloudera), Mark Donsky (Cloudera), and Mubashir Kazia (Cloudera) - Part 3 00:56:46
    4. A practitioner’s guide to securing your Hadoop cluster - Michael Yoder (Cloudera), Ben Spivey (Cloudera), Mark Donsky (Cloudera), and Mubashir Kazia (Cloudera) - Part 4 00:33:48
    5. Account takeovers are taking over: How big data can stop them - Fang Yu (DataVisor Inc.) 00:32:40
    6. Streaming cybersecurity into Graph: Accelerating data into Datastax Graph and Blazegraph - Keith Kraus (Accenture Labs), Joshua Patterson (Accenture Labs), and Michael Wendt (Accenture Labs) 00:39:13
    7. Securing Apache Kafka - Jun Rao (Confluent) 00:43:14
    8. Authorization in the cloud: Enforcing access control across compute engines - Li Li (Cloudera) and Hao Hao (Cloudera) 00:33:31
  9. Spark & Beyond
    1. Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Vartika Singh (Cloudera) and Jayant Shekhar (Sparkflows Inc.) - Part 1 00:41:23
    2. Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Vartika Singh (Cloudera) and Jayant Shekhar (Sparkflows Inc.) - Part 2 00:44:01
    3. Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Vartika Singh (Cloudera) and Jayant Shekhar (Sparkflows Inc.) - Part 3 00:43:36
    4. Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Vartika Singh (Cloudera) and Jayant Shekhar (Sparkflows Inc.) - Part 4 00:52:38
    5. Spark camp: Exploring Wikipedia with Spark - Zoltan Toth (datapao.com) - Part 1 00:37:48
    6. Spark camp: Exploring Wikipedia with Spark - Zoltan Toth (datapao.com) - Part 2 00:39:16
    7. Spark camp: Exploring Wikipedia with Spark - Zoltan Toth (datapao.com) - Part 3 00:41:47
    8. Spark camp: Exploring Wikipedia with Spark - Zoltan Toth (datapao.com) - Part 4 00:29:58
    9. Spark camp: Exploring Wikipedia with Spark - Zoltan Toth (datapao.com) - Part 5 00:47:12
    10. Spark camp: Exploring Wikipedia with Spark - Zoltan Toth (datapao.com) - Part 6 00:45:47
    11. Spark camp: Exploring Wikipedia with Spark - Zoltan Toth (datapao.com) - Part 7 00:37:12
    12. Spark camp: Exploring Wikipedia with Spark - Zoltan Toth (datapao.com) - Part 8 00:57:35
    13. Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 1 00:42:56
    14. Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 2 00:46:04
    15. Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 3 00:34:43
    16. Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 4 00:41:14
    17. Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science), and Mauricio Vacas (Silicon Valley Data Science) - Part 1 00:37:07
    18. Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science), and Mauricio Vacas (Silicon Valley Data Science) - Part 2 00:46:50
    19. Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science), and Mauricio Vacas (Silicon Valley Data Science) - Part 3 00:35:52
    20. Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science), and Mauricio Vacas (Silicon Valley Data Science) - Part 4 00:56:21
    21. The state of Spark and what's next after Spark 2.0 - Ram Sriharsha (Databricks) 00:33:44
    22. Top five mistakes when writing Spark applications - Ted Malaska (Cloudera) and Mark Grover (Cloudera) 00:38:56
    23. Tuning Spark machine-learning workloads - Raj Krishnamurthy (IBM) 00:39:59
    24. Delivering near real-time mobility insights at Swisscom - François Garillot (Swisscom) 00:39:31
    25. Breaking Spark: The top five mistakes to avoid when using Apache Spark in production - Neelesh Srinivas Salian (Cloudera) 00:34:13
    26. A deep dive into Structured Streaming in Spark - Reynold Xin (Databricks) 00:32:28
    27. Apache Spark in fintech: Building fraud detection applications with distributed machine learning at Intel - Yuhao Yang (Intel) 00:41:10
    28. Spark Structured Streaming for machine learning - Holden Karau (IBM) and Seth Hendrickson (IBM) 00:40:31
    29. Choice Hotels' journey to better understand its customers through self-service analytics - Avinash Ramineni (Clairvoyant), Narasimhan Sampath (Choice Hotels International) 00:38:56
    30. Spark and Java: Yes, they work together - Jesse Anderson (Smoking Hand) 00:39:04
  10. Visualization & User Experience
    1. Introduction to visualizations using D3 - Brian Suda ((optional.is)) - Part 1 00:46:49
    2. Introduction to visualizations using D3 - Brian Suda ((optional.is)) - Part 2 00:42:56
    3. Introduction to visualizations using D3 - Brian Suda ((optional.is)) - Part 3 00:44:23
    4. Introduction to visualizations using D3 - Brian Suda ((optional.is)) - Part 4 00:38:36
    5. Holographic data visualizations: Welcome to the real world - Brad Sarsfield (Microsoft HoloLens) 00:38:35
    6. The devil is in the details: Interactive, multiscale visualization of data lineage - Sean Kandel (Trifacta) 00:38:50
    7. What ties to what? Visualizing large-scale customer text data with bipartite graphs - Mark Turner (Teradata) 00:40:34
    8. Investigating event graphs at scale: Going from theory to practice - Leo Meyerovich (Graphistry) 00:39:21
    9. Caravel: An open source data exploration and visualization platform - Maxime Beauchemin (Airbnb) 00:44:14
    10. Data risk intelligence in a regulated world - Uma Raghavan (Integris Software) 00:39:24
    11. Five-senses data: Using your senses to improve data signal and value - Cameron Turner (The Data Guild), Evan Macmillan (Gridspace), Hanna Kang-Brown (R/GA), and Brad Sarsfield (Microsoft HoloLens) 00:38:59
  11. Sponsored
    1. Big data meets the IoT - Cheryl Wiebe (Think Big, A Teradata Company) and Dave Shuman (Cloudera) 00:39:34
    2. Create advanced analytic models with open source - Kyle Ambert (Intel), Tipton Loo (ProKarma), and Michael Hood (ProKarma) 00:29:43
    3. Achieve richer insights and business outcomes with Dell EMC big data and analytics - Carey James (EMC) 00:31:19
    4. The keys to an event-based microservices application - Crystal Valentine (MapR Technologies) 00:48:29
    5. Getting it right exactly once: Principles for streaming architectures - Darryl Smith (Dell) 00:39:01
    6. Accelerating time to analytical value in the enterprise with data lake management - Viral Shah (Asurion Services), Krishna Sarma (TD Ameritrade), and Murthy Mathiprakasam (Informatica) 00:39:42
    7. The flux capacitor of machine learning: Turn data garbage into 1.21 gigawatt-powered acceleration - Ingo Mierswa (RapidMiner) 00:43:31
    8. A new “Sparkitecture” for modernizing your data warehouse - Jack Gudenkauf (Hewlett Packard Enterprise) and Deepak Majeti (Hewlett Packard Enterprise) 00:37:54
    9. Trusted IoT and big data ecosystems - Reiner Kappenberger (HPE Security – Data Security) 00:33:08
    10. Citi, Standard Charter Bank, and Polaris - Nenshad Bardoliwalla (Paxata), Mark Nelson (Standard Chartered Bank), Veronica Liwak (Polaris), and Kapil Khurana (Citibank) 00:38:45
    11. Building a modern data architecture - Ben Sharma (Zaloni) 00:38:50
    12. Top data wrangling use cases in enterprise analytics - Connor Carreras (Trifacta), Doug Stradley (Trifacta), Rajiv Synghal (Kaiser Permanente), and Austin Leahy (ebay) 00:42:15
    13. Turning petabytes of data into millions in cost savings for the world’s biggest retailers - Jonathon Whitton (PRGX USA Inc) and Ashley Stirrup (Talend) 00:36:59
    14. From data to insights using analytics - Johan Bjerke (Splunk Inc) 00:43:41
    15. Enhancing the customer experience when driving Hadoop adoption - Anthony Dina (Dell) and Nick Curcuru (Mastercard) 00:53:36
    16. Unified integration for data lakes and modern data applications - Jonathan Gray (Cask) 00:35:28
    17. Gaining extreme agility and performance using a Spark-free approach to data management - Jake Dolezal (McKnight Consulting Group Global Services) 00:22:42
    18. Open source operations: Building on Apache Spark with InsightEdge, TensorFlow, Apache Zeppelin, and your own project - Antonio Rosales (Canonical) 00:37:34
    19. Virtualizing big data: Effective approaches derived from real-world deployments - Martin Yip (VMware) and Dave Jaffee (VMware) 00:40:01
    20. Why is this disruption different from all other disruptions? - Matt Turck (FirstMark Capital), Einat Burshtine (Credit Suisse), Shui Cheung Yip (Pershing LLC (Bank of New York Mellon)), and Alasdair Anderson (Nordea) 00:44:03
    21. From lake to reservoir: Harnessing big data’s power for the enterprise - Thomas Place (First Data) 00:36:33
    22. Making real-time analytics on the data lake a reality - Amit Vij (Kinetica) and Mark Brooks (Kinetica) 00:39:27
    23. Changing the landscape with deep learning and accelerated analytics - Jim McHugh (NVIDIA), Eric Kontargyris (MapD), Mike Perez (Kinetica), and Mike Wendt (Accenture) 00:33:36
    24. Data warehouse augmentation and modernization using Hadoop - Amar Arsikere (infoworks.io) 00:39:14
    25. VoltDB and the Jepsen test: What we learned about data accuracy and consistency - John Hugg (VoltDB) 00:39:20
    26. 5 cloud AI innovations - Rimma Nehme (Microsoft), Ankur Teredesai (KenSci), and Lukas Biewald (CrowdFlower) 00:39:03
    27. Big data and analytics with Cisco UCS: Lessons learned and platform considerations - Rajesh Shroff (Cisco) 00:21:51
    28. Governance and metadata management of Cigna's enterprise data lake - Sherri Adame (Cigna) 00:42:31
    29. Accelerate EDW modernization with the Hadoop ecosystem - Joe Goldberg (BMC Software Inc.) 00:36:35
    30. BigQuery for data warehousing - Chad W. Jennings (Google) and Felipe Hoffa (Google) 00:43:38
    31. Big data governance: Making big data an enterprise-class citizen - Michael Eacrett (SAP) 00:38:50
    32. Path-to-purchase analytics using a data lake and spark - Joe Caserta (Caserta Concepts) 00:43:03
    33. Sensitive data sharing for analytics - Steve Touw (Immuta) 00:38:22
    34. Big data journeys from the real world - John Morrell (Datameer) 00:42:38
  12. Law, Ethics, Governance
    1. Big data, big decisions: Key legal considerations for the collection and use of big data - Kristi Wolff (Kelley Drye & Warren LLP) and Crystal Skelton (Kelley Drye & Warren LLP) 00:36:41
    2. The personalization spectrum - Sara Watson (Tow Center for Digital Journalism) 00:39:50
  13. Hadoop Internals & Development
    1. Debunking HDFS erasure coding performance myths - Uma Maheswara Rao G (Intel), Rui Li (Intel), and Zhe Zhang (LinkedIn) 00:43:25
    2. Apache Kudu: 1.0 and beyond - Todd Lipcon (Cloudera) 00:41:41
    3. Rethinking operational data stores on Hadoop (non-sponsored) - Vinayak Borkar (X15 Software) 00:36:25
  14. Data Innovations
    1. Parallel SQL and analytics with Solr - Yonik Seeley (Cloudera) 00:40:53
    2. File format benchmark: Avro, JSON, ORC, and Parquet - Owen O'Malley (HortonWorks) 00:43:30
    3. Designing a location intelligence platform for everyone by integrating data, analysis, and cartography - Stuart Lynn (Carto) 00:42:36
    4. The future of column-oriented data processing with Arrow and Parquet - Julien Le Dem (Dremio) and Jacques Nadeau (Dremio) 00:44:36
    5. Beyond Hadoop at Yahoo: Interactive analytics with Druid - Himanshu Gupta (Yahoo) 00:39:00
    6. The Netflix data platform: Now and in the future - Kurt Brown (Netflix) 1:05:19
    7. Parquet performance tuning: The missing guide - Ryan Blue (Netflix) - Part 1 00:50:06
    8. Parquet performance tuning: The missing guide - Ryan Blue (Netflix) - Part 2 00:30:04
    9. The evolution of massive-scale data processing - Tyler Akidau (Google) 00:46:37
    10. Lessons learned running Hadoop and Spark in Docker - Thomas Phelan (BlueData) 00:34:54
    11. Streaming analytics at 300 billion events per day with Kafka, Samza, and Druid - Xavier Léauté (Metamarkets) 00:40:25
    12. Alluxio (formerly Tachyon): The journey thus far and the road ahead (non-sponsored) - Haoyuan Li (Alluxio) 00:44:59
    13. Smart data for smarter firefighters - Bart van Leeuwen (Netage) 00:43:29
    14. Data modeling for microservices with Cassandra and Spark - Jeffrey Carpenter (Choice Hotels International) 00:45:20
  15. DCS sessions
    1. Kentucky Transportation Cabinet: Monitoring road activities with a real-time snow and ice information management system using Spark and Hadoop - Vineet Kumar (Dept of Transportation) 00:34:44
    2. Self-service data integration in an IT-managed environment - Madhuri Kollu (Sabre) and James Altendorf (Sabre) 00:26:20
    3. Of market makers and middlemen: How data is transforming global trade - Renee DiResta (Haven) 00:37:41
    4. Catchy content: What makes TV content work? - David Boyle (BBC Worldwide) 00:32:14
    5. Web analytics in the platform era: The Gawker Media experience - Joshua Laurito (Gizmodo Media) 00:30:59
    6. A collaboration in civic tech: Improving traffic safety nationwide - Erin Akred (DataKind) and Michael Dowd (DataKind) 00:25:32
    7. Cold chain analytics: Using Revolution R and the Hadoop ecosystem - Nitin Kaul (Merck & Co., Inc.) and Richard Baumgartner (Merck) 00:29:09
    8. Driving field service profitability with advanced analytics - Jolene Jeffries (GE Oil & Gas) and Tara Prakriya (MAANA) 00:22:25
  16. FinData sessions
    1. Driving change: Intelligent systems in wealth management - Jeff McMillan (Morgan Stanley) 00:28:34
    2. The future of fintech - Anand Sanwal (CB Insights) 00:28:48
    3. Using big data for small business financing - Diane Chang (Intuit) 00:29:19
    4. Connecting the dots through leaked and public data - Giannina Segnini (Journalism School, Columbia University) and Mar Cabra (International Consortium of Investigative Journalists) 00:34:51
    5. Open the black box: An executive guide to making unstructured data work in finance - Michelle Bonat (Data Simply) 00:23:29
    6. Upcoming challenges and opportunities for data technologies in consumer finance - Juan Huerta (Goldman Sachs Consumer Lending Group) 00:32:45
  17. Ask Me Anything conference sessions
    1. Ask me anything: Getting into (and out of) data science consulting (non-sponsored) - Max Shron (Warby Parker) 00:51:51
    2. Ask me anything: Apache Kafka - Jun Rao (Confluent) and Ewen Cheslack-Postava (Confluent) 00:42:54
  18. Solutions Showcase Theater
    1. Powering the Predictive Enterprise - Brian Weissler (Aginity) 00:09:11
    2. Anki Cars, IOT and Big Data - Stuart Coggins and John Graves (Oracle) 00:09:26
    3. When it Comes to Customer Experience, Context is Everything - Joey Echeverria (Rocana) 00:07:53
    4. Big Data Enabled Connected Care with Patient 360 Analytics - Rohit Balasubramanian (Deloitte) 00:10:33
    5. Discover the role Hadoop plays in MasterCard’s customer success story - Nick Curcuru (MasterCard) 00:13:00
    6. Fast, Scalable Analysis of Streaming Trade Data - Dale Kim (MapR) 00:08:41
    7. Enabling Operational Intelligence in Managing Hadoop with Attunity Visibility - Jordan Martz (Attunity) 00:09:00
    8. At the intersection of Big Data, AI, and Security: JASK and the Open Data Model - Grant Babb (JASK) 00:09:30
    9. Industrialize your transition to the modern data landscape with Capgemini’s Leap Data Transformation Framework - Anne-Laure Thieullent (Capgemini) 00:09:01
    10. Smart Data lakes - Marty Loughlin (Cambridge Semantics) 00:07:54
    11. Scale and Automation of Big Data Platforms - Paula Patel (Cisco) 00:09:34
    12. Behind the Scenes: Driving $125 Billion in Real Estate Transactions with BI - Ani Manian (SiSense) 00:12:27
    13. To the Cloud and Back: A look at Hybrid Analytics - Keith Manthey (Dell EMC) 00:06:24
    14. Connecting Customers with a Retail ValueMart - Steve Thompson (RCG Global Services) 00:10:02
    15. It Doesn't Need To Be This Hard - Rob Mustarde (Galactic Exchange ) 00:08:18
    16. Driving margin by effective access to large scale event-based data - Ami Gal (Sqream Technologies) 00:10:02
    17. Creating a Consumer Profile for the Business - Scott Nichols (Novetta) 00:08:12
    18. CARTO Builder: The Location Intelligence tool for rapid spatial analysis and decision-making - Andrew Thompson (CARTO) 00:10:10
    19. Shortening the Spark / Hadoop On Ramp: How a Multinational Telecommunications Company accelerated Big Data success - Jesus Puente (RapidMiner) 00:10:27
    20. Teradata UDA in Action - Kiran Kamreddy (Teradata) 00:10:54
    21. Understanding Why Change is Actually Good for Your Business - Jeff Veis (HPE) 00:11:35
    22. A multi-platform approach to leveraging analytics on Hadoop - Mike Upchurch (Fuzzy Logix) and Munir Bondre (Fuzzy Logix) 00:10:01
    23. Performance Benchmark for BI-on-Hadoop: SQL Engine Wars Continue and Everybody Wins! - Josh Klahr (AtScale) 00:12:01
    24. Enterprise Best Practices for Data Lake Management - Murthy Mathiprakasam (Informatica) 00:09:08
    25. Maximize Big Data Application Performance and ROI - Kunal Agarwal (Unravel) 00:07:29
    26. Real-Time Analytics - Gary Orenstein (MemSQL) 00:10:01
    27. USA Cycling uses IBM Analytics to instantly deliver eye-opening performance metrics for real-time feedback to athletes - Uday Tekumalia (IBM) 00:10:20
    28. Efficient Data Ingest at Scale with Attunity CDC, Kafka and More - Itamar Ankorion (Attunity) 00:09:29
    29. Insiders: Your Biggest Threat to Security? - Donna DeCapite (SAS) 00:11:29
    30. Healthcare Solutions with OpenSource - Brandon Draeger (Intel) 00:09:39
    31. A Real-time Digital Marketing Platform: With Red Hat JBoss Data Grid and Red Hat JBoss BRMS - Divya Mehra (Red Hat) 00:08:45
    32. Conversations With Your Unstructured Big Data Universe - Ashoke Dutt (Semantify) 00:10:41
    33. Simplifying Streaming Analytics with GPU-Acceleration - Mike Perez (Kinetica) 00:09:28
    34. Geo-Analytics with Apache Spark and In-Memory Data Grids - Ali Hodroj (Gigaspaces) 00:10:16
    35. Learn how a major credit card provider successfully connected thousands of Tableau users to large scale Hadoop based data - Roger Gaskell (Kognito) 00:08:14
    36. Hydrograph…Open Source ETL tool for Hadoop - Shahab Kamal and Ankur Gupta (Bitwise) 00:10:11
    37. Transcending Space and Time: Streaming Analytics at the Intersection of Enterprise, Cloud and IoT - Steve Wilkes (Striim) 00:10:01
    38. How a Data Catalog Drives Greater Business Value - Randy Duran (Waterline Data) 00:10:10
    39. Technical Strategies for Creating a Consumer Profile - Scott Nichols (Novetta) 00:09:15
    40. Data-Centric Security that Spans Hadoop, Spark, and NoSQL - Eric Tilenius (Bluetalon) 00:08:48
    41. Analyzing 25 billion stock market events in under an hour with Google Cloud Platform - Misha Brukman (Google ) 00:09:44