O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Strata + Hadoop World 2016 - Singapore

Video Description

The 2016 Strata + Hadoop World conference in Singapore was an amazing rojak experience in the best sense of the term. It provided a wide-ranging collection of the globe’s most insightful information about big data and machine learning and how these technologies are reshaping the world’s businesses, institutions, and governments. This video compilation gives you a complete recording of each of the conference’s 73 sessions, 8 tutorials, and 10 keynotes covering topics like big data in retail, finance, and telecommunications; real-time IoT analytics; recommendation algorithms; high-efficiency AI and ML distributed systems; ubiquitous computing; collaboration; peer analytics; Apache Beam; Apache Flink; structured streaming; and much, much more. Get this video and you’ll enjoy the opportunity to learn from 122 of the world's best data engineers and data scientists working in Asia and elsewhere at data-centric companies such as Singtel, ShopBack, IHI, Lazada, Mediacorp, Qunar, Cloudera, MapR, Microsoft, Cisco, Teradata, BT Group, Google, IBM, Qubole, StarHub, SKY TV, Capgemini, and NTT Data. So Singapore, so rojak!

  • Gain total access to all 73 sessions, 8 tutorials, and 10 keynotes: almost 80 hours of material
  • John Akred (Silicon Valley Data Science) on how to develop a modern enterprise data strategy
  • Qiaoliang Xiang (Shopback) on handling 25M e-commerce products with Hadoop-related tools
  • Dean Wampler (Lightbend) on the core features of Scala necessary to write Spark code
  • Rebecca Tien Yu Lin (is-land Systems) on big data solutions in the semiconductor industry
  • Haoyuan Li (Alluxio Co-Creator) on Alluxio use cases at Alibaba, Baidu, and elsewhere
  • Jennifer Marsman (IBM) on bots, chat, machine learning, and artificial intelligence
  • Sean Owen (Cloudera) on doing full Python development on the Hadoop stack at Hadoop scale
  • Vivian Peng (Médecins Sans Frontières) on designing human emotion into data visualizations
  • Get 24 sessions on becoming a data science company, data science technology, and data analytics
  • Get 12 sessions related to Apache Spark, including the highly popular 8-hour Spark Camp tutorial
  • Get 10 sessions on IoT and intelligent real-time applications; and 9 sessions on ML and AI
  • Get multiple sessions on Hadoop use cases; VR and visualization; and security and data law

Table of Contents

  1. Keynotes
    1. A smarter ecosystem through big data analytics - Wei Keong Ng (Fusionex) 00:12:38
    2. Big data, big value for smart banking at DBS - Mike Olson (Cloudera) and David Gledhill (DBS Bank) 00:15:15
    3. Disruption in insurance: Seven predictions - Zia Zaman (MetLife) 00:21:09
    4. Image intelligence: Making visual content predictive - Susan Etlinger (Altimeter Group) 00:15:47
    5. Information at the speed of thought - Prakash Nanduri (Paxata) 00:03:45
    6. Real-time intelligence gives Uber the edge - M. C. Srivas (Uber) 00:17:53
    7. Taking personalization personally - Sara Watson (Tow Center for Digital Journalism) 00:24:00
    8. The ACID revolution - Vijay Narayanan (Microsoft) 00:12:11
    9. The new dynamics of Big Data - Amr Awadallah (Cloudera, Inc.) 00:08:51
    10. What if data had personality? - Julie Rodriguez (Sapient Global Markets) 00:11:34
  2. Becoming a data-centric company
    1. Developing a modern enterprise data strategy - John Akred (Silicon Valley Data Science) and Scott Kurth (Silicon Valley Data Science) - Part 1 00:44:35
    2. Developing a modern enterprise data strategy - John Akred (Silicon Valley Data Science) and Scott Kurth (Silicon Valley Data Science) - Part 2 00:45:50
    3. Developing a modern enterprise data strategy - John Akred (Silicon Valley Data Science) and Scott Kurt (Silicon Valley Data Science) - Part 3 00:45:08
    4. Developing a modern enterprise data strategy - John Akred (Silicon Valley Data Science) and Scott Kurt (Silicon Valley Data Science) - Part 4 00:47:11
    5. Dealing with device data - Mark Madsen (Third Nature) 00:42:09
    6. Act on insight with the IoT - Devin Deen (enterprise IT) and Dnyanesh Prabhu (SKY TV nz) 00:39:59
    7. Evolving from RDBMS to NoSQL + SQL - Jim Scott (MapR Technologies, Inc.) 00:32:08
    8. First mover or fast follower? Santander UK's big data journey - Antonio Alvarez (Santander Group) 00:43:45
    9. Computable content: Notebooks, containers, and data-centric organizational learning - Paco Nathan (O'Reilly Media) 00:39:08
    10. What's your data worth? - John Akred (Silicon Valley Data Science) 00:44:13
    11. Case studies of business transformation through big data - John Kreisa (Hortonworks) 00:40:09
    12. The fallacy of the subject-matter expert - Chris Neumann (The Engineer & The Designer) 00:27:00
    13. How to use a marketing data lake for data-driven marketing - Franz Aman (Informatica) 00:44:38
  3. Chat, machine learning, & AI
    1. Experience in adopting deep learning into existing software development practices - Verdi March (Deep Labs) 00:36:50
    2. Context-aware recommendations using reinforcement learning in the item-similarity space - Arun Veettil (Starbucks) 00:40:23
    3. Applications of natural language understanding: Tools and technologies - Alyona Medelyan (Thematic) 00:31:05
    4. High-efficiency systems for distributed AI and machine learning at scale - Qirong Ho (Petuum, Inc.) 00:37:32
    5. Machine learning in practice with Spark MLlib: An intelligent data analyzer - Flavio Clesio (Movile) and Eiti Kimura (Movile) 00:40:07
    6. Deep reinforcement learning on Spark - Adam Gibson (Skymind) 00:44:28
    7. Cloud AI innovations - Graham Williams (Microsoft), Hong Ooi (Microsoft), and Ted Minkinow (Fullerton Health) 00:43:36
    8. Transfer learning and fine-tuning deep neural network models across different domains - Anusua Trivedi (Microsoft) 00:42:04
  4. Data science & advanced analytics
    1. Fast deep learning at your fingertips - Nir Lotan (Intel) 00:32:22
    2. How Lazada ranks products to improve customer experience and increase conversion - Eugene Yan (Lazada) 00:45:09
    3. Creating real-time, data-centric applications with Impala and Kudu - Marcel Kornacker (Cloudera) and Todd Lipcon (Cloudera) 00:52:19
    4. Web-scale machine learning on Apache Spark - Jason (Jinquan) Dai (Intel) 00:32:21
    5. Machine learning: The power of ensembles - Bargava Subramanian (Cisco Systems) and Amit Kapoor (narrativeVIZ Consulting) 00:41:14
    6. Understanding the voice of members via text mining: How Linkedin built a text analytics engine at scale - Chi-Yi Kuan (LinkedIn), Weidong Zhang (LinkedIn), and Yongzheng Zhang (LinkedIn) 00:42:54
    7. A survey of time series analysis techniques for sensor data - Rajesh Sampathkumar (The Data Team) 00:41:46
    8. Deep learning at scale - Mateusz Dymczyk (H2O.ai) 00:41:32
    9. Concepts before machinery: Harnessing the power of domain expertise for machine-learning-based solutions - Ofer Ron (LivePerson) 00:35:29
    10. Deep learning for natural language processing - Bargava Subramanian (Cisco Systems) and Amit Kapoor (narrativeVIZ Consulting) 00:44:18
  5. Hadoop use cases
    1. Using big data technology to solve data connectivity in a disconnected world - Imron Zuhri (Dattabot) 00:38:09
    2. Encoding new data visualizations - Piotr Kaczmarek (Sapient Global Markets) and Julie Rodriguez (Sapient Global Markets) 00:39:13
    3. Hadoop as a service at BT: How to build a successful enterprise data hub - Phillip Radley (BT) 00:48:15
    4. Crawling and tracking millions of ecommerce products at scale - Qiaoliang Xiang (Shopback) 00:45:25
    5. How to apply big data solutions in the semiconductor industry - Rebecca Tien Yu Lin (is-land Systems Inc.) 00:38:00
  6. IoT & intelligent real-time applications
    1. Learn stream processing with Apache Beam - Tyler Akidau (Google), Slava Chernyak (Google), Dan Halperin (Google), Sandeep Deshmukh (DataTorrent), and Aljoscha Krettek (Data Artisans) - Part 1 00:56:13
    2. Learn stream processing with Apache Beam - Tyler Akidau (Google), Slava Chernyak (Google), Dan Halperin (Google), Sandeep Deshmukh (DataTorrent), and Aljoscha Krettek (Data Artisans) - Part 2 00:57:13
    3. IoT and Spark MLlib applications for improving products, services, and manufacturing technologies - Masaru Dobashi (NTT DATA Corporation) and Yoshitaka Suzuki (IHI Corporation) 00:42:19
    4. Twitter's real-time stack: Processing billions of events with Heron and DistributedLog - Maosong Fu (Twitter) 00:40:11
    5. Architecting a hybrid cloud application using a global publish-subscribe streaming message system - Mathieu Dumoulin (MapR Technologies) 00:37:19
    6. Watermarks and triggers: Time and progress in Apache Beam (incubating) and beyond - Slava Chernyak (Google) 00:36:21
    7. Making sense of the sensors: Connecting the IoT and analytics - Frank Saeuberlich (Teradata) and Karthik Bharadwaj Thirumalai (Teradata) 00:39:27
    8. Industrial big data and sensor time series data: Different but not difficult—Part II - Gopal GopalKrishnan (OSIsoft, LLC.) and Chris Soyza (BEARS) 00:38:04
    9. A simplified enterprise architecture for real-time stream processing - Mathieu Dumoulin (MapR Technologies) 00:42:33
    10. Robust stream processing with Apache Flink - Aljoscha Krettek (data Artisans) 00:42:44
  7. Spark & beyond
    1. Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 1 00:33:28
    2. Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 2 00:52:24
    3. Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 3 00:36:18
    4. Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 4 00:37:51
    5. Building and tuning machine-learning apps using Spark ML and GraphX Libraries - Jayant Shekhar (Sparkflows Inc.) and Vartika Singh (Cloudera) - Part 1 00:50:29
    6. Building and tuning machine-learning apps using Spark ML and GraphX Libraries - Jayant Shekhar (Sparkflows Inc.) and Vartika Singh (Cloudera) - Part 2 00:43:05
    7. Building and tuning machine-learning apps using Spark ML and GraphX Libraries - Jayant Shekhar (Sparkflows Inc.) and Vartika Singh (Cloudera) - Part 3 00:49:51
    8. Building and tuning machine-learning apps using Spark ML and GraphX Libraries - Jayant Shekhar (Sparkflows Inc.) and Vartika Singh (Cloudera) - Part 4 00:37:26
    9. How Alluxio (formerly Tachyon) brings a 300x performance improvement to Qunar’s streaming processing - Xueyan Li (Qunar) 00:29:59
    10. The business case for Spark, Kafka, and friends - John Akred (Silicon Valley Data Science) 00:42:19
    11. Scala and the JVM as a big data platform: Lessons from Apache Spark - Dean Wampler (Lightbend) 00:44:07
    12. Alluxio (formerly Tachyon): An open source memory-speed virtual distributed storage system - Jiri Simsa (Alluxio) 00:40:43
    13. From telco data to spatial-temporal intelligence APIs: Architecting through microservices - Chandras Sekhar Saripaka (DataSpark) 00:45:32
    14. Spark Structured Streaming for machine learning - Holden Karau (IBM) and Seth Hendrickson (IBM) 00:39:49
    15. How Mediacorp has leveraged Apache Spark and Microsoft Cloud to analyze patterns of user behavior for actionable insights - Andrea Gagliardi La Gala (Microsoft) 00:45:40
    16. Apache Spark: Enterprise security for production deployments - Vinay Shukla (Hortonworks) 00:41:16
  8. Design, visualization, & VR
    1. Writing reusable visualization software with D3.js: Part I - Michael Freeman (University of Washington) 00:50:00
    2. Writing reusable visualization software with D3.js: Part II - Michael Freeman (University of Washington) 00:36:55
    3. Data distillation: Applying design principles to reporting, KPI, and dashboards - Patrick Nord (Archetype SC) 00:35:56
    4. The feels: How to design data visualizations that evoke an emotion from your users - Vivian Peng (Doctors Without Borders) 00:34:53
    5. Algorithmic art and data creativity - Joerg Blumtritt (Datarella) and Heather Dewey-Hagborg (School of the Art Institute of Chicago) 00:41:22
  9. Production-ready Hadoop
    1. Apache Beam: A unified model for batch and streaming data processing - Dan Halperin (Google) 00:42:24
    2. Support digital applications with a resilient, highly available, and NRT Hadoop backend - Nicolette Bullivant (Isban UK, Santander Group) and Jorge Pablo Fernandez (Isban UK, Santander Group) 00:38:29
    3. BI and SQL analytics with Hadoop in the cloud - Alex Gutow (Cloudera) and Henry Robinson (Cloudera) 00:42:40
    4. Organizing the data lake - Mark Madsen (Third Nature) 00:44:50
  10. Security & governance
    1. Securing big data on YARN, Hive, and Spark clusters - Nitin Khandelwal (Qubole) and Abhishek Modi (Qubole) 00:32:38
    2. Authorization in the cloud: Enforcing access control across compute engines - Hao Hao (Cloudera) and Alex Leblang (Cloudera) 00:36:03
    3. Next-generation data governance - Clara Fletcher (Accenture) 00:26:33
  11. Smart cities & urban automation
    1. Mobility as a vital sign of people and the economy - Shao Wei Ying (DataSpark) 00:40:54
    2. Modern telecom analytics with streaming data - Ted Dunning (MapR Technologies) 00:38:38
    3. From telco data to real-world data analytics products at SmartHub - Boon Siew Seah (SmartHub, StarHub Ltd.) 00:34:05
  12. Sponsored
    1. Evolution of big data analytics - KC Wong (Fusionex) 00:28:53
    2. A stream-first approach to drive real-time applications - Ted Dunning (MapR Technologies) 00:38:46
    3. Accelerating time to value at petascale with Cisco UCS - Raghunath Nambiar (Cisco) 00:38:11
    4. Stopping your data lake from becoming a swamp - Steve Jones (Capgemini) 00:42:22
  13. Data innovations
    1. Evolving beyond the data lake - Jim Scott (MapR Technologies, Inc.) 00:42:43
  14. Law, ethics, open data
    1. DataKind SG: Dispatches from the front line of data-driven social development - Raymond Chan (DataKind SG) 00:37:28
    2. Data ethics - Joerg Blumtritt (Datarella) and Heather Dewey-Hagborg (School of the Art Institute of Chicago) 00:48:07
    3. Data science and critical thinking - Alistair Croll (Solve For Interesting) 00:45:38
  15. Data Case Studies
    1. From application to platform: The strategic shift in approach toward data analytics - Sarang Anajwala (Autodesk) 00:26:37
    2. OpenStreetMap for urban resilience - Yantisa Akhadi (Humanitarian OpenStreetMap Team) 00:30:27
    3. Government open data: Tales from a deep dive into CKAN - Audrey Lobo-Pulo (Phoensight) 00:29:20
    4. Big data solutions for analyzing chip DNA in semiconductor manufacturing - Jingwen Ouyang (SanDisk, Western Digital Brand) and Amit Rustagi (SanDisk, Western Digital Brand) 00:36:02
    5. Fast cars, big data: The Internet of Formula 1 Things - Asit Parija (MapR Technologies) 00:30:26