O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Strata + Hadoop World Singapore 2015: : Video Compilation

Video Description

Smart nations run on data

Singapore has set its sights to become the world’s first true Smart Nation, "where people live meaningful and fulfilled lives, enabled seamlessly by technology, offering exciting opportunities for all." Whether you want to transform the future of a nation, build a company that will prosper well into the future, or simply do your job better, this complete video compilation provides the practical data case studies, proven best practices, new analytic approaches, and exceptional technical skills you need.

This compilation features the hottest topics in data: Data Science and Advanced Analytics, Data-driven Business, Design & UX, Hadoop, IoT & Real-time, and Security and Governance.

Highlights include:

  • Monitoring traffic using telco data, presented by Thomas Holleczek from Singtel
  • Building and deploying real time big data prediction models, featuring Deepak Agrawal from 24[7]
  • How Uber is using data science to make better financial decisions, presented by Prakhar Mehrotra from Uber

And that’s just a sample. With this compilation, you’ll have every keynote, tutorial, and workshop from big data’s most influential business decision makers, strategists, architects, developers, and analysts right at your fingertips.

Table of Contents

  1. Keynotes
    1. How To Stop Worrying and Learn to Love Qualitative Data - Farrah Bostic (The Difference Engine) 00:15:23
    2. Challenges for the Data Ecosystem - Doug Cutting (Cloudera) 00:10:21
    3. When AI joins the team: Onboarding the next generation of employees - Jana Eggers 00:10:08
    4. Data ‘daddying' vs. data empowerment - Tara Hirebet (R/GA) 00:18:01
    5. Taxi Uncle, where are you?: Using machine learning to predict taxi availability - Kevin Lee (GrabTaxi) 00:11:25
    6. Toward Big Data driven network, sponsored by Huawei - Sanqi Li (Huawei) 00:12:18
    7. Drive value faster: New optimizations for Big Data and analytics - Ziya ma (Intel Corp) 00:05:41
    8. Music science: Applying streaming data to map a billion behaviors - Rishi Malhotra (Saavn) 00:11:45
    9. The Next Generation of Analytics - Mike Olson (Cloudera) 00:17:49
    10. Road to real-time digital business - Rod Smith (IBM Emerging Internet Technologies) 00:10:05
    11. Deep Learning - Melanie Warrick (Skymind) 00:11:04
    12. State of Spark, and where it is going - Reynold Xin (Databricks) 00:10:53
  2. Data Science and Advanced Analytics
    1. Data science for Telecom - Part 1 - Juliet Hougland (Cloudera), Sandy Ryza (Cloudera) 00:37:19
    2. Data science for Telecom - Part 2 - Juliet Hougland (Cloudera), Sandy Ryza (Cloudera) 00:52:07
    3. Data science for Telecom - Part 3 - Juliet Hougland (Cloudera), Sandy Ryza (Cloudera) 00:56:10
    4. Machine learning In Python with scikit-learn - Part 1 - Andreas Mueller (NYU, scikit-learn) 00:28:27
    5. Machine learning In Python with scikit-learn - Part 2 - Andreas Mueller (NYU, scikit-learn) 00:30:10
    6. Machine learning In Python with scikit-learn - Part 3 - Andreas Mueller (NYU, scikit-learn) 00:21:09
    7. Machine learning In Python with scikit-learn - Part 4 - Andreas Mueller (NYU, scikit-learn) 00:26:47
    8. Interactive data visualization with Lightning: Using d3, Seaborn, and R - Part 1 - Matthew Conlen (FiveThirtyEight) 00:46:53
    9. Interactive data visualization with Lightning: Using d3, Seaborn, and R - Part 2 - Matthew Conlen (FiveThirtyEight) 00:56:04
    10. Interactive data visualization with Lightning: Using d3, Seaborn, and R - Part 3 - Matthew Conlen (FiveThirtyEight) 00:55:22
    11. Deploying models with Azure Machine Learning - Part 1 - Danielle Dean (Microsoft), Wee Hyong Tok (Microsoft) 1:01:44
    12. Deploying models with Azure Machine Learning - Part 2 - Danielle Dean (Microsoft), Wee Hyong Tok (Microsoft) 00:43:19
    13. Deploying models with Azure Machine Learning - Part 3 - Danielle Dean (Microsoft), Wee Hyong Tok (Microsoft) 00:24:11
    14. Building South East Asia's largest E-commerce Recommender - Kai Xin Thia (Lazada) 00:39:58
    15. Apache SINGA: A flexible and scalable deep learning platform for big data analytics - Ju Fan (National University of Singapore) , Wei Wang (National University of Singapore) 00:32:57
    16. Data modeling for data science: Simplify your workload with complex types - Marcel Kornacker (Cloudera, Inc.) , Skye Wanderman-Milne (Cloudera) 00:41:58
    17. Technology solutions for data analytics with privacy and data control - Stephen Hardy (National ICT Australia) 00:39:23
    18. Using EEG and machine learning for lie detection - Jennifer Marsman (Microsoft) 00:46:19
    19. Building and deploying real time big data prediction models - Deepak Agrawal (24[7] Inc.) 00:47:43
    20. Scaling the Python data experience - Wes McKinney (Cloudera) 00:39:22
    21. The revolution of location: Geospatial applications in marketing research - Whye Loon Tung (Nielsen) 00:41:59
    22. Petascale genomics - Uri Laserson (Cloudera) 00:41:18
    23. How to run Neural Nets on GPUs - Melanie Warrick (Skymind) 00:36:08
    24. Enterprise Deep Learning Workflows with DL4J - Josh Patterson (Patterson Consulting) 00:37:06
  3. Data-driven Business
    1. Developing a modern enterprise data strategy - Part 1 - Edd Dumbill (Silicon Valley Data Science), John Akred (Silicon Valley Data Science) 00:35:01
    2. Developing a modern enterprise data strategy - Part 2 - Edd Dumbill (Silicon Valley Data Science), John Akred (Silicon Valley Data Science) 00:57:52
    3. Developing a modern enterprise data strategy - Part 3 - Edd Dumbill (Silicon Valley Data Science), John Akred (Silicon Valley Data Science) 00:46:48
    4. Developing a modern enterprise data strategy - Part 4 - Edd Dumbill (Silicon Valley Data Science), John Akred (Silicon Valley Data Science) 00:42:22
    5. Democratizing big data: Riding the curve from descriptive to prescriptive intelligence - Tushar Shanbhag (Adatao, Inc) 00:37:18
    6. Why you need a data strategy - Edd Dumbill (Silicon Valley Data Science) 00:42:17
    7. Building a self-serve real-time reporting platform at LinkedIn - Shirshanka Das (LinkedIn) 00:45:48
    8. Data-savvy leaders of the future: Designing an applied analytics course for MBAs - Hallie Benjamin (Accenture) 00:50:28
    9. The 3 key barriers keeping companies from acting upon the possibilities that big data has to offer - Pauline Brown (Dataiku) 00:31:53
    10. Don't believe everything you see on CSI: Beyond predictive policing - Hong Eng Koh (Oracle) , Vladimir Videnovic (Oracle) 00:42:19
    11. How to tell compelling data stories: Why stories are still important in a data-driven world - Selene Chew (Adatao) 00:39:40
    12. Leveraging data analytics for high performance design - Rakesh Menon (McLaren Applied Technology) 00:36:40
  4. Hadoop Platform
    1. Hadoop Application Architectures: Fraud Detection - Part 1 - Gwen Shapira (Confluent), Ted Malaska (Cloudera), Mark Grover (Cloudera), Jonathan Seidman (Cloudera) 00:52:31
    2. Hadoop Application Architectures: Fraud Detection - Part 2 - Gwen Shapira (Confluent), Ted Malaska (Cloudera), Mark Grover (Cloudera), Jonathan Seidman (Cloudera) 00:52:04
    3. Hadoop Application Architectures: Fraud Detection - Part 3 - Gwen Shapira (Confluent), Ted Malaska (Cloudera), Mark Grover (Cloudera), Jonathan Seidman (Cloudera) 1:00:35
    4. Hadoop's storage gap: Resolving transactional access/analytic performance trade-offs with Kudu - Todd Lipcon (Cloudera) 00:52:10
    5. Designing an SQL-on-Hadoop cluster using Impala simulator: A use case for the banking and financial services sector - Jun Liu (Intel) , Zhaojuan Bian (Intel) 00:33:36
    6. Hadoop in the cloud: An architectural how-to - Jairam Ranganathan (Cloudera) 00:40:08
    7. From Oracle to Hadoop: Unlocking Hadoop for your RDBMS with Apache Sqoop and other tools - Guy Harrison (Dell Software) 00:40:45
    8. Evolving from RDBMS to NoSQL + SQL - Jim Scott (MapR Technologies, Inc.) 00:38:07
  5. IoT and Real-time
    1. An Introduction to time series with Team Apache - Part 1 - Patrick McFadin (DataStax) 00:37:27
    2. An Introduction to time series with Team Apache - Part 2 - Patrick McFadin (DataStax) 00:36:29
    3. An Introduction to time series with Team Apache - Part 3 - Patrick McFadin (DataStax) 00:43:12
    4. An Introduction to time series with Team Apache - Part 4 - Patrick McFadin (DataStax) 00:54:54
    5. Invigorating the Telco landscape: How telcos can use data assets to create new applications - Amy Shi-Nash (Singtel) 00:46:45
    6. Monitoring traffic in Singapore using telco data - Thomas Holleczek (Singtel) 00:34:23
    7. GearPump: Real time DAG processing at scale - Sean Zhong (Intel) 00:36:36
    8. Application of Spark on analyzing massive GIS data for a large number of mobile objects - Masaru Dobashi (NTT Data Corp.), Yoshitaka Suzuki (IHI Corporation) 00:40:15
    9. Modeling machine failure in the IoT era - Danielle Dean (Microsoft) 00:36:26
    10. Using machine learning to identify fraud on Telecom networks - Arshak Navruzyan (Argyle Data) 00:42:22
    11. Sketching big data with Spark: Randomized algorithms for large-scale data analytics - Reynold Xin (Databricks) 00:37:40
    12. Next-generation platforms for IoT-driven contextual awareness - Markus Kirchberg (OA Labs Pte. Ltd.) 00:37:11
    13. Modeling the smart and connected city of the future with Kafka and Spark - Eric Frenkiel (MemSQL) 00:41:07
    14. Druid: Power Applications to Analyze Sensor Data - Fangjin Yang (Imply) 00:40:40
    15. How to improve mobile radio network planning based on a new big data structure analysis - Vianney Martinez Alcantara (Datameer) 00:43:02
    16. How are your morals? Ethics in algorithms and IoT - Majken Sander (BusinessAnalyst.dk) , Joerg Blumtritt (Datarella™) 00:45:45
  6. Production-ready Hadoop
    1. Apache Hadoop operations for production systems - Part 1 - Kathleen Ting (Cloudera), Jonathan Hsieh (Cloudera, Inc), Philip Langdale (Cloudera, Inc.), Kostas Sakellis (Cloudera) 00:45:29
    2. Apache Hadoop operations for production systems - Part 2 - Kathleen Ting (Cloudera), Jonathan Hsieh (Cloudera, Inc), Philip Langdale (Cloudera, Inc.), Kostas Sakellis (Cloudera) 00:45:03
    3. Apache Hadoop operations for production systems - Part 3 - Kathleen Ting (Cloudera), Jonathan Hsieh (Cloudera, Inc), Philip Langdale (Cloudera, Inc.), Kostas Sakellis (Cloudera) 00:44:03
    4. Apache Hadoop operations for production systems - Part 4 - Kathleen Ting (Cloudera), Jonathan Hsieh (Cloudera, Inc), Philip Langdale (Cloudera, Inc.), Kostas Sakellis (Cloudera) 00:57:55
  7. Hadoop & Beyond
    1. Reliable data propagation between SQL and NoSQL databases using Aesop - Regunath Balasubramanian (Flipkart Internet) 00:37:05
    2. When it absolutely, positively, has to be there: Reliability guarantees in Kafka - Gwen Shapira (Confluent) 00:37:49
    3. Architectural patterns for streaming applications - Ted Malaska (Cloudera) , Mark Grover (Cloudera) 00:39:17
    4. GDELT + BigQuery: Understanding global society through SQL - Felipe Hoffa (Google) , Kalev Leetaru (GDELT Project (http://gdeltproject.org/)) 00:40:25
    5. Breakthrough OLAP performance on Cassandra and Spark - Evan Chan (Tuplejump) 00:39:48
    6. Customer record deduplication using Spark and Reifier - Dave Chan (UBM Asia) , Sonal Goyal (Nube) 00:41:40
    7. Estimating financial risk with Spark - Sandy Ryza (Cloudera) 00:36:07
  8. Design, User Experience, Visualization
    1. Visualising multi-dimensional data - Amit Kapoor (Narrativeviz Consulting) 00:39:41
  9. Sponsored
    1. Patterns from the future - Deepak Ramanathan (SAS Asia Pacific) 00:06:11
    2. The journey to value using advanced analytics - Thomas Beaujard (Accenture Digital) , Tom Ridsdill-Smith (Woodside) 00:42:40
    3. Analytics in action – The analytics lifecycle from data discovery to deployment - Deepak Ramanathan (SAS Asia Pacific) 00:37:51
    4. Road to real-time digital business - Rod Smith (IBM Emerging Internet Technologies) 00:37:19
    5. Hadoop data replication: Guaranteeing consistency across distributions, versions and datacenters while active - Paul Scott-Murphy (WANdisco) 00:45:22
    6. Demystifying analytics: Idea to insight in 7 minutes - Christopher Harrold (EMC) 00:40:26
    7. On the edge of everything – from edge security to edge analytics: Emerging technologies define the way progressive organizations will interact with data - Joanna Schloss (Dell Software) 00:35:02
    8. Avoiding big data ecoming a big problem - Raghunath Nambiar (Cisco Systems) 00:43:26
    9. Scaling document data up (way up) while scaling complexity down - Ted Dunning (MapR) 00:39:32
    10. Big solution in manufacturing industry: Ask hadoop. Hadoop answers. - SeongHwa Ahn (SK telecom) , Jisung Kim (sk telecom) 00:28:48
    11. Faster time to insight using Spark, Tachyon, and Zeppelin - Nirmal Ranganathan (Rackspace Hosting) 00:39:37
    12. SAP HANA Vora: Delivering contextual awareness in the digital enterprise - Paul Marriott and Mark Teehan (SAP Asia Pacific Japan) 00:50:42
    13. Advanced analytics with large scale distributed machine learning on Apache Spark - Shengsheng Huang (Intel) 00:38:17
    14. Achieving business transformation with Open Enterprise Hadoop - Jeff Markham (Hortonworks) 00:37:39
    15. Hadoop everywhere: Geo-distributed storage for big data - Nikhil Joshi (EMC, Advanced Software Division), Priya Lakshminarayanan (EMC Corporation) 00:38:39
  10. Security & Governance
    1. 12 steps to cloud security - Vishnu Vettrivel (Atigeo) 00:41:57
    2. How to avoid building a "data swamp": Case studies in data management and governance - Mark Donsky (Cloudera) , Naren Koneru (Cloudera) 00:42:55