O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data Just Right LiveLessons (Video Training)

Video Description

Data Just Right LiveLessons provides a practical introduction to solving common data challenges, such as managing massive datasets, visualizing data, building data pipelines and dashboards, and choosing tools for statistical analysis. You will learn how to use many of today's leading data analysis tools, including Hadoop, Hive, Shark, R, Apache Pig, Mahout, and Google BigQuery.

Table of Contents

  1. Introduction to Data Just Right LiveLessons 00:09:15
  2. Learning objectives 00:00:33
  3. 1.1 Why is big data such a hot concept now? 00:02:52
  4. 1.2 Four strategies for tackling big data problems 00:04:13
  5. 1.3 Anatomy of a data pipeline 00:03:58
  6. 1.4 What the ideal database would look like 00:02:23
  7. Learning objectives 00:00:35
  8. 2.1 Challenges of hosting and sharing large amounts of data 00:02:48
  9. 2.2 Choosing the right data format 00:07:33
  10. 2.3 Best practices for physically storing and sharing large amounts of data 00:04:43
  11. 2.4 Understanding data serialization formats 00:03:56
  12. Learning objectives 00:00:50
  13. 3.1 History and use of relational databases 00:03:59
  14. 3.2 Databases and the Internet: Understanding the CAP theorem 00:05:07
  15. 3.3 Non-relational databases: Document and key-value stores 00:04:20
  16. 3.4 Introduction to Redis 00:06:31
  17. 3.5 Sharding Redis across a cluster of machines 00:07:16
  18. 3.6 Future trends in database technology 00:04:03
  19. Learning objectives 00:00:35
  20. 4.1 History and meaning of business intelligence 00:06:27
  21. 4.2 Data warehousing and Hadoop 00:03:00
  22. 4.3 Data silos can be good 00:03:59
  23. 4.4 Convergence and the future of the business intelligence concept 00:02:25
  24. Learning objectives 00:00:42
  25. 5.1 Introduction to Apache Hive 00:02:53
  26. 5.2 Loading data into Hive 00:07:07
  27. 5.3 Querying data with Hive 00:06:03
  28. 5.4 Introduction to AMPLab's Shark 00:02:18
  29. 5.5 Data warehousing in the cloud 00:02:01
  30. Learning objectives 00:00:36
  31. 6.1 Introduction to analytical databases 00:02:47
  32. 6.2 Google's Dremel and BigQuery 00:02:15
  33. 6.3 Running a BigQuery query and retrieving the result 00:06:35
  34. 6.4 Visualizing BigQuery query results 00:06:12
  35. 6.5 The future of analytical query engines 00:02:27
  36. Learning objectives 00:00:47
  37. 7.1 History and goals of data visualization 00:02:59
  38. 7.2 Strategies for dealing with visualization of very large datasets 00:02:19
  39. 7.3 Building interactive visualizations with R and ggplot() 00:08:09
  40. 7.4 Building 2D plots with Python and matplotlib 00:05:49
  41. 7.5 Building interactive visualizations for the Web with D3.js 00:07:26
  42. Learning objectives 00:00:40
  43. 8.1 Writing a simple data pipeline script 00:04:53
  44. 8.2 Introduction to the Hadoop MapReduce framework 00:03:36
  45. 8.3 Writing a Hadoop streaming MapReduce job in Python 00:06:05
  46. 8.4 Writing a multistep MapReduce job using the mrjob Python library 00:06:30
  47. 8.5 Running mrjob scripts on Amazon Elastic MapReduce 00:04:26
  48. Learning objectives 00:00:41
  49. 9.1 Challenges of building complex data workflows 00:02:00
  50. 9.2 Writing a MapReduce workflow script with Apache Pig 00:05:52
  51. 9.3 Creating a MapReduce workflow application with Cascading 00:04:23
  52. 9.4 When to use Pig versus Cascading 00:02:19
  53. Learning objectives 00:00:40
  54. 10.1 Use cases and limitations of machine learning 00:02:49
  55. 10.2 Bayesian classification, clustering, and recommendation engines 00:05:17
  56. 10.3 Using Apache Mahout for bayesian classification 00:07:35
  57. 10.4 Introduction to MLbase 00:02:35
  58. Learning objectives 00:00:53
  59. 11.1 Understanding memory usage with R 00:05:49
  60. 11.2 Working with large matrices using bigmemory and biganalytics 00:05:17
  61. 11.3 Manipulating large data frames with ff 00:04:47
  62. 11.4 Running a linear regression over large datasets using biglm 00:05:23
  63. 11.5 Interfacing with Hadoop using R and RHadoop 00:03:25
  64. Learning objectives 00:00:42
  65. 12.1 Choosing a programming language for analytics 00:02:35
  66. 12.2 Working with NumPy and SciPy 00:06:01
  67. 12.3 Using the Pandas library for analysing time series data 00:09:08
  68. 12.4 Using the iPython notebook 00:06:08
  69. Learning objectives 00:00:48
  70. 13.1 Understanding Your Data Problem 00:03:40
  71. 13.2 A playbook for the build versus buy problem 00:03:14
  72. 13.3 Investing in a data center: Public versus private 00:03:30
  73. 13.4 Understanding the costs of open-source software 00:04:05
  74. 13.5 Using analytics as a service technologies 00:03:55
  75. Learning objectives 00:00:40
  76. 14.1 Trends driving innovation in data analytics technology 00:03:04
  77. 14.2 Hadoop: The disruptor and the disrupted 00:03:11
  78. 14.3 Analytics move toward the cloud 00:03:15
  79. 14.4 The evolving definition of “data scientist” 00:04:18
  80. 14.5 Converging technologies 00:03:01
  81. Summary of Data Just Right LiveLessons 00:00:55