O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Learning Path: Data Science with R

Video Description

Learn R and get comfortable with data science

In Detail

Excited by the endless possibilities offered by the fields of data science and data analysis? Let R set you on your way!

Data scientists, statisticians and analysts use R for statistical analysis, data visualization and predictive modeling. R gives aspiring analysts and data scientists the ability to represent complex sets of data in an impressive way.

Make yourself comfortable in R and get deep into data science using R with this Learning Path.

Prerequisites: Requires no programming knowledge - we’re covering basics of R too!

Resources: Code downloads and errata:

  • Introduction to R Programming

  • Getting Started with R for Data Science

  • Learning Data Mining with R

  • Learning R for Data Visualization

  • R for Data Science Solutions

  • PATH PRODUCTS

    This path navigates across the following products (in sequential order):

  • Introduction to R Programming (3h 46m)

  • Getting Started with R for Data Science (1h 39m)

  • Learning Data Mining with R (2h 17m)

  • Learning R for Data Visualization (1h 59m)

  • R for Data Science Solutions (5h 32m)

  • Table of Contents

    1. Chapter 1 : Introduction to R Programming
      1. The Course Overview 00:04:54
      2. Installing R 00:03:46
      3. Installing RStudio 00:04:36
      4. Installing Packages 00:04:50
      5. Data Types and Data Structures 00:03:05
      6. Vectors 00:05:44
      7. Random Numbers, Rounding, and Binning 00:04:00
      8. Missing Values 00:02:47
      9. The which() Operator 00:03:11
      10. Lists 00:04:35
      11. Set Operations 00:02:09
      12. Sampling and Sorting 00:02:52
      13. Check Conditions 00:02:17
      14. For Loops 00:02:34
      15. Dataframes 00:08:30
      16. Importing and Exporting Data 00:06:30
      17. Matrices and Frequency Tables 00:03:41
      18. Merging Dataframes 00:02:26
      19. Aggregation 00:02:48
      20. Melting and Cross Tabulations with dcast() 00:03:58
      21. Dates 00:05:35
      22. String Manipulation 00:05:14
      23. Functions 00:05:34
      24. Debugging and Error Handling 00:04:30
      25. Fast Loops with apply() 00:04:27
      26. Fast Loops with sapply(), lapply() and vapply() 00:02:00
      27. Creating and Customizing an R Plot 00:07:03
      28. Drawing Plots with 2 Y Axes 00:02:23
      29. Multiplots and Custom Layouts 00:03:08
      30. Creating Basic Graph Types 00:04:47
      31. Univariate Analysis 00:06:16
      32. Normal Distribution, Central Limit Theorem, and Confidence Intervals 00:05:32
      33. Correlation and Covariance 00:03:03
      34. Chi-sq Statistic 00:04:42
      35. ANOVA 00:04:54
      36. Statistical Tests 00:05:14
      37. Project 1 – Data Munging and Summarizing 00:11:31
      38. Project 2 – Visualization with Base Graphics 00:05:42
      39. Project 3 – Statistical Inference 00:03:50
      40. Pipes with Magrittr 00:05:21
      41. The 7 Data Manipulation Verbs 00:05:19
      42. Aggregation and Special Functions 00:03:36
      43. Two Table Verbs 00:02:43
      44. Working With Databases 00:05:30
      45. Understanding Basics, Filter, and Select 00:07:34
      46. Understanding Syntax, Creating and Updating Columns 00:04:06
      47. Aggregating Data, .N, and .I 00:04:21
      48. data.table 00:04:17
      49. Fast Loops with set(), Keys, and Joins 00:09:13
    2. Chapter 2 : Getting Started with R for Data Science
      1. The Course Overview 00:04:15
      2. What is R? 00:02:34
      3. The Structure of the Language 00:03:52
      4. Data Structures within R 00:05:57
      5. Writing a Simple Program in R 00:04:33
      6. The Structure of a DataFrame 00:05:35
      7. Creating a DataFrame from a CSV File 00:02:41
      8. Creating a DataFrame from a Zip File 00:03:03
      9. Creating a DataFrame from a Database 00:06:55
      10. The Tools Available for Cleaning Data 00:06:50
      11. Dealing with Null Values 00:04:03
      12. Standardizing Date Formats 00:03:13
      13. Blending Multiple DataFrames 00:04:22
      14. What Is a Codebook and Why Create One? 00:03:50
      15. Creating the Codebook Using Standard R API Functionality 00:02:28
      16. Manually Creating a Custom Codebook 00:03:32
      17. Introduction to Data Mining and Analysis 00:03:49
      18. The Tools and Techniques for Creating the Story 00:03:32
      19. Regression Analysis with R 00:02:24
      20. Clustering Data with R 00:03:20
      21. Classifying Data with R 00:04:01
      22. Data Visualization Tools 00:03:09
      23. Creating Static Visualization Plots 00:03:47
      24. Creating Interactive Plots 00:02:01
      25. Publishing the Graphics 00:02:06
      26. What's Next? 00:03:12
    3. Chapter 3 : Learning Data Mining with R
      1. The Course Overview 00:03:31
      2. Getting Started with R 00:05:06
      3. Data Preparation and Data Cleansing 00:04:10
      4. The Basic Concepts of R 00:05:46
      5. Data Frames and Data Manipulation 00:05:29
      6. Data Points and Distances in a Multidimensional Vector Space 00:03:59
      7. An Algorithmic Approach to Find Hidden Patterns in Data 00:06:24
      8. A Real-world Life Science Example 00:04:29
      9. Example – Using a Single Line of Code in R 00:04:00
      10. R Data Types 00:05:44
      11. R Functions and Indexing 00:04:15
      12. S3 Versus S4 – Object-oriented Programming in R 00:04:45
      13. Market Basket Analysis 00:03:01
      14. Introduction to Graphs 00:02:09
      15. Different Association Types 00:05:27
      16. The Apriori Algorithm 00:06:38
      17. The Eclat Algorithm 00:03:54
      18. The FP-Growth Algorithm 00:03:48
      19. Mathematical Foundations 00:06:01
      20. The Naive Bayes Classifier 00:03:50
      21. Spam Classification with Naïve Bayes 00:03:33
      22. Support Vector Machines 00:04:29
      23. K-nearest Neighbors 00:03:21
      24. Hierarchical Clustering 00:05:45
      25. Distribution-based Clustering 00:06:55
      26. Density-based Clustering 00:03:12
      27. Using DBSCAN to Cluster Flowers Based on Spatial Properties 00:02:25
      28. Introduction to Neural Networks and Deep Learning 00:06:09
      29. Using the H2O Deep Learning Framework 00:02:28
      30. Real-time Cloud Based IoT Sensor Data Analysis 00:06:17
    4. Chapter 4 : Learning R for Data Visualization
      1. The Course Overview 00:05:32
      2. Preview of R Plotting Functionalities 00:03:16
      3. Introducing the Dataset 00:03:21
      4. Loading Tables and CSV Files 00:04:41
      5. Loading Excel Files 00:03:33
      6. Exporting Data 00:04:19
      7. Creating Histograms 00:05:01
      8. The Importance of Box Plots 00:03:44
      9. Plotting Bar Charts 00:02:43
      10. Plotting Multiple Variables – Scatterplots 00:03:07
      11. Dealing with Time – Time-series Plots 00:02:38
      12. Handling Uncertainty 00:04:15
      13. Changing Theme 00:03:07
      14. Changing Colors 00:03:20
      15. Modifying Axis and Labels 00:02:40
      16. Adding Supplementary Elements 00:04:08
      17. Adding Text Inside and Outside of the Plot 00:05:02
      18. Multi-plots 00:03:59
      19. Exporting Plots as Images 00:03:24
      20. Adjusting the Page Size 00:02:33
      21. Getting Started with Interactive Plotting 00:02:44
      22. Creating Interactive Histograms and Box Plots 00:04:55
      23. Plotting Interactive Bar Charts 00:03:12
      24. Creating Interactive Scatterplots 00:02:58
      25. Developing Interactive Time-series Plots 00:03:47
      26. Getting Started with Shiny 00:04:09
      27. Creating a Simple Website 00:04:52
      28. File Input 00:03:09
      29. Conditional Panels – UI 00:03:45
      30. Conditional Panels – Servers 00:05:31
      31. Deploying the Site 00:05:38
    5. Chapter 5 : R for Data Science Solutions
      1. R Functions and Arguments 00:06:25
      2. Understanding Environments 00:02:59
      3. Working with Lexical Scoping 00:02:49
      4. Understanding Closure 00:02:17
      5. Performing Lazy Evaluation 00:01:56
      6. Creating Infix Operators 00:02:51
      7. Using the Replacement Function 00:02:17
      8. Handling Errors in a Function 00:04:31
      9. The Debugging Function 00:04:05
      10. Downloading Open Data 00:02:15
      11. Reading and Writing CSV Files 00:01:13
      12. Scanning Text Files 00:02:21
      13. Working with Excel Files 00:01:56
      14. Reading Data from Databases 00:04:04
      15. Scraping Web Data 00:05:17
      16. Renaming the Data Variable 00:02:27
      17. Converting Data Types 00:04:03
      18. Working with Date Format 00:02:36
      19. Adding New Records 00:02:55
      20. Filtering Data 00:02:09
      21. Dropping Data 00:03:29
      22. Merging and Sorting Data 00:01:42
      23. Reshaping Data 00:04:00
      24. Detecting Missing Data 00:02:42
      25. Imputing Missing Data 00:03:15
      26. Enhancing a data.frame with a data.table 00:04:50
      27. Managing Data with data.table 00:01:40
      28. Performing Fast Aggregation with data.table 00:01:14
      29. Merging Large Datasets with a data.table 00:01:54
      30. Subsetting and Slicing Data with dplyr 00:02:11
      31. Sampling Data with dplyr 00:04:14
      32. Selecting Columns with dplyr 00:02:10
      33. Chaining Operations in dplyr 00:02:41
      34. Arranging Rows with dplyr 00:02:09
      35. Eliminating Duplicated Rows with dplyr 00:01:26
      36. Adding New Columns with dplyr 00:02:40
      37. Summarizing Data with dplyr 00:02:10
      38. Merging Data with dplyr 00:01:22
      39. Creating Basic Plots with ggplot2 00:04:15
      40. Changing Aesthetics Mapping 00:03:09
      41. Introducing Geometric Objects 00:03:13
      42. Performing Transformations 00:03:27
      43. Adjusting Scales 00:02:16
      44. Faceting 00:02:07
      45. Adjusting Themes 00:01:33
      46. Combining Plots 00:02:04
      47. Creating Maps 00:04:39
      48. Creating R Markdown Reports 00:02:47
      49. Learning the Markdown Syntax 00:03:14
      50. Embedding R Code Chunks 00:02:19
      51. Creating Interactive Graphics with ggvis 00:02:39
      52. Understanding Basic Syntax and Gramma 00:01:57
      53. Controlling Axes and Legends and Using Scales 00:02:55
      54. Adding Interactivity to a ggvis Plot 00:03:41
      55. Creating an R Shiny Document 00:02:16
      56. Publishing an R Shiny Report 00:02:29
      57. Generating Random Samples 00:02:52
      58. Understanding Uniform Distributions 00:01:39
      59. Generating Binomial Random Variates 00:02:30
      60. Generating Poisson Random Variates 00:02:06
      61. Sampling from a Normal Distribution 00:04:08
      62. Sampling from a Chi-Squared Distribution 00:02:00
      63. Understanding Student's t- Distribution 00:02:11
      64. Sampling from a Dataset 00:01:52
      65. Simulating the Stochastic Process 00:02:29
      66. Getting Confidence Intervals 00:05:54
      67. Performing Z-tests 00:03:12
      68. Performing Student's t-Tests 00:02:15
      69. Conducting Exact Binomial Tests 00:02:09
      70. Performing Kolmogorov-Smirnov Tests 00:02:17
      71. Working with the Pearson's Chi-Squared Tests 00:01:40
      72. Understanding the Wilcoxon Rank Sum and Signed Rank Tests 00:01:48
      73. Conducting One-way ANOVA 00:02:39
      74. Performing Two-way ANOVA 00:03:02
      75. Transforming Data into Transactions 00:05:12
      76. Displaying Transactions and Associations 00:03:03
      77. Mining Associations with the Apriori Rule 00:04:19
      78. Pruning Redundant Rules 00:02:15
      79. Visualizing Association Rules 00:02:36
      80. Mining Frequent Itemsets with Eclat 00:03:08
      81. Creating Transactions with Temporal Information 00:02:53
      82. Mining Frequent Sequential Patterns with cSPADE 00:02:42
      83. Creating Time Series Data 00:05:12
      84. Plotting a Time Series Object 00:02:26
      85. Decomposing Time Series 00:02:11
      86. Smoothing Time Series 00:05:21
      87. Forecasting Time Series 00:02:31
      88. Selecting an ARIMA Model 00:03:19
      89. Creating an ARIMA Model 00:02:20
      90. Forecasting with an ARIMA Model 00:02:11
      91. Predicting Stock Prices with an ARIMA Model 00:04:24
      92. Fitting a Linear Regression Model with lm 00:05:35
      93. Summarizing Linear Model Fits 00:02:14
      94. Using Linear Regression to Predict Unknown Values 00:01:38
      95. Measuring the Performance of the Regression Model 00:03:46
      96. Performing a Multiple Regression Analysis 00:02:54
      97. Selecting the Best-Fitted Regression Model with Stepwise Regression 00:03:57
      98. Applying the Gaussian Model for Generalized Linear Regression 00:03:23
      99. Performing a Logistic Regression Analysis 00:04:17
      100. Building a Classification Model with Recursive Partitioning Trees 00:02:42
      101. Visualizing Recursive Partitioning Tree 00:02:19
      102. Measuring Model Performance with a Confusion Matrix 00:04:31
      103. Measuring Prediction Performance Using ROCR 00:03:59
      104. Clustering Data with Hierarchical Clustering 00:06:10
      105. Cutting Tree into Clusters 00:01:51
      106. Clustering Data with the k-means Method 00:01:20
      107. Clustering Data with the Density-Based Method 00:02:54
      108. Extracting Silhouette Information from Clustering 00:01:45
      109. Comparing Clustering Methods 00:02:09
      110. Recognizing Digits Using the Density-Based Clustering Method 00:03:12
      111. Grouping Similar Text Documents with k-means Clustering Method 00:01:50
      112. Performing Dimension Reduction with Principal Component Analysis (PCA) 00:02:12
      113. Determining the Number of Principal Components Using a Scree Plot 00:01:52
      114. Determining the Number of Principal Components Using the Kaiser Method 00:02:15
      115. Visualizing Multivariate Data Using a biplot 00:02:51