You are previewing R for Data Science.
O'Reilly logo
R for Data Science

Book Description

Learn and explore the fundamentals of data science with R

In Detail

R is a powerful, open source, functional programming language. It can be used for a wide range of programming tasks and is best suited to produce data and visual analytics through customizable scripts and commands.

The purpose of the book is to explore the core topics that data scientists are interested in. This book draws from a wide variety of data sources and evaluates this data using existing publicly available R functions and packages. In many cases, the resultant data can be displayed in a graphical form that is more intuitively understood. You will also learn about the often needed and frequently used analysis techniques in the industry.

By the end of the book, you will know how to go about adopting a range of data science techniques with R.

What You Will Learn

  • Develop, execute, and modify R scripts
  • Find, install, and use third-party R packages
  • Organize your data to get the best results
  • Produce graphical displays of your results, including 3D visualizations
  • Perform statistical analyses that you can use all the time
  • Understand the trade-offs between different approaches to problems
  • Be comfortable with trying features to fine-tune your results
  • Adopt and learn data science with R in a practical tutorial format
  • Explore concepts such as data mining, data analysis, data visualization, and machine learning using R
  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

    Table of Contents

    1. R for Data Science
      1. Table of Contents
      2. R for Data Science
      3. Credits
      4. About the Author
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Downloading the color images of this book
          3. Errata
          4. Piracy
          5. Questions
      8. 1. Data Mining Patterns
        1. Cluster analysis
          1. K-means clustering
            1. Usage
            2. Example
          2. K-medoids clustering
            1. Usage
            2. Example
          3. Hierarchical clustering
            1. Usage
            2. Example
          4. Expectation-maximization
            1. Usage
            2. List of model names
            3. Example
          5. Density estimation
            1. Usage
            2. Example
        2. Anomaly detection
          1. Show outliers
            1. Example
            2. Example
            3. Another anomaly detection example
          2. Calculating anomalies
            1. Usage
            2. Example 1
            3. Example 2
        3. Association rules
          1. Mine for associations
            1. Usage
            2. Example
        4. Questions
        5. Summary
      9. 2. Data Mining Sequences
        1. Patterns
          1. Eclat
            1. Usage
            2. Using eclat to find similarities in adult behavior
            3. Finding frequent items in a dataset
            4. An example focusing on highest frequency
          2. arulesNBMiner
            1. Usage
            2. Mining the Agrawal data for frequent sets
          3. Apriori
            1. Usage
            2. Evaluating associations in a shopping basket
          4. Determining sequences using TraMineR
            1. Usage
            2. Determining sequences in training and careers
          5. Similarities in the sequence
            1. Sequence metrics
            2. Usage
            3. Example
        2. Questions
        3. Summary
      10. 3. Text Mining
        1. Packages
          1. Text processing
            1. Example
            2. Creating a corpus
              1. Converting text to lowercase
              2. Removing punctuation
              3. Removing numbers
              4. Removing words
              5. Removing whitespaces
              6. Word stems
              7. Document term matrix
              8. Using VectorSource
          2. Text clusters
            1. Word graphics
            2. Analyzing the XML text
        2. Questions
        3. Summary
      11. 4. Data Analysis – Regression Analysis
        1. Packages
          1. Simple regression
          2. Multiple regression
          3. Multivariate regression analysis
          4. Robust regression
        2. Questions
        3. Summary
      12. 5. Data Analysis – Correlation
        1. Packages
          1. Correlation
            1. Example
          2. Visualizing correlations
          3. Covariance
          4. Pearson correlation
          5. Polychoric correlation
          6. Tetrachoric correlation
          7. A heterogeneous correlation matrix
          8. Partial correlation
        2. Questions
        3. Summary
      13. 6. Data Analysis – Clustering
        1. Packages
        2. K-means clustering
          1. Example
            1. Optimal number of clusters
          2. Medoids clusters
          3. The cascadeKM function
          4. Selecting clusters based on Bayesian information
          5. Affinity propagation clustering
          6. Gap statistic to estimate the number of clusters
          7. Hierarchical clustering
        3. Questions
        4. Summary
      14. 7. Data Visualization – R Graphics
        1. Packages
          1. Interactive graphics
          2. The latticist package
            1. Bivariate binning display
            2. Mapping
            3. Plotting points on a map
            4. Plotting points on a world map
            5. Google Maps
          3. The ggplot2 package
        2. Questions
        3. Summary
      15. 8. Data Visualization – Plotting
        1. Packages
        2. Scatter plots
          1. Regression line
          2. A lowess line
          3. scatterplot
          4. Scatterplot matrices
            1. splom – display matrix data
            2. cpairs – plot matrix data
          5. Density scatter plots
        3. Bar charts and plots
          1. Bar plot
            1. Usage
          2. Bar chart
          3. ggplot2
          4. Word cloud
        4. Questions
        5. Summary
      16. 9. Data Visualization – 3D
        1. Packages
        2. Generating 3D graphics
          1. Lattice Cloud – 3D scatterplot
          2. scatterplot3d
          3. scatter3d
          4. cloud3d
          5. RgoogleMaps
          6. vrmlgenbar3D
          7. Big Data
            1. pbdR
              1. Common global values
              2. Distribute data across nodes
              3. Distribute a matrix across nodes
            2. bigmemory
              1. pdbMPI
              2. snow
              3. More Big Data
          8. Research areas
            1. Rcpp
            2. parallel
            3. microbenchmark
            4. pqR
            5. SAP integration
            6. roxygen2
            7. bioconductor
            8. swirl
            9. pipes
        3. Questions
        4. Summary
      17. 10. Machine Learning in Action
        1. Packages
        2. Dataset
          1. Data partitioning
          2. Model
            1. Linear model
            2. Prediction
            3. Logistic regression
            4. Residuals
            5. Least squares regression
            6. Relative importance
            7. Stepwise regression
            8. The k-nearest neighbor classification
            9. Naïve Bayes
          3. The train Method
            1. predict
            2. Support vector machines
            3. K-means clustering
            4. Decision trees
            5. AdaBoost
            6. Neural network
            7. Random forests
        3. Questions
        4. Summary
      18. 11. Predicting Events with Machine Learning
        1. Automatic forecasting packages
          1. Time series
          2. The SMA function
          3. The decompose function
          4. Exponential smoothing
          5. Forecast
            1. Correlogram
            2. Box test
          6. Holt exponential smoothing
            1. Automated forecasting
            2. ARIMA
            3. Automated ARIMA forecasting
        2. Questions
        3. Summary
      19. 12. Supervised and Unsupervised Learning
        1. Packages
          1. Supervised learning
            1. Decision tree
            2. Regression
            3. Neural network
            4. Instance-based learning
            5. Ensemble learning
            6. Support vector machines
            7. Bayesian learning
            8. Random forests
          2. Unsupervised learning
            1. Cluster analysis
            2. Density estimation
            3. Expectation-maximization
            4. Hidden Markov models
            5. Blind signal separation
        2. Questions
        3. Summary
      20. Index