Safari, the world’s most comprehensive technology and business learning platform.

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required

O'Reilly logo
R Data Mining Blueprints

Book Description

Learn about data mining with real-world datasets

About This Book

  • Diverse real-world datasets to teach data mining techniques

  • Practical and focused on real-world data mining cases, this book covers concepts such as spatial data mining, text mining, social media mining, and web mining

  • Real-world case studies illustrate various data mining techniques, taking you from novice to intermediate

  • Who This Book Is For

    Data analysts from beginner to intermediate level who need a step-by-step helping hand in developing complex data mining projects are the ideal audience for this book. They should have prior knowledge of basic statistics and little bit of programming language experience in any tool or platform.

    What You Will Learn

  • Make use of statistics and programming to learn data mining concepts and its applications

  • Use R Programming to apply statistical models on data

  • Create predictive models to be applied for performing classification, prediction and recommendation

  • Use of various libraries available on R CRAN (comprehensive R archives network) in data mining

  • Apply data management steps in handling large datasets

  • Learn various data visualization libraries available in R for representing data

  • Implement various dimension reduction techniques to handle large datasets

  • Acquire knowledge about neural network concept drawn from computer science and its applications in data mining

  • In Detail

    The R language is a powerful open source functional programming language. At its core, R is a statistical programming language that provides impressive tools for data mining and analysis. It enables you to create high-level graphics and offers an interface to other languages. This means R is best suited to produce data and visual analytics through customization scripts and commands, instead of the typical statistical tools that provide tick boxes and drop-down menus for users.

    This book explores data mining techniques and shows you how to apply different mining concepts to various statistical and data applications in a wide range of fields. We will teach you about R and its application to data mining, and give you relevant and useful information you can use to develop and improve your applications. It will help you complete complex data mining cases and guide you through handling issues you might encounter during projects.

    Style and approach

    This fast-paced guide will help you solve predictive modeling problems using the most popular data mining algorithms through simple, practical cases.

    Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at If you purchased this book elsewhere, you can visit and register to have the code file.

    Table of Contents

    1. R Data Mining Blueprints
      1. R Data Mining Blueprints
      2. Credits
      3. About the Author
      4. About the Reviewer
        1. eBooks, discount offers, and more
          1. Why subscribe?
      6. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Downloading the color images of this book
          3. Errata
          4. Piracy
          5. Questions
      7. 1. Data Manipulation Using In-built R Data
        1. What is data mining?
          1. How is it related to data science, analytics, and statistical modeling?
        2. Introduction to the R programming language
          1. Getting started with R
          2. Data types, vectors, arrays, and matrices
          3. List management, factors, and sequences
          4. Import and export of data types
        3. Data type conversion
        4. Sorting and merging dataframes
        5. Indexing or subsetting dataframes
        6. Date and time formatting
        7. Creating new functions
          1. User-defined functions
          2. Built-in functions
        8. Loop concepts - the for loop
        9. Loop concepts - the repeat loop
        10. Loop concepts - while conditions
        11. Apply concepts
        12. String manipulation
        13. NA and missing value management
        14. Missing value imputation techniques
        15. Summary
      8. 2. Exploratory Data Analysis with Automobile Data
        1. Univariate data analysis
        2. Bivariate analysis
        3. Multivariate analysis
        4. Understanding distributions and transformation
          1. Normal probability distribution
          2. Binomial probability distribution
          3. Poisson probability distribution
        5. Interpreting distributions
          1. Interpreting continuous data
        6. Variable binning or discretizing continuous data
        7. Contingency tables, bivariate statistics, and checking for data normality
        8. Hypothesis testing
          1. Test of the population mean
            1. One tail test of mean with known variance
            2. One tail and two tail test of proportions
          2. Two sample variance test
        9. Non-parametric methods
          1. Wilcoxon signed-rank test
          2. Mann-Whitney-Wilcoxon test
          3. Kruskal-Wallis test
        10. Summary
      9. 3. Visualize Diamond Dataset
        1. Data visualization using ggplot2
          1. Bar chart
          2. Boxplot
          3. Bubble chart
          4. Donut chart
          5. Geo mapping
          6. Histogram
          7. Line chart
          8. Pie chart
          9. Scatterplot
          10. Stacked bar chart
          11. Stem and leaf plot
          12. Word cloud
          13. Coxcomb plot
        2. Using plotly
          1. Bubble plot
          2. Bar charts using plotly
          3. Scatterplot using plotly
          4. Boxplots using plotly
          5. Polar charts using plotly
          6. Polar scatterplot using plotly
          7. Polar area chart
        3. Creating geo mapping
        4. Summary
      10. 4. Regression with Automobile Data
        1. Regression introduction
          1. Formulation of regression problem
          2. Case study
        2. Linear regression
        3. Stepwise regression method for variable selection
        4. Logistic regression
        5. Cubic regression
        6. Penalized regression
        7. Summary
      11. 5. Market Basket Analysis with Groceries Data
        1. Introduction to Market Basket Analysis
          1. What is MBA?
          2. Where to apply MBA?
          3. Data requirement
          4. Assumptions/prerequisites
          5. Modeling techniques
          6. Limitations
        2. Practical project
          1. Apriori algorithm
          2. Eclat algorithm
          3. Visualizing association rules
          4. Implementation of arules
        3. Summary
      12. 6. Clustering with E-commerce Data
        1. Understanding customer segmentation
          1. Why understanding customer segmentation is important
          2. How to perform customer segmentation?
        2. Various clustering methods available
          1. K-means clustering
          2. Hierarchical clustering
          3. Model-based clustering
          4. Other cluster algorithms
          5. Comparing clustering methods
        3. References
        4. Summary
      13. 7. Building a Retail Recommendation Engine
        1. What is recommendation?
          1. Types of product recommendation
          2. Techniques to perform recommendation
        2. Assumptions
        3. What method to apply when
        4. Limitations of collaborative filtering
        5. Practical project
        6. Summary
      14. 8. Dimensionality Reduction
        1. Why dimensionality reduction?
          1. Techniques available for dimensionality reduction
            1. Which technique to apply where?
              1. Principal component analysis
        2. Practical project around dimensionality reduction
          1. Attribute description
        3. Parametric approach to dimension reduction
        4. References
        5. Summary
      15. 9. Applying Neural Network to Healthcare Data
        1. Introduction to neural networks
        2. Understanding the math behind the neural network
        3. Neural network implementation in R
        4. Neural networks for prediction
        5. Neural networks for classification
        6. Neural networks for forecasting
        7. Merits and demerits of neural networks
        8. References
        9. Summary