You are previewing Exploring Data with RapidMiner.
O'Reilly logo
Exploring Data with RapidMiner

Book Description

RapidMiner is a highly versatile tool that can make data work harder for you. This book will show you how to import, parse, and structure your data with remarkable speed and efficiency. It’s data mining made accessible.

  • See how to import, parse, and structure your data quickly and effectively

  • Understand the visualization possibilities and be inspired to use these with your own data

  • Structured in a modular way to adhere to standard industry processes

  • In Detail

    Data is everywhere and the amount is increasing so much that the gap between what people can understand and what is available is widening relentlessly. There is a huge value in data, but much of this value lies untapped. 80% of data mining is about understanding data, exploring it, cleaning it, and structuring it so that it can be mined. RapidMiner is an environment for machine learning, data mining, text mining, predictive analytics, and business analytics. It is used for research, education, training, rapid prototyping, application development, and industrial applications.

    Exploring Data with RapidMiner is packed with practical examples to help practitioners get to grips with their own data. The chapters within this book are arranged within an overall framework and can additionally be consulted on an ad-hoc basis. It provides simple to intermediate examples showing modeling, visualization, and more using RapidMiner.

    Exploring Data with RapidMiner is a helpful guide that presents the important steps in a logical order. This book starts with importing data and then lead you through cleaning, handling missing values, visualizing, and extracting additional information, as well as understanding the time constraints that real data places on getting a result. The book usesreal examples to help you understand how to set up processes, quickly.

    This book will give you a solid understanding of the possibilities that RapidMiner gives for exploring data and you will be inspired to use it for your own work.

    Table of Contents

    1. Exploring Data with RapidMiner
      1. Table of Contents
      2. Exploring Data with RapidMiner
      3. Credits
      4. About the Author
      5. About the Reviewer
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers and more
          1. Why Subscribe?
          2. Free Access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Downloading the color images of this book
          3. Errata
          4. Piracy
          5. Questions
      8. 1. Setting the Scene
        1. A process framework
        2. Data volume and velocity
        3. Data variety, formats, and meanings
        4. Missing data
        5. Cleaning data
        6. Visualizing data
        7. Resource constraints
        8. Terminology
        9. Accompanying material
        10. Summary
      9. 2. Loading Data
        1. Reading files
          1. Alternative delimiters
          2. Reading complete lines
          3. Reading large numbers of attributes
          4. Splitting files into smaller pieces
        2. Databases
          1. The Read Database operator
          2. Large datasets
        3. Using macros
        4. Summary
      10. 3. Visualizing Data
        1. Getting started
        2. Statistical summaries
        3. Relationships between attributes
          1. Scatter plots
          2. Scatter 3D color
          3. Parallel and deviation
          4. Quartile color
        4. Time series data
          1. Plotting series
          2. Using the survey plotter
        5. Relations between examples
          1. Using histograms
          2. Using block plots
        6. Summary
      11. 4. Parsing and Converting Attributes
        1. Generating attributes
          1. Date functions
          2. Regular expression functions
          3. Generating extracts
          4. Regular expressions
          5. XPath
        2. Renaming attributes
          1. Searching and replacing attribute values
          2. Using the Map operator
          3. Using the Replace operator
          4. Using Replace (Dictionary)
        3. Summary
      12. 5. Outliers
        1. Manual inspection
          1. Increasing the data volume
          2. Rules for handling outliers
        2. Automated detection of example outliers
          1. Detect Outlier (Distances)
          2. Detect Outlier (Densities)
          3. Detect Outlier (LOF)
          4. Detect Outliers (COF)
        3. Summary
      13. 6. Missing Values
        1. Missing or empty?
        2. Types of missing data
          1. Missing completely at random
          2. Missing at random
          3. Not missing at random
        3. Categorizing missing data
          1. Finding MCAR data
          2. Finding MAR data
          3. Finding NMAR data
          4. A cautionary note
        4. Effect of missing data
        5. Options for handling missing data
          1. Returning to the root cause
          2. Ignore it
          3. Manual editing
          4. Deletion of examples
          5. Deletion of attributes
          6. Imputation with single values
          7. Modeling
        6. Summary
      14. 7. Transforming Data
        1. Creating new attributes
        2. Aggregation
        3. Using pivoting
        4. Using de-pivoting
        5. Windowing data
        6. Summary
      15. 8. Reducing Data Size
        1. Removing examples using sampling
        2. Removing attributes
          1. Removing useless attributes
          2. Weighting attributes
          3. Selecting attributes using models
        3. Summary
      16. 9. Resource Constraints
        1. Measuring and estimating performance
          1. Measuring performance
        2. Adding memory
        3. Parallel processing
        4. Restructuring processes
        5. Summary
      17. 10. Debugging
        1. Breakpoints in RapidMiner Studio
        2. Logging data in RapidMiner Studio
        3. RapidMiner Studio console printing
        4. Groovy scripts
          1. Outputting macros example
          2. Console logging with Groovy
        5. Regex tools
        6. Using XPath effectively
        7. Summary
      18. 11. Taking Stock
        1. Exploring new techniques
          1. Time series
          2. Web mining
          3. Using R
          4. Java or Groovy
          5. Third-party components
          6. RapidMiner Server
        2. Where to go next
      19. Index