You are previewing RapidMiner.
O'Reilly logo
RapidMiner

Book Description

Powerful, Flexible Tools for a Data-Driven World
As the data deluge continues in today’s world, the need to master data mining, predictive analytics, and business analytics has never been greater. These techniques and tools provide unprecedented insights into data, enabling better decision making and forecasting, and ultimately the solution of increasingly complex problems.

Learn from the Creators of the RapidMiner Software
Written by leaders in the data mining community, including the developers of the RapidMiner software, RapidMiner: Data Mining Use Cases and Business Analytics Applications provides an in-depth introduction to the application of data mining and business analytics techniques and tools in scientific research, medicine, industry, commerce, and diverse other sectors. It presents the most powerful and flexible open source software solutions: RapidMiner and RapidAnalytics. The software and their extensions can be freely downloaded at www.RapidMiner.com.

Understand Each Stage of the Data Mining Process
The book and software tools cover all relevant steps of the data mining process, from data loading, transformation, integration, aggregation, and visualization to automated feature selection, automated parameter and process optimization, and integration with other tools, such as R packages or your IT infrastructure via web services. The book and software also extensively discuss the analysis of unstructured data, including text and image mining.

Easily Implement Analytics Approaches Using RapidMiner and RapidAnalytics
Each chapter describes an application, how to approach it with data mining methods, and how to implement it with RapidMiner and RapidAnalytics. These application-oriented chapters give you not only the necessary analytics to solve problems and tasks, but also reproducible, step-by-step descriptions of using RapidMiner and RapidAnalytics. The case studies serve as blueprints for your own data mining applications, enabling you to effectively solve similar problems.

Table of Contents

  1. Preliminareis
  2. Series
  3. Dedication
  4. Foreword
    1. Case Studies Are for Communication and Collaboration
    2. RapidMiner
  5. Preface
    1. What Is Data Mining? What Is It Good for, What Are Its Applications, and What Does It Enable Me to Do?
    2. Why Should I Read This Book? Why Case Studies? What Will I Learn? What Will I Be Able to Achieve?
    3. What Are the Advantages of the Open Source Solutions RapidMiner and RapidAnalytics Used in This Book?
    4. What Is the Structure of This Book and Which Chapters Should I Read?
  6. About the Editors
    1. Markus Hofmann
    2. Ralf Klinkenberg
  7. List of Contributors
    1. Editors
    2. Chapter Authors
  8. Acknowledgments
  9. Part I Introduction to Data Mining and RapidMiner
    1. Chapter 1 What This Book is About and What It is Not
      1. 1.1 Introduction
      2. 1.2 Coincidence or Not?
      3. 1.3 Applications of Data Mining
        1. 1.3.1 Financial Services
        2. 1.3.2 Retail and Consumer Products
        3. 1.3.3 Telecommunications and Media
        4. 1.3.4 Manufacturing, Construction, and Electronics
      4. 1.4 Fundamental Terms
        1. 1.4.1 Attributes and Target Attributes
        2. 1.4.2 Concepts and Examples
        3. 1.4.3 Attribute Roles
        4. 1.4.4 Value Types
        5. 1.4.5 Data and Meta Data
        6. 1.4.6 Modeling
        1. Table 1.1
    2. Chapter 2 Getting Used to RapidMiner
      1. 2.1 Introduction
      2. 2.2 First Start
      3. 2.3 Design Perspective
      4. 2.4 Building a First Process
        1. 2.4.1 Loading Data
        2. 2.4.2 Creating a Predictive Model
        3. 2.4.3 Executing a Process
        4. 2.4.4 Looking at Results
        1. Figure 2.1
        2. Figure 2.2
        3. Figure 2.3
        4. Figure 2.4
        5. Figure 2.5
        6. Figure 2.6
        7. Figure 2.7
        8. Figure 2.8
        9. Figure 2.9
        10. Figure 2.10
  10. Part II Basic Classification Use Cases for Credit Approval and in Education
    1. Chapter 3 k-Nearest Neighbor Classification I
      1. 3.1 Introduction
      2. 3.2 Algorithm
      3. 3.3 The k-NN Operator in RapidMiner
      4. 3.4 Dataset
        1. 3.4.1 Teacher Assistant Evaluation Dataset
        2. 3.4.2 Basic Information
        3. 3.4.3 Examples
        4. 3.4.4 Attributes
      5. 3.5 Operators in This Use Case
        1. 3.5.1 Read URL Operator
        2. 3.5.2 Rename Operator
        3. 3.5.3 Numerical to Binominal Operator
        4. 3.5.4 Numerical to Polynominal Operator
        5. 3.5.5 Set Role Operator
        6. 3.5.6 Split Validation Operator
        7. 3.5.7 Apply Model Operator
        8. 3.5.8 Performance Operator
      6. 3.6 Use Case
        1. 3.6.1 Data Import
        2. 3.6.2 Pre-processing
        3. 3.6.3 Renaming Attributes
        4. 3.6.4 Changing the Type of Attributes
        5. 3.6.5 Changing the Role of Attributes
        6. 3.6.6 Model Training, Testing, and Performance Evaluation
        1. Figure 3.1
        2. Figure 3.2
        3. Figure 3.3
        4. Figure 3.4
        5. Figure 3.5
        6. Figure 3.6
        7. Figure 3.7
    2. Chapter 4 k-Nearest Neighbor Classification II
      1. 4.1 Introduction
      2. 4.2 Dataset
      3. 4.3 Operators Used in This Use Case
        1. 4.3.1 Read CSV Operator
        2. 4.3.2 Principal Component Analysis Operator
        3. 4.3.3 Split Data Operator
        4. 4.3.4 Performance (Classification) Operator
      4. 4.4 Data Import
      5. 4.5 Pre-processing
        1. 4.5.1 Principal Component Analysis
      6. 4.6 Model Training, Testing, and Performance Evaluation
        1. 4.6.1 Training the Model
        2. 4.6.2 Testing the Model
        3. 4.6.3 Performance Evaluation
        1. Figure 4.1
        2. Figure 4.2
        3. Figure 4.3
    3. Chapter 5 Naïve Bayes Classification I
      1. 5.1 Introduction
      2. 5.2 Dataset
        1. 5.2.1 Credit Approval Dataset
        2. 5.2.2 Examples
        3. 5.2.3 Attributes
      3. 5.3 Operators in This Use Case
        1. 5.3.1 Rename by Replacing Operator
        2. 5.3.2 Filter Examples Operator
        3. 5.3.3 Discretize by Binning Operator
        4. 5.3.4 X-Validation Operator
        5. 5.3.5 Performance (Binominal Classification) Operator
      4. 5.4 Use Case
        1. 5.4.1 Data Import
        2. 5.4.2 Pre-processing
        3. 5.4.3 Model Training, Testing, and Performance Evaluation
        1. Figure 5.1
        2. Figure 5.2
        3. Figure 5.3
        4. Figure 5.4
        5. Figure 5.5
        6. Figure 5.6
        7. Figure 5.7
        8. Figure 5.8
        9. Figure 5.9
    4. Chapter 6 Naïve Bayes Classification II
      1. 6.1 Dataset
        1. 6.1.1 Nursery Dataset
        2. 6.1.2 Basic Information
        3. 6.1.3 Examples
        4. 6.1.4 Attributes
      2. 6.2 Operators in this Use Case
        1. 6.2.1 Read Excel Operator
        2. 6.2.2 Select Attributes Operator
      3. 6.3 Use Case
        1. 6.3.1 Data Import
        2. 6.3.2 Pre-processing
        3. 6.3.3 Model Training, Testing, and Performance Evaluation
        4. 6.3.4 A Deeper Look into the Naïve Bayes Algorithm
        1. Figure 6.1
        2. Figure 6.2
        3. Figure 6.3
        4. Figure 6.4
        5. Figure 6.5
        6. Figure 6.6
        7. Figure 6.7
        8. Figure 6.8
        1. Table 6.1
        2. Table 6.2
  11. Part III Marketing, Cross-Selling, and Recommender System Use Cases
    1. Chapter 7 Who Wants My Product? Affinity-Based Marketing
      1. Acronyms
      2. 7.1 Introduction
      3. 7.2 Business Understanding
      4. 7.3 Data Understanding
      5. 7.4 Data Preparation
        1. 7.4.1 Assembling the Data
        2. 7.4.2 Preparing for Data Mining
      6. 7.5 Modelling and Evaluation
        1. 7.5.1 Continuous Evaluation and Cross Validation
        2. 7.5.2 Class Imbalance
        3. 7.5.3 Simple Model Evaluation
        4. 7.5.4 Confidence Values, ROC, and Lift Charts
        5. 7.5.5 Trying Different Models
      7. 7.6 Deployment
      8. 7.7 Conclusions
      9. Glossary
      10. Bibliography
        1. Figure 7.1
        2. Figure 7.2
        3. Figure 7.3
        4. Figure 7.4
        5. Figure 7.5
        6. Figure 7.6
        7. Figure 7.7
        8. Figure 7.8
        9. Figure 7.9
        10. Figure 7.10
        11. Figure 7.11
        12. Figure 7.12
        13. Figure 7.13
    2. Chapter 8 Basic Association Rule Mining in RapidMiner
      1. 8.1 Data Mining Case Study
        1. Figure 8.1
        2. Figure 8.2
        3. Figure 8.3
        4. Figure 8.4
        5. Figure 8.5
        6. Figure 8.6
        7. Figure 8.7
        8. Figure 8.8
        9. Figure 8.9
        10. Figure 8.10
        11. Figure 8.11
        12. Figure 8.12
        13. Figure 8.13
        14. Figure 8.14
        15. Figure 8.15
        16. Figure 8.16
        17. Figure 8.17
        18. Figure 8.18
        19. Figure 8.19
        20. Figure 8.20
        21. Figure 8.21
        22. Figure 8.22
        23. Figure 8.23
        24. Figure 8.24
        25. Figure 8.25
        26. Figure 8.26
        27. Figure 8.27
        28. Figure 8.28
        29. Figure 8.29
        30. Figure 8.30
        31. Figure 8.31
        32. Figure 8.32
        33. Figure 8.33
        34. Figure 8.34
        35. Figure 8.35
    3. Chapter 9 Constructing Recommender Systems in RapidMiner
      1. Acronyms
      2. 9.1 Introduction
      3. 9.2 The Recommender Extension
        1. 9.2.1 Recommendation Operators
        2. 9.2.2 Data Format
        3. 9.2.3 Performance Measures
      4. 9.3 The VideoLectures.net Dataset
      5. 9.4 Collaborative-based Systems
        1. 9.4.1 Neighbourhood-based Recommender Systems
        2. 9.4.2 Factorization-based Recommender Systems
        3. 9.4.3 Collaborative Recommender Workflows
        4. 9.4.4 Iterative Online Updates
      6. 9.5 Content-based Recommendation
        1. 9.5.1 Attribute-based Content Recommendation
        2. 9.5.2 Similarity-based Content Recommendation
      7. 9.6 Hybrid Recommender Systems
      8. 9.7 Providing RapidMiner Recommender System Workflows as Web Services Using RapidAnalytics
        1. 9.7.1 Simple Recommender System Web Service
        2. 9.7.2 Guidelines for Optimizing Workflows for Service Usage
      9. 9.8 Summary
      10. Glossary
      11. Bibliography
        1. Figure 9.1
        2. Figure 9.2
        3. Figure 9.3
        4. Figure 9.4
        5. Figure 9.5
        6. Figure 9.6
        7. Figure 9.7
        8. Figure 9.8
        9. Figure 9.9
        10. Figure 9.10
        1. Table 9.1
        2. Table 9.2
        3. Table 9.3
        4. Table 9.4
        5. Table 9.5
    4. Chapter 10 Recommender System for Selection of the Right Study Program for Higher Education Students
      1. Abstract
      2. 10.1 Introduction
      3. 10.2 Literature Review
      4. 10.3 Automatic Classification of Students using RapidMiner
        1. 10.3.1 Data
        2. 10.3.2 Processes
          1. 10.3.2.1 Simple Evaluation Process
          2. 10.3.2.2 Complex Process (with Feature Selection)
      5. 10.4 Results
      6. 10.5 Conclusion
      7. Bibliography
        1. Figure 10.1
        2. Figure 10.2
        3. Figure 10.3
        4. Figure 10.4
        5. Figure 10.5
        6. Figure 10.6
        7. Figure 10.7
        8. Figure 10.8
        9. Figure 10.9
        10. Figure 10.10
        11. Figure 10.11
        12. Figure 10.12
        1. Table 10.1
        2. Table 10.2
        3. Table 10.3
  12. Part IV Clustering in Medical and Educational Domains
    1. Chapter 11 Visualising Clustering Validity Measures
      1. Acronyms
      2. 11.1 Overview
      3. 11.2 Clustering
        1. 11.2.1 A Brief Explanation of k-Means.
      4. 11.3 Cluster Validity Measures
        1. 11.3.1 Internal Validity Measures
        2. 11.3.2 External Validity Measures
        3. 11.3.3 Relative Validity Measures
      5. 11.4 The Data
        1. 11.4.1 Artificial Data
        2. 11.4.2 E-coli Data
      6. 11.5 Setup
        1. 11.5.1 Download and Install R Extension
        2. 11.5.2 Processes and Data
      7. 11.6 The Process in Detail
        1. 11.6.1 Import Data (A)
        2. 11.6.2 Generate Clusters (B)
        3. 11.6.3 Generate Ground Truth Validity Measures (C)
        4. 11.6.4 Generate External Validity Measures (D)
        5. 11.6.5 Generate Internal Validity Measures (E)
        6. 11.6.6 Output Results (F)
      8. 11.7 Running the Process and Displaying Results
      9. 11.8 Results and Interpretation
        1. 11.8.1 Artificial Data Ground Truth
        2. 11.8.2 E-coli Data
      10. 11.9 Conclusion
      11. Bibliography
        1. Figure 11.1
        2. Figure 11.2
        3. Figure 11.3
        4. Figure 11.4
        5. Figure 11.5
        6. Figure 11.6
        7. Figure 11.7
        8. Figure 11.8
        9. Figure 11.9
        10. Figure 11.10
        11. Figure 11.11
        12. Figure 11.12
        13. Figure 11.13
        14. Figure 11.14
        15. Figure 11.15
        16. Figure 11.16
        17. Figure 11.17
        1. Table 11.1
        2. Table 11.2
    2. Chapter 12 Grouping Higher Education Students with RapidMiner
      1. Overview
      2. 12.1 Introduction
      3. 12.2 Related Work
      4. 12.3 Using RapidMiner for Clustering Higher Education Students
        1. 12.3.1 Data
        2. 12.3.2 Process for Automatic Evaluation of Clustering Algorithms
        3. 12.3.3 Results and Discussion
      5. 12.4 Conclusion
      6. Bibliography
        1. Figure 12.1
        2. Figure 12.2
        3. Figure 12.3
        4. Figure 12.4
        5. Figure 12.5
        6. Figure 12.6
        7. Figure 12.7
        8. Figure 12.8
        9. Figure 12.9
        10. Figure 12.10
        1. Table 12.1
  13. Part V Text Mining: Spam Detection, Language Detection, and Customer Feedback Analysis
    1. Chapter 13 Detecting Text Message Spam
      1. Acronyms
      2. 13.1 Overview
      3. 13.2 Applying This Technique in Other Domains
      4. 13.3 Installing the Text Processing Extension
      5. 13.4 Getting the Data
      6. 13.5 Loading the Text
        1. 13.5.1 Data Import Wizard Step 1
        2. 13.5.2 Data Import Wizard Step 2
        3. 13.5.3 Data Import Wizard Step 3
        4. 13.5.4 Data Import Wizard Step 4
        5. 13.5.5 Step 5
      7. 13.6 Examining the Text
        1. 13.6.1 Tokenizing the Document
        2. 13.6.2 Creating the Word List and Word Vector
        3. 13.6.3 Examining the Word Vector
      8. 13.7 Processing the Text for Classification
        1. 13.7.1 Text Processing Concepts
      9. 13.8 The Naïve Bayes Algorithm
        1. 13.8.1 How It Works
      10. 13.9 Classifying the Data as Spam or Ham
      11. 13.10 Validating the Model
      12. 13.11 Applying the Model to New Data
        1. 13.11.1 Running the Model on New Data
      13. 13.12 Improvements
      14. 13.13 Summary
        1. Figure 13.1
        2. Figure 13.2
        3. Figure 13.3
        4. Figure 13.4
        5. Figure 13.5
        6. Figure 13.6
        7. Figure 13.7
        1. Table 13.1
    2. Chapter 14 Robust Language Identification with RapidMiner: A Text Mining Use Case
      1. Acronyms
      2. 14.1 Introduction
      3. 14.2 The Problem of Language Identification
      4. 14.3 Text Representation
        1. 14.3.1 Encoding
        2. 14.3.2 Token-based Representation
        3. 14.3.3 Character-Based Representation
        4. 14.3.4 Bag-of-Words Representation
      5. 14.4 Classification Models
      6. 14.5 Implementation in RapidMiner
        1. 14.5.1 Datasets
        2. 14.5.2 Importing Data
        3. 14.5.3 Frequent Words Model
        4. 14.5.4 Character n-Grams Model
        5. 14.5.5 Similarity-based Approach
      7. 14.6 Application
        1. 14.6.1 RapidAnalytics
        2. 14.6.2 Web Page Language Identification
      8. 14.7 Summary
      9. Acknowledgment
      10. Glossary
      11. Bibliography
        1. Figure 14.1
        2. Figure 14.2
        3. Figure 14.3
        4. Figure 14.4
        5. Figure 14.5
        6. Figure 14.6
        7. Figure 14.7
        8. Figure 14.8
        9. Figure 14.9
        10. Figure 14.10
        11. Figure 14.11
        12. Figure 14.12
        13. Figure 14.13
        14. Figure 14.14
        1. Table 14.1
    3. Chapter 15 Text Mining with RapidMiner
      1. 15.1 Introduction
        1. 15.1.1 Text Mining
        2. 15.1.2 Data Description
        3. 15.1.3 Running RapidMiner
        4. 15.1.4 RapidMiner Text Processing Extension Package
        5. 15.1.5 Installing Text Mining Extensions
      2. 15.2 Association Mining of Text Document Collection (Process01)
        1. 15.2.1 Importing Process01
        2. 15.2.2 Operators in Process01
        3. 15.2.3 Saving Process01
      3. 15.3 Clustering Text Documents (Process02)
        1. 15.3.1 Importing Process02
        2. 15.3.2 Operators in Process02
        3. 15.3.3 Saving Process02
      4. 15.4 Running Process01 and Analyzing the Results
        1. 15.4.1 Running Process01
        2. 15.4.2 Empty Results for Process01
        3. 15.4.3 Specifying the Source Data for Process01
        4. 15.4.4 Re-Running Process01
        5. 15.4.5 Process01 Results
        6. 15.4.6 Saving Process01 Results
      5. 15.5 Running Process02 and Analyzing the Results
        1. 15.5.1 Running Process02
        2. 15.5.2 Specifying the Source Data for Process02
        3. 15.5.3 Process02 Results
      6. 15.6 Conclusions
      7. Acknowledgment
        1. Figure 15.1
        2. Figure 15.2
        3. Figure 15.3
        4. Figure 15.4
        5. Figure 15.5
        6. Figure 15.6
        7. Figure 15.7
        8. Figure 15.8
        9. Figure 15.9
        10. Figure 15.10
        11. Figure 15.11
        12. Figure 15.12
        13. Figure 15.13
        14. Figure 15.14
        15. Figure 15.15
        16. Figure 15.16
        17. Figure 15.17
        18. Figure 15.18
        19. Figure 15.19
        20. Figure 15.20
        21. Figure 15.21
        22. Figure 15.22
        23. Figure 15.23
  14. Part VI Feature Selection and Classification in Astroparticle Physics and in Medical Domains
    1. Chapter 16 Application of RapidMiner in Neutrino Astronomy
      1. 16.1 Protons, Photons, and Neutrinos
      2. 16.2 Neutrino Astronomy
      3. 16.3 Feature Selection
        1. 16.3.1 Installation of the Feature Selection Extension for RapidMiner
        2. 16.3.2 Feature Selection Setup
        3. 16.3.3 Inner Process of the Loop Parameters Operator
        4. 16.3.4 Inner Operators of the Wrapper X-Validation
        5. 16.3.5 Settings of the Loop Parameters Operator
        6. 16.3.6 Feature Selection Stability
      4. 16.4 Event Selection Using a Random Forest
        1. 16.4.1 The Training Setup
        2. 16.4.2 The Random Forest in Greater Detail
        3. 16.4.3 The Random Forest Settings
        4. 16.4.4 The Testing Setup
      5. 16.5 Summary and Outlook
      6. Bibliography
        1. Figure 16.1
        2. Figure 16.2
        3. Figure 16.3
        4. Figure 16.4
        5. Figure 16.5
        6. Figure 16.6
        7. Figure 16.7
        8. Figure 16.8
        9. Figure 16.9
        10. Figure 16.10
        11. Figure 16.11
        12. Figure 16.12
        13. Figure 16.13
        14. Figure 16.14
        15. Figure 16.15
        16. Figure 16.16
        17. Figure 16.17
        18. Figure 16.18
        19. Figure 16.19
        20. Figure 16.20
        21. Figure 16.21
        22. Figure 16.22
    2. Chapter 17 Medical Data Mining
      1. 17.1 Background
      2. 17.2 Description of Problem Domain: Two Medical Examples
        1. 17.2.1 Carpal Tunnel Syndrome
        2. 17.2.2 Diabetes
      3. 17.3 Data Mining Algorithms in Medicine
        1. 17.3.1 Predictive Data Mining
        2. 17.3.2 Descriptive Data Mining
        3. 17.3.3 Data Mining and Statistics: Hypothesis Testing
      4. 17.4 Knowledge Discovery Process in RapidMiner: Carpal Tunnel Syndrome
        1. 17.4.1 Defining the Problem, Setting the Goals
        2. 17.4.2 Dataset Representation
        3. 17.4.3 Data Preparation
        4. 17.4.4 Modeling
        5. 17.4.5 Selecting Appropriate Methods for Classification
        6. 17.4.6 Results and Data Visualisation
        7. 17.4.7 Interpretation of the Results
        8. 17.4.8 Hypothesis Testing and Statistical Analysis
        9. 17.4.9 Results and Visualisation
      5. 17.5 Knowledge Discovery Process in RapidMiner: Diabetes
        1. 17.5.1 Problem Definition, Setting the Goals
        2. 17.5.2 Data Preparation
        3. 17.5.3 Modeling
        4. 17.5.4 Results and Data Visualization
        5. 17.5.5 Hypothesis Testing
      6. 17.6 Specifics in Medical Data Mining
      7. 17.7 Summary
      8. Bibliography
        1. Figure 17.1
        2. Figure 17.2
        3. Figure 17.3
        4. Figure 17.4
        5. Figure 17.5
        6. Figure 17.6
        7. Figure 17.7
        8. Figure 17.8
        9. Figure 17.9
        10. Figure 17.10
        11. Figure 17.11
        12. Figure 17.12
        13. Figure 17.13
        14. Figure 17.14
        15. Figure 17.15
        16. Figure 17.16
        17. Figure 17.17
        18. Figure 17.18
        19. Figure 17.19
        20. Figure 17.20
        21. Figure 17.21
        22. Figure 17.22
        23. Figure 17.23
        24. Figure 17.24
        1. Table 17.1
        2. Table 17.2
        3. Table 17.3
        4. Table 17.4
  15. Part VII Molecular Structure- and Property-Activity Relationship Modeling in Biochemistry and Medicine
    1. Chapter 18 Using PaDEL to Calculate Molecular Properties and Chemoinformatic Models
      1. 18.1 Introduction
      2. 18.2 Molecular Structure Formats for Chemoinformatics
      3. 18.3 Installation of the PaDEL Extension for RapidMiner
      4. 18.4 Applications and Capabilities of the PaDEL Extension
      5. 18.5 Examples of Computer-aided Predictions
      6. 18.6 Calculation of Molecular Properties
      7. 18.7 Generation of a Linear Regression Model
      8. 18.8 Example Workflow
      9. 18.9 Summary
      10. Acknowledgment
      11. Bibliography
        1. Figure 18.1
        2. Figure 18.2
        3. Figure 18.3
        4. Figure 18.4
        1. Table 18.1
    2. Chapter 19 Chemoinformatics: Structure- and Property-activity Relationship Development
      1. 19.1 Introduction
      2. 19.2 Example Workflow
      3. 19.3 Importing the Example Set
      4. 19.4 Preprocessing of the Data
      5. 19.5 Feature Selection
      6. 19.6 Model Generation
      7. 19.7 Validation
      8. 19.8 Y-Randomization
      9. 19.9 Results
      10. 19.10 Conclusion/Summary
      11. Acknowledgment
      12. Bibliography
        1. Figure 19.1
        2. Figure 19.2
        3. Figure 19.3
        4. Figure 19.4
        5. Figure 19.5
        6. Figure 19.6
        1. Table 19.1
  16. Part VIII Image Mining: Feature Extraction, Segmentation, and Classification
    1. Chapter 20 Image Mining Extension for RapidMiner (Introductory)
      1. Acronyms
      2. 20.1 Introduction
      3. 20.2 Image Reading/Writing
      4. 20.3 Conversion between Colour and Grayscale Images
      5. 20.4 Feature Extraction
        1. 20.4.1 Local Level Feature Extraction
        2. 20.4.2 Segment-Level Feature Extraction
        3. 20.4.3 Global-Level Feature Extraction
      6. 20.5 Summary
      7. Exercises
      8. Glossary
      9. Bibliography
        1. Figure 20.1
        2. Figure 20.2
        3. Figure 20.3
        4. Figure 20.4
        5. Figure 20.5
        6. Figure 20.6
        7. Figure 20.7
        8. Figure 20.8
        9. Figure 20.9
        10. Figure 20.10
        11. Figure 20.11
        12. Figure 20.12
        13. Figure 20.13
    2. Chapter 21 Image Mining Extension for RapidMiner (Advanced)
      1. Acronyms
      2. 21.1 Introduction
      3. 21.2 Image Classification
        1. 21.2.1 Load Images and Assign Labels
        2. 21.2.2 Global Feature Extraction
      4. 21.3 Pattern Detection
        1. 21.3.1 Process Creation
      5. 21.4 Image Segmentation and Feature Extraction
      6. 21.5 Summary
      7. Bibliography
        1. Figure 21.1
        2. Figure 21.2
        3. Figure 21.3
        4. Figure 21.4
        5. Figure 21.5
        6. Figure 21.6
        7. Figure 21.7
        8. Figure 21.8
        9. Figure 21.9
        10. Figure 21.10
        11. Figure 21.11
        12. Figure 21.12
        13. Figure 21.13
        14. Figure 21.14
        15. Figure 21.15
  17. Part IX Anomaly Detection, Instance Selection, and Prototype Construction
    1. Chapter 22 Instance Selection in RapidMiner
      1. Acronyms
      2. 22.1 Introduction
      3. 22.2 Instance Selection and Prototype-Based Rule Extension
      4. 22.3 Instance Selection
        1. 22.3.1 Description of the Implemented Algorithms
        2. 22.3.2 Accelerating 1-NN Classification
        3. 22.3.3 Outlier Elimination and Noise Reduction
        4. 22.3.4 Advances in Instance Selection
      5. 22.4 Prototype Construction Methods
      6. 22.5 Mining Large Datasets
      7. 22.6 Summary
      8. Bibliography
        1. Figure 22.1
        2. Figure 22.2
        3. Figure 22.3
        4. Figure 22.4
        5. Figure 22.5
        6. Figure 22.6
        7. Figure 22.7
        8. Figure 22.8
        9. Figure 22.9
        10. Figure 22.10
        11. Figure 22.11
        12. Figure 22.12
        13. Figure 22.13
        14. Figure 22.14
        15. Figure 22.15
        16. Figure 22.16
        17. Figure 22.17
        18. Figure 22.18
        19. Figure 22.19
        20. Figure 22.20
        21. Figure 22.21
        1. Table 22.1
        2. Table 22.2
        3. Table 22.3
    2. Chapter 23 Anomaly Detection
      1. Acronyms
      2. 23.1 Introduction
      3. 23.2 Categorizing an Anomaly Detection Problem
        1. 23.2.1 Type of Anomaly Detection Problem (Pre-processing)
        2. 23.2.2 Local versus Global Problems
        3. 23.2.3 Availability of Labels
      4. 23.3 A Simple Artificial Unsupervised Anomaly Detection Example
      5. 23.4 Unsupervised Anomaly Detection Algorithms
        1. 23.4.1 k-NN Global Anomaly Score
        2. 23.4.2 Local Outlier Factor (LOF)
        3. 23.4.3 Connectivity-Based Outlier Factor (COF)
        4. 23.4.4 Influenced Outlierness (INFLO)
        5. 23.4.5 Local Outlier Probability (LoOP)
        6. 23.4.6 Local Correlation Integral (LOCI) and aLOCI
        7. 23.4.7 Cluster-Based Local Outlier Factor (CBLOF)
        8. 23.4.8 Local Density Cluster-Based Outlier Factor (LDCOF)
      6. 23.5 An Advanced Unsupervised Anomaly Detection Example
      7. 23.6 Semi-supervised Anomaly Detection
        1. 23.6.1 Using a One-Class Support Vector Machine (SVM)
        2. 23.6.2 Clustering and Distance Computations for Detecting Anomalies
      8. 23.7 Summary
      9. Glossary
      10. Bibliography
        1. Figure 23.1
        2. Figure 23.2
        3. Figure 23.3
        4. Figure 23.4
        5. Figure 23.5
        6. Figure 23.6
        7. Figure 23.7
        8. Figure 23.8
        9. Figure 23.9
        10. Figure 23.10
        11. Figure 23.11
        12. Figure 23.12
        13. Figure 23.13
        14. Figure 23.14
        15. Figure 23.15
        16. Figure 23.16
        17. Figure 23.17
        18. Figure 23.18
        19. Figure 23.19
        20. Figure 23.20
        21. Figure 23.21
        1. Table 23.1
        2. Table 23.2
        3. Table 23.3
  18. Part X Meta-Learning, Automated Learner Selection, Feature Selection, and Parameter Optimization
    1. Chapter 24 Using RapidMiner for Research: Experimental Evaluation of Learners
      1. 24.1 Introduction
      2. 24.2 Research of Learning Algorithms
        1. 24.2.1 Sources of Variation and Control
        2. 24.2.2 Example of an Experimental Setup
      3. 24.3 Experimental Evaluation in RapidMiner
        1. 24.3.1 Setting Up the Evaluation Scheme
        2. 24.3.2 Looping Through a Collection of Datasets
        3. 24.3.3 Looping Through a Collection of Learning Algorithms
        4. 24.3.4 Logging and Visualizing the Results
        5. 24.3.5 Statistical Analysis of the Results
        6. 24.3.6 Exception Handling and Parallelization
        7. 24.3.7 Setup for Meta-Learning
      4. 24.4 Conclusions
      5. Bibliography
        1. Figure 24.1
        2. Figure 24.2
        3. Figure 24.3
        4. Figure 24.4
        5. Figure 24.5
        6. Figure 24.6
        7. Figure 24.7
        8. Figure 24.8
        9. Figure 24.9
        10. Figure 24.10
        11. Figure 24.11
        12. Figure 24.12
        13. Figure 24.13
        14. Figure 24.14