You are previewing Intelligent Data Analysis: Developing New Methodologies Through Pattern Discovery and Recovery.
O'Reilly logo
Intelligent Data Analysis: Developing New Methodologies Through Pattern Discovery and Recovery

Book Description

Intelligent Data Analysis: Developing New Methodologies Through Pattern Discovery and Recovery covers a variety of issues in relation to intelligent data analysis and brings together current research, results, problems, and applications from both theoretical and practical approaches.

Table of Contents

  1. Copyright
  2. Editorial Advisory Board
  3. Foreword
  4. Preface
  5. Acknowledgment
  6. I. Introduction
    1. I. Automatic Intelligent Data Analysis
      1. ABSTRACT
      2. INTRODUCTION
      3. AUTOMATING DATA ANALYSIS
      4. SPIDA
      5. FUZZY MATCHING OF REQUIREMENTS AND PROPERTIES
      6. SPIDA AS AN ANALYTICAL SERVICE
      7. CASE STUDIES
      8. CONCLUSION
    2. REFERENCES
    3. II. Random Fuzzy Sets
      1. ABSTRACT
      2. FROM MULTIVARIATE STATISTICAL ANALYSIS TO RANDOM SETS
      3. THE LAWSON TOPOLOGY OF CLOSED SETS
        1. Proof
      4. METRICS ON CLOSED SETS
      5. RANDOM CLOSED SETS: FINAL DEFINITION AND CHOQUET THEOREM
      6. FROM RANDOM SETS TO RANDOM FUZZY SETS
      7. THE CONTINUOUS LATTICE OF UPPER SEMICONTINUOUS FUNCTIONS
        1. Remarks
      8. METRICS AND CHOQUET THEOREM FOR RANDOM FUZZY SETS
      9. TOWARDS PRACTICAL APPLICATIONS
      10. CASE STUDY: A BIOINFORMATICS PROBLEM
      11. ACKNOWLEDGMENT
    4. REFERENCES
    5. III. Pattern Discovery in Gene Expression Data
      1. ABSTRACT
      2. INTRODUCTION
      3. BACKGROUND
        1. The Gene Expression Dataset
        2. Characteristics Of The Gene Expression Dataset
        3. Methods of Identifying Groups of Related Genes
          1. Similarity-Based
          2. Density-Based
          3. Model-Based
      4. CLUSTER ANALYSIS
        1. Current Methods
          1. Hierarchical Methods
          2. Partitive Methods
          3. Fuzzy Methods
          4. Artificial Neural Networks
          5. Search Based Methods
          6. Graph theoretic Methods:
        2. Cluster Evaluation and Comparison
          1. Determining the Correct Number of Clusters
          2. Comparing Results from Clustering Algorithms
          3. Comparison with Metadata
      5. FUTURE TRENDS
      6. CONCLUSION
    6. REFERENCES
      1. ENDNOTES
    7. IV. Using "Blackbox" Algorithms Such as TreeNet and Random Forests for Data-Mining and for Finding Meaningful Patterns, Relationships, and Outliers in Complex Ecological Data: An Overview, an Example Using Golden Eagle Satellite Data and an Outlook for a Promising Future
      1. ABSTRACT
      2. INTRODUCTION
      3. METHODS AND MATERIALS
      4. RESULTS
      5. DISCUSSION
      6. ACKNOWLEDGMENT
    8. REFERENCES
    9. V. A New Approach to Classification of Imbalanced Classes via Atanassov's Intuitionistic Fuzzy Sets1
      1. ABSTRACT
      2. INTRODUCTION
        1. A Brief Introduction to Atanassov'S Intuitionistic Fuzzy Sets
        2. Converting Relative Frequency Distributions Into A-IFSs
        3. Definition 1 (Mass Assignment)
          1. Example 1 (Baldwin et al., 1995b)
          2. Definition 2 (Least Prejudiced Distribution) (Baldwin et al., 1995)
          3. Theorem 1 (Baldwin et al., 1998)
          4. Example 2 (Baldwin et al., 1998)
        4. The Algorithm of Assigning The Parameters Of A-IFSs
          1. Example 3 (Szmidt & Baldwin, 2006)
        5. The Models of a Classifier Error
          1. Example 4
        6. Confusion Matrix
        7. A Simple Classification Problem
        8. Classification via Fuzzy Sets
        9. Classification via Intuitionistic Fuzzy Sets
        10. Results Obtained for a Benchmark Classification Problem
      3. CONCLUSION
    10. REFERENCES
  7. II. Pattern Discovery from Huge Data Set: Methodologies
    1. VI. Fuzzy Neural Networks for Knowledge Discovery
      1. ABSTRACT
      2. INTRODUCTION
      3. BACKGROUND
        1. Data Selection, Pre-Processing, and Transformation
        2. Data Mining
        3. Knowledge Discovery and Rule Extraction Techniques
      4. METHODOLOGY
        1. Fuzzy Inference System
        2. Fuzzy-Neural Network Models with Supervised Learning
        3. Fuzzy Neural Network Models for Clustering
        4. Rule Generation and Optimization
      5. CASE STUDIES
      6. SUMMARY
      7. ACKNOWLEDGMENT
    2. REFERENCES
    3. VII. Genetic Learning: Initialization and Representation Issues
      1. ABSTRACT
      2. INTRODUCTION
      3. BACKGROUND
      4. MAIN THRUST OF THE CHAPTER: RULE-INDUCING GENETIC LEARNER
        1. Genetic-Based Learner
        2. (M) Michigan Approach
        3. (P) Pittsburgh Approach
        4. Discretization/Fuzzification of numerical Attributes
        5. Initial Population
        6. Representation of Decision Rules/Sets
          1. (M) Michigan Approach
          2. (P) Pittsburgh Approach
      5. CONCLUSION AND FUTURE TRENDS
    4. REFERENCES
    5. VIII. Evolutionary Computing
      1. ABSTRACT
      2. INTRODUCTION
      3. EVOLUTIONARY ALGORITHMS
      4. SWARM INTELLIGENCE
        1. Flocking
        2. Ant Colony Optimization
        3. Particle Swarm Optimization
        4. PSO versus Evolutionary Computing
      5. APPLICATIONS OF EA AND SI
        1. Adaptive Sampling using an Evolutionary Algorithm
        2. Distributed Flocking Algorithm for Information Stream Clustering Analysis
      6. FUTURE TRENDS & CONCLUSION
    6. REFERENCES
    7. IX. Particle Identification Using Light Scattering: A Global Optimization Problem
      1. ABSTRACT
      2. INTRODUCTION
      3. PARTICLE IDENTIFICATION USING LEAST-SQUARES
      4. GLOBAL OPTIMIZATION APPLIED TO PARTICLE IDENTIFICATION
        1. Particle Identification with Noisy Data
      5. PARTICLE IDENTIFICATION FROM EXPERIMENTAL DATA
        1. Global Optimization Applied to the Experimental Data Sets
        2. More Detailed Results for Dataset py12log
          1. More Detailed Results for Dataset Ip29log
          2. More Detailed Results for Dataset n1log
          3. More Detailed Results for Dataset pilog
        3. Identification Based on other Error Norms
      6. IDENTIFICATION USING A COMPOSITE ERROR FUNCTION
        1. Using the Composite Error Function with Dataset py12Log
        2. Using the Composite Error Function with Dataset lp29log
        3. Using the Composite Error Function with Dataset n1log
        4. Using the Composite Error Function with Dataset p1log
        5. The Balance Between E2 and E3
      7. IDENTIFICATION USING CONSTRAINTS ON PEAK-MATCHING
      8. DISCUSSION AND CONCLUSION
    8. REFERENCES
    9. X. Exact Markov Chain Monte Carlo Algorithms and their Applications in Probabilistic Data Analysis and Inference
      1. ABSTRACT
      2. INTRODUCTION
      3. MARKOV CHAIN MONTE CARLO
        1. Metropolis-Hastings Algorithm
        2. Single-Component Metropolis-Hastings Sampler
        3. Gibbs Sampler
        4. Slice Sampler
      4. EXACT MCMC (PERFECT SAMPLING)
        1. Coupling from the Past
        2. Coupling from the Past
        3. Coupling from the Past (Set Version)
        4. Monotone Coupling from the Past
        5. Monotone CFTP
        6. Extensions of CFTP to General State Space
      5. IMH CFTP
        1. Multi-Gamma Coupler
        2. Multi-Gamma Sampler
        3. Dominated CFTP
        4. Other Exact MCMC Algorithms
        5. Fill's Algorithm
      6. APPLICATIONS OF EXACT MCMC
        1. Markov Random Fields
        2. Linear Models
        3. Mixture Models
        4. Other Applications
      7. CONCLUSION
    10. REFERENCES
      1. APPENDIX: GLOSSARY ON MARKOV CHAINS FOR MCMC
        1. Continuous State Space
        2. Discrete State Space
        3. Rates of Convergence to the Stationary Distribution
  8. III. Pattern Discovery from Huge Data Set: Applications
    1. XI. Design of Knowledge Bases for Forward and Reverse Mappings of TIG Welding Process
      1. ABSTRACT
      2. INTRODUCTION
      3. BACKGROUND
        1. Theoretical Studies
        2. Conventional Statistical Regression Analysis
        3. Soft Computing-Based Approaches
      4. MAIN THRUST
        1. Statement of the Problem
        2. Multiple Least-Square Regression Analysis Using Full-Factorial Design of Experiments (Montgomery, 1997)
        3. Forward and Reverse Mappings Using FL-Based Approaches
          1. Design of FLC for Forward Mapping
          2. Approach 1: GA-Based Tuning of the DB (Symmetric) and RB of the Manually Constructed FLC
          3. Approach 2: GA-Based Tuning of RB and DB (asymmetric) of the Manually Constructed FLC
          4. Approach 3: Automatic Design of FLC (Having Symmetrical DB) Using a GA
          5. Approach 4: Automatic design of FLC (Having Asymmetric DB) Using GA
        4. Design of FLC for Reverse Mapping
          1. Approach 1: Automatic Design of FLC (Symmetric Membership Function Distribution)
          2. Approach 2: Automatic Design of FLC (Asymmetric Membership Function Distribution)
      5. FUTURE TRENDS
      6. CONCLUSION
    2. REFERENCES
    3. XII. A Fuzzy Decision Tree Analysis of Traffic Fatalities in the U.S.
      1. ABSTRACT
      2. INTRODUCTION
      3. BACKGROUND
        1. Basic Concepts of Fuzzy Set Theory
        2. Description Fuzzy Decision Trees
        3. Example Fuzzy Decision Tree Analysis
      4. FUZZY DECISION TREE ANALYSIS OF U.S. TRAFFIC FATALITIES
      5. TRAFFIC FATALITIES IN THE U.S. AND FUZZIFICATION OF DATA
        1. Fuzzy Decision Tree Analysis of Fuzzy Traffic Data Set
      6. FUTURE TRENDS
      7. CONCLUSION
    4. REFERENCES
    5. XIII. New Churn Prediction Strategies in Telecom Industry
      1. ABSTRACT
      2. INTRODUCTION
      3. CUSTOMER LIFECYCLE
      4. CHURN MANAGEMENT ON DIFFERENT DATA LAYERS
        1. Prior Churn Rate
        2. Customer Lifetime
        3. HMM-Based Binary Events Prediction
        4. Temporal Classification on Events Data
        5. Complete Events Sequence Prediction
      5. EXPERIMENTS
      6. CONCLUSION
    6. REFERENCES
    7. XIV. Intelligent Classification and Ranking Analyses Using CaRBS Bank Rating Application
      1. ABSTRACT
      2. INTRODUCTION
      3. BACKGROUND
        1. The General CaRBS Technique
        2. The Use of CaRBS for Object Classification
        3. The Use of CaRBS for Object Ranking
        4. Further Associated Measures Included with CaRBS
      4. CaRBS CLASSIFICATION AND RANKING ANALYSIS OF BANK DATA SET
        1. Bank Data Set
        2. Binary Classification of Bank Data Set using CaRBS
        3. First Ranking Analysis of Bank Data Set
        4. Second Ranking Analysis of Bank Data Set
      5. FUTURE TRENDS
      6. CONCLUSION
    8. REFERENCES
    9. XV. Analysis of Individual Risk Attitude for Risk Management Based on Cumulative Prospect Theory
      1. ABSTRACT
      2. INTRODUCTION
      3. BASIC CONCEPTS
        1. The Proposed Individual Risk Management Process (IRM)
          1. Risk Analysis
        2. Individual's Risk Levels and Risk Zones
        3. Regression Model for the Possibility of Each Risk Zone
        4. Analysis of the Slope Interval of Each Risk Zone
        5. Risk Response
        6. Response Strategy
        7. Response Evaluation Model
        8. Summary
        9. Illustrative Example
        10. Problem Description and Analysis
        11. Risk Analysis
          1. Decision 1
          2. Decision 2
        12. Risk Response
        13. Discussion and Conclusion
      4. CONCLUSION
    10. REFERENCES
  9. IV. Pattern Recovery from Small Data Set: Methodologies and Applications
    1. XVI. Neural Networks and Bootstrap Methods for Regression Models with Dependent Errors
      1. ABSTRACT
      2. INTRODUCTION
      3. NEURAL NETWORKS IN REGRESSION MODELS
      4. RESAMPLING SCHEMES FOR NEURAL NETWORK REGRESSION MODELS
      5. SOME MONTE CARLO RESULTS
      6. AN APPLICATION TO REAL DATA
      7. CONCLUDING REMARKS
    2. REFERENCES
    3. XVII. Financial Crisis Modeling and Prediction with a Hilbert-EMD-Based SVM Approach
      1. ABSTRACT
      2. INTRODUCTION
      3. HILBERT-EMD-BASED SVM APPROACH TO FINANCIAL CRISIS FORECASTING
        1. Overview of Hilbert-EMD Technique
        2. Overview of SVM
        3. Hilbert-EMD-Based SVM Approach to Financial Crisis Forecasting
      4. EXPERTMENT STUDY
        1. Data Description and Experiment Design
        2. Experimental Results
          1. A. Methodology Implementation Process
          2. B. Classification Results for the Training Samples
          3. C. Classification Results for Testing Sample
          4. D. Results of Comparison with Other Methods
      5. CONCLUSION
      6. ACKNOWLEDGMENT
    4. REFERENCES
    5. XVIII. Virtual Sampling with Data Construction Method
      1. ABSTRACT
      2. INTRODUCTION
        1. The Procedure of Virtual Sample Generation
        2. Definition 2.1 Small Samples (Huang, 2002)
        3. The Intervalized Kernel Density Estimation (IKDE)
        4. The Validation of IKDE
        5. Definition 2.2 Possibility Function (Spott, 1999; Steuer, 1986)
        6. Definition 2.3 The Density Estimation Function
        7. Theorem 2.1 The Extended Decomposition Theorem
          1. Proof
        8. The Proposed Method, Data Construction Method (DCM)
        9. Comparative Study
        10. Conclusions and Discussion
      3. ACKNOWLEDGMENT
    6. REFERENCES
  10. Compilation of References
  11. About the Contributors