You are previewing Data Mining Techniques in Grid Computing Environments.
O'Reilly logo
Data Mining Techniques in Grid Computing Environments

Book Description

Based around eleven international real life case studies and including contributions from leading experts in the field this groundbreaking book explores the need for the grid-enabling of data mining applications and provides a comprehensive study of the technology, techniques and management skills necessary to create them. This book provides a simultaneous design blueprint, user guide, and research agenda for current and future developments and will appeal to a broad audience; from developers and users of data mining and grid technology, to advanced undergraduate and postgraduate students interested in this field.

Table of Contents

  1. Cover Page
  2. Title Page
  3. Copyright
  4. Contents
  5. Preface
    1. Acknowledgments
  6. List of Contributors
  7. 1: Data mining meets grid computing: Time to dance?
    1. 1.1 Introduction
    2. 1.2 Data mining
    3. 1.3 Grid computing
    4. 1.4 Data mining grid – mining grid data
    5. 1.5 Conclusions
    6. 1.6 Summary of chapters in this volume
    7. References
  8. 2: Data analysis services in the Knowledge Grid
    1. 2.1 Introduction
    2. 2.2 Approach
    3. 2.3 Knowledge Grid services
    4. 2.4 Data analysis services
    5. 2.5 Design of Knowledge Grid applications
    6. 2.6 Conclusions
    7. References
  9. 3: GridMiner: An advanced support for e-science analytics
    1. 3.1 Introduction
    2. 3.2 Rationale behind the design and development of GridMiner
    3. 3.3 Use case
    4. 3.4 Knowledge discovery process and its support by the GridMiner
    5. 3.5 Graphical user interface
    6. 3.6 Future developments
    7. 3.7 Conclusions
    8. References
  10. 4: ADaM services: Scientific data mining in the service-oriented architecture paradigm
    1. 4.1 Introduction
    2. 4.2 ADaM system overview
    3. 4.3 ADaM toolkit overview
    4. 4.4 Mining in a service-oriented architecture
    5. 4.5 Mining web services
    6. 4.6 Mining grid services
    7. 4.7 Summary
    8. Acknowledgements
    9. References
  11. 5: Mining for misconfigured machines in grid systems
    1. 5.1 Introduction
    2. 5.2 Preliminaries and related work
    3. 5.3 Acquiring, pre-processing and storing data
    4. 5.4 Data analysis
    5. 5.5 The GMS
    6. 5.6 Evaluation
    7. 5.7 Conclusions and future work
    8. References
  12. 6: FAEHIM: Federated Analysis Environment for Heterogeneous Intelligent Mining
    1. 6.1 Introduction
    2. 6.2 Requirements of a distributed knowledge discovery framework
    3. 6.3 Workflow-based knowledge discovery
    4. 6.4 Data mining toolkit
    5. 6.5 Data mining service framework
    6. 6.6 Distributed data mining services
    7. 6.7 Data manipulation tools
    8. 6.8 Availability
    9. 6.9 Empirical experiments
    10. 6.10 Conclusions
    11. References
  13. 7: Scalable and privacy preserving distributed data analysis over a service-oriented platform
    1. 7.1 Introduction
    2. 7.2 A service-oriented solution
    3. 7.3 Background
    4. 7.4 Model-based scalable, privacy preserving, distributed data analysis
    5. 7.5 Modelling distributed data mining and workflow processes
    6. 7.6 Lessons learned
    7. 7.7 Further research directions
    8. 7.8 Conclusions
    9. Acknowledgements
    10. References
  14. 8: Building and using analytical workflows in Discovery Net
    1. 8.1 Introduction
    2. 8.2 Discovery Net system
    3. 8.3 Architecture for Discovery Net
    4. 8.4 Data management
    5. 8.5 Example of a workflow study
    6. 8.6 Future directions
    7. Acknowledgements
    8. References
  15. 9: Building workflows that traverse the bioinformatics data landscape
    1. 9.1 Introduction
    2. 9.2 The bioinformatics data landscape
    3. 9.3 The bioinformatics experiment landscape
    4. 9.4 Taverna for bioinformatics experiments
    5. 9.5 Building workflows in Taverna
    6. 9.6 Workflow case study
    7. 9.7 Discussion
    8. Acknowledgements
    9. References
  16. 10: Specification of distributed data mining workflows with DataMiningGrid
    1. 10.1 Introduction
    2. 10.2 DataMiningGrid environment
    3. 10.3 Operations for workflow construction
    4. 10.4 Extensibility
    5. 10.5 Case studies
    6. 10.6 Discussion and related work
    7. 10.7 Open issues
    8. 10.8 Conclusions
    9. References
  17. 11: Anteater: Service-oriented data mining
    1. 11.1 Introduction
    2. 11.2 The architecture
    3. 11.3 Runtime framework
    4. 11.4 Parallel algorithms for data mining
    5. 11.5 Visual metaphors
    6. 11.6 Case studies
    7. 11.7 Future developments
    8. 11.8 Conclusions and future work
    9. References
  18. 12: DMGA: A generic brokering-based Data Mining Grid Architecture
    1. 12.1 Introduction
    2. 12.2 DMGA overview
    3. 12.3 Horizontal composition
    4. 12.4 Vertical composition
    5. 12.5 The need for brokering
    6. 12.6 Brokering-based data mining grid architecture
    7. 12.7 Use cases: Apriori, ID3 and J4.8 algorithms
    8. 12.8 Related work
    9. 12.9 Conclusions
    10. References
  19. 13: Grid-based data mining with the Environmental Scenario Search Engine (ESSE)
    1. 13.1 Environmental data source: NCEP/NCAR reanalysis data set
    2. 13.2 Fuzzy search engine
    3. 13.3 Software architecture
    4. 13.4 Applications
    5. 13.5 Conclusions
    6. References
  20. 14: Data pre-processing using OGSA-DAI
    1. 14.1 Introduction
    2. 14.2 Data pre-processing for grid-enabled data mining
    3. 14.3 Using OGSA-DAI to support data mining applications
    4. 14.4 Data pre-processing scenarios in data mining applications
    5. 14.5 State-of-the-art solutions for grid data management
    6. 14.6 Discussion
    7. 14.7 Open Issues
    8. 14.8 Conclusions
    9. References
  21. Index