Book description
This IBM® Redbooks® publication provides information about aspects of performing infrastructure health checks, such as checking the configuration and verifying the functionality of the common subsystems (nodes or servers, switch fabric, parallel file system, job management, problem areas, and so on).
This IBM Redbooks publication documents how to monitor the overall health check of the cluster infrastructure, to deliver technical computing clients cost-effective, highly scalable, and robust solutions.
This IBM Redbooks publication is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) responsible for delivering cost-effective Technical Computing and IBM High Performance Computing (HPC) solutions to optimize business results, product development, and scientific discoveries. This book provides a broad understanding of a new architecture.
Table of contents
- Front cover
- Notices
- Preface
-
Chapter 1. Introduction
- 1.1 Overview of the IBM HPC solution
- 1.2 Why we need a methodical approach for cluster consistency checking
- 1.3 Tools and interpreting their results for HW and SW states
- 1.4 Tools and interpreting their results for identifying performance inconsistencies
- 1.5 Template of diagnostics steps that can be used (checklists)
- Chapter 2. Key concepts and interdependencies
- Chapter 3. The health lifecycle methodology
-
Chapter 4. Cluster components reference model
- 4.1 Overview of installed cluster systems
- 4.2 ClusterA nodes hardware description
- 4.3 ClusterA software description
- 4.4 ClusterB nodes hardware description
- 4.5 ClusterB software description
- 4.6 ClusterC nodes hardware description
- 4.7 ClusterC software description
- 4.8 Interconnect infrastructure
- 4.9 GPFS cluster
- Chapter 5. Toolkits for verifying health (individual diagnostics)
- Appendix A. Commonly used tools
- Appendix B. Tools and commands outside of the toolkit
- Related publications
- Back cover
Product information
- Title: IBM High Performance Computing Cluster Health Check
- Author(s):
- Release date: February 2014
- Publisher(s): IBM Redbooks
- ISBN: None
You might also like
book
IBM High-Performance Computing Insights with IBM Power System AC922 Clustered Solution
This IBM® Redbooks® publication documents and addresses topics to set up a complete infrastructure environment and …
book
Optimizing HPC Applications with Intel® Cluster Tools
Optimizing HPC Applications with Intel® Cluster Tools takes the reader on a tour of the fast-growing …
book
IBM Spectrum Scale (formerly GPFS)
This IBM® Redbooks® publication updates and complements the previous publication: Implementing the IBM General Parallel File …
book
Building a Linux HPC Cluster with xCAT
This IBM Redbooks publication will guide system architects and systems engineers toward a basic understanding of …