## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

## Book Description

Big data' poses challenges that require both classical multivariate methods and contemporary techniques from machine learning and engineering. This modern text equips you for the new world - integrating the old and the new, fusing theory and practice and bridging the gap to statistical learning. The theoretical framework includes formal statements that set out clearly the guaranteed 'safe operating zone' for the methods and allow you to assess whether data is in the zone, or near enough. Extensive examples showcase the strengths and limitations of different methods with small classical data, data from medicine, biology, marketing and finance, high-dimensional data from bioinformatics, functional data from proteomics, and simulated data. High-dimension low-sample-size data gets special attention. Several data sets are revisited repeatedly to allow comparison of methods. Generous use of colour, algorithms, Matlab code, and problem sets complete the package. Suitable for master's/graduate students in statistics and researchers in data-rich disciplines.

1. Cover
2. Title Page
4. Dedication
5. Contents
6. List of Algorithms
7. Notation
8. Preface
9. I CLASSICAL METHODS
1. 1 Multidimensional Data
1. 1.1 Multivariate and High-Dimensional Problems
2. 1.2 Visualisation
3. 1.3 Multivariate Random Vectors and Data
4. 1.4 Gaussian Random Vectors
5. 1.5 Similarity, Spectral and Singular Value Decomposition
2. 2 Principal Component Analysis
1. 2.1 Introduction
2. 2.2 Population Principal Components
3. 2.3 Sample Principal Components
4. 2.4 Visualising Principal Components
5. 2.5 Properties of Principal Components
6. 2.6 Standardised Data and High-Dimensional Data
7. 2.7 Asymptotic Results
8. 2.8 Principal Component Analysis, the Number of Components and Regression
3. 3 Canonical Correlation Analysis
1. 3.1 Introduction
2. 3.2 Population Canonical Correlations
3. 3.3 Sample Canonical Correlations
4. 3.4 Properties of Canonical Correlations
5. 3.5 Canonical Correlations and Transformed Data
6. 3.6 Asymptotic Considerations and Tests for Correlation
7. 3.7 Canonical Correlations and Regression
4. 4 Discriminant Analysis
1. 4.1 Introduction
2. 4.2 Classes, Labels, Rules and Decision Functions
3. 4.3 Linear Discriminant Rules
4. 4.4 Evaluation of Rules and Probability of Misclassification
5. 4.5 Discrimination under Gaussian Assumptions
6. 4.6 Bayesian Discrimination
7. 4.7 Non-Linear, Non-Parametric and Regularised Rules
8. 4.8 Principal Component Analysis, Discrimination and Regression
5. Problems for Part I
10. II FACTORS AND GROUPINGS
1. 5 Norms, Proximities, Features and Dualities
1. 5.1 Introduction
2. 5.2 Vector and Matrix Norms
3. 5.3 Measures of Proximity
4. 5.4 Features and Feature Maps
5. 5.5 Dualities for X and XT
2. 6 Cluster Analysis
1. 6.1 Introduction
2. 6.2 Hierarchical Agglomerative Clustering
3. 6.3 k-Means Clustering
4. 6.4 Second-Order Polynomial Histogram Estimators
5. 6.5 Principal Components and Cluster Analysis
6. 6.6 Number of Clusters
3. 7 Factor Analysis
1. 7.1 Introduction
2. 7.2 Population k-Factor Model
3. 7.3 Sample k-Factor Model
5. 7.5 Asymptotic Results and the Number of Factors
6. 7.6 Factor Scores and Regression
7. 7.7 Principal Components, Factor Analysis and Beyond
4. 8 Multidimensional Scaling
1. 8.1 Introduction
2. 8.2 Classical Scaling
3. 8.3 Metric Scaling
4. 8.4 Non-Metric Scaling
5. 8.5 Data and Their Configurations
6. 8.6 Scaling for Grouped and Count Data
5. Problems for Part II
11. III NON-GAUSSIAN ANALYSIS
1. 9 Towards Non-Gaussianity
1. 9.1 Introduction
2. 9.2 Gaussianity and Independence
3. 9.3 Skewness, Kurtosis and Cumulants
4. 9.4 Entropy and Mutual Information
5. 9.5 Training, Testing and Cross-Validation
2. 10 Independent Component Analysis
1. 10.1 Introduction
2. 10.2 Sources and Signals
3. 10.3 Identification of the Sources
4. 10.4 Mutual Information and Gaussianity
5. 10.5 Estimation ofthe Mixing Matrix
6. 10.6 Non-Gaussianity and Independence in Practice
7. 10.7 Low-Dimensional Projections of High-Dimensional Data
8. 10.8 Dimension Selection with Independent Components
3. 11 Projection Pursuit
1. 11.1 Introduction
2. 11.2 One-Dimensional Projections and Their Indices
3. 11.3 Projection Pursuit with Two- and Three-Dimensional Projections
4. 11.4 Projection Pursuit in Practice
5. 11.5 Theoretical Developments
6. 11.6 Projection Pursuit Density Estimation and Regression
4. 12 Kernel and More Independent Component Methods
1. 12.1 Introduction
2. 12.2 Kernel Component Analysis
3. 12.3 Kernel Independent Component Analysis
4. 12.4 Independent Components from Scatter Matrices (aka Invariant Coordinate Selection)
5. 12.5 Non-Parametric Estimation of Independence Criteria
5. 13 Feature Selection and Principal Component Analysis Revisited
1. 13.1 Introduction
2. 13.2 Independent Components and Feature Selection
3. 13.3 Variable Ranking and Statistical Learning
4. 13.4 Sparse Principal Component Analysis
5. 13.5 (In)Consistency of Principal Components as the Dimension Grows
6. Problems for Part III
12. Bibliography
13. Author Index
14. Subject Index
15. Data Index