You are previewing Data Science with Java.
O'Reilly logo
Data Science with Java

Book Description

A good data scientist knows how to do something really well, but a great data scientist can do "something of everything." From raw data all the way to shining in front of C-level executives, a great data scientist has the skills to architect data systems, build applications, perform modeling and machine learning and wrap up the results in a clear (and quickly iterable) manner. From data models to ETL to databases to distributed algorithms and learning, this book has you covered.

Table of Contents

  1. 1. Data IO
    1. What is data anyway?
    2. Data Models
      1. Row-based arrays
      2. Column-based arrays
      3. Data Objects
    3. Dealing with Real Data
      1. Nulls
      2. Blank Spaces
      3. Parse Errors
      4. Outliers
    4. Managing Data Files
      1. Understanding the File Structure
      2. Reading From a Text File
      3. Reading a Remote Text File
      4. Parsing Each Line
      5. Writing to a File
    5. Mastering Database Operations
      1. Command Line Clients
      2. Structured Query Language (SQL)
      3. Unstructured Data (NoSQL)
      4. Java Database Connectivity (JDBC)
    6. Visualizing Data with Basic Plots
      1. Creating Simple Plots
      2. Plotting Multiple Series
      3. Customizing a Plot
      4. Plotting Mixed Chart Types
      5. Saving a Plot to a File
  2. 2. Linear Algebra
    1. Building Vectors and Matrices
      1. Real Vectors and Matrices
      2. Block Matrices
      3. Sparse Vectors and Matrices
      4. Accessing Vector and Matrix Elements
      5. Working with Sub-Matrices
    2. Operating on Vectors and Matrices
      1. Scaling
      2. Transposing
      3. Addition and Subtraction
      4. Length
      5. Distances
      6. Multiplication
      7. Inner Product
      8. Outer Product
      9. Entrywise Product
      10. Compound Operations
      11. Mapping a Function
    3. Matrix Decomposition
      1. Cholesky Decomposition
      2. LU Decomposition
      3. QR Decomposition
      4. Singular Value Decomposition (SVD)
      5. Eigen Decomposition
      6. Determinant
      7. Inverse
    4. Solving Linear Systems
      1. Simultaneous Equations
      2. Fitting Univariate Data
      3. Fitting Multivariate Data
      4. Fitting Multivariate-Multiresponse Data
      5. Determining Parameter Error and Goodness of Fit
  3. 3. Statistics
    1. The Probabilistic Origins of Data
      1. Probability Density
      2. Cumulative Probability
      3. Statistical Moments
      4. Entropy
      5. Continuous Distributions
      6. Discrete Distributions
    2. Characterizing Datasets
      1. Calculating Moments
      2. Descriptive Statistics
      3. Multivariate Statistics
      4. Covariance and Correlation
      5. Regression
      6. Distribution Testing
    3. Working with Big Data
      1. Accumulating Statistics
      2. Merging Statistics
      3. Regression
    4. Calculating Statistics with Database Functions
  4. 4. Data Operations
    1. Assessing Data Quality
      1. A String Counter
      2. A Numeric Data Validator
      3. A String Data Validator
      4. A DateTime Data Validator
    2. Transforming Text Data
      1. Extracting Tokens from a Document
      2. Utilizing Dictionaries
      3. Vectorizing a Document
    3. Transforming Image Data
    4. Scaling and Regularizing Numeric Data
      1. Scaling Columns
      2. Scaling Rows
      3. Matrix Scaling Operator
    5. Reducing Data to Principal Components
      1. Covariance Method
      2. SVD Method
    6. Creating Training, Validation and Test Sets
      1. Index-Based Resampling
      2. List-Based Resampling
    7. Encoding Labels
      1. A Generic Encoder
      2. One Hot Encoding