You are previewing Machine Learning in Java.
O'Reilly logo
Machine Learning in Java

Book Description

Design, build, and deploy your own machine learning applications by leveraging key Java machine learning libraries

About This Book

  • Develop a sound strategy to solve predictive modelling problems using the most popular machine learning Java libraries

  • Explore a broad variety of data processing, machine learning, and natural language processing through diagrams, source code, and real-world applications

  • Packed with practical advice and tips to help you get to grips with applied machine learning

  • Who This Book Is For

    If you want to learn how to use Java's machine learning libraries to gain insight from your data, this book is for you. It will get you up and running quickly and provide you with the skills you need to successfully create, customize, and deploy machine learning applications in real life. You should be familiar with Java programming and data mining concepts to make the most of this book, but no prior experience with data mining packages is necessary.

    What You Will Learn

  • Understand the basic steps of applied machine learning and how to differentiate among various machine learning approaches

  • Discover key Java machine learning libraries, what each library brings to the table, and what kind of problems each are able to solve

  • Learn how to implement classification, regression, and clustering

  • Develop a sustainable strategy for customer retention by predicting likely churn candidates

  • Build a scalable recommendation engine with Apache Mahout

  • Apply machine learning to fraud, anomaly, and outlier detection

  • Experiment with deep learning concepts, algorithms, and the toolbox for deep learning

  • Write your own activity recognition model for eHealth applications using mobile sensors

  • In Detail

    As the amount of data continues to grow at an almost incomprehensible rate, being able to understand and process data is becoming a key differentiator for competitive organizations. Machine learning applications are everywhere, from self-driving cars, spam detection, document search, and trading strategies, to speech recognition. This makes machine learning well-suited to the present-day era of Big Data and Data Science. The main challenge is how to transform data into actionable knowledge.

    Machine Learning in Java will provide you with the techniques and tools you need to quickly gain insight from complex data. You will start by learning how to apply machine learning methods to a variety of common tasks including classification, prediction, forecasting, market basket analysis, and clustering.

    Moving on, you will discover how to detect anomalies and fraud, and ways to perform activity recognition, image recognition, and text analysis. By the end of the book, you will explore related web resources and technologies that will help you take your learning to the next level.

    By applying the most effective machine learning methods to real-world problems, you will gain hands-on experience that will transform the way you think about data.

    Style and approach

    This is a practical tutorial that uses hands-on examples to step through some real-world applications of machine learning. Without shying away from the technical details, you will explore machine learning with Java libraries using clear and practical examples. You will explore how to prepare data for analysis, choose a machine learning method, and measure the success of the process.

    Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at If you purchased this book elsewhere, you can visit and register to have the code file.

    Table of Contents

    1. Machine Learning in Java
      1. Table of Contents
      2. Machine Learning in Java
      3. Credits
      4. About the Author
      5. About the Reviewers
        1. eBooks, discount offers, and more
          1. Why subscribe?
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Supporting materials
        5. Conventions
        6. Reader feedback
        7. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      8. 1. Applied Machine Learning Quick Start
        1. Machine learning and data science
          1. What kind of problems can machine learning solve?
          2. Applied machine learning workflow
        2. Data and problem definition
          1. Measurement scales
        3. Data collection
          1. Find or observe data
          2. Generate data
          3. Sampling traps
        4. Data pre-processing
          1. Data cleaning
          2. Fill missing values
          3. Remove outliers
          4. Data transformation
          5. Data reduction
        5. Unsupervised learning
          1. Find similar items
            1. Euclidean distances
            2. Non-Euclidean distances
            3. The curse of dimensionality
          2. Clustering
        6. Supervised learning
          1. Classification
            1. Decision tree learning
            2. Probabilistic classifiers
            3. Kernel methods
            4. Artificial neural networks
            5. Ensemble learning
            6. Evaluating classification
              1. Precision and recall
              2. Roc curves
          2. Regression
            1. Linear regression
            2. Evaluating regression
              1. Mean squared error
              2. Mean absolute error
              3. Correlation coefficient
        7. Generalization and evaluation
          1. Underfitting and overfitting
            1. Train and test sets
            2. Cross-validation
            3. Leave-one-out validation
            4. Stratification
        8. Summary
      9. 2. Java Libraries and Platforms for Machine Learning
        1. The need for Java
        2. Machine learning libraries
          1. Weka
          2. Java machine learning
          3. Apache Mahout
          4. Apache Spark
          5. Deeplearning4j
          6. MALLET
          7. Comparing libraries
        3. Building a machine learning application
          1. Traditional machine learning architecture
          2. Dealing with big data
            1. Big data application architecture
        4. Summary
      10. 3. Basic Algorithms – Classification, Regression, and Clustering
        1. Before you start
        2. Classification
          1. Data
          2. Loading data
          3. Feature selection
          4. Learning algorithms
          5. Classify new data
          6. Evaluation and prediction error metrics
          7. Confusion matrix
          8. Choosing a classification algorithm
        3. Regression
          1. Loading the data
          2. Analyzing attributes
          3. Building and evaluating regression model
            1. Linear regression
            2. Regression trees
          4. Tips to avoid common regression problems
        4. Clustering
          1. Clustering algorithms
          2. Evaluation
        5. Summary
      11. 4. Customer Relationship Prediction with Ensembles
        1. Customer relationship database
          1. Challenge
          2. Dataset
          3. Evaluation
        2. Basic naive Bayes classifier baseline
          1. Getting the data
          2. Loading the data
        3. Basic modeling
          1. Evaluating models
          2. Implementing naive Bayes baseline
        4. Advanced modeling with ensembles
          1. Before we start
          2. Data pre-processing
          3. Attribute selection
          4. Model selection
          5. Performance evaluation
        5. Summary
      12. 5. Affinity Analysis
        1. Market basket analysis
          1. Affinity analysis
        2. Association rule learning
          1. Basic concepts
            1. Database of transactions
            2. Itemset and rule
            3. Support
            4. Confidence
          2. Apriori algorithm
          3. FP-growth algorithm
        3. The supermarket dataset
        4. Discover patterns
          1. Apriori
          2. FP-growth
        5. Other applications in various areas
          1. Medical diagnosis
          2. Protein sequences
          3. Census data
          4. Customer relationship management
          5. IT Operations Analytics
        6. Summary
      13. 6. Recommendation Engine with Apache Mahout
        1. Basic concepts
          1. Key concepts
          2. User-based and item-based analysis
          3. Approaches to calculate similarity
            1. Collaborative filtering
            2. Content-based filtering
            3. Hybrid approach
          4. Exploitation versus exploration
        2. Getting Apache Mahout
          1. Configuring Mahout in Eclipse with the Maven plugin
        3. Building a recommendation engine
          1. Book ratings dataset
          2. Loading the data
            1. Loading data from file
            2. Loading data from database
            3. In-memory database
          3. Collaborative filtering
            1. User-based filtering
            2. Item-based filtering
            3. Adding custom rules to recommendations
            4. Evaluation
            5. Online learning engine
        4. Content-based filtering
        5. Summary
      14. 7. Fraud and Anomaly Detection
        1. Suspicious and anomalous behavior detection
          1. Unknown-unknowns
        2. Suspicious pattern detection
        3. Anomalous pattern detection
          1. Analysis types
            1. Pattern analysis
          2. Transaction analysis
          3. Plan recognition
        4. Fraud detection of insurance claims
          1. Dataset
          2. Modeling suspicious patterns
            1. Vanilla approach
            2. Dataset rebalancing
        5. Anomaly detection in website traffic
          1. Dataset
          2. Anomaly detection in time series data
            1. Histogram-based anomaly detection
            2. Loading the data
            3. Creating histograms
            4. Density based k-nearest neighbors
        6. Summary
      15. 8. Image Recognition with Deeplearning4j
        1. Introducing image recognition
          1. Neural networks
            1. Perceptron
            2. Feedforward neural networks
            3. Autoencoder
            4. Restricted Boltzmann machine
            5. Deep convolutional networks
        2. Image classification
          1. Deeplearning4j
            1. Getting DL4J
          2. MNIST dataset
          3. Loading the data
          4. Building models
            1. Building a single-layer regression model
            2. Building a deep belief network
            3. Build a Multilayer Convolutional Network
        3. Summary
      16. 9. Activity Recognition with Mobile Phone Sensors
        1. Introducing activity recognition
          1. Mobile phone sensors
          2. Activity recognition pipeline
          3. The plan
        2. Collecting data from a mobile phone
          1. Installing Android Studio
          2. Loading the data collector
            1. Feature extraction
          3. Collecting training data
        3. Building a classifier
          1. Reducing spurious transitions
          2. Plugging the classifier into a mobile app
        4. Summary
      17. 10. Text Mining with Mallet – Topic Modeling and Spam Detection
        1. Introducing text mining
          1. Topic modeling
          2. Text classification
        2. Installing Mallet
        3. Working with text data
          1. Importing data
            1. Importing from directory
            2. Importing from file
          2. Pre-processing text data
        4. Topic modeling for BBC news
          1. BBC dataset
          2. Modeling
          3. Evaluating a model
          4. Reusing a model
            1. Saving a model
            2. Restoring a model
        5. E-mail spam detection
          1. E-mail spam dataset
          2. Feature generation
          3. Training and testing
            1. Model performance
        6. Summary
      18. 11. What is Next?
        1. Machine learning in real life
          1. Noisy data
          2. Class unbalance
          3. Feature selection is hard
          4. Model chaining
          5. Importance of evaluation
          6. Getting models into production
          7. Model maintenance
        2. Standards and markup languages
          1. CRISP-DM
          2. SEMMA methodology
          3. Predictive Model Markup Language
        3. Machine learning in the cloud
          1. Machine learning as a service
        4. Web resources and competitions
          1. Datasets
          2. Online courses
          3. Competitions
          4. Websites and blogs
          5. Venues and conferences
        5. Summary
      19. A. References
      20. Index