O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Artificial Intelligence for Big Data

Book Description

Build next-generation Artificial Intelligence systems with Java

About This Book
  • Implement AI techniques to build smart applications using Deeplearning4j
  • Perform big data analytics to derive quality insights using Spark MLlib
  • Create self-learning systems using neural networks, NLP, and reinforcement learning
Who This Book Is For

This book is for you if you are a data scientist, big data professional, or novice who has basic knowledge of big data and wish to get proficiency in Artificial Intelligence techniques for big data. Some competence in mathematics is an added advantage in the field of elementary linear algebra and calculus.

What You Will Learn
  • Manage Artificial Intelligence techniques for big data with Java
  • Build smart systems to analyze data for enhanced customer experience
  • Learn to use Artificial Intelligence frameworks for big data
  • Understand complex problems with algorithms and Neuro-Fuzzy systems
  • Design stratagems to leverage data using Machine Learning process
  • Apply Deep Learning techniques to prepare data for modeling
  • Construct models that learn from data using open source tools
  • Analyze big data problems using scalable Machine Learning algorithms
In Detail

In this age of big data, companies have larger amount of consumer data than ever before, far more than what the current technologies can ever hope to keep up with. However, Artificial Intelligence closes the gap by moving past human limitations in order to analyze data.

With the help of Artificial Intelligence for big data, you will learn to use Machine Learning algorithms such as k-means, SVM, RBF, and regression to perform advanced data analysis. You will understand the current status of Machine and Deep Learning techniques to work on Genetic and Neuro-Fuzzy algorithms. In addition, you will explore how to develop Artificial Intelligence algorithms to learn from data, why they are necessary, and how they can help solve real-world problems.

By the end of this book, you'll have learned how to implement various Artificial Intelligence algorithms for your big data systems and integrate them into your product offerings such as reinforcement learning, natural language processing, image recognition, genetic algorithms, and fuzzy logic systems.

Style and approach

An easy-to-follow, step-by-step guide to help you get to grips with real-world applications of Artificial Intelligence for big data using Java

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Artificial Intelligence for Big Data
  3. Packt Upsell
    1. Why subscribe?
    2. PacktPub.com
  4. Contributors
    1. About the authors
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. Big Data and Artificial Intelligence Systems
    1. Results pyramid
    2. What the human brain does best
      1. Sensory input
      2. Storage
      3. Processing power
      4. Low energy consumption
    3. What the electronic brain does best
      1. Speed information storage
      2. Processing by brute force
    4. Best of both worlds
      1. Big Data
      2. Evolution from dumb to intelligent machines
      3. Intelligence
        1. Types of intelligence
        2. Intelligence tasks classification
      4. Big data frameworks
        1. Batch processing
        2. Real-time processing
      5. Intelligent applications with Big Data
        1. Areas of AI
      6. Frequently asked questions
    5. Summary
  7. Ontology for Big Data
    1. Human brain and Ontology
    2. Ontology of information science
      1. Ontology properties
      2. Advantages of Ontologies
      3. Components of Ontologies
      4. The role Ontology plays in Big Data
      5. Ontology alignment
      6. Goals of Ontology in big data
      7. Challenges with Ontology in Big Data
      8. RDF—the universal data format
        1. RDF containers
        2. RDF classes
        3. RDF properties
        4. RDF attributes
      9. Using OWL, the Web Ontology Language
      10. SPARQL query language
        1. Generic structure of an SPARQL query
        2. Additional SPARQL features
      11. Building intelligent machines with Ontologies
      12. Ontology learning
        1. Ontology learning process
      13. Frequently asked questions
    3. Summary
  8. Learning from Big Data
    1. Supervised and unsupervised machine learning
    2. The Spark programming model
    3. The Spark MLlib library
      1. The transformer function
      2. The estimator algorithm
      3. Pipeline
    4. Regression analysis
      1. Linear regression
        1. Least square method
      2. Generalized linear model
      3. Logistic regression classification technique
        1. Logistic regression with Spark
      4. Polynomial regression
      5. Stepwise regression
        1. Forward selection
        2. Backward elimination
      6. Ridge regression
      7. LASSO regression
    5. Data clustering
    6. The K-means algorithm
      1. K-means implementation with Spark ML
    7. Data dimensionality reduction
    8. Singular value decomposition
      1. Matrix theory and linear algebra overview
      2. The important properties of singular value decomposition
      3. SVD with Spark ML
    9. The principal component analysis method
      1. The PCA algorithm using SVD
      2. Implementing SVD with Spark ML
    10. Content-based recommendation systems
    11. Frequently asked questions
    12. Summary
  9. Neural Network for Big Data
    1. Fundamentals of neural networks and artificial neural networks
    2. Perceptron and linear models
      1. Component notations of the neural network
      2. Mathematical representation of the simple perceptron model
        1. Activation functions
          1. Sigmoid function
          2. Tanh function
          3. ReLu
    3. Nonlinearities model
    4. Feed-forward neural networks
    5. Gradient descent and backpropagation
      1. Gradient descent pseudocode
      2. Backpropagation model 
    6. Overfitting
    7. Recurrent neural networks
      1. The need for RNNs
      2. Structure of an RNN
      3. Training an RNN
    8. Frequently asked questions
    9. Summary
  10. Deep Big Data Analytics
    1. Deep learning basics and the building blocks
      1. Gradient-based learning
      2. Backpropagation
      3. Non-linearities
      4. Dropout
    2. Building data preparation pipelines
    3. Practical approach to implementing neural net architectures
    4. Hyperparameter tuning
      1. Learning rate
      2. Number of training iterations
      3. Number of hidden units
      4. Number of epochs
      5. Experimenting with hyperparameters with Deeplearning4j
    5. Distributed computing
    6. Distributed deep learning
      1. DL4J and Spark
        1. API overview
      2. TensorFlow
      3. Keras
    7. Frequently asked questions
    8. Summary
  11. Natural Language Processing
    1. Natural language processing basics
    2. Text preprocessing
      1. Removing stop words
      2. Stemming
        1. Porter stemming
        2. Snowball stemming
        3. Lancaster stemming
        4. Lovins stemming
        5. Dawson stemming
      3. Lemmatization
      4. N-grams
    3. Feature extraction
      1. One hot encoding
      2. TF-IDF
      3. CountVectorizer
      4. Word2Vec
        1. CBOW
        2. Skip-Gram model
    4. Applying NLP techniques
      1. Text classification
        1. Introduction to Naive Bayes' algorithm
        2. Random Forest
        3. Naive Bayes' text classification code example
    5. Implementing sentiment analysis
    6. Frequently asked questions
    7. Summary
  12. Fuzzy Systems
    1. Fuzzy logic fundamentals
      1. Fuzzy sets and membership functions
      2. Attributes and notations of crisp sets
        1. Operations on crisp sets
        2. Properties of crisp sets
      3. Fuzzification
      4. Defuzzification
        1. Defuzzification methods
      5. Fuzzy inference 
    2. ANFIS network
      1. Adaptive network
      2. ANFIS architecture and hybrid learning algorithm
    3. Fuzzy C-means clustering
    4. NEFCLASS
    5. Frequently asked questions
    6. Summary
  13. Genetic Programming
    1. Genetic algorithms structure
    2. KEEL framework
    3. Encog machine learning framework
      1. Encog development environment setup
      2. Encog API structure
    4. Introduction to the Weka framework
      1. Weka Explorer features
        1. Preprocess
        2. Classify
    5. Attribute search with genetic algorithms in Weka
    6. Frequently asked questions
    7. Summary
  14. Swarm Intelligence
    1. Swarm intelligence 
      1. Self-organization
      2. Stigmergy
      3. Division of labor
      4. Advantages of collective intelligent systems
      5. Design principles for developing SI systems
    2. The particle swarm optimization model
      1. PSO implementation considerations 
    3. Ant colony optimization model
    4. MASON Library
      1. MASON Layered Architecture
    5. Opt4J library
    6. Applications in big data analytics
    7. Handling dynamical data
    8. Multi-objective optimization
    9. Frequently asked questions
    10. Summary
  15. Reinforcement Learning
    1. Reinforcement learning algorithms concept
    2. Reinforcement learning techniques
      1. Markov decision processes
      2. Dynamic programming and reinforcement learning
        1. Learning in a deterministic environment with policy iteration
      3. Q-Learning
      4. SARSA learning
    3. Deep reinforcement learning
    4. Frequently asked questions
    5. Summary
  16. Cyber Security
    1. Big Data for critical infrastructure protection
      1. Data collection and analysis
      2. Anomaly detection 
      3. Corrective and preventive actions 
      4. Conceptual Data Flow
        1. Components overview
          1. Hadoop Distributed File System
          2. NoSQL databases
          3. MapReduce
          4. Apache Pig
          5. Hive
    2. Understanding stream processing
      1. Stream processing semantics
      2. Spark Streaming
      3. Kafka
    3. Cyber security attack types
      1. Phishing
      2. Lateral movement
      3. Injection attacks
      4. AI-based defense 
    4. Understanding SIEM
      1. Visualization attributes and features
    5. Splunk
      1. Splunk Enterprise Security
      2. Splunk Light
    6. ArcSight ESM
    7. Frequently asked questions
    8. Summary
  17. Cognitive Computing
    1. Cognitive science
    2. Cognitive Systems
      1. A brief history of Cognitive Systems
      2. Goals of Cognitive Systems
      3. Cognitive Systems enablers
    3. Application in Big Data analytics
    4. Cognitive intelligence as a service
      1. IBM cognitive toolkit based on Watson
        1. Watson-based cognitive apps
        2. Developing with Watson
          1. Setting up the prerequisites
          2. Developing a language translator application in Java
    5. Frequently asked questions
    6. Summary
  18. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think