You are previewing Building Machine Learning Systems with Python.
O'Reilly logo
Building Machine Learning Systems with Python

Book Description

Expand your Python knowledge and learn all about machine-learning libraries in this user-friendly manual. ML is the next big breakthrough in technology and this book will give you the head-start you need.

  • Master Machine Learning using a broad set of Python libraries and start building your own Python-based ML systems

  • Covers classification, regression, feature engineering, and much more guided by practical examples

  • A scenario-based tutorial to get into the right mind-set of a machine learner (data exploration) and successfully implement this in your new or existing projects

  • In Detail

    Machine learning, the field of building systems that learn from data, is exploding on the Web and elsewhere. Python is a wonderful language in which to develop machine learning applications. As a dynamic language, it allows for fast exploration and experimentation and an increasing number of machine learning libraries are developed for Python.

    Building Machine Learning system with Python shows you exactly how to find patterns through raw data. The book starts by brushing up on your Python ML knowledge and introducing libraries, and then moves on to more serious projects on datasets, Modelling, Recommendations, improving recommendations through examples and sailing through sound and image processing in detail.

    Using open-source tools and libraries, readers will learn how to apply methods to text, images, and sounds. You will also learn how to evaluate, compare, and choose machine learning techniques.

    Written for Python programmers, Building Machine Learning Systems with Python teaches you how to use open-source libraries to solve real problems with machine learning. The book is based on real-world examples that the user can build on.

    Readers will learn how to write programs that classify the quality of StackOverflow answers or whether a music file is Jazz or Metal. They will learn regression, which is demonstrated on how to recommend movies to users. Advanced topics such as topic modeling (finding a text’s most important topics), basket analysis, and cloud computing are covered as well as many other interesting aspects.

    Building Machine Learning Systems with Python will give you the tools and understanding required to build your own systems, which are tailored to solve your problems.

    Table of Contents

    1. Building Machine Learning Systems with Python
      1. Table of Contents
      2. Building Machine Learning Systems with Python
      3. Credits
      4. About the Authors
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers and more
          1. Why Subscribe?
          2. Free Access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      8. 1. Getting Started with Python Machine Learning
        1. Machine learning and Python – the dream team
        2. What the book will teach you (and what it will not)
        3. What to do when you are stuck
        4. Getting started
          1. Introduction to NumPy, SciPy, and Matplotlib
          2. Installing Python
          3. Chewing data efficiently with NumPy and intelligently with SciPy
          4. Learning NumPy
            1. Indexing
            2. Handling non-existing values
            3. Comparing runtime behaviors
          5. Learning SciPy
        5. Our first (tiny) machine learning application
          1. Reading in the data
          2. Preprocessing and cleaning the data
          3. Choosing the right model and learning algorithm
            1. Before building our first model
            2. Starting with a simple straight line
            3. Towards some advanced stuff
            4. Stepping back to go forward – another look at our data
            5. Training and testing
            6. Answering our initial question
        6. Summary
      9. 2. Learning How to Classify with Real-world Examples
        1. The Iris dataset
          1. The first step is visualization
          2. Building our first classification model
            1. Evaluation – holding out data and cross-validation
        2. Building more complex classifiers
        3. A more complex dataset and a more complex classifier
          1. Learning about the Seeds dataset
          2. Features and feature engineering
          3. Nearest neighbor classification
        4. Binary and multiclass classification
        5. Summary
      10. 3. Clustering – Finding Related Posts
        1. Measuring the relatedness of posts
          1. How not to do it
          2. How to do it
        2. Preprocessing – similarity measured as similar number of common words
          1. Converting raw text into a bag-of-words
          2. Counting words
          3. Normalizing the word count vectors
          4. Removing less important words
          5. Stemming
            1. Installing and using NLTK
            2. Extending the vectorizer with NLTK's stemmer
          6. Stop words on steroids
          7. Our achievements and goals
        3. Clustering
          1. KMeans
          2. Getting test data to evaluate our ideas on
          3. Clustering posts
        4. Solving our initial challenge
          1. Another look at noise
        5. Tweaking the parameters
        6. Summary
      11. 4. Topic Modeling
        1. Latent Dirichlet allocation (LDA)
          1. Building a topic model
        2. Comparing similarity in topic space
          1. Modeling the whole of Wikipedia
        3. Choosing the number of topics
        4. Summary
      12. 5. Classification – Detecting Poor Answers
        1. Sketching our roadmap
        2. Learning to classify classy answers
          1. Tuning the instance
          2. Tuning the classifier
        3. Fetching the data
          1. Slimming the data down to chewable chunks
          2. Preselection and processing of attributes
          3. Defining what is a good answer
        4. Creating our first classifier
          1. Starting with the k-nearest neighbor (kNN) algorithm
          2. Engineering the features
          3. Training the classifier
          4. Measuring the classifier's performance
          5. Designing more features
        5. Deciding how to improve
          1. Bias-variance and its trade-off
          2. Fixing high bias
          3. Fixing high variance
          4. High bias or low bias
        6. Using logistic regression
          1. A bit of math with a small example
          2. Applying logistic regression to our postclassification problem
        7. Looking behind accuracy – precision and recall
        8. Slimming the classifier
        9. Ship it!
        10. Summary
      13. 6. Classification II – Sentiment Analysis
        1. Sketching our roadmap
        2. Fetching the Twitter data
        3. Introducing the Naive Bayes classifier
          1. Getting to know the Bayes theorem
          2. Being naive
          3. Using Naive Bayes to classify
          4. Accounting for unseen words and other oddities
          5. Accounting for arithmetic underflows
        4. Creating our first classifier and tuning it
          1. Solving an easy problem first
          2. Using all the classes
          3. Tuning the classifier's parameters
        5. Cleaning tweets
        6. Taking the word types into account
          1. Determining the word types
          2. Successfully cheating using SentiWordNet
          3. Our first estimator
          4. Putting everything together
        7. Summary
      14. 7. Regression – Recommendations
        1. Predicting house prices with regression
          1. Multidimensional regression
          2. Cross-validation for regression
        2. Penalized regression
          1. L1 and L2 penalties
          2. Using Lasso or Elastic nets in scikit-learn
        3. P greater than N scenarios
          1. An example based on text
          2. Setting hyperparameters in a smart way
          3. Rating prediction and recommendations
        4. Summary
      15. 8. Regression – Recommendations Improved
        1. Improved recommendations
          1. Using the binary matrix of recommendations
          2. Looking at the movie neighbors
          3. Combining multiple methods
        2. Basket analysis
          1. Obtaining useful predictions
          2. Analyzing supermarket shopping baskets
          3. Association rule mining
          4. More advanced basket analysis
        3. Summary
      16. 9. Classification III – Music Genre Classification
        1. Sketching our roadmap
        2. Fetching the music data
          1. Converting into a wave format
        3. Looking at music
          1. Decomposing music into sine wave components
        4. Using FFT to build our first classifier
          1. Increasing experimentation agility
          2. Training the classifier
          3. Using the confusion matrix to measure accuracy in multiclass problems
          4. An alternate way to measure classifier performance using receiver operator characteristic (ROC)
        5. Improving classification performance with Mel Frequency Cepstral Coefficients
        6. Summary
      17. 10. Computer Vision – Pattern Recognition
        1. Introducing image processing
        2. Loading and displaying images
          1. Basic image processing
            1. Thresholding
            2. Gaussian blurring
            3. Filtering for different effects
          2. Adding salt and pepper noise
            1. Putting the center in focus
          3. Pattern recognition
          4. Computing features from images
          5. Writing your own features
        3. Classifying a harder dataset
        4. Local feature representations
        5. Summary
      18. 11. Dimensionality Reduction
        1. Sketching our roadmap
        2. Selecting features
          1. Detecting redundant features using filters
            1. Correlation
            2. Mutual information
          2. Asking the model about the features using wrappers
        3. Other feature selection methods
        4. Feature extraction
          1. About principal component analysis (PCA)
            1. Sketching PCA
            2. Applying PCA
          2. Limitations of PCA and how LDA can help
        5. Multidimensional scaling (MDS)
        6. Summary
      19. 12. Big(ger) Data
        1. Learning about big data
        2. Using jug to break up your pipeline into tasks
          1. About tasks
          2. Reusing partial results
          3. Looking under the hood
          4. Using jug for data analysis
        3. Using Amazon Web Services (AWS)
          1. Creating your first machines
            1. Installing Python packages on Amazon Linux
            2. Running jug on our cloud machine
          2. Automating the generation of clusters with starcluster
        4. Summary
      20. A. Where to Learn More about Machine Learning
        1. Online courses
        2. Books
          1. Q&A sites
          2. Blogs
          3. Data sources
          4. Getting competitive
        3. What was left out
        4. Summary
      21. Index