You are previewing Python Data Science Cookbook.
O'Reilly logo
Python Data Science Cookbook

Book Description

Over 60 practical recipes to help you explore Python and its robust data science capabilities

About This Book

  • The book is packed with simple and concise Python code examples to effectively demonstrate advanced concepts in action

  • Explore concepts such as programming, data mining, data analysis, data visualization, and machine learning using Python

  • Get up to speed on machine learning algorithms with the help of easy-to-follow, insightful recipes

  • Who This Book Is For

    This book is intended for all levels of Data Science professionals, both students and practitioners, starting from novice to experts. Novices can spend their time in the first five chapters getting themselves acquainted with Data Science. Experts can refer to the chapters starting from 6 to understand how advanced techniques are implemented using Python. People from non-Python backgrounds can also effectively use this book, but it would be helpful if you have some prior basic programming experience.

    What You Will Learn

  • Explore the complete range of Data Science algorithms

  • Get to know the tricks used by industry engineers to create the most accurate data science models

  • Manage and use Python libraries such as numpy, scipy, scikit learn, and matplotlib effectively

  • Create meaningful features to solve real-world problems

  • Take a look at Advanced Regression methods for model building and variable selection

  • Get a thorough understanding of the underlying concepts and implementation of Ensemble methods

  • Solve real-world problems using a variety of different datasets from numerical and text data modalities

  • Get accustomed to modern state-of-the art algorithms such as Gradient Boosting, Random Forest, Rotation Forest, and so on

  • In Detail

    Python is increasingly becoming the language for data science. It is overtaking R in terms of adoption, it is widely known by many developers, and has a strong set of libraries such as Numpy, Pandas, scikit-learn, Matplotlib, Ipython and Scipy, to support its usage in this field. Data Science is the emerging new hot tech field, which is an amalgamation of different disciplines including statistics, machine learning, and computer science. It’s a disruptive technology changing the face of today’s business and altering the economy of various verticals including retail, manufacturing, online ventures, and hospitality, to name a few, in a big way.

    This book will walk you through the various steps, starting from simple to the most complex algorithms available in the Data Science arsenal, to effectively mine data and derive intelligence from it. At every step, we provide simple and efficient Python recipes that will not only show you how to implement these algorithms, but also clarify the underlying concept thoroughly.

    The book begins by introducing you to using Python for Data Science, followed by working with Python environments. You will then learn how to analyse your data with Python. The book then teaches you the concepts of data mining followed by an extensive coverage of machine learning methods. It introduces you to a number of Python libraries available to help implement machine learning and data mining routines effectively. It also covers the principles of shrinkage, ensemble methods, random forest, rotation forest, and extreme trees, which are a must-have for any successful Data Science Professional.

    Style and approach

    This is a step-by-step recipe-based approach to Data Science algorithms, introducing the math philosophy behind these algorithms.

    Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

    Table of Contents

    1. Python Data Science Cookbook
      1. Table of Contents
      2. Python Data Science Cookbook
      3. Credits
      4. About the Author
      5. About the Reviewer
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why Subscribe?
          2. Free Access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Sections
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        5. Conventions
        6. Reader feedback
        7. Customer support
          1. Downloading the example code
          2. Downloading the color images of this book
          3. Errata
          4. Piracy
          5. Questions
      8. 1. Python for Data Science
        1. Introduction
        2. Using dictionary objects
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        3. Working with a dictionary of dictionaries
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        4. Working with tuples
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        5. Using sets
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        6. Writing a list
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        7. Creating a list from another list - list comprehension
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        8. Using iterators
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        9. Generating an iterator and a generator
          1. Getting ready
          2. How it do it…
          3. How it works…
          4. There's more…
        10. Using iterables
          1. Getting ready
          2. How to do it…
          3. How it works..
          4. See also
        11. Passing a function as a variable
          1. Getting ready
          2. How to do it…
          3. How it works…
        12. Embedding functions in another function
          1. Getting ready
          2. How to do it…
          3. How it works…
        13. Passing a function as a parameter
          1. Getting ready
          2. How to do it…
          3. How it works…
        14. Returning a function
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        15. Altering the function behavior with decorators
          1. Getting ready
          2. How to do it…
          3. How it works…
        16. Creating anonymous functions with lambda
          1. Getting ready
          2. How to do it…
          3. How it works…
        17. Using the map function
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        18. Working with filters
          1. Getting ready
          2. How to do it…
          3. How it works…
        19. Using zip and izip
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        20. Processing arrays from the tabular data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        21. Preprocessing the columns
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        22. Sorting lists
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        23. Sorting with a key
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        24. Working with itertools
          1. Getting ready
          2. How to do it…
          3. How it works…
      9. 2. Python Environments
        1. Introduction
        2. Using NumPy libraries
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        3. Plotting with matplotlib
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Machine learning with scikit-learn
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
      10. 3. Data Analysis – Explore and Wrangle
        1. Introduction
        2. Analyzing univariate data graphically
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        3. Grouping the data and using dot plots
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        4. Using scatter plots for multivariate data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        5. Using heat maps
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more...
          5. See also
        6. Performing summary statistics and plots
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        7. Using a box-and-whisker plot
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        8. Imputing the data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        9. Performing random sampling
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
            1. Stratified sampling
            2. Progressive sampling
        10. Scaling the data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        11. Standardizing the data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        12. Performing tokenization
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        13. Removing stop words
          1. How to do it…
          2. How it works…
          3. There's more…
          4. See also
        14. Stemming the words
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        15. Performing word lemmatization
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        16. Representing the text as a bag of words
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        17. Calculating term frequencies and inverse document frequencies
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      11. 4. Data Analysis – Deep Dive
        1. Introduction
          1. Matrix Decomposition:
        2. Extracting the principal components
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        3. Using Kernel PCA
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Extracting features using singular value decomposition
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        5. Reducing the data dimension with random projection
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        6. Decomposing the feature matrices using non-negative matrix factorization
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
      12. 5. Data Mining – Needle in a Haystack
        1. Introduction
        2. Working with distance measures
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more...
          5. See also
        3. Learning and using kernel methods
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more...
          5. See also
        4. Clustering data using the k-means method
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more...
          5. See also
        5. Learning vector quantization
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more...
          5. See also
        6. Finding outliers in univariate data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        7. Discovering outliers using the local outlier factor method
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      13. 6. Machine Learning 1
        1. Introduction
        2. Preparing data for model building
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more...
        3. Finding the nearest neighbors
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        4. Classifying documents using Naïve Bayes
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        5. Building decision trees to solve multiclass problems
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
      14. 7. Machine Learning 2
        1. Introduction
        2. Predicting real-valued numbers using regression
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more...
          5. See also
        3. Learning regression with L2 shrinkage – ridge
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        4. Learning regression with L1 shrinkage – LASSO
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        5. Using cross-validation iterators with L1 and L2 shrinkage
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
      15. 8. Ensemble Methods
        1. Introduction
        2. Understanding Ensemble – Bagging Method
          1. Getting ready…
          2. How to do it
          3. How it works…
          4. There's more…
          5. See also
        3. Understanding Ensemble – Boosting Method
          1. Getting Started…
          2. How to do it
          3. How it works…
          4. There's more…
          5. See also
        4. Understanding Ensemble – Gradient Boosting
          1. Getting Started…
          2. How to do it
          3. How it works…
          4. There's more…
          5. See also
      16. 9. Growing Trees
        1. Introduction
        2. Going from trees to Forest – Random Forest
          1. Getting ready
          2. How to do it...
          3. How it works…
          4. There's more…
          5. See also
        3. Growing Extremely Randomized Trees
          1. Getting ready…
          2. How to do it...
          3. How it works…
          4. There's more…
          5. See also
        4. Growing Rotational Forest
          1. Getting ready…
          2. How to do it...
          3. How it works…
          4. There's more…
          5. See also
      17. 10. Large-Scale Machine Learning – Online Learning
        1. Introduction
        2. Using perceptron as an online learning algorithm
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        3. Using stochastic gradient descent for regression
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        4. Using stochastic gradient descent for classification
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
      18. Index