O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Python Data Analysis

Book Description

Learn how to apply powerful data analysis techniques with popular open source Python modules

In Detail

Python is a multi-paradigm programming language well suited for both object-oriented application development as well as functional design patterns. Python has become the language of choice for data scientists for data analysis, visualization, and machine learning. It will give you velocity and promote high productivity.

This book will teach novices about data analysis with Python in the broadest sense possible, covering everything from data retrieval, cleaning, manipulation, visualization, and storage to complex analysis and modeling. It focuses on a plethora of open source Python modules such as NumPy, SciPy, matplotlib, pandas, IPython, Cython, scikit-learn, and NLTK. In later chapters, the book covers topics such as data visualization, signal processing, and time-series analysis, databases, predictive analytics and machine learning. This book will turn you into an ace data analyst in no time.

What You Will Learn

  • Install open source Python modules on various platforms
  • Get to know about the fundamentals of NumPy including arrays
  • Manipulate data with pandas
  • Retrieve, process, store, and visualize data
  • Understand signal processing and time-series data analysis
  • Work with relational and NoSQL databases
  • Discover more about data modeling and machine learning
  • Get to grips with interoperability and cloud computing
  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

    Table of Contents

    1. Python Data Analysis
      1. Table of Contents
      2. Python Data Analysis
      3. Credits
      4. About the Author
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      8. 1. Getting Started with Python Libraries
        1. Software used in this book
          1. Installing software and setup
          2. On Windows
          3. On Linux
          4. On Mac OS X
        2. Building NumPy, SciPy, matplotlib, and IPython from source
        3. Installing with setuptools
        4. NumPy arrays
        5. A simple application
        6. Using IPython as a shell
        7. Reading manual pages
        8. IPython notebooks
        9. Where to find help and references
        10. Summary
      9. 2. NumPy Arrays
        1. The NumPy array object
          1. The advantages of NumPy arrays
        2. Creating a multidimensional array
        3. Selecting NumPy array elements
        4. NumPy numerical types
          1. Data type objects
          2. Character codes
          3. The dtype constructors
          4. The dtype attributes
        5. One-dimensional slicing and indexing
        6. Manipulating array shapes
          1. Stacking arrays
          2. Splitting NumPy arrays
          3. NumPy array attributes
          4. Converting arrays
        7. Creating array views and copies
        8. Fancy indexing
        9. Indexing with a list of locations
        10. Indexing NumPy arrays with Booleans
        11. Broadcasting NumPy arrays
        12. Summary
      10. 3. Statistics and Linear Algebra
        1. NumPy and SciPy modules
        2. Basic descriptive statistics with NumPy
        3. Linear algebra with NumPy
          1. Inverting matrices with NumPy
          2. Solving linear systems with NumPy
        4. Finding eigenvalues and eigenvectors with NumPy
        5. NumPy random numbers
          1. Gambling with the binomial distribution
          2. Sampling the normal distribution
          3. Performing a normality test with SciPy
        6. Creating a NumPy-masked array
          1. Disregarding negative and extreme values
        7. Summary
      11. 4. pandas Primer
        1. Installing and exploring pandas
        2. pandas DataFrames
        3. pandas Series
        4. Querying data in pandas
        5. Statistics with pandas DataFrames
        6. Data aggregation with pandas DataFrames
        7. Concatenating and appending DataFrames
        8. Joining DataFrames
        9. Handling missing values
        10. Dealing with dates
        11. Pivot tables
        12. Remote data access
        13. Summary
      12. 5. Retrieving, Processing, and Storing Data
        1. Writing CSV files with NumPy and pandas
        2. Comparing the NumPy .npy binary format and pickling pandas DataFrames
        3. Storing data with PyTables
        4. Reading and writing pandas DataFrames to HDF5 stores
        5. Reading and writing to Excel with pandas
        6. Using REST web services and JSON
        7. Reading and writing JSON with pandas
        8. Parsing RSS and Atom feeds
        9. Parsing HTML with Beautiful Soup
        10. Summary
      13. 6. Data Visualization
        1. matplotlib subpackages
        2. Basic matplotlib plots
        3. Logarithmic plots
        4. Scatter plots
        5. Legends and annotations
        6. Three-dimensional plots
        7. Plotting in pandas
        8. Lag plots
        9. Autocorrelation plots
        10. Plot.ly
        11. Summary
      14. 7. Signal Processing and Time Series
        1. statsmodels subpackages
        2. Moving averages
        3. Window functions
        4. Defining cointegration
        5. Autocorrelation
        6. Autoregressive models
        7. ARMA models
        8. Generating periodic signals
        9. Fourier analysis
        10. Spectral analysis
        11. Filtering
        12. Summary
      15. 8. Working with Databases
        1. Lightweight access with sqlite3
        2. Accessing databases from pandas
        3. SQLAlchemy
          1. Installing and setting up SQLAlchemy
          2. Populating a database with SQLAlchemy
          3. Querying the database with SQLAlchemy
        4. Pony ORM
        5. Dataset – databases for lazy people
        6. PyMongo and MongoDB
        7. Storing data in Redis
        8. Apache Cassandra
        9. Summary
      16. 9. Analyzing Textual Data and Social Media
        1. Installing NLTK
        2. Filtering out stopwords, names, and numbers
        3. The bag-of-words model
        4. Analyzing word frequencies
        5. Naive Bayes classification
        6. Sentiment analysis
        7. Creating word clouds
        8. Social network analysis
        9. Summary
      17. 10. Predictive Analytics and Machine Learning
        1. A tour of scikit-learn
        2. Preprocessing
        3. Classification with logistic regression
        4. Classification with support vector machines
        5. Regression with ElasticNetCV
        6. Support vector regression
        7. Clustering with affinity propagation
        8. Mean Shift
        9. Genetic algorithms
        10. Neural networks
        11. Decision trees
        12. Summary
      18. 11. Environments Outside the Python Ecosystem and Cloud Computing
        1. Exchanging information with MATLAB/Octave
        2. Installing rpy2
        3. Interfacing with R
        4. Sending NumPy arrays to Java
        5. Integrating SWIG and NumPy
        6. Integrating Boost and Python
        7. Using Fortran code through f2py
        8. Setting up Google App Engine
        9. Running programs on PythonAnywhere
        10. Working with Wakari
        11. Summary
      19. 12. Performance Tuning, Profiling, and Concurrency
        1. Profiling the code
        2. Installing Cython
        3. Calling C code
        4. Creating a process pool with multiprocessing
        5. Speeding up embarrassingly parallel for loops with Joblib
        6. Comparing Bottleneck to NumPy functions
        7. Performing MapReduce with Jug
        8. Installing MPI for Python
        9. IPython Parallel
        10. Summary
      20. A. Key Concepts
      21. B. Useful Functions
        1. matplotlib
        2. NumPy
        3. pandas
        4. Scikit-learn
        5. SciPy
          1. scipy.fftpack
          2. scipy.signal
          3. scipy.stats
      22. C. Online Resources
      23. Index