You are previewing Python Data Analytics: Data Analysis and Science Using Pandas, matplotlib, and the Python Programming Language.
O'Reilly logo
Python Data Analytics: Data Analysis and Science Using Pandas, matplotlib, and the Python Programming Language

Book Description

Python Data Analytics will help you tackle the world of data acquisition and analysis using the power of the Python language. At the heart of this book lies the coverage of pandas, an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Author Fabio Nelli expertly shows the strength of the Python programming language when applied to processing, managing and retrieving information. Inside, you will see how intuitive and flexible it is to discover and communicate meaningful patterns of data using Python scripts, reporting systems, and data export. This book examines how to go about obtaining, processing, storing, managing and analyzing data using the Python programming language.

You will use Python and other open source tools to wrangle data and tease out interesting and important trends in that data that will allow you to predict future patterns. Whether you are dealing with sales data, investment data (stocks, bonds, etc.), medical data, web page usage, or any other type of data set, Python can be used to interpret, analyze, and glean information from a pile of numbers and statistics.

This book is an invaluable reference with its examples of storing and accessing data in a database; it walks you through the process of report generation; it provides three real world case studies or examples that you can take with you for your everyday analysis needs.

Table of Contents

  1. Cover
  2. Title
  3. Copyright
  4. Contents at a Glance
  5. Contents
  6. About the Author
  7. About the Technical Reviewer
  8. Acknowledgments
  9. Chapter 1 : An Introduction to Data Analysis
    1. Data Analysis
    2. Knowledge Domains of the Data Analyst
      1. Computer Science
      2. Mathematics and Statistics
      3. Machine Learning and Artificial Intelligence
      4. Professional Fields of Application
    3. Understanding the Nature of the Data
      1. When the Data Become Information
      2. When the Information Becomes Knowledge
      3. Types of Data
    4. The Data Analysis Process
      1. Problem Definition
      2. Data Extraction
      3. Data Preparation
      4. Data Exploration/Visualization
      5. Predictive Modeling
      6. Model Validation
      7. Deployment
    5. Quantitative and Qualitative Data Analysis
    6. Open Data
    7. Python and Data Analysis
    8. Conclusions
  10. Chapter 2 : Introduction to the Python’s World
    1. Python—The Programming Language
    2. Python—The Interpreter
      1. Cython
      2. Jython
      3. PyPy
    3. Python 2 and Python 3
    4. Installing Python
    5. Python Distributions
      1. Anaconda
      2. Enthought Canopy
      3. Python(x,y)
    6. Using Python
      1. Python Shell
      2. Run an Entire Program Code
      3. Implement the Code Using an IDE
      4. Interact with Python
    7. Writing Python Code
      1. Make Calculations
      2. Import New Libraries and Functions
      3. Functional Programming (Only for Python 3.4)
      4. Indentation
    8. IPython
      1. IPython Shell
      2. IPython Qt-Console
    9. PyPI—The Python Package Index
    10. The IDEs for Python
      1. IDLE (Integrated DeveLopment Environment)
      2. Spyder
      3. Eclipse (pyDev)
      4. Sublime
      5. Liclipse
      6. NinjaIDE
      7. Komodo IDE
    11. SciPy
      1. NumPy
      2. Pandas
      3. matplotlib
    12. Conclusions
  11. Chapter 3 : The NumPy Library
    1. NumPy: A Little History
    2. The NumPy Installation
    3. Ndarray: The Heart of the Library
      1. Create an Array
      2. Types of Data
      3. The dtype Option
      4. Intrinsic Creation of an Array
    4. Basic Operations
      1. Arithmetic Operators
      2. The Matrix Product
      3. Increment and Decrement Operators
      4. Universal Functions (ufunc)
      5. Aggregate Functions
    5. Indexing, Slicing, and Iterating
      1. Indexing
      2. Slicing
      3. Iterating an Array
    6. Conditions and Boolean Arrays
    7. Shape Manipulation
    8. Array Manipulation
      1. Joining Arrays
      2. Splitting Arrays
    9. General Concepts
      1. Copies or Views of Objects
      2. Vectorization
      3. Broadcasting
    10. Structured Arrays
    11. Reading and Writing Array Data on Files
      1. Loading and Saving Data in Binary Files
      2. Reading File with Tabular Data
    12. Conclusions
  12. Chapter 4 : The pandas Library—An Introduction
    1. pandas: The Python Data Analysis Library
    2. Installation
      1. Installation from Anaconda
      2. Installation from PyPI
      3. Installation on Linux
      4. Installation from Source
      5. A Module Repository for Windows
    3. Test Your pandas Installation
    4. Getting Started with pandas
    5. Introduction to pandas Data Structures
      1. The Series
      2. The DataFrame
      3. The Index Objects
    6. Other Functionalities on Indexes
      1. Reindexing
      2. Dropping
      3. Arithmetic and Data Alignment
    7. Operations between Data Structures
      1. Flexible Arithmetic Methods
      2. Operations between DataFrame and Series
    8. Function Application and Mapping
      1. Functions by Element
      2. Functions by Row or Column
      3. Statistics Functions
    9. Sorting and Ranking
    10. Correlation and Covariance
    11. “Not a Number” Data
      1. Assigning a NaN Value
      2. Filtering Out NaN Values
      3. Filling in NaN Occurrences
    12. Hierarchical Indexing and Leveling
      1. Reordering and Sorting Levels
      2. Summary Statistic by Level
    13. Conclusions
  13. Chapter 5 : pandas: Reading and Writing Data
    1. I/O API Tools
    2. CSV and Textual Files
    3. Reading Data in CSV or Text Files
      1. Using RegExp for Parsing TXT Files
      2. Reading TXT Files into Parts or Partially
      3. Writing Data in CSV
    4. Reading and Writing HTML Files
      1. Writing Data in HTML
      2. Reading Data from an HTML File
    5. Reading Data from XML
    6. Reading and Writing Data on Microsoft Excel Files
    7. JSON Data
    8. The Format HDF5
    9. Pickle—Python Object Serialization
      1. Serialize a Python Object with cPickle
      2. Pickling with pandas
    10. Interacting with Databases
      1. Loading and Writing Data with SQLite3
      2. Loading and Writing Data with PostgreSQL
    11. Reading and Writing Data with a NoSQL Database: MongoDB
    12. Conclusions
  14. Chapter 6 : pandas in Depth: Data Manipulation
    1. Data Preparation
      1. Merging
    2. Concatenating
      1. Combining
      2. Pivoting
      3. Removing
    3. Data Transformation
      1. Removing Duplicates
      2. Mapping
    4. Discretization and Binning
      1. Detecting and Filtering Outliers
    5. Permutation
    6. String Manipulation
      1. Built-in Methods for Manipulation of Strings
      2. Regular Expressions
    7. Data Aggregation
      1. GroupBy
      2. A Practical Example
      3. Hierarchical Grouping
    8. Group Iteration
      1. Chain of Transformations
      2. Functions on Groups
    9. Advanced Data Aggregation
    10. Conclusions
  15. Chapter 7 : Data Visualization with matplotlib
    1. The matplotlib Library
    2. Installation
    3. IPython and IPython QtConsole
    4. matplotlib Architecture
      1. Backend Layer
      2. Artist Layer
      3. Scripting Layer (pyplot)
      4. pylab and pyplot
    5. pyplot
      1. A Simple Interactive Chart
      2. Set the Properties of the Plot
      3. matplotlib and NumPy
    6. Using the kwargs
      1. Working with Multiple Figures and Axes
    7. Adding Further Elements to the Chart
      1. Adding Text
      2. Adding a Grid
      3. Adding a Legend
    8. Saving Your Charts
      1. Saving the Code
      2. Converting Your Session as an HTML File
      3. Saving Your Chart Directly as an Image
    9. Handling Date Values
    10. Chart Typology
    11. Line Chart
      1. Line Charts with pandas
    12. Histogram
    13. Bar Chart
      1. Horizontal Bar Chart
      2. Multiserial Bar Chart
      3. Multiseries Bar Chart with pandas DataFrame
      4. Multiseries Stacked Bar Charts
      5. Stacked Bar Charts with pandas DataFrame
      6. Other Bar Chart Representations
    14. Pie Charts
      1. Pie Charts with pandas DataFrame
    15. Advanced Charts
      1. Contour Plot
      2. Polar Chart
    16. mplot3d
      1. 3D Surfaces
      2. Scatter Plot in 3D
      3. Bar Chart 3D
    17. Multi-Panel Plots
      1. Display Subplots within Other Subplots
      2. Grids of Subplots
    18. Conclusions
  16. Chapter 8 : Machine Learning with scikit-learn
    1. The scikit-learn Library
    2. Machine Learning
      1. Supervised and Unsupervised Learning
      2. Training Set and Testing Set
    3. Supervised Learning with scikit-learn
    4. The Iris Flower Dataset
      1. The PCA Decomposition
    5. K-Nearest Neighbors Classifier
    6. Diabetes Dataset
    7. Linear Regression: The Least Square Regression
    8. Support Vector Machines (SVMs)
      1. Support Vector Classification (SVC)
      2. Nonlinear SVC
      3. Plotting Different SVM Classifiers Using the Iris Dataset
      4. Support Vector Regression (SVR)
    9. Conclusions
  17. Chapter 9 : An Example—Meteorological Data
    1. A Hypothesis to Be Tested: The Influence of the Proximity of the Sea
      1. The System in the Study: The Adriatic Sea and the Po Valley
    2. Data Source
    3. Data Analysis on IPython Notebook
    4. The RoseWind
      1. Calculating the Distribution of the Wind Speed Means
    5. Conclusions
  18. Chapter 10 : Embedding the JavaScript D3 Library in IPython Notebook
    1. The Open Data Source for Demographics
    2. The JavaScript D3 Library
    3. Drawing a Clustered Bar Chart
    4. The Choropleth Maps
    5. The Choropleth Map of the US Population in 2014
    6. Conclusions
  19. Chapter 11 : Recognizing Handwritten Digits
    1. Handwriting Recognition
    2. Recognizing Handwritten Digits with scikit-learn
    3. The Digits Dataset
    4. Learning and Predicting
    5. Conclusions
  20. Appendix A: Writing Mathematical Expressions with LaTeX
    1. With matplotlib
    2. With IPython Notebook in a Markdown Cell
    3. With IPython Notebook in a Python 2 Cell
    4. Subscripts and Superscripts
    5. Fractions, Binomials, and Stacked Numbers
    6. Radicals
    7. Fonts
    8. Accents
  21. Appendix B: Open Data Sources
    1. Political and Government Data
    2. Health Data
    3. Social Data
    4. Miscellaneous and Public Data Sets
    5. Financial Data
    6. Climatic Data
    7. Sports Data
    8. Publications, Newspapers, and Books
    9. Musical Data
  22. Index