You are previewing Mastering Python Data Visualization.
O'Reilly logo
Mastering Python Data Visualization

Book Description

Generate effective results in a variety of visually appealing charts using the plotting packages in Python

About This Book

  • Explore various tools and their strengths while building meaningful representations that can make it easier to understand data
  • Packed with computational methods and algorithms in diverse fields of science
  • Written in an easy-to-follow categorical style, this book discusses some niche techniques that will make your code easier to work with and reuse

Who This Book Is For

If you are a Python developer who performs data visualization and wants to develop existing knowledge about Python to build analytical results and produce some amazing visual display, then this book is for you. A basic knowledge level and understanding of Python libraries is assumed.

What You Will Learn

  • Gather, cleanse, access, and map data to a visual framework
  • Recognize which visualization method is applicable and learn best practices for data visualization
  • Get acquainted with reader-driven narratives and author-driven narratives and the principles of perception
  • Understand why Python is an effective tool to be used for numerical computation much like MATLAB, and explore some interesting data structures that come with it
  • Explore with various visualization choices how Python can be very useful in computation in the field of finance and statistics
  • Get to know why Python is the second choice after Java, and is used frequently in the field of machine learning
  • Compare Python with other visualization approaches using Julia and a JavaScript-based framework such as D3.js
  • Discover how Python can be used in conjunction with NoSQL such as Hive to produce results efficiently in a distributed environment

In Detail

Python has a handful of open source libraries for numerical computations involving optimization, linear algebra, integration, interpolation, and other special functions using array objects, machine learning, data mining, and plotting. Pandas have a productive environment for data analysis. These libraries have a specific purpose and play an important role in the research into diverse domains including economics, finance, biological sciences, social science, health care, and many more. The variety of tools and approaches available within Python community is stunning, and can bolster and enhance visual story experiences.

This book offers practical guidance to help you on the journey to effective data visualization. Commencing with a chapter on the data framework, which explains the transformation of data into information and eventually knowledge, this book subsequently covers the complete visualization process using the most popular Python libraries with working examples. You will learn the usage of Numpy, Scipy, IPython, MatPlotLib, Pandas, Patsy, and Scikit-Learn with a focus on generating results that can be visualized in many different ways. Further chapters are aimed at not only showing advanced techniques such as interactive plotting; numerical, graphical linear, and non-linear regression; clustering and classification, but also in helping you understand the aesthetics and best practices of data visualization. The book concludes with interesting examples such as social networks, directed graph examples in real-life, data structures appropriate for these problems, and network analysis.

By the end of this book, you will be able to effectively solve a broad set of data analysis problems.

Style and approach

The approach of this book is not step by step, but rather categorical. The categories are based on fields such as bioinformatics, statistical and machine learning, financial computation, and linear algebra. This approach is beneficial for the community in many different fields of work and also helps you learn how one approach can make sense across many fields

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Mastering Python Data Visualization
    1. Table of Contents
    2. Mastering Python Data Visualization
    3. Credits
    4. About the Author
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Downloading the color images of this book
        3. Errata
        4. Piracy
        5. Questions
    8. 1. A Conceptual Framework for Data Visualization
      1. Data, information, knowledge, and insight
        1. Data
        2. Information
        3. Knowledge
        4. Data analysis and insight
      2. The transformation of data
        1. Transforming data into information
          1. Data collection
          2. Data preprocessing
          3. Data processing
          4. Organizing data
          5. Getting datasets
        2. Transforming information into knowledge
        3. Transforming knowledge into insight
      3. Data visualization history
        1. Visualization before computers
          1. Minard's Russian campaign (1812)
          2. The Cholera epidemics in London (1831-1855)
          3. Statistical graphics (1850-1915)
          4. Later developments in data visualization
      4. How does visualization help decision-making?
        1. Where does visualization fit in?
        2. Data visualization today
          1. What is a good visualization?
      5. Visualization plots
        1. Bar graphs and pie charts
          1. Bar graphs
          2. Pie charts
        2. Box plots
        3. Scatter plots and bubble charts
          1. Scatter plots
          2. Bubble charts
        4. KDE plots
      6. Summary
    9. 2. Data Analysis and Visualization
      1. Why does visualization require planning?
      2. The Ebola example
      3. A sports example
        1. Visually representing the results
      4. Creating interesting stories with data
        1. Why are stories so important?
        2. Reader-driven narratives
          1. Gapminder
          2. The State of the Union address
          3. Mortality rate in the USA
          4. A few other example narratives
        3. Author-driven narratives
      5. Perception and presentation methods
        1. The Gestalt principles of perception
      6. Some best practices for visualization
        1. Comparison and ranking
        2. Correlation
        3. Distribution
        4. Location-specific or geodata
        5. Part-to-whole relationships
        6. Trends over time
      7. Visualization tools in Python
        1. Development tools
          1. Canopy from Enthought
          2. Anaconda from Continuum Analytics
      8. Interactive visualization
        1. Event listeners
        2. Layouts
          1. Circular layout
          2. Radial layout
          3. Balloon layout
      9. Summary
    10. 3. Getting Started with the Python IDE
      1. The IDE tools in Python
        1. Python 3.x versus Python 2.7
        2. Types of interactive tools
          1. IPython
          2. Plotly
        3. Types of Python IDE
          1. PyCharm
          2. PyDev
          3. Interactive Editor for Python (IEP)
          4. Canopy from Enthought
          5. Anaconda from Continuum Analytics
            1. An overview of Spyder
            2. An overview of conda
      2. Visualization plots with Anaconda
        1. The surface-3D plot
        2. The square map plot
      3. Interactive visualization packages
        1. Bokeh
        2. VisPy
      4. Summary
    11. 4. Numerical Computing and Interactive Plotting
      1. NumPy, SciPy, and MKL functions
        1. NumPy
          1. NumPy universal functions
          2. Shape and reshape manipulation
          3. An example of interpolation
          4. Vectorizing functions
          5. Summary of NumPy linear algebra
        2. SciPy
          1. An example of linear equations
          2. The vectorized numerical derivative
        3. MKL functions
        4. The performance of Python
      2. Scalar selection
      3. Slicing
        1. Slice using flat
      4. Array indexing
        1. Numerical indexing
        2. Logical indexing
      5. Other data structures
        1. Stacks
        2. Tuples
        3. Sets
        4. Queues
        5. Dictionaries
        6. Dictionaries for matrix representation
          1. Sparse matrices
            1. Visualizing sparseness
          2. Dictionaries for memoization
        7. Tries
      6. Visualization using matplotlib
        1. Word clouds
        2. Installing word clouds
        3. Input for word clouds
          1. Web feeds
          2. The Twitter text
        4. Plotting the stock price chart
          1. Obtaining data
      7. The visualization example in sports
      8. Summary
    12. 5. Financial and Statistical Models
      1. The deterministic model
        1. Gross returns
      2. The stochastic model
        1. Monte Carlo simulation
          1. What exactly is Monte Carlo simulation?
          2. An inventory problem in Monte Carlo simulation
          3. Monte Carlo simulation in basketball
          4. The volatility plot
          5. Implied volatilities
        2. The portfolio valuation
        3. The simulation model
        4. Geometric Brownian simulation
        5. The diffusion-based simulation
      3. The threshold model
        1. Schelling's Segregation Model
      4. An overview of statistical and machine learning
        1. K-nearest neighbors
        2. Generalized linear models
          1. Bayesian linear regression
      5. Creating animated and interactive plots
      6. Summary
    13. 6. Statistical and Machine Learning
      1. Classification methods
      2. Understanding linear regression
      3. Linear regression
      4. Decision tree
        1. An example
      5. The Bayes theorem
      6. The Naïve Bayes classifier
      7. The Naïve Bayes classifier using TextBlob
        1. Installing TextBlob
        2. Downloading corpora
        3. The Naïve Bayes classifier using TextBlob
      8. Viewing positive sentiments using word clouds
      9. k-nearest neighbors
      10. Logistic regression
      11. Support vector machines
      12. Principal component analysis
        1. Installing scikit-learn
      13. k-means clustering
      14. Summary
    14. 7. Bioinformatics, Genetics, and Network Models
      1. Directed graphs and multigraphs
        1. Storing graph data
        2. Displaying graphs
          1. igraph
          2. NetworkX
          3. Graph-tool
            1. PageRank
      2. The clustering coefficient of graphs
      3. Analysis of social networks
      4. The planar graph test
      5. The directed acyclic graph test
      6. Maximum flow and minimum cut
      7. A genetic programming example
      8. Stochastic block models
      9. Summary
    15. 8. Advanced Visualization
      1. Computer simulation
        1. Python's random package
        2. SciPy's random functions
        3. Simulation examples
        4. Signal processing
        5. Animation
        6. Visualization methods using HTML5
        7. How is Julia different from Python?
        8. D3.js for visualization
        9. Dashboards
      2. Summary
    16. A. Go Forth and Explore Visualization
      1. An overview of conda
      2. Packages installed with Anaconda
      3. Packages websites
      4. About matplotlib
    17. Index