O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Learning Path: Python: Effective Data Analysis Using Python

Video Description

Master data analysis and visualization

In Detail

Data analysis as we know it is the process taking the source data, refining it to get useful information, and then making useful predictions from it.

Python features numerous numerical and mathematical toolkits such as: Numpy, Scipy, Scikit learn and SciKit, all used for data analysis and machine learning. With the aid of all of these, Python has become the language of choice for data scientists for data analysis, visualization, and machine learning.

We will have a general look at data analysis and then then discuss the Web scraping tools and techniques in detail. We will show a rich collection of recipes that will come in handy when you are scraping a website using Python, addressing your usual and unusual problems while scraping websites by diving deep into the capabilities of Python’s web scraping tools such as Selenium, BeautifulSoup, and urllib2.

We will then discuss the visualization best practices. Effective visualization helps you get better insights from your data, and help you make better and more informed business decisions.

After completing this Learning Path, you will be well-equipped to extract data even from dynamic and complex websites by using Python web scraping tools, and get a better understanding of the data visualization concepts, how to apply them, and how you can overcome any challenge while implementing them.

Prerequisites: Requires a prior knowledge of Python.

Resources: Code downloads and errata:

  • Learning Python Data Analysis

  • Getting Started with Python Web Scraping

  • Python Data Visualization Solutions

  • PATH PRODUCTS

    This path navigates across the following products (in sequential order):

  • Learning Python Data Analysis (5h 55m)

  • Getting Started with Python Web Scraping (1h 36m)

  • Python Data Visualization Solutions (3h 27m)

  • Table of Contents

    1. Chapter 1 : Learning Python Data Analysis
      1. The Course Overview 00:03:55
      2. Getting started with Python 00:26:23
      3. Getting Data using the Twitter API 00:20:47
      4. Collecting and Storing Tweets 00:09:27
      5. Database Design 00:10:31
      6. Pandas and Databases 00:05:56
      7. Panda Series, Dataframes, and Columnar Operations 00:21:21
      8. Grouping Operations and Working with Date Columns 00:17:01
      9. Merging Operations and Exporting data to JSON/CSV 00:14:54
      10. Array Features, Bucketting Arrays and Histogram Functions 00:21:02
      11. Simple Aggregations 00:21:23
      12. Linear Algebra 00:04:29
      13. Introducting PyQT and MatplotLib 00:31:47
      14. Creating Charts 00:07:36
      15. Simple XY Plots with Axis Scales 00:04:47
      16. Introduction to the NTLK Package 00:19:00
      17. Bag of Words 00:21:33
      18. Classification of Words 00:09:27
      19. Stemming 00:11:53
      20. Simple Sentiment Analysis 00:05:43
      21. Grouping By Dimensions and Classification of Data Types 00:25:08
      22. Trend Analysis and Deriving New Metrics 00:20:29
      23. Correlation Analysis 00:17:28
      24. Course Summary 00:03:42
    2. Chapter 2 : Getting Started with Python Web Scraping
      1. The Course Overview 00:02:44
      2. When to Web Scrape 00:02:57
      3. What Makes up a Website 00:09:50
      4. How to Interact with a Website 00:08:32
      5. Using the Selenium Module 00:12:12
      6. Ethical Web Scraping 00:04:39
      7. Requesting HTML 00:09:14
      8. Using the BeautifulSoup Module 00:13:18
      9. Example: Parsing Wikipedia 00:11:22
      10. Bypassing the Browser 00:04:25
      11. Introduction to APIs 00:04:59
      12. Working with APIs 00:11:52
    3. Chapter 3 : Python Data Visualization Solutions
      1. The Course Overview 00:03:38
      2. Importing Data from CSV 00:04:33
      3. Importing Data from Microsoft Excel Files 00:04:46
      4. Importing Data from Fix-Width Files 00:03:06
      5. Importing Data from Tab Delimited Files 00:02:23
      6. Importing Data from a JSON Resource 00:05:17
      7. Importing Data from a Database 00:05:09
      8. Cleaning Up Data from Outliers 00:05:54
      9. Importing Image Data into NumPy Arrays 00:06:01
      10. Generating Controlled Random Datasets 00:06:36
      11. Smoothing Noise in Real-World Data 00:04:45
      12. Defining Plot Types and Drawing Sine and Cosine Plots 00:07:53
      13. Defining Axis Lengths and Limits 00:05:16
      14. Defining Plot Line Styles, Properties, and Format Strings 00:01:59
      15. Setting Ticks, Labels, and Grids 00:02:43
      16. Adding Legends and Annotations 00:02:33
      17. Moving Spines to Center 00:01:22
      18. Making Histograms 00:03:59
      19. Making Bar Charts with Error Bars 00:03:23
      20. Making Pie Charts Count 00:01:59
      21. Plotting with Filled Areas 00:01:56
      22. Drawing Scatter Plots with Colored Markers 00:02:13
      23. Adding a Shadow to the Chart Line 00:03:56
      24. Adding a Data Table to the Figure 00:02:26
      25. Using Subplots 00:03:57
      26. Customizing Grids 00:03:05
      27. Creating Contour Plots 00:03:24
      28. Filling an Under-Plot Area 00:02:01
      29. Drawing Polar Plots 00:02:56
      30. Visualizing the filesystem Tree Using a Polar Bar 00:03:03
      31. Creating 3D Bars 00:05:33
      32. Creating 3D Histograms 00:03:13
      33. Animating with OpenGL 00:06:02
      34. Plotting with Images 00:06:18
      35. Displaying Images with Other Plots in the Figure 00:03:52
      36. Plotting Data on a Map Using Basemap 00:05:23
      37. Generating CAPTCHA 00:06:36
      38. Understanding Logarithmic Plots 00:05:19
      39. Creating a Stem Plot 00:04:18
      40. Drawing Streamlines of Vector Flow 00:03:28
      41. Using Colormaps 00:05:17
      42. Using Scatter Plots and Histograms 00:04:29
      43. Plotting the Cross Correlation Between Two Variables 00:03:27
      44. The Importance of Autocorrelation 00:04:11
      45. Drawing Barbs 00:06:24
      46. Making a Box-and-Whisker Plot 00:03:37
      47. Making Gantt Charts 00:03:50
      48. Making Error Bars 00:04:40
      49. Making Use of Text and Font Properties 00:04:00
      50. Understanding the Difference between pyplot and OO API 00:05:13