Chapter 5. Web Mining, Databases, and Big Data

On the menu for this chapter are the following recipes:

  • Simulating web browsing
  • Scraping the Web
  • Dealing with non-ASCII text and HTML entities
  • Implementing association tables
  • Setting up database migration scripts
  • Adding a table column to an existing table
  • Adding indices after table creation
  • Setting up a test web server
  • Implementing a star schema with fact and dimension tables
  • Using HDFS
  • Setting up Spark
  • Clustering data with Spark

Introduction

This chapter is light on math, but it is more focused on technical topics. Technology has a lot to offer for data analysts. Databases have been around for a while, but the relational databases that most people are familiar with can be traced back to the 1970s. Edgar Codd came ...

Get Python Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.