O'Reilly logo
live online training icon Live Online training

Geo-Located Data: Extracting Patterns from Mobile Data Using Scikit-Learn and Cassandra

Learn how to extract patterns and detect anomalies within geo-located data, using machine learning clustering algorithms

Natalino Busa

Join Natalino Busa for an introduction to extracting patterns from geo-located data and building geo-located microservices.

In this online course, you’ll prototype a venue recommender and a geo-fencing alerting engine, using geo-located data and machine learning clustering algorithms, practicing the skills you need to build your own geo-located data applications.

You’ll see how geographical analyses enable a wide range of services, from location-based recommenders to advanced security systems, and you’ll learn how to package data-driven applications based on geographical data and expose these insights as (micro) services.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • Machine learning “K-Means” and “DBSCAN” clustering techniques
  • How to cluster geo-located data
  • How to detect patterns and anomalies
  • How to use Cassandra as a datastore for events and models

And you’ll be able to:

  • Use Scikit-Learn for preparing and clustering geo-located data
  • Prototype a basic venue recommender
  • Prototype a basic geo-fencing alerting engine
  • Load and extract geo-located data and models in Cassandra
  • Build a data-driven microservice in Python

This training course is for you because...

  • You are a developer or a data engineer/scientist who wants to learn how to design data-driven predictive APIs and microservices
  • You are a technical lead who wants to understand modern data processing pipelines using NoSQL and machine learning technologies
  • You are a leader or member of a design or business team with an interest in geo-located data and social networks data such as check ins, events, and venues
  • You work with Python and machine learning
  • You want to become a data scientist and are interested in concrete uses cases such recommenders and anomaly detection engines for geo-located data

Prerequisites

  • Intermediate knowledge of Python
  • Introductory level knowledge of machine learning
  • Some affinity for (computational) geometry

Materials or downloads needed in advance:

  • Github Repo Contains the set-up instructions.
  • Docker
  • Docker image containing all necessary tools and example files, to be provided to participants by the instructor in advance

Recommended Preparation:

Intermediate Python Programming

Introduction to Machine Learning with Python

Algorithms in a Nutshell

About your instructor

  • Natalino Busa is head of Data Science at Teradata, where he provides consultancy services and delivers big/fast data solutions for data-driven applications such as predictive analytics, personalized marketing, man-machine interaction, fraud and cyber security.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Day One: High-level overview & venue recommender

Theory (1 hour)

  • Welcome and overview of schedule of day
  • Introducing the dataset, and modeling with Cassandra
  • Introduction to K-Means clustering: strengths and weaknesses
  • Clustering geo-located data, Voronoi tessellation
  • Range of potential use cases, interesting domains
  • Prototyping microservices with Python Flask

Practice (1 hour)

  • Write a python notebook for clustering geo-located data
  • Access the data points from Cassandra
  • Collect basic statistics about the dataset
  • Apply k-means to geo-located data
  • Visualizing the clusters using matplotlib
  • Overlay data, maps and clusters using voronoi tessellation, and open street maps
  • Build a simple web api service using flask

Day Two: Anomaly detection & geo-fence alerts

Theory (1 hour)

  • Introduction to DBSCAN clustering: strengths and weaknesses
  • Spatial and Temporal clustering
  • Summarize DBSCAN clusters: Convex hulls and Delaunay triangulation
  • Data engineering: Lambda Architecture
  • Introduction to streaming analytics

Practice (1 hour)

  • Write a python notebook for clustering geo-located data
  • Apply DBSCAN to geo-located data
  • Visualizing the clusters using matplotlib
  • GeoFencing Service: Build a simple web api service using flask
  • Some ideas about scaling python microservices with uwsgi
  • Q&A