O'Reilly logo
live online training icon Live Online training

Geo-Located Data: Extracting Patterns from Mobile Data Using Scikit-Learn and Cassandra

Learn how to extract patterns and detect anomalies within geo-located data, using machine learning clustering algorithms

Natalino Busa

Join Natalino Busa for an introduction to extracting patterns from geo-located data and building geo-located microservices.

In this online course, you’ll prototype a venue recommender and a geo-fencing alerting engine, using geo-located data and machine learning clustering algorithms, practicing the skills you need to build your own geo-located data applications.

You’ll see how geographical analyses enable a wide range of services, from location-based recommenders to advanced security systems, and you’ll learn how to package data-driven applications based on geographical data and expose these insights as (micro) services.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • Machine learning “K-Means” and “DBSCAN” clustering techniques
  • How to cluster geo-located data
  • How to detect patterns and anomalies
  • How to use Cassandra as a datastore for events and models

And you’ll be able to:

  • Use Scikit-Learn for preparing and clustering geo-located data
  • Prototype a basic venue recommender
  • Prototype a basic geo-fencing alerting engine
  • Load and extract geo-located data and models in Cassandra
  • Build a data-driven microservice in Python

This training course is for you because...

  • You are a developer or a data engineer/scientist who wants to learn how to design data-driven predictive APIs and microservices
  • You are a technical lead who wants to understand modern data processing pipelines using NoSQL and machine learning technologies
  • You are a leader or member of a design or business team with an interest in geo-located data and social networks data such as check ins, events, and venues
  • You work with Python and machine learning
  • You want to become a data scientist and are interested in concrete uses cases such recommenders and anomaly detection engines for geo-located data

Prerequisites

  • Intermediate knowledge of Python
  • Introductory level knowledge of machine learning
  • Some affinity for (computational) geometry

Materials or downloads needed in advance:

  • Github Repo Contains the set-up instructions.
  • Docker
  • Docker image containing all necessary tools and example files, to be provided to participants by the instructor in advance

Recommended Preparation:

Intermediate Python Programming

Introduction to Machine Learning with Python

Algorithms in a Nutshell

About your instructor

  • Natalino Busa (@natbusa) is a passionate scientist and engineer on a daily diet of data science, analytics, math and algorithm. In his roles as Architect, CDO and CTO he has coached and bootstrapped many R&D teams and delivered AI- and Data- driven applications for banking, retail, and infotainment domains. He has worked in the past as lead engineer and scientist for Philips, and ING Bank in the Netherlands and DBS bank in Singapore. Currently Chief Data Scientist at Teko in Vietnam.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Day One: High-level overview & venue recommender

Theory (1 hour)

  • Welcome and overview of schedule of day
  • Introducing the dataset, and modeling with Cassandra
  • Introduction to K-Means clustering: strengths and weaknesses
  • Clustering geo-located data, Voronoi tessellation
  • Range of potential use cases, interesting domains
  • Prototyping microservices with Python Flask

Practice (1 hour)

  • Write a python notebook for clustering geo-located data
  • Access the data points from Cassandra
  • Collect basic statistics about the dataset
  • Apply k-means to geo-located data
  • Visualizing the clusters using matplotlib
  • Overlay data, maps and clusters using voronoi tessellation, and open street maps
  • Build a simple web api service using flask

Day Two: Anomaly detection & geo-fence alerts

Theory (1 hour)

  • Introduction to DBSCAN clustering: strengths and weaknesses
  • Spatial and Temporal clustering
  • Summarize DBSCAN clusters: Convex hulls and Delaunay triangulation
  • Data engineering: Lambda Architecture
  • Introduction to streaming analytics

Practice (1 hour)

  • Write a python notebook for clustering geo-located data
  • Apply DBSCAN to geo-located data
  • Visualizing the clusters using matplotlib
  • GeoFencing Service: Build a simple web api service using flask
  • Some ideas about scaling python microservices with uwsgi
  • Q&A