O'Reilly logo
live online training icon Live Online training

Streaming data analysis with Go

enter image description here

Leveraging Go’s core primitives to analyze events in real-time

Daniel Whitenack

Go provides an excellent platform for real-time analysis. In this training, we will stream text data (tweets) and analyze the sentiment (negative or positive) of those tweets in real-time. However, the principles from this example could be applied to streaming analysis of log lines, user interactions, financial transactions, and much more.

What you'll learn-and how you can apply it

After taking this course, you will be able to:
- Write handlers for streaming data input with Go
- Parse textual data in event messages
- Analyze the sentiment of portions of text

Participants will understand…
- Basic Go syntax and program structure
- Asynchronous handling of events
- High-level sentiment analysis techniques

Participants will be able to…
- Utilize a pattern for real time data analysis with Go
- Parse text data with Go
- Generate predictions of sentiment for portions of text

This training course is for you because...

  • You are a developer with some experience writing Go, and you need to process data in
    real time
  • You are a data analyst working with streaming data, and you need to implement
    streaming analyses in a language made to handle concurrency

Prerequisites

  • Programming experience in some language (Go, Python, R, etc.)
  • Some experience working on the command line (files I/O, etc.)

Recommended Preparation:
- Complete the Go tour: https://tour.golang.org/welcome/1
- Read through the Twitter API docs: https://dev.twitter.com/streaming/overview

Downloads required for the course:

This training uses JupyterHub. A link will be provided at the start of the course for the needed notebooks and materials. If you prefer to install locally, please follow the instructions in the gophernotes README so you can run Go in a Jupyter notebook.

To test that you will be able to run our Jupyter notebooks in your upcoming training, please follow these steps below:

  1. Navigate here: https://notebook.oreilly-jupyterhub.com
  2. Sign in with your Safari credentials
  3. Click start my server
  4. Click on notebook.ipynb
  5. Run each of the code cells: click the cell and then either press Shift+Return or click the triangle in the top menu.
  6. There may be a few second delay, but you should eventually see the 3 graphs. If you do not, this probably means that your firewall is blocking JupyterHub’s websockets. Please turn off your company VPN or speak with your system administrator to allow.

About your instructor

  • Daniel (@dwhitena) is a Ph.D. trained data scientist working with Pachyderm (@pachydermIO). Daniel develops innovative, distributed data pipelines which include predictive models, data visualizations, statistical analyses, and more. He has spoken at conferences around the world (ODSC, Spark Summit, R Conference NYC, PyCon, GopherCon, JuliaCon, and more), teaches data science/engineering with Ardan Labs (@ardanlabs), maintains the Go kernel for Jupyter, and is actively helping to organize contributions to various open source data science projects.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

The time frames are only estimates and may vary according to how the class is progressing

DAY 1:

Go standard library components
- Utilizing structs to aggregate data
- Understanding synchronization primitives including mutual exclusion locks
- Utilizing the “sync” package
- When not to use channels and goroutines

Interacting with a streaming data source
- Connecting to Twitter
- Handling tweets

Day 2:

Leverage more of the standard library
- When to utilize channels and goroutines
- Buffering tweets
- Making sure we keep up with the streaming data source

Add a bit of intelligence to our analysis
- Understanding sentiment analysis
- Aggregating the sentiment of a twitter feed