Streaming data analysis with Go
Leveraging Go’s core primitives to analyze events in real-time
Go provides an excellent platform for real-time analysis. In this training, we will stream text data (tweets) and analyze the sentiment (negative or positive) of those tweets in real-time. However, the principles from this example could be applied to streaming analysis of log lines, user interactions, financial transactions, and much more.
What you'll learn-and how you can apply it
After taking this course, you will be able to: - Write handlers for streaming data input with Go - Parse textual data in event messages - Analyze the sentiment of portions of text
Participants will understand… - Basic Go syntax and program structure - Asynchronous handling of events - High-level sentiment analysis techniques
Participants will be able to… - Utilize a pattern for real time data analysis with Go - Parse text data with Go - Generate predictions of sentiment for portions of text
This training course is for you because...
- You are a developer with some experience writing Go, and you need to process data in real time
- You are a data analyst working with streaming data, and you need to implement streaming analyses in a language made to handle concurrency
- Programming experience in some language (Go, Python, R, etc.)
- Some experience working on the command line (files I/O, etc.)
Recommended Preparation: - Complete the Go tour: https://tour.golang.org/welcome/1 - Read through the Twitter API docs: https://dev.twitter.com/streaming/overview
Downloads required for the course:
This training uses JupyterHub. A link will be provided at the start of the course for the needed notebooks and materials. If you prefer to install locally, please follow the instructions in the gophernotes README so you can run Go in a Jupyter notebook.
To test that you will be able to run our Jupyter notebooks in your upcoming training, please follow these steps below:
- Navigate here: https://notebook.oreilly-jupyterhub.com
- Sign in with your Safari credentials
start my server
- Click on
- Run each of the code cells: click the cell and then either press
Shift+Returnor click the triangle in the top menu.
- There may be a few second delay, but you should eventually see the 3 graphs. If you do not, this probably means that your firewall is blocking JupyterHub’s websockets. Please turn off your company VPN or speak with your system administrator to allow.
About your instructor
Daniel (@dwhitena) is a Ph.D. trained data scientist working with Pachyderm (@pachydermIO). Daniel develops innovative, distributed data pipelines which include predictive models, data visualizations, statistical analyses, and more. He has spoken at conferences around the world (ODSC, Spark Summit, R Conference NYC, PyCon, GopherCon, JuliaCon, and more), teaches data science/engineering with Ardan Labs (@ardanlabs), maintains the Go kernel for Jupyter, and is actively helping to organize contributions to various open source data science projects.
The timeframes are only estimates and may vary according to how the class is progressing
The time frames are only estimates and may vary according to how the class is progressing
Go standard library components - Utilizing structs to aggregate data - Understanding synchronization primitives including mutual exclusion locks - Utilizing the “sync” package - When not to use channels and goroutines
Interacting with a streaming data source - Connecting to Twitter - Handling tweets
Leverage more of the standard library - When to utilize channels and goroutines - Buffering tweets - Making sure we keep up with the streaming data source
Add a bit of intelligence to our analysis - Understanding sentiment analysis - Aggregating the sentiment of a twitter feed