Posted on by & filed under analytics, infrastructure, IT.

Log files, those reams of text that programs spit out, are for the most part useless to humans. We as operators ignore them until we need to find something specific (“ZOMG! That thing broke, what is happening?!“), or until they start to consume so much disk space that we end up just deleting most of them.

It’s not the logs fault that they are useless to us, or even the Programmer that coded all that useless log output. The logs are just data, and it is up to us to make them useful.

In this post I will show you how you can aggregate your log data across your enterprise, make it easy to search and correlate events across all your machines, and even display some pretty graphs.

So let’s start with some pretty graphs.

Kibana1

Here you see the aggregate of the last twelve hours across many logs displayed as a graph. Included in that stream are VMware, EqualLogic, Authentication, Chef, and other application log data at the rate of around 200 log entries per second. Can you imagine trying to tail all those log files at that rate in your terminal, and trying to determine if there are any trends? By aggregating that data and turning it into a graph, we can more easily see peaks and valleys.

Kibana2

Here I have created a dashboard of the file /var/log/auth.log across thirty servers for the last twenty-four hours — hey, what’s that spike over there around 14:20?

Enhance.”
Kibana3

Enhance, and perform host name analysis.

Kibana4

Huh, most of those log entries are from three hosts. We went from a stream of about 20,000 log messages per second to three hosts that may warrant further investigation in about four clicks — pretty awesome! Because the data was presented in a format that our brains could understand — a graph — we were very easily able to look at it, notice the anomaly, and start to drill down into the data. The same data presented in thirty log files would have been impossible for us to scan and recognize a pattern; that’s the real power of these tools, they make useless masses of data totally and completely relevant and useful.

Now that you are convinced, let’s build something.

The software in this space is growing and moving fast: there are for-pay services, and open source stacks you can build yourself. I will focus on showing you how to build something using Open Source software. The primary elements in this stack are Kibana, Elasticsearch, and Heka.

INSTALL ELASTICSEARCH

http://www.elasticsearch.org/download/

INSTALL HEKA

https://github.com/mozilla-services/heka/releases

Here’s a sample configuration file that will take the file /var/log/auth.log, decode it a bit (making it easier for Elasticsearch to index some of the attributes), and output the payload to STDOUT and Elasticsearch, saving it as hekad.toml for this example:

Execute hekad:

INSTALL KIBANA:

http://www.elasticsearch.org/overview/kibana/installation/

Point your browser at where you installed Nginx and Kibana, you should be able to choose a “Sample Dashboard” that will display your log data.

Next steps.

The above setup is a very simplistic install and is useful as a starting point, but does not represent the totality of what you could do. Some other things you could do to make this even cooler:

  • Place an AMQP like RabbitMQ between your servers and your log host, and have hekad ship log data to the AMQP then pull it out on the logging host.
  • Use the multiple output plug-ins of Heka to write the files out to disk for longer retention (text compresses really nicely, and takes up a lot less space than the Elasticsearch data).
  • Play with getting other application data decoded and write your own decoders for those.
  • Split the components up, and build clusters for each piece: an Elasticsearch cluster, a RabbitMQ cluster, an rsyslog host for long file retention, a Heka cluster — with nodes doing different types of message processing; think nodes just for processing Chef logs, and sending Nagios alerts.

 

Tags: AWS, elasticsearch, heka, kibana, logging, syslog, visualization,

4 Responses to “Getting started with log analysis”

  1. Richard

    Thanks for this Augie!

    Do you any suggestions for approaches to conducting more involved analyses (e.g. linear algebra, regression, etc.) on the sort of massive data sets typically held in in ES? I’m used to working in R and Python, but it seems cumbersome to export massive JSON indices just to import them back into these memory limited tools.

    Thanks for any suggestions.

    Richard