Posted on by & filed under Content - Highlights and Reviews, Programming & Development.

A guest post by James Turnbull, author of six technical books about open source software and a long-time member of the open source community. James authored the first (and second!) books about Puppet and works for Venmo as VP of Engineering.

Logging can be both painful and critical to managing your applications and infrastructure. It’s also hard to build a viable log management system that actually provides real insight into your environment. Enter LogStash. LogStash is easy to set up and deploy. It provides a simple and extensible framework for gathering, parsing, filtering and outputting log events. We’re going to go through the basic steps of installing and configuring LogStash.

LogStash’s principal prerequisite is Java, and LogStash itself runs in a Java Virtual Machine or JVM. So let’s start by installing Java. The fastest way to do this is via our distribution’s packaging system. On Red Hat:

Or on Debian and Ubuntu:

Once we have Java installed we can grab the LogStash package. Although LogStash is written in JRuby, it is released as a standalone JAR file containing all of the required dependencies. This means we don’t need to install JRuby or any other packages.

Download the JAR file. On our host we’re going to download it and rename it as logstash.jar:

Once we have the JAR file we can launch it with the java binary and a simple configuration file. First, let’s create the configuration file. We’re going to call ours sample.conf and you can see it here:

Our sample.conf file contains three configuration blocks: input, filter and output. Each block configures a different portion of the LogStash agent:

  • inputs – How events get into LogStash
  • filters – How you can manipulate events in LogStash
  • outputs – How you can output events from LogStash

In the LogStash world, events enter via inputs, and they are manipulated, mutated or changed in filters. They then exit LogStash via outputs.

Inside each component’s block you can specify and configure plugins. For example, in the input block above, we’ve defined the file plugin, which inputs from a file – in this case the Apache access log. In the filter block, we’ve defined the grok plugin that parses log output and processes it into more structured formats. In the output block, we’ve configured the stdout plugin, which outputs events to STDOUT.

Combined, this configuration will read Apache access log events, parse them into a usable format and then output them to STDOUT. In the real world, we’d usually output them to ElasticSearch to allow LogStash to make them searchable.

Now that we have a configuration file, let’s run LogStash for ourselves:

We’ve used the java binary and specified our downloaded JAR file using the -jar option. We’ve also specified three command line flags: agent, which tells LogStash to run as the basic agent; -v, which turns on verbose logging; and -f, which specifies the configuration file LogStash should start with.

Now that Logstash is running let’s generate some HTTP traffic to get some events. When we do, we should see some output on STDOUT that looks like:

We can see our event has been printed as a hash. Indeed it’s represented internally in LogStash as a JSON hash. The format is made up of a number of elements:

  • @source: The source of the event which includes the plugin that generated it and the hostname that produced it.
  • @tags: An array of potential tags on the event.
  • @fields: A set of fields, for example "verb": "GET" for the event.
  • @timestamp: An [ISO8601 timestamp](
  • @source_host: The source host of the event.
  • @source_path: The path, if any, of a source, for example /var/log/apache2/access.log.
  • @message: The event’s message.
  • @type: The type of log event, user-specified.

You can see that LogStash has taken an unstructured Apache access log event and turned it into a structured event that breaks out the information you need to know to understand it: the byte count, the verb used, the request path, and so on. We can then take this event and do a number of useful things with it:

  • Store it in ElasticSearch and make it searchable.
  • Generate metrics and graphs, for example, using Graphite to produce graphing showing the totals of each HTTP verb type.
  • Alert or track specific elements of an event, for example 4xx and 5xx errors.


From this, it should be immediately clear how powerful LogStash can be. In addition to Apache, numerous other data sources can be ingested, parsed, structured and then stored by LogStash. LogStash allows you to take control of the previously unmanaged data in your logs, add context to it and allow you to make operational and trouble-shooting use of that data. If you’re interested in learning more about LogStash, check out the LogStash site or my new book on LogStash.

Safari Books Online has the content you need

Resilience and Reliability on AWS shows you how to prepare for potentially devastating interruptions by building your own resilient and reliable applications in the public cloud. Read Chapter 9: LogStash in Resilience and Reliability on AWS to learn more about LogStash.
JRuby Cookbook offers practical solutions for using the Java implementation of the Ruby language, with targeted recipes for deploying Rails web applications on Java servers, integrating JRuby code with Java technologies, developing JRuby desktop applications with Java toolkits, and more.
JavaScript: The Complete Reference, Third Edition is completely revised to cover the newest changes to JavaScript up to version 1.9, the latest browser-specific features for Internet Explorer, Firefox, and Chrome, as well popular JavaScript frameworks like jQuery. Read this JSON section in JavaScript: The Complete Reference, Third Edition to learn more about JSON.

About the author

jamespro2 James Turnbull is the author of six technical books about open source software and a long-time member of the open source community. James authored the first (and second!) books about Puppet and works for Venmo as VP of Engineering. He was previously at Puppet Labs running Operations and Professional Services.

Tags: Apache, JAR, java, Java Virtual Machine, jruby, JSON, logging, LogStash,

Comments are closed.