Managing logs is difficult. It gets even more complicated when your infrastructure is bigger. And, making it dynamic (changing instances, all the time) doesn’t help at all.
You have commercial services like Splunk and Loggly, but they can get very expensive, very quickly. We would prefer to run it ourselves, providing that the following are true:
Figure 9-1. Logstash distributed logging
The base setup should be able to handle several hundred events per second. The shippers have a small footprint. The reader (and interface) run on high CPU medium. For the elasticsearch (powering the interface) we’ll use two high memory medium instances.
Logstash works with
output definitions. Most of the time shippers read from file (
input), do not filter very much, and write out to a middleware message bus (
output). Logstash comes with many plug-ins.
The default isolated message bus is RabbitMQ; you can also use Redis, but we want to use SQS, of course.
The latest logstash.jar comes with the AWS SDK. This means ...