Elk in Production

Elk is a software stack for processing and searching logs. Elk consists of Elasticsearch, Logstash, and Kibana. Logstash is a log processing tool. It supports a variety of inputs, log processing functions and outputs. One of its outputs is typically Elasticsearch. Elasticsearch is a full-text search engine based on Lucene. Elasticsearch makes your logs easily searchable and Kibana is a web interface for searching and making dashboards out of the data stored in logstash.

Logstash is written in Ruby and targets the jRuby interpreter. In general, it doesnʼt matter other than the configuration file is written in ruby. As previously mentioned, Logstash supports a large number of inputs such as a file, heroku, Kafka and many more. There is probably an input for however, you are currently logging. At Signal Vine, we are currently using the lumberjack, Rabbitmq and syslog inputs.

The easiest way to get started with collecting logs is the syslog input. Many components of your current infrastructure probably already send logs over syslog. It’s easy to configure your chosen syslog daemon to forward logs over tcp. Its probably just as easy to use udp but that isnʼt a good idea as then logs will be dropped even under normal operating conditions. You should be careful to keep the logstash server up and running when collecting syslog messages over tcp. Iʼve seen some strange behavior when the Logstash server is unavailable. Basically, some processes use a large amount of cpu while not doing anything. One downside to using the syslog input is that the timestamps arenʼt very precise, they are only second precision. This is only a problem because Elasticsearch doesnʼt preserve the insert order. What this means is that when you are viewing log messages from within the same second, they likely wonʼt be in the same order that they occurred. Most of the time this isnʼt an issue but it is something to keep in mind.

Lumberjack is a bit harder to explain. Lumberjack is a custom binary format with encryption built in. It was designed for use with lumberjack, which was a Go service that reads from files on the server and ships them to a Logstash server. It has recently been acquired by Elastic and renamed to Logstash-forwarder. While you could run the Logstash service on every server to read from log files and ship them to a central instance to finish processing them, the Logstash service has a decent amount of overhead. It runs on the jvm and as such uses a decent chunk of memory. Logstash-forwarder is very efficient and you shouldnʼt notice its overhead even on the smallest of servers. It even has encryption built in. Unfortunately, this does make it harder to get setup. It took me a few tries to get it right but it has been incredibly stable since then. In fact, since I set it up, I havenʼt had to touch it at all. It simply works. It even has handled resuming after downtime on the logstash server without any sort of intervention.

At Signal Vine, we use Rabbitmq for the internal communication medium between components of our application. Weʼve also chosen to ship our application level logs over Rabbitmq. Logstash has a built in input for processing Rabbitmq messages. This works out really well if you send json to Logstash. Logstash can take the log messages from Rabbitmq and directly insert them into Elasticsearch. This allows you add any custom fields that you want to query on. This then becomes a very effective way to query your own logs. It does help if you have a few standard fields that logs must contain. This allows you to query across all of your application logs using very similar queries.

Once Logstash has received the log messages, it can process them with various types of filters. These filters include: adding geolocation data to web server logs, parsing the a date field included in the log message, the ability to process custom log formats and even run ruby code on the message. These filters are useful for coercing whatever sort of log message you get into a queryable format for later consumption. The available functions make this process very easy. Given that the configuration file is made in ruby, youʼre able to selectively apply these filters based on any scheme that you desire. It might be useful to apply one filter to all of the logs that come from a specific input. You can also target these filters based on the contents of the log message.

The notable exception to the processing ease is grok. Grok allows you to process custom log formats using a regex which, can be a difficult process. Luckily Logstash ships with a number of complete patterns and partial patterns that you can reuse for your logging format. There is also a great tool for building your grok filter based on an example of a message. It’s important to add a unique tag if the parsing fails and you should leave the default _grokparsefailure as the combination of these two tags allows you to easily query log messages that failed to parse and then locate the specific filter that failed.

Once the messages have been fully processed, Logstash can send these messages to a variety of systems for further processing or storage. You can send the logs to s3 for permanent storage, Send an email with a particularly troubling log message, forward the data to another logstash instance or store them in Elasticsearch. I think Elasticsearch is nearly universally used. It is a very effective way to query logging data. At Signal Vine, we also use the Riemann output to process logs that we may want to be alerted about. I find that this process works really well as Iʼm able to use all of Riemannʼs powerful stream processing tools to filter the notifications that the engineering team receives. I would like to forward the full log message to Riemann but there happens to be a bug with a pending resolution preventing this.

After all of the information about Logstash, the information about Elasticsearch is rather dull. It serves as an easily queryable data store. It supports full-text search and it includes Luceneʼs powerful querying language to make short work of finding relevant log messages. Elasticsearch allows you to scale horizontally based on the number of shards that you choose when configuring it. This is very useful for if the amount of logs that you want to keep exceeds the capacity of a single server or if your write volume is greater than a single server can handle. Elasticsearch also allows you to specify the number of replicas to keep which will allow you to lose some servers without losing any of your data. Just keep in mind that Elasticsearch has some catastrophic failure modes. Even with the ability to shard the data across many servers, most likely it wonʼt be long before you run out of storage for your logs. That is where curator comes in. Curator allows you to specify how much logging data to keep. You can either choose how long to keep logging data or the total amount of disk space to use.

Once your logs are stored in Elasticsearch, you need a way to easily query them. This is where the final piece of the Elk stack comes in, Kibana. Iʼm currently using Kibana 3 as I havenʼt yet put in the time to understand Kibana 4. Iʼve tried it out but, I wasnʼt a fan of it as it seems very unfamiliar. I found the changes to be dizzying. Iʼm sure that there are quite a number of benefits but, as I havenʼt yet researched them, I donʼt know what they are. With that being said, Kibana is a great way to look at your log data. It exposes all of the powerful querying functionality of Elasticsearch for doing ad-hoc querying of your logs as well as creating great dashboards from the data contained in your logs. When you look through the documentation, there are several great examples of this.

The Elk stack is not without its flaws but, it gives you a powerful set of tools for processing log data. Having centralized storage for logs is an invaluable tool for any IT team to have. Having a single location where you can research issues happening on your infrastructure can greatly speed up the time between detection and the fix. It might even help you find errors that you may not have noticed without it. The Elk stack is the best tooling in this area that Iʼve seen and you should think about using it as well.

Using Multiple Elasticsearch Indices in Logstash

Logstash is a great way to make the wealth of information available in logs available. Specifically logstash, elasticsearch, and kibana combine to make searching and making sense of the data in logs. Due to the ease of collection and the uncertainty of what you may need in the future, it’s likely that you are collecting everything. I know that we are but, this has its drawbacks.

The main one being that there is a limited amount of data that we can store due to the size of the drives attached to the elasticsearch servers. For us, we can only hold the last 3 months of logs. For most uses this is sufficient but, what if there are some logs that need to be retained for longer? Unfortunately, elasticsearch-curator is very coarse-grained, you can only drop whole indices, not the result of queries. Of course, you could always make use of another one of Logstashʼs output options but there is an easy way to handle this situation, by sending important logs to a different index.

While this is relatively easy to do, it does take some configuration. For the sake of simplicity, Iʼm going to assume that elasticsearch is running on the same node as the logstash server. If not, fill in the values that you need.

output {
  if ([program] == "logstash" or [program] == "elasticsearch" or [program] == "nginx") and [environment] == "production" {
    elasticsearch {
      host => "127.0.0.1"
      index => "elk-%{+YYYY.MM.dd}"
    }
  } else {
    elasticsearch {
      host => "127.0.0.1"
    }
  }
}

So from that, you probably gathered the basic form. In this specific case, Iʼve chosen to send the logs from the ELK stack to the elk index. Probably not that useful but if you change out the program name conditions with something identifying more important logs for your app, this is all you need to get it setup.

There are a couple of issues though. First off, this doesnʼt actually solve the problem that we set out to solve. Sure, all of the logs are going to a new index but, elasticsearch-curator is still going to be removing the logs after the configured size or age. To remedy this, youʼll need to change your curator options.

# Change the settings for the default indices
/usr/local/bin/curator delete --disk-space 110 --prefix logstash
# Change the settings for the new indices
/usr/local/bin/curator delete --disk-space 30 --prefix elk

Now that solves the original problem but, it made a new problem. How exactly can you search both of the indexes? Kibana has the ability built in. At least 3.1.0 does, I havenʼt gotten a chance to use kibana 4 yet. Just go to the settings cog and modify this setting to point to both of the indices.

As you can see from the instructions, all you have to do is add all of the indices in a comma-separated list like this [logstash-]YYYY.MM.DD,[elk-]YYYY.MM.DD. Now youʼll be searching both indices whenever you run a query. As far as I can tell, youʼll need to modify the setting for each dashboard.

Youʼve now fixed the original problem but, its likely that you have data in the old indices that you donʼt want to lose on the old expiration schedule. There is a relatively easy way to migrate the data that you want on the new index. The easiest way to make this happen is to wait for the first of the logs to get written to the new index. Youʼll also need to have elasticdump installed. If you already have node.js and npm installed, all you need to do is run npm install -g elasticdump. Once you have it installed, youʼll need to dump the data that you wish to move. elasticdump supports moving the data from the old index to the new one directly but, I ran into issues doing that. I suggest that you first dump it to a file and then import it. Something along these lines should work:

elasticdump --input=http://127.0.0.1:9200/logstash* --searchBody '{
  "query":{
    "filtered":{
      "query":{
        "query_string":{
          "query":"(program: \"logstash\" OR program: elasticsearch OR program: nginx) AND environment: production"
        }
      }
    }
  }
}' --output=elk.logs

Youʼll need to heavily customize that for what you are trying to move. To figure out the query, try it out in kibana. you can replace the "query" value with the exact query you use in kibana but youʼll need to escape any quotes as Iʼ done above. Once the export has completed, youʼll need to purge the old logs. This does mean that youʼll lose a couple of logs during the transition but, I think saving the history is far more important. To delete all of the logs marching your query, simply run curl -XDELETE "http://127.0.0.1:9200/_all/_query" -d '{}' where {} is the same query you ran to export the logs. This will generate an error message but, you can ignore it. It will delete all of the logs marching that query but, it may take a little while. After a sufficient amount of time for the delete to complete, its time to import all of the exported data. To do that simply run:

elasticdump --input=elk.logs --bulk --bulk-use-output-index-name \
  --output=http://127.0.0.1:9200/elk-2015.03.15

Where elk.logs is file that you exported to in the previous step and elk-2015.03.15 is the full name of the new index. There are a variety of ways to find this but I usually just check the disk, on ubuntu the indices are at /var/lib/elasticsearch/elasticsearch/nodes/0/indices/ (you may need to change the 0 to whatever node you are connected to). Once that completes, youʼll have moved all of the data from the old indices to the new one. in my experience, the import will take considerably less time than the export.