/ devops

Using Multiple Elasticsearch Indices in Logstash

Logstash is a great way to make the wealth of information available in logs available. Specifically logstash, elasticsearch, and kibana combine to make searching and making sense of the data in logs. Due to the ease of collection and the uncertainty of what you may need in the future, its likely that you are collecting everything. I know that we are but, this has its drawbacks.

The main one being that there is a limited amount of data that we can store due to the size of the drives attached to the elasticsearch servers. For us, we can only hold the last 3 months of logs. For most uses this is sufficient but, what if there are some logs that need to be retained for longer? Unfortunately, elasticsearch-curator is very course grained, you can only drop whole indices, not the result of queries. Of course, you could always make use of another one of Logstashʼs output options but there is an easy way to handle this situation, by sending important logs to a different index.

While this is relatively easy to do, it does take some configuration. For the sake of simplicity, Iʼm going to assume that elasticsearch is running on the same node as the logstash server. If not, fill in the values that you need.

{% highlight ruby %}

output {

if ([program] == "logstash" or [program] == "elasticsearch" or [program] == "nginx") and [environment] == "production" {

elasticsearch {

host => "127.0.0.1"

index => "elk-%{+YYYY.MM.dd}"

}

} else {

elasticsearch {

host => "127.0.0.1"

}

}

}

{% endhighlight %}

So from that, you probably gathered the basic form. In this specific case, Iʼve chosen to send the logs from the ELK stack to the elk index. Probably not that useful but if you change out the program name conditions with something identifying more important logs for your app, this is all you need to get it setup.

There are a couple of issues though. First off, this doesnʼt actually solve the problem that we set out to solve. Sure, all of the logs are going to a new index but, elasticsearch-curator is still going to be removing the logs after the configured size or age. To remedy this, youʼll need to change your curator options.

{% highlight bash %}

Change the settings for the default indices

/usr/local/bin/curator delete --disk-space 110 --prefix logstash

Change the settings for the new indices

/usr/local/bin/curator delete --disk-space 30 --prefix elk

{% endhighlight %}

Now that solves the original problem but, its made a new problem. How exactly can you search both of the indexes? Kibana has the ability built in. At least 3.1.0 does, I havenʼt gotten a change to use kibana 4 yet. Just go to the settings cog and modify this setting to point to both of the indices.

As you can see from the instructions, all you have to do is add all of the indices in a comma seperated list like this [logstash-]YYYY.MM.DD,[elk-]YYYY.MM.DD. Now youʼll be searching both indices whenever you run a query. As far as I can tell, youʼll need to modify the setting for each dashboard.

Youʼve now fixed the original problem but, its likely that you have data in the old indices that you donʼt want to lose on the old expiration schedule. There is a relatively easy way to migrate the data that you want on the new index. The easiest way to make this happen is to wait for the first of the logs to get written to the new index. Youʼll also need to have elasticdump installed. If you already have node.js and npm installed, all you need to do is run npm install -g elasticdump. Once you have it installed, youʼll need to dump the data that you wish to move. elasticdump supports moving the data from the old index to the new one directly but, I ran into issues doing that. I suggest that you first dump it to a file and then import it. Something along these lines should work:

{% highlight bash %}

elasticdump --input=http://127.0.0.1:9200/logstash* --searchBody '{

"query":{

"filtered":{

"query":{

"query_string":{

"query":"(program: "logstash" OR program: elasticsearch OR program: nginx) AND environment: production"

}

}

}

}

}' --output=elk.logs

{% endhighlight %}

Youʼll need to heavily customize that for what you are trying to move. To figure out the query, try it out in kibana. you can replace the "query" value with the exact query you use in kibana but youʼll need to escape any quotes as Iʼ done above. Once the export has completed, youʼll need to purge the old logs. This does mean that youʼll lose a couple of logs during the transition but, I think saving the history is far more important. To delete all of the logs marching your query, simply run curl -XDELETE "http://127.0.0.1:9200/_all/_query" -d '{}' where {} is the same query you ran to export the logs. This will generate an error message but, you can ignore it. It will delete all of the logs marching that query but, it may take a little while. After a sufficient amount of time for the delete to complete, its time to import all of the exported data. To do that simply run:

{% highlight bash %}

elasticdump --input=elk.logs --bulk --bulk-use-output-index-name

--output=http://127.0.0.1:9200/elk-2015.03.15

{% endhighlight %}

Where elk.logs is file that you exported to in the previous step and elk-2015.03.15 is the full name of the new index. There are a variety of ways to find this but I usually just check the disk, on ubuntu the indices are at /var/lib/elasticsearch/elasticsearch/nodes/0/indices/ (you may need to change the 0 to whatever node you are connected to). Once that completes, youʼll have moved all of the data from the old indices to the new one. in my experience the import will take considerably less time than the export.