/ devops

Parsing Nginx logs with logstash

There happens to be a great many descriptions of how to do this particular thing readily available with just a search but, all of the ones that I was able to find left me wanting. Depending on the systemʼs use, we use either Nginx or Haproxy. Haproxy has a number of great grok templates already but, a similar thing does not exist for Nginx. Given that we use both, it was important to parse the logs in the same way so that the information can be queried using the same syntax. It is surprisingly easy to do.

The first thing that you will need to do is add a custom Nginx log format and set your access log to use the custom format:

http {
  ...

  log_format logstash '$http_host $remote_addr [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_time $upstream_response_time';
  access_log /var/log/nginx/access.log logstash;

  ...
}

Only the log_format portion needs to occur in the http{...} block. You can specify the access_log option in each server{...} block but, using this setup will capture access logs for all of the virtual hosts.

Now youʼll also need to modify your logstash configuration:

filter {
  ...

  grok {
    type => "nginx-access"
    match => [
      "message", "%{IPORHOST:http_host} %{IPORHOST:client_ip} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:http_verb} %{NOTSPACE:http_request}(?: HTTP/%{NUMBER:http_version})?|%{DATA:raw_http_request})\" %{NUMBER:http_status_code} (?:%{NUMBER:bytes_read}|-) %{QS:referrer} %{QS:agent} %{NUMBER:time_duration:float} %{NUMBER:time_backend_response:float}",
      "message", "%{IPORHOST:http_host} %{IPORHOST:client_ip} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:http_verb} %{NOTSPACE:http_request}(?: HTTP/%{NUMBER:http_version})?|%{DATA:raw_http_request})\" %{NUMBER:http_status_code} (?:%{NUMBER:bytes_read}|-) %{QS:referrer} %{QS:agent} %{NUMBER:time_duration:float}"
    ]
  }

  ...
}

After modifying your logstash configuration, youʼll need to restart logstash. Now all you need to do is ship your Nginx access logs to your logstash server using the type nginx-access, I recommend using logstash-forwarder to do this. Once you start shipping them, youʼll see that they are parsed using the same variable names as the Haproxy grok templates so that youʼll be able to query accross both your haproxy and your nginx logs using the same queries.