Elasticsearch and Kibana

Elasticsearch and Kibana are open source tools run by Elastic. Elasticsearch, at its simplest, is a document search storage engine and Kibana is the user interface for Elasticsearch.

Elasticsearch

Elasticsearch is a search engine based on the Lucene search engine. It is as scalable and flexible as its name suggests. Elasticsearch stores data centrally and its documents can be searched in near real-time. It enables advanced search queries for detailed analysis. You can perform and combine multiple types of search, including structured, unstructured, by geography, by metrics, etc. Advanced search operations like paging, sorting, filtering, scripting, aggregation are all also available.

Elasticsearch offers a comprehensive REST API that can be used to interact with your Elasticsearch cluster. In addition to executing search operations, you can also use your API to check your cluster, node, and index health, status and statistics; administer your cluster, node, and index data and metadata; and perform CRUD* (Create, Read, Update, and Delete) and search operations against your indexes.

According to the DB-Engines ranking, Elasticsearch is the most popular enterprise search engine and one of the ten most popular database management systems. Most websites that store large amounts of data use Elasticsearch for their search engines as it is so fast; for instance, Wikipedia, Ebay, Yelp and Netflix are all users.

*NB You can’t use the CRUD operations with the elasticsearch index provided by section.io

Elastic

Elastic, the company behind Elasticsearch, has always harbored ambitions beyond just search technology. Its Elastic Stack, in which Elasticsearch remains the key component, also includes Logstash for data collection and Kibana for analytics and data visualization. These three products are designed for use as a single integrated log management tool, previously known as the ELK Stack. However, the company then added a fourth product, Beats, to its stack (a platform for single-purpose data shippers) and continues to add to its portfolio - largely by acquiring startups in different fields, including most recently Swiftype. All its products are open source. The Elastic Stack can be deployed on premises or made available as Software as a Service (SaaS).

Kibana

Kibana is Elastic’s data visualization plug-in. It offers visualization tools on top of the content indexed by an Elasticsearch cluster. Kibana’s diagrams can visualize complex queries executed through ElasticSearch, as well as geospatial data and timelines that show how different services are performing over time. Custom graphs that fit the needs of specific applications can be generated and saved. Through visualizing the information stored in Elasticsearch, Kibana gives developers’ rapid insights into the documents stored and how their system/s are operating.

Lucene Search Query Syntax in Kibana

There are multiple ways in which you can search Kibana, for instance:

  • Enter a text string to perform a free text search e.g. enter ‘chrome’ to search all fields for the term ‘chrome’
  • To search for a value in a certain field, prefix the value with the field name e.g. enter ‘status: 100’ to find every entry that contains the value ‘100’ in the ‘status’ field
  • To search for a range of values, use the bracketed range syntax as in Lucerne: [START_VALUE TO END_VALUE] e.g. to find entries that have 2xx status codes, enter ‘status: [200 to 299]
  • Use the Boolean operators AND, OR and NOT to indicate more complex search criteria e.g. to find entries that have 2xx status codes and an extension of ‘php’’, enter ‘status: [200 to 299] AND (extension: php)

How section.io Works with Elasticsearch and Kibana

At section.io we provide a hosted Elasticsearch and Kibana instance for each customer, which provides transparency and visibility for developers. We use Elasticsearch and Kibana to gain visibility into our customers’ HTTP traffic, taking the logs from each proxy in your stack and putting them into Elasticsearch to be able to track requests, identify problematic patterns and solve any issues that may need fixing.

Indexes, Types, Documents, Proxies, Fields

Each section.io application has its own Elasticsearch index, which acts as an identifier or tag. This index is used to track the customer application through Elasticsearch and/or Kibana.

Inside each index, every document is further broken down by a _type field, which specifies the type of log for each proxy. Every application stack at section.io has an edge proxy and a last proxy, plus the proxies that relate to each module in each specific application stack e.g. a Varnish proxy, an OpenResty proxy, etc. Each _type has different types of field, for example geo info path, URL, user agent, etc. Some of these are shared across logs for the different proxies; others are specific to certain proxies.

Edge Proxy / _type: edge-access-log

Every application proxy stack within section.io starts with the edge proxy. When a customer makes a request, it always hits the edge first then goes through the proxies for the other modules until it hits the last proxy. The edge proxy acts as the endpoint through which the user’s web browser connects. It performs several critical actions:

  • Performs the TLS handshake for HTTPS connections
  • Routes requests to the correct application proxy stack
  • Implements the HTTP/2 protocol
  • Requests correlation

These functions ensure that your experience on the section.io platform is consistent irrespective of the other proxies in your application’s specific stack, or the order in which they may appear.

IP Geolocation

As each request comes in, the edge proxy aims to resolve the connecting IP address to a specific geography. These results are then shared with the other proxies in the application’s proxy stack and with the origin web server as HTTP request headers. These include the country code e.g. US for the United States; the country name; the region; the city (if applicable); the latitude, longitude, postal code and for U.S. IPs only, the Nielsen Designated Market area ID as used by DoubleClick. These results, along with the geo-fields, are always logged in the edge-access-log.

Client IP Detection

The edge proxy adds a True-Client-IP request header to every request that comes in. This provides the IP address of the client that connected to us, and is the same IP address used for IP geolocation. The request header can then be used for fraud detection, IP whitelist/blacklisting, rate limiting and logging client usage. We log this field in the edge log as “remote_addr”.

Request Correlation

As part of the edge proxy’s handling of each incoming request, it generates a unique identifier, which is added to the request via a section-io-id HTTP request header. NB The format can change without notice, so it should always be treated as an opaque string. This request header then travels through each proxy in your stack in addition to your application’s origin web server. When the user agent receives the final response via the proxy stack, the edge proxy inserts the same identifier as a section-io-id HTTP response header. This enables easy correlation of log entries across all the proxies in the application stack. The section-io-id request header can also be logged on your origin server to help with diagnostics. Every log from the edge proxy - through your custom stack of proxies - to the last proxy contain a section-io-id for traceability.

For more information on the Edge Proxy, see our related guide.  

Last Proxy

The last proxy communicates with the customer’s origin server, and is the final proxy in the application stack sequence of proxies.

These are the two types of logs we capture from the last proxy:

_type: last_proxy-access-log

The last proxy access log contains fields related to the upstream response we receive from the origin server. This is a really good place to look when determining whether an issue is originating from section.io, or the origin server itself.

Here are examples of the types of upstream fields we receive:

  • upstream_bytes_received
  • upstream_http_cache_control
  • upstream_http_content_type
  • upstream_request_host
  • upstream_response_time_seconds
  • upstream_status

_type: last_proxy-error-log

The last proxy error log contains any errors generated by nginx when communicating with the origin server; for example, it would log an entry if our last proxy was unable to establish a connection with the origin server.

Use Cases for Elasticsearch and Kibana

Debugging & Visibility

The primary use case for Elasticsearch and Kibana at section.io is debugging and the next most common is visibility.

When a customer reports a 500 error, something has gone wrong on the website’s server that needs to be further investigated. If one of our customers asks for our assistance working out what the exact problem is behind the 500 error and supplies a correlated section-io-id from the response headers, we can easily see where that error originated from and the time that it occurred. By searching via the section-io-id, Kibana will parse each document logged, no matter the type and pull all related docs. By doing this, we can determine what in the stack generated the error message. This will tell us if the problem is originating from the customer’s server or if the error is being served from one of the proxies in the customer’s section.io proxy stack.

If we identify a pattern, the logs stored in Elasticsearch will help us identify when the problem started, what the server did in response and what needs to be done as a fix.

We will always help our customers diagnose any problem they report; however, when we get back to them with our response, we like to include the logs that we’ve been looking at to encourage our customers to use Elasticsearch and Kibana themselves - with the ultimate goal of self reliance.

Blog Categories

Interested in articles about a specific topic? Click on a category to see all related content.