Introduction to the Elastic Stack - Problem(x) Solutions

This post will describe the Elastic Stack, also and formerly the ELK Stack, and its individual components. In a follow-up post, I’ll demonstrate how to get the ELK Stack up and running.

What is the ELK Stack?

The ELK Stack consists of Elasticsearch, Logstash and Kibana developed by Elastic. Later the company came out with Beats and other features. As they continued to expand capabilities it became the Elastic Stack.

The components in the stack are complimentary to each other. This means Logstash can be used to pipe in and process data from multiple sources into Elasticsearch, where it can be stored and indexed. Data in Elasticsearch can then be analyzed and visualized using Kibana. Beats, which is a lightweight data shipper, works to transport data into either Logstash or Elasticsearch.

Image Source: https://www.elastic.co/what-is/elk-stack

What is Elasticsearch?

Elasticsearch is a search and analytics engine plain and simple. It stores and indexes the data exceptionally well.

Elasticsearch works for all types of data including text, numeric, geospatial, structured and unstructured. This makes it highly versatile for many projects, analytics, and applications.

One of its key standout specialties is indexing data and providing an excellent search capability across your stored data. It is a RESTful service and JSON-based. The application can can also be implemented in a distributed architecture (local, on-premise or cloud), which promotes speed and scalability.

Elasticsearch stores its indexes and data in JSON documents, defined by keys and their corresponding values. Values can be strings, numbers, booleans, dates, arrays of values, geospatial coordinates and more. See official documentation for more details.

It can consume data from a variety of sources, though data should be processed from its raw original form. This means any parsing, normalization and enhancements should be factored into your pipeline. This is where tools like Logstash can help.

When Elasticsearch is populated, the contents are indexed. Its indexing methodology facilitates high-performance queries against all the available data. It was engineered to employ an inverted index data structure, which is designed to provide fast full-text searches. You can also do aggregations to create summaries of your data. This makes it somewhat comparable to a data warehouse or data mart.

For more information on Elasticsearch, please refer back to their site: https://www.elastic.co/elasticsearch/. You can find a full list of feature here: https://www.elastic.co/elasticsearch/features.

Is Elasticsearch a Database?

Elasticsearch is not explicitly branded as a database or data warehouse, but it certainly can be used as one. The intended use case would be to use Elasticsearch as an indexing back-end to some other database system (think PostgreSQL, Redis, Cassandra and MongoDB). Given its indexing features and the ability to search the data, it makes sense to use as a data warehouse or data mart since its not as necessary to update records in those instances. When used in conjunction with Kibana, you get a powerful business intelligence and data visualization combination.

If you are working on a local project for school or a small work project, maybe it makes sense to leverage Elasticsearch as your database. Due to it being open source and free, it can save you money. This will should speak loudly to college students and data hobbyists as well as helping your professional development.

You can learn quite a bit due to the amount of available resources and documentation as well.

What is Logstash?

Logstash is a data ingestion pipeline tool that operations server-side. It can process data from multiple data sources and transform it before directing its outputs to some storage medium, though it makes sense to leverage Elasticsearch.

Logstash accepts a multitude of data formats and structures. It’s part of its dynamic ingestion capabilities. Using grok, Logstash can put structure into unstructured data, decipher geospatial information from IP addresses, and sanitize or exclude sensitive data. You can ingest logs, metrics, web application data, databases, and various AWS services.

Data can be streamed continuously regardless of source. It is also capable of handling multiple sources simultaneously as well.

You can define what and how your data gets filtered, parsed and transformed. This ETL process and functionality of Logstash facilitates working with structured, semi-structured and unstructured data in many different formats.

While Elastic would suggest porting your data into Elasticsearch, you have the flexibility to route the data to a number of different storage mediums or services. They provide a list of output plugins available here.

Elastic also indicates that its relatively easy to extend Logstash if you are working on something custom or unique that isn’t available…yet. You can monitor and visualize the activity and performance of your Logstash operations in Kibana.

What is Kibana?

Kibana is a real-time interactive data visualization tool that allows users to create visualizations and dashboards. It also provides the front-end interface to navigate and query your data.

The visualizations are easy to create and the tools are intuitive. Drag and drop fields where you think they should be. Modify your chart to see in in a different format. They even have a tool to suggest chart types based on your data.

You can filter your query through selections, as well as see where data corresponds among different charts.

Build dashboard from multiple indexed sources for content rich graphics. You had the ability to export your visualizations to PDF and PNG,as well a the in CSV. You can also embed or share via link.

Unlike Tableau or some other BI Visualization took, you aren’t going to use Kibana without your data first being in Elasticsearch. All-in-all it’s a nice ecosystem that requires little programming to get up and running on your local machine to tinker around or use for a data project.

How do I use and implement the ELK stack?

First, I will demonstrate getting started with the ELK stack in a separate post. Then, I’ll further demonstrate using the stack to ingest, process, store and visualize crime data I gathered in my Criminal Analysis project. In future projects, I’ll also try to demonstrate more capabilities as I continue to explore and apply the tools available.

Supported Programming Languages

Here is a list of currently supported languages for interacting with the ELK Stack:

Java
JavaScript (Node.js)
Go
.NET (C#)
PHP
Perl
Python
Ruby

Although not stated on their page, you can find resources to interact with Elasticsearch in R. Here are links to those packages:

https://github.com/ropensci/elastic
https://github.com/AlexIoannides/elasticsearchr

Official References

For more information, please visit the Elastic site:

https://www.elastic.co/
Elasticsearch
- https://www.elastic.co/elasticsearch/
- https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
Logstash
- https://www.elastic.co/logstash/
- https://www.elastic.co/guide/en/logstash/current/index.html
Kibana
- https://www.elastic.co/kibana/
- https://www.elastic.co/guide/en/kibana/current/index.html
Beats
- https://www.elastic.co/beats/