Never run out of cold brewed coffee again with Elasticsearch alerts

ObjectRocket Marketing

7 years ago

Using Elasticsearch alerts to tell us when we’re “almost empty”

At ObjectRocket, we fuel our office with cold brew coffee on tap. It’s hot here in Austin, TX in August. We need our caffeine cold.

Since we only keep a single keg at a time, scheduling our orders without running out can be tricky, so when someone in the ObjectRocket office mentioned that they “wished we had better data on our cold brew usage” I was all over it. I’m always looking for some new way to show what the Elastic Stack can do, so I set out to create a semi-autonomous monitoring and alerting system so we would never run out of cold brew again.

What we ended up with is a combination of a Raspberry Pi, Elasticsearch, Kibana, and Sentinl/ElastAlert to fire alerts to Slack when something important happens to the keg. Sentinl and ElastAlert were the stars of the show so we’ll dig into those later.

In the first of this two-part series, we’ll explore how I created an alert system with the data captured from our kegerator solution. My hope is that you walk away with some ideas and some knowledge about the open source alerting options for Elasticsearch. I’ll save the details on how I physically built the system for the next post.

Data (not just cold brew) from the Keg

As I noted, we’ll discuss the kegerator build-out in the next blog, and why we chose to use a scale-keg-weight solution instead of building a flow rate monitoring system.

All you need to know for this blog is that I set up a Raspberry Pi-powered scale that sends a regular stream of documents to Elasticsearch in a format like below.


{
  "_index": "filebeat-6.2.4-2018.05.15",
  "_type": "doc",
  "_source": {
    "@timestamp": "2018-05-15T16:50:49.000Z",
    "beat": {
      "hostname": "raspberrypi",
      "name": "raspberrypi",
      "version": "6.2.4"
    },
    "weight": 58.4,
    "message": "2018-05-15T16:50:49+0000 - -0.4"
  }
}

Filebeat grabs the data and includes two main fields that we’re interested in:

the “@timestamp” of each weight reading
the actual “weight” reading at that timestamp

Alerting Options

First things first: to take action when something is up with the keg, I needed a way to send notifications to key persons. That’s where an alerting package comes into play.

Luckily there are a number of Apache 2.0 licensed options out there to add alerting to an Elasticsearch cluster: ElastAlert and Sentinl.

ElastAlert
ElastAlert is a flexible alerting framework for Elasticsearch created by Yelp that runs separately from Elasticsearch and is configured mainly through basic config files. Users can create a main config file with global parameters across all alerts, then create rules files for each rule that contains ElastAlert-specific YAML to configure rules and the resulting alerts. Each “rule” contains:

Rule configuration based on a standard list of rules (like “flatline”, “spike”, and “metric aggregation”)
Elasticsearch filters to limit the query hits each rule uses
An alert to fire when the rule portion is positive (like email, Jira, Slack, etc.)

Sentinl
Sentinl is a bit newer than ElastAlert. ObjectRocket stumbled across Sentinl—created by Siren Solutions—while looking for new alerting options for customers on the ObjectRocket Service. Sentinl is a Kibana plugin that offers reporting in addition to alerting for Elasticsearch. It calls each job a “watcher” (no to be confused with an older plugin of the same name) including:

A schedule (how often and when to check)
An input query (an Elasticsearch query for the data you want to check)
A condition (logic that determines whether to alert on the data)
A transform (code to modify the data before alerting)
Actions (what to do when you alert like Slack, Email, or webhook)

Which Alerting Option is Right for You?

Both are really great, so it just depends on what types of alerts you’re looking for. Here are my findings:

	ElastAlert	Sentinl
Pros	Quicker to ramp on with predefined rules Larger number of baked in alerting options Helpful testing tools included	Super-Flexible – If you can query it in Elasticsearch, you can alert on it Slick UI that integrates with Kibana Also includes reporting
Cons	Less flexible out of the box, since it uses predefined rules Runs separately (not a plugin) No official UI (though an awesome plugin from bitsensor exists)	Writing rules requires more “code” Fewer alerting options Since it’s a plugin, Kibana version baggage

Creating the Alerts

Now I need to define what types of alerts we need to create. What I ultimately want to know was when we should order more cold-brew. I needed to create alerts that told us:

Keg Empty / Nearly empty
Pretty simple: report if the weight has fallen below a certain threshold.

Keg has been replaced
Aha, there’s fresh coffee: detects when there is a sharp increase in weight.

Monitoring is broken
No data, no alerts: we need to know when the scale has stopped sending data

Keg Empty / Nearly Empty

In this scenario, I set an alert if the weight is below the designated threshold deemed as “empty”, and warn us if the weight indicates that there’s about 20% of the keg remaining.

ElastAlert
ElastAlert makes this pretty easy with the metric aggregation rule type: create an aggregation from a metric and then determine whether it’s above or below a specific threshold.


# (Required)
# Rule name, must be unique
name: Empty Alarm

# (Required)
# Type of alert.
type: metric_aggregation

# (Required)
# Index to search, wildcard supported
index: filebeat-*

# How much data should we use
buffer_time:
  hours: 1

# How often can we send this alert?
realert:
  hours: 24

# Type of elasticsearch document to use
doc_type: doc

metric_agg_key: weight
metric_agg_type: max
min_threshold: 41

# (Required)
# The alert is use when a match is found
alert:
- "slack"
alert_subject: The cold brew keg is empty
alert_text_type: alert_text_only
alert_text: "The cold brew keg is empty. Panic."
slack:
slack_webhook_url: "https://hooks.slack.com/services/foo/bar"
slack_msg_color: danger
slack_emoji_override: ":torch-and-pitchfork:"

I set it up to look over the past hour (buffer_time) for a max weight below 41 (an empty keg) and only want to send the alert once every 24 hours. Then, I can configure Slack’s alerts, the type of data to send in the alerts, and even specifics about the color and emoji used.

It gets a little trickier on the warning alerts. I only want to know when it’s within a warning threshold: above 41 (empty) but below 65 (25% mark to start warning that we’re almost out of cold brew). Since ElastAlert only gives a single threshold, I use their filters to fix this.


metric_agg_key: weight
metric_agg_type: avg
min_threshold: 65

filter:
- range:
    weight:
      gte: 41

I filter to evaluate weights greater than or equal to 41 because if all weights were below 41, I don’t want this alert to trigger (since the “empty” alert will). However, if there are some weights equal to 41 or greater the “empty” alert will not trigger, and I can determine whether the average meets our less than 65 requirement.

Sentinl
Sentinl provides an easy-to-use GUI for configuring the watcher.

On the first screen, just name your watcher and set the schedule.

The Input is an Elasticsearch query to grab the data you want: any docs that include weight and creating an average aggregation on the weight field.

In the condition screen, determine what triggers an alert: at least some hits must be returned and average weight must be below 41.

Here’s a console alert (which also stores a document in a special index) that fires off a message when the condition is true.

Sentinl comes with a UI, but you need to write (slightly) more queries. However, they are mostly just Elasticsearch queries, so chances are you’re probably comfortable with them.

Keg has been refilled

Now I need a way to send a general announcement to the office that the keg has been refilled. Since we sometimes receive different sized kegs, rather than just looking for a level above a certain threshold, I wanted to look for a sharp increase in weight.

ElastAlert
ElastAlert includes a built-in called “spike” to determine an upward increase in weight of the keg:


name: Refill Detector
type: spike
index: filebeat-*

field_value: weight

spike_height: 2

spike_type: 'up'

timeframe:
  minutes: 10

threshold_ref: 100
threshold_cur: 100

realert:
  hours: 6

This rule looks at the values in the weight field and buckets them into a 10-minute windows. If the mean of the weight for any window is greater than 2x (spike_height) than that of the previous 10-minute window, I want to fire an alert. Also, note that for a 10-minute window to be “valid” it must have at least 100 samples (the threshold_ref and threshold_cur settings) and that the alert can’t fire more often than every 6 hours.

Sentinl
On the Sentinl side, it’s a little more complicated, but we used the serial difference aggregation to determine whether we see any buckets that include a huge increase over the previous buckets:


{
  "input": {
    "search": {
      "request": {
        "index": [
          "filebeat-*"
        ],
        "body": {
          "query": {
            "bool": {
              "filter": [
                {
                  "range": {
                    "@timestamp": {
                      "gt": "now-5m/m"
                    }
                  }
                }
              ]
            }
          },
          "aggs": {
            "30s_buckets": {
              "date_histogram": {
                "field": "@timestamp",
                "interval": "30s"
              },
              "aggs": {
                "weight_avg": {
                  "avg": {
                    "field": "weight"
                  }
                },
                "weight_diff": {
                  "serial_diff": {
                    "buckets_path": "weight_avg",
                    "lag": 3
                  }
                }
              }
            },
            "max_weight_diff": {
              "max_bucket": {
                "buckets_path": "30s_buckets>weight_diff"
              }
            }
          }
        }
      }
    }
  },
  "condition": {
    "script": {
      "script": "payload.hits.total > 20"
    },
    "compare": {
      "payload.aggregations.max_weight_diff.value": {
        "gte": 40
      }
    }
  }
}

This “watcher” scans through the last 5 minutes, separate into 30s buckets, then looks to see if the difference across 3 buckets is an increase of at least 40. Once again, it’s a little more complicated than the ElastAlert equivalent, but it works pretty well.

I haven’t shown the third alert, that looks for a broken data feed, here, but in both cases it’s a very simple alert. ElastAlert’s flatline handles it nicely and Sentinl is as simple as grabbing all weight readings in a window and counting the hits. If anyone wants to see the code, let us know.

Using the Alert Data

A bonus of both alerting systems is that they feature the ability to query the actual alerts, since a log of alerts is stored in Elasticsearch, allowing you to create visualizations based on the alerts.

Refills

I use the refill alerts to plot out when we’ve seen a refill on our Kibana dashboard. Rather than looking at the weight data to create a visualization of last refill time, I can just look at the last refill alert for my visualization. Sentinl, for example, stores all alerts in local indexes called “watcher_alerts-*” based on date.

To create a visualization on the last refill, filter the events by the “watcher” field, so you only get refill events, then select the max date.

Daily Reporting

Another great option is that you can use a regular alert to create behavior similar to a roll-up that you can then report on later.

For example: if I want to see the amount of daily consumption, I look at decreases in weight from day to day, filter out large spikes (due to refills or other events), and then convert weight to ounces.

This can be a little tricky with a single Kibana visualization. (However, Vega looks promising.)

So instead, I created a daily alert that runs every night before midnight and creates a sum of consumption for that day. Then I create a simple visualization that reads those daily sums from the “watcher_alerts” indexes.

The Final Results

Once I completed the installation, Slack now tells us when to order more cold brew kegs. And it’s not just sent to IT or admins, or some other select few individuals. The entire office now knows when a new keg arrives.

No more cold sweats for cold brew here: staff happiness and productivity are at an all-time high.

Elasticsearch + Sentinl/ElastAlert alerting FTW!