How To Monitor AWS Elasticsearch

Elasticsearch provides quick search experiences for your applications, websites, and data lake catalogs. Learn key metrics to monitor and more!

3 min read

June 22, 2020

Subscribe to our newsletter

Get the latest blogs, whitepapers, eGuides, and more straight into your inbox.

What Is Elasticsearch?

Elasticsearch offers a way to provide quick search experiences for your applications, websites, and data lake catalogs. It can also be used to monitor and collect logs from your infrastructure and applications. Elasticsearch is AWS’s service based on Elastic’s open-source Elasticsearch coined as an “open-source, distributed, RESTful search engine”. It is designed to provide fast, relevant, and scalable searching for your data.

LogicMonitor Banking Services Map dashboard

AWS Cloudwatch Metrics

Elasticsearch publishes data points to Amazon CloudWatch for your Elasticsearch instances. CloudWatch enables you to retrieve statistics about those data points as an ordered set of time-series data, known as metrics. For Elasticsearch service, Amazon listed a few basic metrics and their Recommended CloudWatch Alarms.

Why Should I Add My Own Metrics?

LogicMonitor introduced Complex Datapoints to try to bring understanding around metrics value over time or as percentages. Tracking available capacity using raw values such as ClusterUsedSpace is a hard task. A complex datapoint that calculates the percentile is much more meaningful and easier to alert on.

ClusterUsedSpacePercent = (ClusterUsedSpace/(ClusterUsedSpace + FreeStorageSpace))*100

CloudWatch enables you to retrieve https errors codes 4xx and 5xx as raw values. Consider the next question: Is the value 2 for the 5xx metric good or bad? And how good (or bad) is it? Usually, when I present this question, the person would say, “It depends how many requests”. In this case, too, a complex datapoint that calculates the percentile is much more meaningful and easier to alert on.

RequestCount5xxPercent = 5xx / (5xx + 4xx + 3xx + 2xx) if (5xx + 4xx + 3xx + 2xx) > 0 else 0

RequestCount4xxPercent = 4xx / (5xx + 4xx + 3xx + 2xx) if (5xx + 4xx + 3xx + 2xx) > 0 else 0

What Are the Key Metrics?

Elasticsearch cluster monitoring dashboard in LogicMonitor

ClusterUsedSpacePercent (Complex Datapoint)

A complex datapoint that calculates the percentage of the cluster space that is used. By default, LogicMonitor recommends a warning at 85% used and an error at 95% used.

ClusterUsedSpacePercent = (ClusterUsedSpace/(ClusterUsedSpace + FreeStorageSpace))*100

CPUUtilization

The average maximum percentage of CPU resources used for data nodes in the cluster. LogicMonitor recommends a warning at 85% utilization and an error at 95% utilization.

ClusterStatusRed

This indicates that the primary and replica shards of at least one index are not allocated to nodes in a cluster. LogicMonitor recommends an error if this value is not 0.

ClusterStatusYellow

This indicates that the primary and replica shards of at least one index are not allocated to nodes in a cluster. LogicMonitor recommends an error if this value is not 0. Please note that if you do not have replication for your Elasticsearch instances you will want to edit this metric to not alert you.

JVMMemoryPressure

This metric shows the maximum percentage of the Java heap used for all data nodes in the cluster. LogicMonitor recommends a warning if this value is above 80.

What Should I Enable Anomaly Detection Dynamic Threshold On?

While using Dynamic threshold and Static thresholds for key metrics is expected, there are other use-cases.

Example 1

Enabling Anomaly Detection for Query and Fetch Latency is a way to denote the potential degradation of a service and a way to have advanced warning when there is a shift in expected operation.

Detecting potential degradation in LogicMonitor using anomaly detection.

Example 2

Enabling Anomaly Detection for Query and Fetch count is a way to denote potential abnormal load on the system (or onboarding new customers).

Users can compare current load with last month to denote additional anomalies in LogicMonitor

Disclaimer: The views expressed on this blog are those of the author and do not necessarily reflect the views of LogicMonitor or its affiliates.

Related Blogs

Blog AIOps & Automation

Platform

Infrastructure

Cloud & Multi-Cloud

Logs

AIOps & Edwin AI

Digital Experience

Solutions

Business Outcome

Role

Industry

Resources

By Resources

By Topic

Learn the Platform

2026 The Year of Autonomous IT

Company

About Us

How To Monitor AWS Elasticsearch

In this article

NEWSLETTER

Subscribe to our newsletter

Thank you!

In this article

What Is Elasticsearch?

AWS Cloudwatch Metrics

Why Should I Add My Own Metrics?

What Are the Key Metrics?

What Should I Enable Anomaly Detection Dynamic Threshold On?

Example 1

Example 2

Related Blogs

Self-Healing ITOps: Close the Loop From Detection to Resolution

Best Network Monitoring Tools in 2026: Compare Top Platforms

When World Cup Traffic Spikes in Mexico, Can You See Where the Internet Breaks?

Platform

Infrastructure

Cloud & Multi-Cloud

Logs

AIOps & Edwin AI

Digital Experience

Solutions

Business Outcome

Role

Industry

Resources

By Resources

By Topic

Learn the Platform

2026 The Year of Autonomous IT

Company

About Us

How To Monitor AWS Elasticsearch

In this article

NEWSLETTER

Subscribe to our newsletter

Thank you!

SHARE

In this article

What Is Elasticsearch?

AWS Cloudwatch Metrics

Why Should I Add My Own Metrics?

What Are the Key Metrics?

What Should I Enable Anomaly Detection Dynamic Threshold On?

Example 1

Example 2

Related Blogs

Self-Healing ITOps: Close the Loop From Detection to Resolution

Best Network Monitoring Tools in 2026: Compare Top Platforms

When World Cup Traffic Spikes in Mexico, Can You See Where the Internet Breaks?

14-day access to the full LogicMonitor platform