No one wants to get paged in the middle of the night for an issue or failure within their infrastructure. When this does happen, IT operations engineers need to be able to quickly and confidently identify where the fire is and how to put it out to minimize negative impact. The root cause analysis (RCA) feature within LogicMonitor’s new AIOps Early Warning System makes this easier than ever. RCA intelligently identifies the root cause when an issue occurs, enabling IT operations engineers to focus on solving the issue quickly instead of searching for it. With LogicMonitor’s ability to monitor pretty much anything (e.g. cloud, containers, network, servers, storage, virtualization, etc.), this means that RCA can help reduce downtime even for complex hybrid infrastructures.
RCA uses automatically discovered topology relationships between monitored resources to establish dependencies between those resources. When a monitored resource becomes unreachable, those dependencies are used to identify the root cause and impacted dependent resources. Alert notifications routed to IT operations engineers are limited to that of the root cause, and include information about impacted dependent resources, preventing the typical alert storm these engineers would normally receive for such a scenario. By preventing alert storms that obscure root cause, RCA helps speed issue to resolution time and minimize downtime.
Normally this would result in dozens of alerts for all unreachable devices, but with RCA the originating cause alert is identified and dependent alerts are grouped and alert notification for dependents are disabled.
While the initial release relies on the reachability of monitored resources, future enhancements will allow resource dependencies to be more granularly configured beyond LogicMonitor provided defaults.
So what does RCA have to do with automated remediation? Identifying the root cause is only the beginning; we want to provide the ability to automate actions that remediate the root cause issue. This will close the loop from intelligently identifying and predicting issues to automatically fixing and preventing them, and not only further reduce downtime but also save IT operations teams valuable time, enabling them to spend more time innovating and less time reacting to problems. RCA is just one step in this direction, but with this first step, we aim to cut through the noise and give you more nights of uninterrupted sleep. Reach out to the LogicMonitor team more information or to get started on a free trial!
Subscribe to our blog
Get articles like this delivered straight to your inbox