Is your Collector Status healthy? 6 steps to resolve issues fast

Learn how the LogicMonitor Collector works, and quick tips on how to bring downed Collectors back up with six troubleshooting steps.

Help! My Collector is Down: Troubleshoot in 6 Steps

At the heart of LogicMonitor’s monitoring solution is the LogicMonitor Collector, a crucial application that gathers device data and sends it to the LogicMonitor platform. This real-time monitoring feature tracks the health and performance of Collectors and ensures continuous data collection by sending alerts about potential issues before they escalate. When issues arise, understanding the Collector Status is key to quickly resolving them. 

This guide walks through steps for troubleshooting issues related to the Collector Status, ensuring that the monitoring setup remains reliable and effective.

Key takeaways

Checkmark
Ensuring the LogicMonitor Collector and Watchdog services are running is critical for data flow.
Checkmark
Correct credentials and permissions and connectivity with the LogicMonitor servers are essential for the designated Collector to communicate with devices.
Checkmark
Antivirus software can interfere with the Collector, so ensure necessary exclusions are set.
Checkmark
Setting up resilient monitoring, like backup Collectors or Auto-Balanced Collector Groups, helps prevent disruptions to continue receiving real-time insights into Collector health and performance.

What is Collector Status?

Collector Status provides real-time insights into the health and performance of LogicMonitor Collectors. It tracks essential metrics such as CPU load, memory usage, and network connectivity, sending notifications to users about potential issues before they escalate into major problems. Regular monitoring of the Collector Status prevents downtime, optimizes performance, ensures continuous data collection, and gives the ability to personalize solutions.

Collector Status is the first line of defense in identifying and solving monitoring issues.

Step 1: Check the Collector and Watchdog services

The first step in troubleshooting is to validate that the LogicMonitor Collector and Watchdog services are running properly on the host machine. These services are essential for maintaining communication between devices and the LogicMonitor platform. If either service is down, the status of the Collector will reflect this, and gaps in monitoring data may become apparent.

  • Action: Verify that both services are active by checking the status on the host machine. If they are not running, attempt to restart them. If the services fail to start, investigate further by checking operating system logs or updating the services.

Learn more about troubleshooting and managing Collector services.

Step 2: Verify credentials and permissions

Incorrect credentials or insufficient permissions can cause the Collector to fail to communicate with your monitored devices, which will be reflected in the Collector Status. This is a common issue, particularly in Windows environments.

  • Action: Ensure that the credentials the Collector and Watchdog services use have the correct permissions. The Collector service should have “Log on as a service” rights under the Local Policy/User Rights Assignment settings in the host OS. If not using an account on the same domain, ensure the local administrator credentials are correct by verifying wmi.user and wmi.password properties in LogicMonitor. This will help maintain a healthy Collector Status.

Step 3: Check the Collector connection to LogicMonitor servers

A common reason for a degraded Collector Status is connectivity issues. The LogicMonitor Collector needs to connect to LogicMonitor’s cloud servers over port 443 using HTTPS/TLS. If this connection is interrupted, the Collector cannot send data, and monitoring will be disrupted.

  • Action: Test the connectivity from the Collector host to LogicMonitor’s cloud servers. Do this by accessing the LogicMonitor portal from a web browser on the Collector host. Ensure that firewall rules and whitelists (if using IP address whitelisting instead of DNS) are up to date to allow traffic over port 443. I
    Get detailed instructions on monitoring Collector connectivity and health.

Understanding how the Collector communicates with LogicMonitor is key to resolving downtime quickly.

Step 4: Review antivirus software settings

Antivirus software can sometimes interfere with the Collector’s operation by blocking necessary files or processes. This can lead to a poor Collector Status as the Collector may not be able to perform its functions correctly.

  • Action: Check antivirus software settings and ensure the LogicMonitor directory is added as an exclusion (C:\Program Files (x86)\LogicMonitor\ by default). This will prevent the antivirus from blocking the Collector’s operations, helping to maintain a positive Collector Status.

Step 5: Monitor Collector health with Collector Status

The Collector Status in LogicMonitor is the primary tool for monitoring the health and performance of Collectors. Regularly reviewing the Collector Status can help to identify potential issues, such as high CPU load, memory overuse, or connectivity problems, before they lead to downtime.

  • Action: Regularly check the Collector Status in the LogicMonitor portal. Look for any warning or error messages related to load, memory, or failed polls, and address them promptly to keep monitoring infrastructure running smoothly.

Explore LogicMonitor’s guide to best practices for optimizing Collector performance.

The Collector Status Option when managing a collector can help troubleshoot collector issues.

Collector Status is a great place to check on Collector health. It can indicate potentially problematic load issues and LogicModules with abnormally high numbers of failed polls. 

The top of the Collector Status gives a quick overview of the status of the varying metrics that make it up. Warning and Error status items should be investigated further.
The various metrics that make up Collector Status can indicate potential load related problems before they become a problem. These change color to indicate potential problems and contain helpful messages.

Collector Status is not intended to provide a complete view of Collector performance but is an excellent tool for quickly identifying the source of issues. It offers several features that help IT teams quickly pinpoint problems and get an overview of a Collector’s overall health:

  • Highlighted issues: Instantly find issues that point to an area of concern that may impact the Collector’s health.
  • Configuration check: Find potential issues with Collector configuration that may impact performance.

The Collector also tracks restarts and errors reported by Watchdog, which is very useful when looking for patterns that indicate problems.

Collector Events for a healthy Collector showing it’s daily restart and credential rotation.

Step 6: Set up resilient monitoring

To further protect the monitoring setup, consider implementing resilient monitoring strategies. This includes setting up a backup Collector or using an Auto-Balanced Collector Group to distribute the monitoring load across multiple Collectors. This helps maintain a healthy Collector Status and ensures that monitoring continues without interruption, even if one Collector goes down.

  • Action: Evaluate the current monitoring setup and determine if adding a backup Collector or implementing Auto-Balanced Collector Groups would benefit the environment. These steps can significantly reduce the risk of downtime and improve your overall monitoring resilience.

LogicMonitor’s article, Collector Capacity, offers a broader understanding of how Collectors handle workloads.

Maintain a healthy Collector Status

Understanding and regularly checking the Collector Status ensures that LogicMonitor Collectors are performing optimally and providing continuous and reliable monitoring for IT infrastructures. Implementing the steps outlined in this troubleshooting guide can help resolve issues that arise and guide the setup of a resilient monitoring system that protects against future problems.

Subscribe to our blog

Get articles like this delivered straight to your inbox