Health Checks Reference

Comprehensive guide to all 22 health checks performed by ElasticDoctor

Overview

ElasticDoctor performs 22 comprehensive health checks across four critical categories. Each check is designed to identify potential issues before they impact your cluster's performance or availability.

5
Cluster Health
6
Performance
5
Security
6
Operations

Severity Levels

Each check is classified by severity: Critical issues require immediate attention, Warnings should be addressed for optimal performance, and Informational items provide insights.

Health Check Categories

Cluster Health(5 checks)

Cluster Status

Monitors overall cluster health (green/yellow/red)

CriticalImpact: High

Node Availability

Checks if all nodes are online and reachable

CriticalImpact: High

Shard Allocation

Verifies proper shard distribution across nodes

WarningImpact: Medium

Unassigned Shards

Identifies shards that cannot be allocated

CriticalImpact: High

Cluster Settings

Reviews cluster-level configuration settings

InfoImpact: Low

Performance(6 checks)

Query Performance

Analyzes query response times and throughput

WarningImpact: Medium

Indexing Performance

Monitors indexing speed and efficiency

WarningImpact: Medium

Memory Usage

Checks heap memory utilization across nodes

WarningImpact: High

Disk Usage

Monitors disk space usage and growth trends

CriticalImpact: High

CPU Utilization

Tracks CPU usage patterns and bottlenecks

WarningImpact: Medium

JVM Performance

Analyzes JVM metrics and garbage collection

WarningImpact: Medium

Security(5 checks)

Authentication

Verifies authentication mechanisms are enabled

CriticalImpact: High

Authorization

Checks role-based access control configuration

CriticalImpact: High

SSL/TLS Configuration

Validates encryption in transit settings

CriticalImpact: High

Audit Logging

Ensures security events are being logged

WarningImpact: Medium

Network Security

Reviews network binding and firewall settings

WarningImpact: Medium

Operations(6 checks)

Backup Configuration

Validates snapshot and backup settings

CriticalImpact: High

Index Management

Reviews index lifecycle policies and settings

WarningImpact: Medium

Monitoring Setup

Checks if proper monitoring is configured

InfoImpact: Low

Log Configuration

Verifies logging levels and output settings

InfoImpact: Low

Version Compatibility

Identifies version-specific issues and recommendations

WarningImpact: Medium

Plugin Management

Reviews installed plugins and their configurations

InfoImpact: Low

Understanding Severity Levels

Critical

Issues that require immediate attention to prevent data loss, service disruption, or security breaches.

Examples:

  • • Red cluster status
  • • Unassigned shards
  • • Disabled authentication
  • • Missing backups

Warning

Issues that should be addressed to maintain optimal performance and prevent future problems.

Examples:

  • • High memory usage
  • • Slow query performance
  • • Suboptimal settings
  • • Version compatibility

Informational

General insights and recommendations for best practices and optimization opportunities.

Examples:

  • • Configuration recommendations
  • • Best practice suggestions
  • • Optimization opportunities
  • • Version upgrade paths

How to Interpret Results

1

Review Critical Issues First

Start with critical issues as they pose the highest risk to your cluster's stability and data integrity.

2

Prioritize by Impact

Focus on issues with high impact that affect multiple nodes or the entire cluster.

3

Follow Remediation Steps

Each check includes specific remediation steps and best practices for resolution.

4

Monitor Progress

Re-run diagnostics after making changes to verify improvements and track your cluster's health over time.

Next Steps

Need Help Understanding Results?

Our team can help you interpret health check results and provide guidance on remediation strategies.