Back to Blog
Health Checks - Infrastructure

Hot Threads Check: Performance Bottleneck Detection and CPU Analysis

Identify performance bottlenecks through thread analysis, resolve high CPU usage issues, and optimize Elasticsearch operations for maximum efficiency.

December 5, 2024
17 min read
ElasticDoctor Team

CPU Detective Work

When your cluster is running hot and CPU usage is high, hot threads analysis reveals exactly which operations are consuming resources. This check is your performance debugging superpower.

High CPU usage can be mysterious - you know something is wrong, but what exactly is consuming resources? The hot threads check provides a real-time snapshot of the most CPU-intensive threads, revealing specific operations, queries, or processes that are causing performance issues.

What You'll Learn

Performance Analysis

  • • How to read hot threads output
  • • Identifying CPU-intensive operations
  • • Understanding thread states and stack traces
  • • Correlating threads with cluster operations

Troubleshooting Skills

  • • Debugging search performance issues
  • • Resolving indexing bottlenecks
  • • Optimizing cluster operations
  • • Preventing CPU exhaustion

Hot Threads Analysis Best Practices

✅ Effective Analysis

  • • Capture hot threads during performance issues
  • • Use multiple snapshots for accurate sampling
  • • Correlate with application and query logs
  • • Focus on threads with >80% CPU usage
  • • Look for patterns across multiple nodes

💡 Investigation Tips

  • • Compare hot threads before and during issues
  • • Check thread pool queue sizes and rejections
  • • Monitor GC activity during high CPU periods
  • • Use profiling tools for deeper analysis
  • • Document findings for pattern recognition

❌ Analysis Mistakes

  • • Only checking hot threads after problems occur
  • • Using too few snapshots for accuracy
  • • Ignoring stack trace context
  • • Not correlating with other metrics
  • • Focusing only on highest CPU threads

⚠️ When to Investigate

  • • CPU usage consistently >80%
  • • Query latencies increasing
  • • Thread pool rejections occurring
  • • Cluster response times degrading
  • • User-reported performance issues

Performance Debugging Mastery

Diagnostic Power

  • Precise Identification: Pinpoint exact CPU-intensive operations
  • Real-time Insight: Understand what's happening right now
  • Actionable Intelligence: Connect findings to specific optimizations
  • Pattern Recognition: Identify recurring performance issues

Action Plan

  • • Set up automated hot threads monitoring
  • • Create performance investigation procedures
  • • Establish correlation with application metrics
  • • Build performance optimization playbooks