ElasticDoctor - Elasticsearch Health Diagnostics

Automate Your Data Lifecycle

Index Lifecycle Management (ILM) automatically manages your indices through their lifecycle - from creation to deletion. Proper ILM policies can reduce storage costs by 60-80% while maintaining optimal performance for different data access patterns.

ILM policies define how your indices transition through different phases based on age, size, or document count. This check analyzes your current ILM configuration, identifies optimization opportunities, and ensures your data management strategy aligns with business requirements and cost objectives.

ILM API Endpoints

ILM Management APIsES 6.6+ (X-Pack)

GET /_ilm/policy - List all ILM policies

GET /_ilm/policy/policy_name - Get specific policy

GET /index_name/_ilm/explain - Index lifecycle status

GET /_ilm/status - ILM operation status

✅ What This Check Analyzes

• Policy configuration and effectiveness
• Phase transitions and timing
• Storage tier utilization
• Rollover and deletion policies
• Index template integration
• Cost optimization opportunities

🔧 ILM Phases

• Hot: Active indexing and querying
• Warm: Querying only, no indexing
• Cold: Infrequent access, compressed
• Frozen: Archived data, searchable snapshots
• Delete: Automated cleanup

ILM Phase Configuration

1. Hot Phase Configuration

{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "7d",
            "max_docs": 100000000
          },
          "set_priority": {
            "priority": 100
          }
        }
      }
    }
  }
}

Rollover Triggers

• Size-based: Index reaches size limit
• Age-based: Time since index creation
• Document count: Number of documents
• Combined triggers: Any condition met

Hot Phase Actions

• High priority for recovery
• Force merge optimization
• Shard allocation to fast storage
• Automatic rollover management

2. Warm Phase Optimization

{
  "warm": {
    "min_age": "30d",
    "actions": {
      "allocate": {
        "number_of_replicas": 1,
        "require": {
          "box_type": "warm"
        }
      },
      "forcemerge": {
        "max_num_segments": 1
      },
      "set_priority": {
        "priority": 50
      }
    }
  }
}

Warm Phase Benefits

• Reduced storage costs
• Optimized for read-only access
• Force merge for better compression
• Move to slower, cheaper storage

Allocation Strategy

• Reduce replica count
• Allocate to warm nodes
• Lower recovery priority
• Optimize for query performance

3. Cold and Frozen Phases

{
  "cold": {
    "min_age": "90d",
    "actions": {
      "allocate": {
        "number_of_replicas": 0,
        "require": {
          "box_type": "cold"
        }
      },
      "searchable_snapshot": {
        "snapshot_repository": "cold-snapshots"
      }
    }
  },
  "frozen": {
    "min_age": "365d",
    "actions": {
      "searchable_snapshot": {
        "snapshot_repository": "frozen-snapshots"
      }
    }
  }
}

Cold Phase (90+ days)

• Zero replicas for maximum savings
• Searchable snapshots for low cost
• Reduced search performance
• Suitable for compliance/archival

Frozen Phase (1+ year)

• Lowest cost storage option
• Snapshot-based access only
• Slow query performance
• Long-term retention

Common ILM Configuration Issues

🚨 Critical: No ILM Policies Configured

Time-series indices are growing indefinitely without lifecycle management, leading to storage bloat and performance degradation.

Implementation Steps:

1. Create ILM policies for each data type
2. Configure rollover triggers based on data patterns
3. Set up data tier allocation
4. Update index templates to use ILM policies
5. Monitor policy execution and adjust as needed

⚠️ Warning: Inefficient Phase Transitions

Indices are staying in hot phase too long or transitioning to cold storage too quickly, impacting performance or costs.

Optimization Actions:

• Analyze access patterns to optimize transition timing
• Adjust rollover thresholds based on actual usage
• Review warm phase duration for query performance
• Consider searchable snapshots for cold data
• Monitor storage costs vs. query performance trade-offs

ℹ️ Info: Policy Execution Delays

ILM policies are experiencing delays in execution, potentially due to cluster resource constraints or configuration issues.

Investigation Steps:

• Check cluster resources during ILM operations
• Review ILM poll interval settings
• Verify node allocation attributes
• Monitor force merge and snapshot operations
• Consider staggering policy execution times

ILM Best Practices

✅ Configuration Best Practices

• Size rollover triggers: 10-50GB per shard
• Age-based transitions: 30d hot, 90d warm, 365d cold
• Use searchable snapshots for cold data
• Force merge in warm phase for compression
• Reduce replicas in cold phase
• Set appropriate recovery priorities

💡 Optimization Tips

• Monitor actual data access patterns
• Adjust policies based on business needs
• Use index templates for consistent application
• Test policies on non-production data first
• Consider custom allocation attributes

❌ Common Mistakes

• Overly aggressive rollover triggers
• Not considering query patterns
• Ignoring storage tier capabilities
• Forgetting to update index templates
• Not monitoring policy execution
• Inadequate testing before production

⚠️ Monitoring Points

• Policy execution success and timing
• Storage costs across tiers
• Query performance after transitions
• Index size and document count trends
• Resource usage during ILM operations

Complete ILM Policy Examples

Log Data Policy

PUT /_ilm/policy/logs-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "7d"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "30d",
        "actions": {
          "allocate": {
            "number_of_replicas": 1,
            "require": {
              "box_type": "warm"
            }
          },
          "forcemerge": {
            "max_num_segments": 1
          },
          "set_priority": {
            "priority": 50
          }
        }
      },
      "cold": {
        "min_age": "90d",
        "actions": {
          "allocate": {
            "number_of_replicas": 0,
            "require": {
              "box_type": "cold"
            }
          },
          "searchable_snapshot": {
            "snapshot_repository": "cold-snapshots"
          }
        }
      },
      "delete": {
        "min_age": "730d"
      }
    }
  }
}

Metrics Data Policy

PUT /_ilm/policy/metrics-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "30gb",
            "max_age": "1d"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "allocate": {
            "number_of_replicas": 0,
            "require": {
              "box_type": "warm"
            }
          },
          "forcemerge": {
            "max_num_segments": 1
          },
          "set_priority": {
            "priority": 50
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "searchable_snapshot": {
            "snapshot_repository": "cold-snapshots"
          }
        }
      },
      "delete": {
        "min_age": "90d"
      }
    }
  }
}

Apply Policy to Index Template

PUT /_index_template/logs-template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1,
      "index.lifecycle.name": "logs-policy",
      "index.lifecycle.rollover_alias": "logs"
    }
  }
}

# Create initial index with alias
PUT /logs-000001
{
  "aliases": {
    "logs": {
      "is_write_index": true
    }
  }
}

Cost Optimization with ILM

📊 Storage Cost Reduction

Typical Savings

• Hot to Warm: 40-50% cost reduction
• Warm to Cold: 60-70% cost reduction
• Cold to Frozen: 80-90% cost reduction
• Overall: 60-80% total storage savings

Optimization Strategies

• Aggressive compression in cold phase
• Searchable snapshots for archival
• Replica reduction in older phases
• Automated deletion of old data

💰 Cost Monitoring

Storage Metrics

• Track storage usage by phase
• Monitor compression ratios
• Calculate cost per GB by tier
• Measure data growth trends

Performance Impact

• Query latency by phase
• Searchable snapshot performance
• Index recovery times
• Resource utilization patterns

Mastering Index Lifecycle Management

Key Benefits

• Automated index management reduces operational overhead
• Significant cost reduction through data tiering
• Improved query performance with optimized allocation
• Compliance with data retention requirements

Implementation Steps

• Analyze current data patterns and access requirements
• Design ILM policies based on business needs
• Test policies thoroughly before production deployment
• Monitor and optimize based on actual usage

Previous: Deprecations Check Next: Data Tiers Check

ILM Policies Check: Index Lifecycle Management Optimization