Smart Storage Management
Data tiers allow you to optimize storage costs and performance by automatically moving data through different storage classes based on access patterns. Proper tier configuration can reduce storage costs by 60-80% while maintaining query performance for active data.
The data tiers check evaluates your storage tier configuration, node allocation, and data distribution patterns. It identifies cost optimization opportunities, performance bottlenecks, and ensures your data placement strategy aligns with access patterns and business requirements.
Data Tiers API
Data Tier APIsES 7.10+
GET /_nodes/stats
- Node tier informationGET /_cat/allocation?v
- Shard allocation by tierGET /_ilm/policy
- ILM tier transitionsGET /_cluster/settings
- Tier routing settings✅ What This Check Monitors
- • Node tier configuration and roles
- • Data distribution across tiers
- • ILM policy effectiveness
- • Storage cost optimization
- • Performance by tier
- • Capacity utilization
🏗️ Tier Architecture
- • Hot: Active data, fast SSD storage
- • Warm: Occasionally accessed data
- • Cold: Rarely accessed, searchable
- • Frozen: Archive, searchable snapshots
- • Content: General purpose data
Tier Configuration and Management
1. Hot Tier (data_hot)
# Hot tier node configuration node.roles: ["data_hot", "master"] # Index template for hot tier { "index_patterns": ["logs-*"], "template": { "settings": { "index.routing.allocation.include._tier_preference": "data_hot", "index.number_of_shards": 1, "index.number_of_replicas": 1, "index.refresh_interval": "1s" } } }
Characteristics
- • Fastest storage (NVMe SSD)
- • High CPU and memory
- • Real-time indexing and search
- • Most expensive per GB
Use Cases
- • Recent logs and metrics
- • Active application data
- • Real-time analytics
- • Frequently searched data
2. Warm Tier (data_warm)
# Warm tier transition policy { "policy": { "phases": { "hot": { "actions": { "rollover": { "max_size": "50GB", "max_age": "7d" } } }, "warm": { "min_age": "7d", "actions": { "allocate": { "require": { "_tier_preference": "data_warm" }, "number_of_replicas": 0 }, "forcemerge": { "max_num_segments": 1 } } } } } }
Characteristics
- • Standard SSD storage
- • Reduced replica count
- • Force-merged segments
- • 50% cost savings
Optimizations
- • Reduce replicas to 0 or 1
- • Force merge to 1 segment
- • Longer refresh intervals
- • Compress source data
3. Cold Tier (data_cold)
# Cold tier configuration { "cold": { "min_age": "30d", "actions": { "allocate": { "require": { "_tier_preference": "data_cold" }, "number_of_replicas": 0 }, "readonly": {}, "forcemerge": { "max_num_segments": 1 } } } }
Characteristics
- • Cheaper storage (HDD/slower SSD)
- • Read-only access
- • No replicas typically
- • 80% cost savings
Optimizations
- • Mark indices as read-only
- • Single segment per shard
- • Maximum compression
- • Minimal compute resources
4. Frozen Tier (data_frozen)
# Frozen tier with searchable snapshots { "frozen": { "min_age": "365d", "actions": { "searchable_snapshot": { "snapshot_repository": "my_repository", "force_merge_index": true } } } }
Characteristics
- • Object storage (S3, GCS, Azure)
- • Searchable snapshots
- • Very slow query performance
- • 95% cost savings
Best Practices
- • Use for compliance/archive data
- • Configure cache settings
- • Optimize snapshot repository
- • Plan for query latency
Tier Performance Analysis
📊 Performance Characteristics by Tier
~1ms
Hot Tier
Query latency
~10ms
Warm Tier
Query latency
~100ms
Cold Tier
Query latency
~1-10s
Frozen Tier
Query latency
🎯 ElasticDoctor Tier Analysis
Distribution Analysis
- • Evaluates data distribution across tiers
- • Identifies misallocated indices
- • Calculates cost optimization opportunities
- • Monitors tier utilization patterns
Performance Optimization
- • Analyzes query patterns by tier
- • Recommends tier transition timing
- • Identifies hot tier bottlenecks
- • Optimizes frozen tier cache usage
Data Tier Best Practices
✅ Design Principles
- • Plan tier transitions based on access patterns
- • Configure appropriate hardware per tier
- • Use ILM policies for automatic transitions
- • Monitor storage costs and utilization
- • Test query performance across tiers
- • Plan for disaster recovery scenarios
💡 Optimization Tips
- • Size hot tier for active data only
- • Use force merge in warm/cold tiers
- • Reduce replicas as data ages
- • Implement searchable snapshots for archives
- • Monitor and adjust ILM timing
❌ Common Mistakes
- • Keeping all data in hot tier
- • Poor ILM policy timing
- • Not optimizing storage per tier
- • Ignoring query performance differences
- • Inadequate capacity planning
- • Not monitoring tier utilization
⚠️ Performance Considerations
- • Plan for query latency increases
- • Monitor cold/frozen tier performance
- • Optimize cache settings for frozen data
- • Consider cross-tier query patterns
- • Plan maintenance windows for transitions
Cost Analysis and Optimization
💰 Storage Cost Comparison
$1.00
Hot Tier
per GB/month
$0.50
Warm Tier
50% savings
$0.20
Cold Tier
80% savings
$0.05
Frozen Tier
95% savings
📊 Cost Optimization Strategies
Data Lifecycle Management
- • Implement aggressive ILM policies
- • Use searchable snapshots for frozen data
- • Optimize retention policies by use case
- • Monitor data access patterns
Hardware Optimization
- • Use appropriate storage types per tier
- • Scale nodes based on tier requirements
- • Implement compression strategies
- • Optimize replica counts by tier
Optimizing Data Storage Strategy
Key Benefits
- • Significant cost reduction through intelligent tiering
- • Improved query performance for active data
- • Efficient resource utilization across tiers
- • Scalable architecture for long-term growth
Implementation Strategy
- • Analyze current data access patterns
- • Configure appropriate node roles and hardware
- • Implement ILM policies for automatic transitions
- • Monitor and optimize tier utilization