Why Index Settings Matter
Index settings control how your data is stored, indexed, and searched. Proper configuration can dramatically improve performance, reduce storage costs, and enhance search relevance. Poor settings can lead to slow queries, wasted disk space, and operational issues.
Index settings are the foundation of Elasticsearch performance optimization. They determine how your data is processed, stored, and retrieved. ElasticDoctor's index settings check analyzes your configuration against best practices and identifies optimization opportunities that can improve performance by 40-60%.
API Endpoint and Usage
GET /index_name/_settings
✅ What This Check Analyzes
- • Number of shards and replicas
- • Refresh interval settings
- • Mapping and field limits
- • Allocation and routing settings
- • Codec and compression settings
- • Merge and translog configuration
🔧 Key Settings Categories
- • Performance: Refresh, merge, translog
- • Capacity: Shards, replicas, routing
- • Storage: Compression, codec
- • Limits: Fields, mapping size
Critical Settings Analysis
1. Shard Configuration
"settings": { "index": { "number_of_shards": "5", "number_of_replicas": "1", "shard": { "check_on_startup": "false" } } }
Optimal Shard Sizing
- • Target size: 10-50GB per shard
- • Document count: 10M-1B documents
- • Too many shards: Overhead, slow cluster state
- • Too few shards: Poor distribution, large segments
Replica Strategy
- • Production: At least 1 replica
- • High availability: 2+ replicas
- • Read-heavy: More replicas for scaling
- • Write-heavy: Fewer replicas for speed
2. Refresh Interval Optimization
"settings": { "index": { "refresh_interval": "30s" } }
Real-time (1s)
Use for dashboards, monitoring, or when immediate visibility is critical.
Balanced (30s)
Good for most use cases. Balances performance with reasonable freshness.
Bulk Loading (-1)
Disable refresh during bulk operations, then enable afterward.
3. Compression and Storage
"settings": { "index": { "codec": "best_compression", "store": { "preload": ["doc", "term"] } } }
Compression Options
- • default: Fastest indexing, larger size
- • best_compression: Slower indexing, 50% smaller
- • Consider storage costs vs. indexing speed
Store Preloading
- • Loads file extensions into memory
- • Improves search performance
- • Requires sufficient heap memory
Common Configuration Issues
🚨 Critical: Too Many Shards
Index has excessive number of shards relative to data size, causing cluster state bloat and performance degradation.
Solutions:
- 1. Use shrink API to reduce shard count:
POST /source/_shrink/target
- 2. Reindex with proper shard count for future indices
- 3. Consider index templates for automatic configuration
- 4. Review shard sizing guidelines (10-50GB per shard)
⚠️ Warning: Aggressive Refresh Interval
Refresh interval is too frequent, causing unnecessary resource consumption and reduced indexing performance.
Optimization:
- • Increase refresh interval to 30s or higher for bulk operations
- • Use _refresh API sparingly for immediate visibility needs
- • Consider application-level caching for frequently accessed data
- • Monitor segment merge activity and I/O patterns
ℹ️ Info: Single Replica in Production
Index has only one replica, which may be insufficient for high availability or read scaling requirements.
Considerations:
- • Increase replicas for better availability and read performance
- • Balance replica count with cluster capacity
- • Consider allocation awareness for rack/zone distribution
- • Monitor query load distribution across replicas
Index Settings Best Practices
✅ Performance Optimization
- • Size shards to 10-50GB for optimal performance
- • Use appropriate refresh intervals (30s+ for bulk data)
- • Enable best_compression for archival data
- • Configure merge policy for your write patterns
- • Set routing for even distribution
💡 Operational Tips
- • Use index templates for consistent configuration
- • Monitor shard sizes and rebalance when needed
- • Implement allocation rules for hardware tiers
- • Set up rollover policies for time-based indices
❌ Common Mistakes
- • Creating too many small shards
- • Using 1s refresh for non-realtime data
- • Ignoring replica count in production
- • Not considering compression for cold data
- • Forgetting to update settings after migration
⚠️ Monitoring Points
- • Track shard size growth over time
- • Monitor refresh and merge activity
- • Watch for allocation failures
- • Check storage utilization patterns
Configuration Examples
High-Performance Real-time Index
PUT /realtime-logs { "settings": { "index": { "number_of_shards": 3, "number_of_replicas": 2, "refresh_interval": "1s", "translog": { "flush_threshold_size": "1gb", "sync_interval": "30s" }, "merge": { "policy": { "max_merged_segment": "5gb" } } } } }
Bulk Loading Optimized Index
PUT /bulk-data { "settings": { "index": { "number_of_shards": 1, "number_of_replicas": 0, "refresh_interval": -1, "translog": { "flush_threshold_size": "2gb", "durability": "async" }, "merge": { "policy": { "max_merge_at_once": 30, "segments_per_tier": 30 } } } } }
Archive/Cold Storage Index
PUT /archive-data { "settings": { "index": { "number_of_shards": 1, "number_of_replicas": 1, "refresh_interval": "30s", "codec": "best_compression", "routing": { "allocation": { "require": { "box_type": "cold" } } }, "store": { "preload": ["doc"] } } } }
Optimizing Your Index Settings
Key Takeaways
- • Proper shard sizing is critical for performance
- • Refresh intervals should match your use case
- • Compression settings can significantly reduce storage
- • Replica count affects both availability and performance
Action Items
- • Review current shard sizes and distribution
- • Adjust refresh intervals based on requirements
- • Implement index templates for consistency
- • Monitor performance after configuration changes