Bucket Aggregations in Elasticsearch
Bucket aggregations create buckets of documents where each bucket is associated with a key and a document count. Think of them as GROUP BY operations in SQL, but much more powerful.
Overview
Bucket aggregations don't calculate metrics over fields like metrics aggregations do. Instead, they create buckets of documents. Each bucket is associated with a criterion (depends on the aggregation type) that determines whether a document falls into it.
Terms Aggregation
The most common bucket aggregation, groups documents by unique field values.
Basic Terms Aggregation
GET /products/_search
{
"size": 0,
"aggs": {
"product_categories": {
"terms": {
"field": "category.keyword"
}
}
}
}
Response:
{
"aggregations": {
"product_categories": {
"buckets": [
{
"key": "Electronics",
"doc_count": 150
},
{
"key": "Clothing",
"doc_count": 120
},
{
"key": "Books",
"doc_count": 80
}
]
}
}
}
Terms with Size and Order
GET /products/_search
{
"size": 0,
"aggs": {
"top_brands": {
"terms": {
"field": "brand.keyword",
"size": 20,
"order": { "_count": "desc" }
}
}
}
}
Multi-field Terms
GET /sales/_search
{
"size": 0,
"aggs": {
"sales_by_region": {
"terms": {
"field": "region.keyword"
},
"aggs": {
"total_sales": {
"sum": {
"field": "amount"
}
}
}
}
}
}
Date Histogram Aggregation
Groups documents into time-based buckets.
Basic Date Histogram
GET /logs/_search
{
"size": 0,
"aggs": {
"logs_over_time": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day"
}
}
}
}
Date Histogram with Custom Interval
GET /sales/_search
{
"size": 0,
"aggs": {
"monthly_sales": {
"date_histogram": {
"field": "date",
"calendar_interval": "month",
"format": "yyyy-MM",
"min_doc_count": 0,
"extended_bounds": {
"min": "2024-01-01",
"max": "2024-12-31"
}
},
"aggs": {
"revenue": {
"sum": {
"field": "amount"
}
}
}
}
}
}
Fixed Interval
GET /metrics/_search
{
"size": 0,
"aggs": {
"metrics_per_hour": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "30m",
"time_zone": "America/New_York"
}
}
}
}
Range Aggregation
Creates arbitrary ranges of numeric or date values.
Numeric Range
GET /products/_search
{
"size": 0,
"aggs": {
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "to": 50, "key": "Cheap" },
{ "from": 50, "to": 200, "key": "Moderate" },
{ "from": 200, "key": "Expensive" }
]
}
}
}
}
Date Range Aggregation
GET /users/_search
{
"size": 0,
"aggs": {
"user_age_groups": {
"date_range": {
"field": "birth_date",
"format": "yyyy",
"ranges": [
{ "key": "Gen Z", "from": "2000", "to": "2010" },
{ "key": "Millennials", "from": "1980", "to": "2000" },
{ "key": "Gen X", "from": "1965", "to": "1980" }
]
}
}
}
}
Histogram Aggregation
Creates fixed-size buckets over numeric values.
GET /products/_search
{
"size": 0,
"aggs": {
"price_histogram": {
"histogram": {
"field": "price",
"interval": 50,
"min_doc_count": 0,
"extended_bounds": {
"min": 0,
"max": 500
}
}
}
}
}
Filter Aggregation
Creates a single bucket matching a query.
Single Filter
GET /products/_search
{
"size": 0,
"aggs": {
"electronics_only": {
"filter": {
"term": { "category": "Electronics" }
},
"aggs": {
"avg_price": {
"avg": { "field": "price" }
}
}
}
}
}
Filters Aggregation (Multiple Filters)
GET /logs/_search
{
"size": 0,
"aggs": {
"log_levels": {
"filters": {
"filters": {
"errors": { "term": { "level": "ERROR" } },
"warnings": { "term": { "level": "WARN" } },
"info": { "term": { "level": "INFO" } }
}
},
"aggs": {
"top_messages": {
"terms": {
"field": "message.keyword",
"size": 5
}
}
}
}
}
}
Composite Aggregation
Useful for paginating through large sets of buckets.
GET /sales/_search
{
"size": 0,
"aggs": {
"sales_composite": {
"composite": {
"size": 10,
"sources": [
{
"product": {
"terms": {
"field": "product_id"
}
}
},
{
"date": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day"
}
}
}
]
},
"aggs": {
"total_sales": {
"sum": {
"field": "amount"
}
}
}
}
}
}
Pagination with After Key
GET /sales/_search
{
"size": 0,
"aggs": {
"sales_composite": {
"composite": {
"size": 10,
"sources": [
{ "product": { "terms": { "field": "product_id" } } }
],
"after": {
"product": "PROD-123"
}
}
}
}
}
Nested Aggregation
For aggregating on nested documents.
GET /products/_search
{
"size": 0,
"aggs": {
"reviews": {
"nested": {
"path": "reviews"
},
"aggs": {
"avg_rating": {
"avg": {
"field": "reviews.rating"
}
},
"rating_histogram": {
"histogram": {
"field": "reviews.rating",
"interval": 1
}
}
}
}
}
}
Global Aggregation
Breaks out of the current aggregation context.
GET /sales/_search
{
"query": {
"term": { "category": "Electronics" }
},
"size": 0,
"aggs": {
"electronics_avg": {
"avg": { "field": "price" }
},
"all_products": {
"global": {},
"aggs": {
"all_avg": {
"avg": { "field": "price" }
}
}
}
}
}
Significant Terms Aggregation
Finds uncommonly common terms.
GET /articles/_search
{
"query": {
"match": { "category": "technology" }
},
"size": 0,
"aggs": {
"significant_words": {
"significant_terms": {
"field": "content.keyword",
"size": 10
}
}
}
}
Combining Multiple Bucket Aggregations
Nested Buckets
GET /sales/_search
{
"size": 0,
"aggs": {
"by_category": {
"terms": {
"field": "category.keyword"
},
"aggs": {
"by_brand": {
"terms": {
"field": "brand.keyword"
},
"aggs": {
"price_stats": {
"stats": {
"field": "price"
}
}
}
}
}
}
}
}
Time-based Analysis
GET /orders/_search
{
"size": 0,
"aggs": {
"sales_per_day": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "day"
},
"aggs": {
"hourly_distribution": {
"histogram": {
"field": "hour_of_day",
"interval": 1,
"min_doc_count": 0
}
},
"revenue": {
"sum": {
"field": "total_amount"
}
}
}
}
}
}
Best Practices
1. Use keyword Fields for Terms Aggregations
// Good
"terms": { "field": "category.keyword" }
// Bad (will fail on text fields)
"terms": { "field": "category" }
2. Control Memory Usage
{
"terms": {
"field": "user_id",
"size": 100, // Limit bucket count
"shard_size": 200 // Control shard-level precision
}
}
3. Handle Missing Values
{
"terms": {
"field": "category.keyword",
"missing": "Unknown" // Include docs with missing values
}
}
4. Optimize Date Histograms
{
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day",
"min_doc_count": 0, // Include empty buckets
"offset": "+6h" // Adjust bucket boundaries
}
}
Common Use Cases
1. E-commerce Analytics
GET /orders/_search
{
"size": 0,
"aggs": {
"daily_orders": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "day"
},
"aggs": {
"unique_customers": {
"cardinality": {
"field": "customer_id"
}
},
"revenue": {
"sum": {
"field": "total"
}
}
}
}
}
}
2. Log Analysis
GET /logs/_search
{
"size": 0,
"aggs": {
"error_timeline": {
"filter": {
"term": { "level": "ERROR" }
},
"aggs": {
"errors_per_hour": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "1h"
}
}
}
}
}
}
3. User Behavior Analysis
GET /events/_search
{
"size": 0,
"aggs": {
"user_sessions": {
"terms": {
"field": "session_id.keyword",
"size": 1000
},
"aggs": {
"session_duration": {
"max": { "field": "timestamp" }
},
"events_count": {
"value_count": { "field": "event_type" }
}
}
}
}
}
Performance Tips
- Limit bucket count: Use
sizeparameter wisely - Use doc_count ordering: Default and most efficient
- Filter first: Apply queries to reduce document set
- Use composite aggregation: For large result sets requiring pagination
- Cache aggregations: Frequently used aggregations benefit from caching
Next Steps
- Combine bucket and metrics aggregations
- Learn about pipeline aggregations
- Explore sub-aggregations patterns
- Study aggregation performance optimization