Inserting Data in Elasticsearch
Learn how to insert documents into Elasticsearch using various methods and options. This tutorial covers single document insertion, bulk operations, and important concepts for data indexing.
Prerequisites
Before inserting data, ensure you have:
- Elasticsearch running (check with
curl localhost:9200) - An index created (or use automatic index creation)
Basic Document Insertion
Create Document with ID
Add a document to an index with a specific ID:
PUT /products/_doc/1
{
"name": "Laptop Pro",
"category": "Electronics",
"price": 1299.99,
"in_stock": true,
"specs": {
"cpu": "Intel i7",
"ram": "16GB",
"storage": "512GB SSD"
}
}
Response:
{
"_index": "products",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
Understanding the Response
_index: The index where document was stored_id: The document's unique identifier_version: Version number (increments with updates)result: Operation result (created/updated)_shards: Shard replication information_seq_no: Sequence number for optimistic concurrency control_primary_term: Primary term for the primary shard
Document Creation Methods
1. PUT with ID (Create or Update)
PUT /users/_doc/100
{
"username": "john_doe",
"email": "[email protected]",
"registered_date": "2024-01-15"
}
This will create a new document or update if ID exists.
2. Create Only (Fail if Exists)
Using op_type parameter:
PUT /users/_doc/100?op_type=create
{
"username": "jane_doe",
"email": "[email protected]"
}
Or using _create endpoint:
PUT /users/_create/100
{
"username": "jane_doe",
"email": "[email protected]"
}
3. Automatic ID Generation
Let Elasticsearch generate a unique ID:
POST /logs/_doc
{
"timestamp": "2024-01-15T10:30:00",
"level": "INFO",
"message": "Application started successfully",
"service": "auth-service"
}
Response includes generated ID:
{
"_index": "logs",
"_id": "dXuSt4sBX_Z_kb8rP3qY", // Auto-generated ID
"_version": 1,
"result": "created"
}
Bulk Operations
For inserting multiple documents efficiently:
POST /_bulk
{ "index": { "_index": "products", "_id": "2" } }
{ "name": "Smartphone", "price": 699.99, "category": "Electronics" }
{ "index": { "_index": "products", "_id": "3" } }
{ "name": "Tablet", "price": 499.99, "category": "Electronics" }
{ "index": { "_index": "products" } }
{ "name": "Headphones", "price": 199.99, "category": "Audio" }
Note: Each action and document must be on separate lines, ending with a newline.
Bulk Insert from File
curl -X POST "localhost:9200/_bulk" \
-H "Content-Type: application/json" \
--data-binary @products.json
Advanced Insertion Options
1. With Routing
Route documents to specific shards:
PUT /orders/_doc/1001?routing=user123
{
"order_id": "1001",
"user_id": "user123",
"total": 299.99,
"items": ["item1", "item2"]
}
2. With Refresh
Make document immediately searchable:
PUT /realtime/_doc/1?refresh=true
{
"message": "This will be immediately searchable"
}
Refresh options:
true: Refresh immediately (impacts performance)wait_for: Wait for next refreshfalse: Don't wait (default)
3. With Pipeline
Apply ingest pipeline during indexing:
PUT /logs/_doc/1?pipeline=add-timestamp
{
"message": "Log entry",
"level": "INFO"
}
4. With Timeout
Set operation timeout:
PUT /products/_doc/1?timeout=5m
{
"name": "Special Product",
"processing_required": true
}
Working with Different Data Types
Nested Objects
POST /employees/_doc
{
"name": "Alice Johnson",
"department": "Engineering",
"contact": {
"email": "[email protected]",
"phone": "+1-555-0123"
},
"projects": [
{
"name": "Project A",
"status": "active"
},
{
"name": "Project B",
"status": "completed"
}
]
}
Arrays
POST /articles/_doc
{
"title": "Elasticsearch Tutorial",
"tags": ["elasticsearch", "search", "database"],
"authors": [
"John Smith",
"Jane Doe"
],
"ratings": [4.5, 4.8, 4.2]
}
Date Formats
POST /events/_doc
{
"event_name": "Conference",
"start_date": "2024-03-15",
"timestamp": "2024-03-15T09:00:00Z",
"epoch_millis": 1710489600000
}
Index Templates and Dynamic Mapping
Create with Dynamic Fields
POST /dynamic_index/_doc
{
"static_field": "known value",
"dynamic_string": "Elasticsearch will detect this as text",
"dynamic_number": 42,
"dynamic_boolean": true,
"dynamic_date": "2024-01-15"
}
Explicit Mapping Before Insert
PUT /products
{
"mappings": {
"properties": {
"name": { "type": "text" },
"price": { "type": "float" },
"in_stock": { "type": "boolean" },
"created_at": { "type": "date" }
}
}
}
Best Practices
1. Use Bulk API for Multiple Documents
Instead of:
PUT /index/_doc/1 {...}
PUT /index/_doc/2 {...}
PUT /index/_doc/3 {...}
Use:
POST /_bulk
{"index": {"_index": "index", "_id": "1"}}
{...}
{"index": {"_index": "index", "_id": "2"}}
{...}
2. Choose Appropriate ID Strategy
- User-provided IDs: When you need predictable, meaningful IDs
- Auto-generated IDs: For logs, events, or when ID doesn't matter
3. Consider Document Size
- Keep documents under 100MB (hard limit)
- Optimal size: 1KB - 100KB
- Split large documents into smaller ones
4. Handle Versioning
PUT /products/_doc/1?version=2&version_type=external
{
"name": "Updated Product",
"version": 2
}
Common Errors and Solutions
1. Index Not Found
{
"error": {
"type": "index_not_found_exception",
"reason": "no such index [products]"
}
}
Solution: Create index first or enable auto-create:
PUT /products
2. Document Already Exists
When using _create:
{
"error": {
"type": "version_conflict_engine_exception",
"reason": "[1]: version conflict, document already exists"
}
}
Solution: Use PUT without _create or update the document.
3. Mapping Conflict
{
"error": {
"type": "mapper_parsing_exception",
"reason": "failed to parse field [price] of type [long]"
}
}
Solution: Ensure data types match the mapping.
Performance Tips
- Bulk Size: Keep bulk requests between 5-15 MB
- Refresh Interval: Increase for better indexing performance
- Replicas: Set to 0 during initial bulk load, then increase
- Sharding: Plan shard count based on data volume
Next Steps
After mastering data insertion:
- Learn about reading and searching data
- Understand update operations
- Explore bulk processing patterns
- Study index optimization techniques