Key Concepts
Key Concepts in Elasticsearch
Understanding these fundamental concepts is essential for working effectively with Elasticsearch. Let's explore each concept with examples to make them clear for beginners.
Cluster
A cluster is a collection of one or more nodes (servers) that work together to store your data and provide search capabilities across all nodes.
Key Points:
- Each cluster has a unique name (default: "elasticsearch")
- Nodes join a cluster by using the cluster name
- A cluster can have just one node (single-node cluster)
- Provides automatic load balancing and failover
Example:
Node
A node is a single server that is part of your cluster, stores data, and participates in the cluster's indexing and search capabilities.
Types of Nodes:
- Master Node: Manages cluster-wide operations
- Data Node: Stores data and executes data-related operations
- Ingest Node: Preprocesses documents before indexing
- Coordinating Node: Routes requests and aggregates results
Example:
Index
An index is a collection of documents that have similar characteristics. It's similar to a database in the relational world.
Key Points:
- Index names must be lowercase
- An index can contain multiple document types (deprecated in newer versions)
- Each index has its own settings and mappings
Example:
Document
A document is a basic unit of information that can be indexed. It's expressed in JSON format and stored within an index.
Key Points:
- Similar to a row in a relational database
- Each document has a unique ID
- Documents are immutable (updates create new versions)
- Can contain nested objects and arrays
Example:
Shard
A shard is a single Lucene instance and a fundamental unit of storage in Elasticsearch. Each index is divided into shards for scalability.
Types of Shards:
- Primary Shards: Original shards that hold the data
- Replica Shards: Copies of primary shards for redundancy
Key Points:
- Number of primary shards is fixed at index creation
- Each shard is a fully functional index
- Shards allow horizontal scaling
- Default: 1 primary shard per index
Example:
Replicas
Replicas are copies of primary shards that provide redundancy and improve search performance.
Benefits:
- High Availability: If a node fails, replicas ensure no data loss
- Increased Performance: Search queries can be executed on replicas
- Load Balancing: Distributes query load across multiple copies
Example:
Near Real-Time (NRT)
Elasticsearch is a near real-time search platform, meaning there's a slight delay between indexing a document and when it becomes searchable.
Key Points:
- Default refresh interval: 1 second
- Documents are searchable within ~1 second of indexing
- Can be configured per index
- Trade-off between real-time and performance
Example:
Additional Important Concepts
Mapping
Defines how documents and their fields are stored and indexed.
Inverted Index
The core data structure that makes searching fast:
- Maps terms to the documents containing them
- Similar to an index in a book
- Enables full-text search capabilities
Example:
How These Concepts Work Together
-
Data Storage Flow:
-
Search Flow:
-
Redundancy:
Best Practices
-
Cluster Planning:
- Use odd number of master-eligible nodes (3, 5, 7)
- Separate master and data nodes for large clusters
-
Index Design:
- Keep indices focused on specific data types
- Use meaningful, lowercase names
- Plan shard count based on data volume
-
Shard Sizing:
- Aim for 20-40GB per shard
- Avoid too many small shards
- Consider future growth
-
Replica Strategy:
- At least 1 replica for production
- More replicas for read-heavy workloads
- Ensure enough nodes to distribute replicas
Common Beginner Mistakes
- Too Many Shards: Creating hundreds of small shards impacts performance
- No Replicas: Running without replicas risks data loss
- Wrong Refresh Settings: Setting refresh to 0 for real-time at the cost of performance
- Ignoring Cluster Health: Not monitoring yellow or red cluster states
Next Steps
Now that you understand the key concepts, let's proceed to install Elasticsearch and start working with these concepts hands-on.