Cassandra Architecture
Cassandra Architecture
Understanding Cassandra's architecture is crucial for designing efficient applications and troubleshooting issues. This tutorial covers the key architectural components and concepts.
Overview
Cassandra is a distributed database designed to handle large amounts of data across many servers with no single point of failure. Its architecture is based on two key technologies:
- Amazon Dynamo: Distributed systems techniques
- Google BigTable: Data model and storage engine
Key Architectural Principles
Peer-to-Peer Architecture
Unlike master-slave architectures, Cassandra uses a peer-to-peer distributed system where:
- All nodes are equal (no master/slave)
- Any node can handle any request
- Nodes communicate via gossip protocol
- No single point of failure
Data Distribution
Consistent Hashing
Cassandra uses consistent hashing to distribute data:
-- The partition key determines which node stores the data
CREATE TABLE users (
user_id UUID PRIMARY KEY, -- Partition key
username TEXT,
email TEXT
);
-- Data is distributed based on hash of partition key
-- hash(user_id) -> token -> node
Token Ring
- Each node is assigned token ranges
- Data is placed on nodes based on partition key hash
- Virtual nodes (vnodes) improve distribution
Replication
Replication Strategy
-- SimpleStrategy for single datacenter
CREATE KEYSPACE single_dc
WITH REPLICATION = {
'class': 'SimpleStrategy',
'replication_factor': 3
};
-- NetworkTopologyStrategy for multiple datacenters
CREATE KEYSPACE multi_dc
WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
'dc1': 3,
'dc2': 2
};
Replication Process
- Coordinator node receives write request
- Calculates which nodes should store replicas
- Sends write to all replica nodes in parallel
- Waits for acknowledgments based on consistency level
Core Components
Node
A single Cassandra instance running on a server:
Node Components:
├── Commit Log (durability)
├── Memtables (in-memory data)
├── SSTables (on-disk data)
├── Cache (row and key cache)
├── Compaction (SSTable maintenance)
└── Gossip (cluster communication)
Cluster
Collection of nodes that contain your data:
Cluster
├── Datacenter 1
│ ├── Rack 1
│ │ ├── Node 1
│ │ └── Node 2
│ └── Rack 2
│ ├── Node 3
│ └── Node 4
└── Datacenter 2
└── ...
Datacenter
Logical grouping of nodes, typically by:
- Geographic location
- Cloud availability zone
- Workload separation
Rack
Logical grouping within a datacenter:
- Represents failure domain
- Cassandra ensures replicas span racks
- Can map to physical racks or cloud AZs
Write Path
Write Process
- Commit Log: Write is first recorded for durability
- Memtable: Data written to in-memory structure
- Acknowledgment: Based on consistency level
- Flush: Memtable flushed to SSTable when full
Client Write Request
↓
Coordinator Node
↓
Commit Log (durability)
↓
Memtable (memory)
↓
[Eventually]
↓
SSTable (disk)
Consistency Levels for Writes
-- Write with consistency level
INSERT INTO users (user_id, username)
VALUES (uuid(), 'john_doe')
USING CONSISTENCY QUORUM;
Common write consistency levels:
- ANY: At least one node (including hinted handoff)
- ONE: At least one replica node
- QUORUM: Majority of replicas (RF/2 + 1)
- ALL: All replica nodes
Read Path
Read Process
- Coordinator: Receives read request
- Replica Selection: Chooses fastest replicas
- Data Retrieval:
- Check row cache
- Check memtables
- Check SSTables (with bloom filters)
- Read Repair: Fix inconsistencies
- Return Result: To client
Client Read Request
↓
Coordinator Node
↓
[Check Multiple Sources]
├── Row Cache (if enabled)
├── Memtables
└── SSTables
├── Bloom Filter (probably has data?)
├── Key Cache (SSTable offset)
├── Partition Summary
├── Partition Index
└── Data File
Read Consistency Levels
-- Read with consistency level
SELECT * FROM users
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000
USING CONSISTENCY LOCAL_QUORUM;
Storage Engine
SSTable (Sorted String Table)
Immutable on-disk files containing data:
SSTable Components:
├── Data.db (actual data)
├── Index.db (index of partitions)
├── Filter.db (bloom filter)
├── Summary.db (partition summary)
├── Statistics.db (metadata)
└── CompressionInfo.db (compression metadata)
Compaction
Process of merging SSTables:
-- Compaction strategies
ALTER TABLE users
WITH compaction = {
'class': 'SizeTieredCompactionStrategy',
'min_threshold': 4,
'max_threshold': 32
};
-- Other strategies:
-- LeveledCompactionStrategy
-- TimeWindowCompactionStrategy
-- DateTieredCompactionStrategy (deprecated)
Memtables
In-memory structures for recent writes:
- One memtable per table
- Flushed to SSTable when full
- Configurable size thresholds
Consistency and Availability
CAP Theorem
Cassandra is an AP system:
- Available: System remains operational
- Partition Tolerant: Handles network splits
- Eventually Consistent: Not immediately consistent
Tunable Consistency
Balance between consistency and availability:
-- Strong consistency (R + W > RF)
-- RF=3, W=QUORUM(2), R=QUORUM(2)
-- 2 + 2 > 3 ✓
-- Eventual consistency
-- RF=3, W=ONE, R=ONE
-- 1 + 1 < 3 ✗
Hinted Handoff
Temporary storage for unavailable nodes:
- Coordinator stores hints for down nodes
- Replays writes when node returns
- Configurable hint window (default 3 hours)
Gossip Protocol
Peer-to-peer communication protocol:
Gossip Information:
├── Node State (UP/DOWN)
├── Load Information
├── Schema Version
├── Token Ownership
└── Datacenter/Rack Info
Gossip characteristics:
- Runs every second
- Exchanges state with up to 3 nodes
- Efficient epidemic protocol
- Enables automatic discovery
Coordinator Node
Any node can be a coordinator:
Coordinator Responsibilities:
├── Route requests to replicas
├── Handle consistency requirements
├── Manage read repair
├── Store hinted handoffs
└── Return results to client
Partitioner
Determines data distribution:
-- Default: Murmur3Partitioner (recommended)
-- hash(partition_key) → token → node
-- Example token ranges
-- Node A: -9223372036854775808 to -4611686018427387904
-- Node B: -4611686018427387903 to 0
-- Node C: 1 to 4611686018427387903
-- Node D: 4611686018427387904 to 9223372036854775807
Caching
Row Cache
Caches entire rows:
ALTER TABLE users WITH caching = {
'keys': 'ALL',
'rows_per_partition': '100'
};
Key Cache
Caches partition key locations in SSTables:
- Enabled by default
- Reduces disk seeks
- Configurable size
Best Practices
- Design for your queries: Model data based on access patterns
- Use appropriate consistency levels: Balance consistency vs performance
- Monitor cluster health: Track metrics and performance
- Plan capacity: Account for replication and growth
- Configure properly: Tune based on workload
- Understand failure modes: Plan for node/rack/DC failures
Common Architecture Patterns
Multi-Datacenter Deployment
CREATE KEYSPACE global_app
WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
'us_east': 3,
'us_west': 3,
'eu_west': 3
};
-- Use LOCAL_QUORUM for datacenter-aware consistency
SELECT * FROM users USING CONSISTENCY LOCAL_QUORUM;
Workload Separation
- Use separate datacenters for:
- Analytics workloads
- Real-time serving
- Backup/disaster recovery
Geographic Distribution
- Place datacenters near users
- Use consistency levels that respect locality
- Consider network latency in design