Cassandra Architecture

Understanding Cassandra's architecture is crucial for designing efficient applications and troubleshooting issues. This tutorial covers the key architectural components and concepts.

Overview

Cassandra is a distributed database designed to handle large amounts of data across many servers with no single point of failure. Its architecture is based on two key technologies:

Amazon Dynamo: Distributed systems techniques
Google BigTable: Data model and storage engine

Key Architectural Principles

Peer-to-Peer Architecture

Unlike master-slave architectures, Cassandra uses a peer-to-peer distributed system where:

All nodes are equal (no master/slave)
Any node can handle any request
Nodes communicate via gossip protocol
No single point of failure

Data Distribution

Consistent Hashing

Cassandra uses consistent hashing to distribute data:

-- The partition key determines which node stores the data
CREATE TABLE users (
    user_id UUID PRIMARY KEY,  -- Partition key
    username TEXT,
    email TEXT
);

-- Data is distributed based on hash of partition key
-- hash(user_id) -> token -> node

Token Ring

Each node is assigned token ranges
Data is placed on nodes based on partition key hash
Virtual nodes (vnodes) improve distribution

Replication

Replication Strategy

-- SimpleStrategy for single datacenter
CREATE KEYSPACE single_dc
WITH REPLICATION = {
    'class': 'SimpleStrategy',
    'replication_factor': 3
};

-- NetworkTopologyStrategy for multiple datacenters
CREATE KEYSPACE multi_dc
WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3,
    'dc2': 2
};

Replication Process

Coordinator node receives write request
Calculates which nodes should store replicas
Sends write to all replica nodes in parallel
Waits for acknowledgments based on consistency level

Core Components

Node

A single Cassandra instance running on a server:

Node Components:
├── Commit Log (durability)
├── Memtables (in-memory data)
├── SSTables (on-disk data)
├── Cache (row and key cache)
├── Compaction (SSTable maintenance)
└── Gossip (cluster communication)

Cluster

Collection of nodes that contain your data:

Cluster
├── Datacenter 1
│   ├── Rack 1
│   │   ├── Node 1
│   │   └── Node 2
│   └── Rack 2
│       ├── Node 3
│       └── Node 4
└── Datacenter 2
    └── ...

Datacenter

Logical grouping of nodes, typically by:

Geographic location
Cloud availability zone
Workload separation

Rack

Logical grouping within a datacenter:

Represents failure domain
Cassandra ensures replicas span racks
Can map to physical racks or cloud AZs

Write Path

Write Process

Commit Log: Write is first recorded for durability
Memtable: Data written to in-memory structure
Acknowledgment: Based on consistency level
Flush: Memtable flushed to SSTable when full

Client Write Request
    ↓
Coordinator Node
    ↓
Commit Log (durability)
    ↓
Memtable (memory)
    ↓
[Eventually]
    ↓
SSTable (disk)

Consistency Levels for Writes

-- Write with consistency level
INSERT INTO users (user_id, username) 
VALUES (uuid(), 'john_doe')
USING CONSISTENCY QUORUM;

Common write consistency levels:

ANY: At least one node (including hinted handoff)
ONE: At least one replica node
QUORUM: Majority of replicas (RF/2 + 1)
ALL: All replica nodes

Read Path

Read Process

Coordinator: Receives read request
Replica Selection: Chooses fastest replicas
Data Retrieval:
- Check row cache
- Check memtables
- Check SSTables (with bloom filters)
Read Repair: Fix inconsistencies
Return Result: To client

Client Read Request
    ↓
Coordinator Node
    ↓
[Check Multiple Sources]
├── Row Cache (if enabled)
├── Memtables
└── SSTables
    ├── Bloom Filter (probably has data?)
    ├── Key Cache (SSTable offset)
    ├── Partition Summary
    ├── Partition Index
    └── Data File

Read Consistency Levels

-- Read with consistency level
SELECT * FROM users 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000
USING CONSISTENCY LOCAL_QUORUM;

Storage Engine

SSTable (Sorted String Table)

Immutable on-disk files containing data:

SSTable Components:
├── Data.db (actual data)
├── Index.db (index of partitions)
├── Filter.db (bloom filter)
├── Summary.db (partition summary)
├── Statistics.db (metadata)
└── CompressionInfo.db (compression metadata)

Compaction

Process of merging SSTables:

-- Compaction strategies
ALTER TABLE users 
WITH compaction = {
    'class': 'SizeTieredCompactionStrategy',
    'min_threshold': 4,
    'max_threshold': 32
};

-- Other strategies:
-- LeveledCompactionStrategy
-- TimeWindowCompactionStrategy
-- DateTieredCompactionStrategy (deprecated)

Memtables

In-memory structures for recent writes:

One memtable per table
Flushed to SSTable when full
Configurable size thresholds

Consistency and Availability

CAP Theorem

Cassandra is an AP system:

Available: System remains operational
Partition Tolerant: Handles network splits
Eventually Consistent: Not immediately consistent

Tunable Consistency

Balance between consistency and availability:

-- Strong consistency (R + W > RF)
-- RF=3, W=QUORUM(2), R=QUORUM(2)
-- 2 + 2 > 3 ✓

-- Eventual consistency
-- RF=3, W=ONE, R=ONE
-- 1 + 1 < 3 ✗

Hinted Handoff

Temporary storage for unavailable nodes:

Coordinator stores hints for down nodes
Replays writes when node returns
Configurable hint window (default 3 hours)

Gossip Protocol

Peer-to-peer communication protocol:

Gossip Information:
├── Node State (UP/DOWN)
├── Load Information
├── Schema Version
├── Token Ownership
└── Datacenter/Rack Info

Gossip characteristics:

Runs every second
Exchanges state with up to 3 nodes
Efficient epidemic protocol
Enables automatic discovery

Coordinator Node

Any node can be a coordinator:

Coordinator Responsibilities:
├── Route requests to replicas
├── Handle consistency requirements
├── Manage read repair
├── Store hinted handoffs
└── Return results to client

Partitioner

Determines data distribution:

-- Default: Murmur3Partitioner (recommended)
-- hash(partition_key) → token → node

-- Example token ranges
-- Node A: -9223372036854775808 to -4611686018427387904
-- Node B: -4611686018427387903 to 0
-- Node C: 1 to 4611686018427387903
-- Node D: 4611686018427387904 to 9223372036854775807

Caching

Row Cache

Caches entire rows:

ALTER TABLE users WITH caching = {
    'keys': 'ALL',
    'rows_per_partition': '100'
};

Key Cache

Caches partition key locations in SSTables:

Enabled by default
Reduces disk seeks
Configurable size

Best Practices

Design for your queries: Model data based on access patterns
Use appropriate consistency levels: Balance consistency vs performance
Monitor cluster health: Track metrics and performance
Plan capacity: Account for replication and growth
Configure properly: Tune based on workload
Understand failure modes: Plan for node/rack/DC failures

Common Architecture Patterns

Multi-Datacenter Deployment

CREATE KEYSPACE global_app
WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'us_east': 3,
    'us_west': 3,
    'eu_west': 3
};

-- Use LOCAL_QUORUM for datacenter-aware consistency
SELECT * FROM users USING CONSISTENCY LOCAL_QUORUM;

Workload Separation

Use separate datacenters for:
- Analytics workloads
- Real-time serving
- Backup/disaster recovery

Geographic Distribution

Place datacenters near users
Use consistency levels that respect locality
Consider network latency in design