Table Operations
Table Operations
This tutorial covers all table-related operations in Cassandra, including creating, altering, and managing tables.
Creating Tables
Basic Syntax
CREATE TABLE [IF NOT EXISTS] [keyspace_name.]table_name (
column_definition,
column_definition,
...
PRIMARY KEY (partition_key, [clustering_columns])
) [WITH table_options];
Simple Table
CREATE TABLE users (
user_id UUID PRIMARY KEY,
username TEXT,
email TEXT,
created_at TIMESTAMP
);
Compound Primary Key
CREATE TABLE user_activities (
user_id UUID,
activity_timestamp TIMESTAMP,
activity_type TEXT,
details TEXT,
PRIMARY KEY (user_id, activity_timestamp)
) WITH CLUSTERING ORDER BY (activity_timestamp DESC);
Composite Partition Key
CREATE TABLE sensor_data (
sensor_type TEXT,
sensor_id UUID,
reading_time TIMESTAMP,
temperature DOUBLE,
humidity DOUBLE,
PRIMARY KEY ((sensor_type, sensor_id), reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC);
Primary Keys
Partition Key
Determines data distribution:
-- Single partition key
PRIMARY KEY (user_id)
-- Composite partition key
PRIMARY KEY ((year, month))
Clustering Columns
Sort data within partitions:
CREATE TABLE time_series (
device_id UUID,
timestamp TIMESTAMP,
value DOUBLE,
PRIMARY KEY (device_id, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
Table Options
Clustering Order
CREATE TABLE events (
user_id UUID,
event_time TIMESTAMP,
event_type TEXT,
PRIMARY KEY (user_id, event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);
Compaction Strategy
CREATE TABLE large_table (
id UUID PRIMARY KEY,
data TEXT
) WITH compaction = {
'class': 'LeveledCompactionStrategy',
'sstable_size_in_mb': 160
};
Compression
CREATE TABLE compressed_data (
id UUID PRIMARY KEY,
payload TEXT
) WITH compression = {
'class': 'LZ4Compressor',
'chunk_length_in_kb': 64
};
Time To Live (TTL)
CREATE TABLE sessions (
session_id UUID PRIMARY KEY,
user_id UUID,
data TEXT
) WITH default_time_to_live = 3600; -- 1 hour TTL
Caching Options
CREATE TABLE cached_data (
id UUID PRIMARY KEY,
data TEXT
) WITH caching = {
'keys': 'ALL',
'rows_per_partition': 'NONE'
};
Altering Tables
Add Columns
ALTER TABLE users ADD phone_number TEXT;
ALTER TABLE users ADD address frozen<map<text, text>>;
Drop Columns
ALTER TABLE users DROP phone_number;
Rename Columns
-- Note: Only works for non-primary key columns
ALTER TABLE users RENAME email TO email_address;
Change Table Options
-- Change compaction strategy
ALTER TABLE users
WITH compaction = {
'class': 'SizeTieredCompactionStrategy'
};
-- Change caching
ALTER TABLE users
WITH caching = {
'keys': 'ALL',
'rows_per_partition': '100'
};
-- Set TTL
ALTER TABLE sessions
WITH default_time_to_live = 7200;
Dropping Tables
-- Drop table
DROP TABLE users;
-- Drop table if exists
DROP TABLE IF EXISTS users;
Truncating Tables
Remove all data but keep schema:
TRUNCATE TABLE users;
-- or
TRUNCATE users;
Table Metadata
Describe Table
-- Show table structure
DESCRIBE TABLE users;
-- Show create statement
SHOW CREATE TABLE users;
List Tables
-- List all tables in current keyspace
DESCRIBE TABLES;
-- List tables in specific keyspace
DESCRIBE TABLES IN my_keyspace;
Static Columns
Shared across all rows in a partition:
CREATE TABLE user_stats (
user_id UUID,
stat_date DATE,
total_activities INT STATIC, -- Same for all dates per user
daily_activities INT,
PRIMARY KEY (user_id, stat_date)
);
-- Update static column
UPDATE user_stats SET total_activities = 100 WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
Secondary Indexes
Create Index
-- Create index on regular column
CREATE INDEX ON users (email);
-- Create named index
CREATE INDEX users_email_idx ON users (email);
-- Create index on collection values
CREATE INDEX ON users (tags); -- For SET/LIST
CREATE INDEX ON users (preferences) VALUES; -- For MAP values
Drop Index
DROP INDEX users_email_idx;
DROP INDEX IF EXISTS users_email_idx;
Materialized Views
Create denormalized views for different query patterns:
-- Base table
CREATE TABLE users (
user_id UUID,
email TEXT,
username TEXT,
created_at TIMESTAMP,
PRIMARY KEY (user_id)
);
-- Materialized view by email
CREATE MATERIALIZED VIEW users_by_email AS
SELECT user_id, email, username, created_at
FROM users
WHERE email IS NOT NULL AND user_id IS NOT NULL
PRIMARY KEY (email, user_id);
Table Properties Reference
All Table Options
CREATE TABLE comprehensive_example (
id UUID PRIMARY KEY,
data TEXT
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = 'Example table with all options'
AND compaction = {'class': 'SizeTieredCompactionStrategy'}
AND compression = {'class': 'LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
Best Practices
Primary Key Design
- Choose partition key carefully: Ensures even distribution
- Avoid large partitions: Keep under 100MB
- Use clustering columns: For time-series data
- Consider query patterns: Design for your queries
Table Design
- Denormalize data: Duplicate data for query efficiency
- One table per query: Design tables for specific access patterns
- Avoid secondary indexes: In production, use materialized views
- Use appropriate data types: Don't use TEXT for everything
Performance Tips
-
Use proper compaction strategy:
- SizeTiered: Write-heavy workloads
- Leveled: Read-heavy workloads
- TimeWindow: Time-series data
-
Configure compression: Reduces storage and I/O
-
Set appropriate TTL: For temporary data
-
Monitor partition sizes: Use nodetool to check
Common Mistakes
Mistake 1: Hot Partitions
-- Bad: All data goes to same partition
CREATE TABLE logs (
log_level TEXT,
timestamp TIMESTAMP,
message TEXT,
PRIMARY KEY (log_level, timestamp)
);
-- Good: Distribute across multiple partitions
CREATE TABLE logs (
log_level TEXT,
date DATE,
timestamp TIMESTAMP,
message TEXT,
PRIMARY KEY ((log_level, date), timestamp)
);
Mistake 2: Large Collections
-- Bad: Unbounded list growth
CREATE TABLE user_events (
user_id UUID PRIMARY KEY,
all_events LIST<TEXT> -- Can grow without limit
);
-- Good: Use clustering columns
CREATE TABLE user_events (
user_id UUID,
event_time TIMESTAMP,
event_data TEXT,
PRIMARY KEY (user_id, event_time)
);
Mistake 3: Too Many Secondary Indexes
-- Bad: Multiple secondary indexes
CREATE INDEX ON users (email);
CREATE INDEX ON users (phone);
CREATE INDEX ON users (city);
-- Good: Use materialized views for different access patterns