Creating Table
Creating Tables
Tables in Cassandra store data in a structured format. Understanding how to create tables with proper primary keys and options is essential for building efficient Cassandra applications.
Basic Syntax
CREATE TABLE [IF NOT EXISTS] [keyspace.]table_name (
column1_name data_type,
column2_name data_type,
...
PRIMARY KEY (partition_key, [clustering_columns])
) [WITH table_options];
Simple Table Example
CREATE TABLE student (
id INT PRIMARY KEY,
name TEXT,
city TEXT,
phone_number TEXT,
enrollment_date DATE
);
Primary Key Types
Single Primary Key
CREATE TABLE users (
user_id UUID PRIMARY KEY,
username TEXT,
email TEXT
);
Compound Primary Key
CREATE TABLE user_posts (
user_id UUID,
post_id TIMEUUID,
title TEXT,
content TEXT,
PRIMARY KEY (user_id, post_id)
) WITH CLUSTERING ORDER BY (post_id DESC);
In this example:
user_id
is the partition keypost_id
is the clustering column- Data is sorted by
post_id
in descending order within each partition
Composite Partition Key
CREATE TABLE sensor_readings (
location TEXT,
sensor_id INT,
reading_time TIMESTAMP,
temperature DOUBLE,
humidity DOUBLE,
PRIMARY KEY ((location, sensor_id), reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC);
Here (location, sensor_id)
together form the partition key.
Column Data Types
Common Data Types
CREATE TABLE data_types_example (
-- Numeric types
tiny_num TINYINT,
small_num SMALLINT,
normal_int INT,
big_num BIGINT,
decimal_num DECIMAL,
float_num FLOAT,
double_num DOUBLE,
-- Text types
ascii_text ASCII,
normal_text TEXT,
varchar_text VARCHAR,
-- Time types
created_date DATE,
created_time TIME,
created_timestamp TIMESTAMP,
-- Other types
id_field UUID,
time_id TIMEUUID,
ip_addr INET,
is_active BOOLEAN,
data_blob BLOB,
PRIMARY KEY (id_field)
);
Collection Types
CREATE TABLE collections_example (
user_id UUID PRIMARY KEY,
emails SET<TEXT>,
phone_numbers LIST<TEXT>,
properties MAP<TEXT, TEXT>
);
Table Options
Clustering Order
CREATE TABLE time_series_data (
device_id UUID,
timestamp TIMESTAMP,
value DOUBLE,
PRIMARY KEY (device_id, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
Compaction Strategy
CREATE TABLE events (
id UUID PRIMARY KEY,
data TEXT
) WITH compaction = {
'class': 'SizeTieredCompactionStrategy',
'min_threshold': 4,
'max_threshold': 32
};
Compression
CREATE TABLE large_data (
id UUID PRIMARY KEY,
content TEXT
) WITH compression = {
'class': 'LZ4Compressor',
'chunk_length_in_kb': 64
};
Time To Live (TTL)
CREATE TABLE sessions (
session_id UUID PRIMARY KEY,
user_id UUID,
data TEXT
) WITH default_time_to_live = 3600; -- Data expires after 1 hour
IF NOT EXISTS
Prevent errors if table already exists:
CREATE TABLE IF NOT EXISTS users (
user_id UUID PRIMARY KEY,
username TEXT,
email TEXT
);
Static Columns
Static columns are shared among all rows in a partition:
CREATE TABLE user_stats (
user_id UUID,
month TEXT,
total_posts INT STATIC, -- Same value for all months of a user
monthly_posts INT,
PRIMARY KEY (user_id, month)
);
Complete Example
Here's a comprehensive example showing various features:
CREATE TABLE IF NOT EXISTS user_activities (
user_id UUID,
activity_id TIMEUUID,
activity_type TEXT,
activity_data MAP<TEXT, TEXT>,
tags SET<TEXT>,
created_at TIMESTAMP,
updated_at TIMESTAMP,
is_public BOOLEAN,
view_count COUNTER,
PRIMARY KEY (user_id, activity_id)
) WITH CLUSTERING ORDER BY (activity_id DESC)
AND compaction = {'class': 'LeveledCompactionStrategy'}
AND compression = {'class': 'LZ4Compressor'}
AND caching = {'keys': 'ALL', 'rows_per_partition': '100'}
AND comment = 'Stores user activity data'
AND default_time_to_live = 2592000; -- 30 days
Best Practices
- Choose the right partition key: Ensure even data distribution
- Avoid large partitions: Keep partition size under 100MB
- Use appropriate data types: Don't use TEXT for everything
- Design for your queries: Create tables based on how you'll query the data
- Consider using TTL: For temporary or time-bound data
- Use IF NOT EXISTS: To make scripts idempotent
Common Mistakes to Avoid
- Hot partitions: Using timestamps or incremental IDs as partition keys
- Too many columns: Cassandra has a limit of ~2 billion columns per partition
- Large collections: Keep collections small (< 100 elements)
- Missing clustering columns: When you need to store multiple rows per partition
Related Operations
After creating a table, you might want to:
- Insert data into the table
- Create indexes for additional query patterns
- Alter the table structure
- Create materialized views for different access patterns