Creating Table
Creating Tables
Tables in Cassandra store data in a structured format. Understanding how to create tables with proper primary keys and options is essential for building efficient Cassandra applications.
Basic Syntax
CREATE TABLE [IF NOT EXISTS] [keyspace.]table_name (
    column1_name data_type,
    column2_name data_type,
    ...
    PRIMARY KEY (partition_key, [clustering_columns])
) [WITH table_options];
Simple Table Example
CREATE TABLE student (
    id INT PRIMARY KEY,
    name TEXT,
    city TEXT,
    phone_number TEXT,
    enrollment_date DATE
);
Primary Key Types
Single Primary Key
CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    username TEXT,
    email TEXT
);
Compound Primary Key
CREATE TABLE user_posts (
    user_id UUID,
    post_id TIMEUUID,
    title TEXT,
    content TEXT,
    PRIMARY KEY (user_id, post_id)
) WITH CLUSTERING ORDER BY (post_id DESC);
In this example:
- user_idis the partition key
- post_idis the clustering column
- Data is sorted by post_idin descending order within each partition
Composite Partition Key
CREATE TABLE sensor_readings (
    location TEXT,
    sensor_id INT,
    reading_time TIMESTAMP,
    temperature DOUBLE,
    humidity DOUBLE,
    PRIMARY KEY ((location, sensor_id), reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC);
Here (location, sensor_id) together form the partition key.
Column Data Types
Common Data Types
CREATE TABLE data_types_example (
    -- Numeric types
    tiny_num TINYINT,
    small_num SMALLINT,
    normal_int INT,
    big_num BIGINT,
    decimal_num DECIMAL,
    float_num FLOAT,
    double_num DOUBLE,
    
    -- Text types
    ascii_text ASCII,
    normal_text TEXT,
    varchar_text VARCHAR,
    
    -- Time types
    created_date DATE,
    created_time TIME,
    created_timestamp TIMESTAMP,
    
    -- Other types
    id_field UUID,
    time_id TIMEUUID,
    ip_addr INET,
    is_active BOOLEAN,
    data_blob BLOB,
    
    PRIMARY KEY (id_field)
);
Collection Types
CREATE TABLE collections_example (
    user_id UUID PRIMARY KEY,
    emails SET<TEXT>,
    phone_numbers LIST<TEXT>,
    properties MAP<TEXT, TEXT>
);
Table Options
Clustering Order
CREATE TABLE time_series_data (
    device_id UUID,
    timestamp TIMESTAMP,
    value DOUBLE,
    PRIMARY KEY (device_id, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
Compaction Strategy
CREATE TABLE events (
    id UUID PRIMARY KEY,
    data TEXT
) WITH compaction = {
    'class': 'SizeTieredCompactionStrategy',
    'min_threshold': 4,
    'max_threshold': 32
};
Compression
CREATE TABLE large_data (
    id UUID PRIMARY KEY,
    content TEXT
) WITH compression = {
    'class': 'LZ4Compressor',
    'chunk_length_in_kb': 64
};
Time To Live (TTL)
CREATE TABLE sessions (
    session_id UUID PRIMARY KEY,
    user_id UUID,
    data TEXT
) WITH default_time_to_live = 3600;  -- Data expires after 1 hour
IF NOT EXISTS
Prevent errors if table already exists:
CREATE TABLE IF NOT EXISTS users (
    user_id UUID PRIMARY KEY,
    username TEXT,
    email TEXT
);
Static Columns
Static columns are shared among all rows in a partition:
CREATE TABLE user_stats (
    user_id UUID,
    month TEXT,
    total_posts INT STATIC,  -- Same value for all months of a user
    monthly_posts INT,
    PRIMARY KEY (user_id, month)
);
Complete Example
Here's a comprehensive example showing various features:
CREATE TABLE IF NOT EXISTS user_activities (
    user_id UUID,
    activity_id TIMEUUID,
    activity_type TEXT,
    activity_data MAP<TEXT, TEXT>,
    tags SET<TEXT>,
    created_at TIMESTAMP,
    updated_at TIMESTAMP,
    is_public BOOLEAN,
    view_count COUNTER,
    PRIMARY KEY (user_id, activity_id)
) WITH CLUSTERING ORDER BY (activity_id DESC)
    AND compaction = {'class': 'LeveledCompactionStrategy'}
    AND compression = {'class': 'LZ4Compressor'}
    AND caching = {'keys': 'ALL', 'rows_per_partition': '100'}
    AND comment = 'Stores user activity data'
    AND default_time_to_live = 2592000;  -- 30 days
Best Practices
- Choose the right partition key: Ensure even data distribution
- Avoid large partitions: Keep partition size under 100MB
- Use appropriate data types: Don't use TEXT for everything
- Design for your queries: Create tables based on how you'll query the data
- Consider using TTL: For temporary or time-bound data
- Use IF NOT EXISTS: To make scripts idempotent
Common Mistakes to Avoid
- Hot partitions: Using timestamps or incremental IDs as partition keys
- Too many columns: Cassandra has a limit of ~2 billion columns per partition
- Large collections: Keep collections small (< 100 elements)
- Missing clustering columns: When you need to store multiple rows per partition
Related Operations
After creating a table, you might want to:
- Insert data into the table
- Create indexes for additional query patterns
- Alter the table structure
- Create materialized views for different access patterns