Reading Data

Reading Data

The SELECT statement is used to retrieve data from Cassandra tables. Understanding how to query data effectively in Cassandra is crucial, as it has different limitations compared to traditional SQL databases due to its distributed nature.

Basic SELECT Syntax

SELECT column_list
FROM [keyspace_name.]table_name
[WHERE condition]
[ORDER BY column_name [ASC|DESC]]
[LIMIT n]
[ALLOW FILTERING];

Reading All Data

Select All Columns

SELECT * FROM student;

Select with Keyspace

SELECT * FROM tutorial_db.student;

Reading Specific Columns

SELECT id, name, city FROM student;

Filtering Data with WHERE

Primary Key Queries

Cassandra is optimized for queries using the primary key:

-- Query by partition key
SELECT * FROM student WHERE id = 2;

-- Query by partition key and clustering column
SELECT * FROM orders WHERE user_id = 123 AND order_date = '2024-01-15';

Using IN Clause

SELECT * FROM student WHERE id IN (1, 2, 3);

Range Queries on Clustering Columns

SELECT * FROM time_series_data 
WHERE sensor_id = 'sensor-001' 
AND timestamp >= '2024-01-01' 
AND timestamp < '2024-02-01';

Ordering Results

ORDER BY can only be used with clustering columns:

SELECT * FROM events 
WHERE user_id = 123 
ORDER BY event_time DESC;

Limiting Results

-- Get first 10 rows
SELECT * FROM student LIMIT 10;

-- Combine with WHERE
SELECT * FROM logs 
WHERE app_id = 'myapp' 
LIMIT 100;

ALLOW FILTERING

Use with caution as it can be expensive:

-- Warning: This performs a full table scan
SELECT * FROM student 
WHERE city = 'New York' 
ALLOW FILTERING;

Working with Collections

Querying Lists

-- Check if list contains a value
SELECT * FROM user_interests 
WHERE user_id = 1 
AND interests CONTAINS 'coding';

Querying Sets

-- Check if set contains a value
SELECT * FROM user_tags 
WHERE user_id = 1 
AND tags CONTAINS 'developer';

Querying Maps

-- Query by map key
SELECT * FROM user_settings 
WHERE user_id = 1 
AND preferences['theme'] = 'dark';

-- Check if map contains a key
SELECT * FROM user_settings 
WHERE user_id = 1 
AND preferences CONTAINS KEY 'theme';

JSON Output

Return results as JSON:

SELECT JSON * FROM student;

-- Specific columns as JSON
SELECT JSON id, name, city FROM student WHERE id = 1;

Aggregate Functions

COUNT

-- Count all rows
SELECT COUNT(*) FROM student;

-- Count with condition
SELECT COUNT(*) FROM orders WHERE user_id = 123;

MIN, MAX, SUM, AVG

SELECT MAX(salary), MIN(salary), AVG(salary) 
FROM employees 
WHERE department = 'Engineering';

Token Function

Used for pagination in large datasets:

-- Get partition token
SELECT token(id), id, name FROM student;

-- Pagination using token
SELECT * FROM student WHERE token(id) > token(100) LIMIT 10;

System Functions

TTL and WRITETIME

-- Get TTL of a column
SELECT TTL(session_data) FROM sessions WHERE session_id = 'abc123';

-- Get write timestamp
SELECT WRITETIME(last_login) FROM users WHERE user_id = 1;

UUID Functions

-- Convert timeuuid to timestamp
SELECT dateOf(event_id), event_type FROM events LIMIT 5;

-- Get UUID from timeuuid
SELECT unixTimestampOf(created_at) FROM time_based_table;

Performance Considerations

Efficient Queries

-- Good: Uses partition key
SELECT * FROM users WHERE user_id = 123;

-- Good: Uses partition key + clustering key
SELECT * FROM posts WHERE user_id = 123 AND post_date = '2024-01-15';

-- Good: Range on clustering key
SELECT * FROM posts 
WHERE user_id = 123 
AND post_date >= '2024-01-01' 
AND post_date < '2024-02-01';

Inefficient Queries

-- Bad: No partition key (requires ALLOW FILTERING)
SELECT * FROM users WHERE email = '[email protected]' ALLOW FILTERING;

-- Bad: IN clause on partition key (multiple partitions)
SELECT * FROM users WHERE user_id IN (1,2,3,4,5,6,7,8,9,10);

Best Practices

  1. Always include the partition key in WHERE clause
  2. Avoid ALLOW FILTERING in production
  3. Design tables based on query patterns
  4. Use LIMIT to control result size
  5. Be careful with IN clauses - they can hit multiple partitions
  6. Consider using materialized views for different query patterns

Common Errors

Missing Partition Key

-- Error: Cannot execute this query as it might involve data filtering
SELECT * FROM student WHERE name = 'John';

-- Fix: Add ALLOW FILTERING (not recommended) or redesign table

Invalid ORDER BY

-- Error: ORDER BY is only supported on clustering columns
SELECT * FROM student ORDER BY name;

-- Fix: ORDER BY can only be used with clustering columns