Create Spark Dataset from Cassandra Table
Sample Program :
SparkSession sparkSession = SparkSession.builder().appName("cassandratable").master("local[2]").getOrCreate();
Map<String, String> cassandraMap = new HashMap<String, String>();
cassandraMap.put(spark.cassandra.connection.host, <cassandraHostName>); //localhost
cassandraMap.put(spark.cassandra.connection.port, <cassandraPort>); //9042
cassandraMap.put(keyspace, <cassandraKeySpace>);
cassandraMap.put(table, <cassandraTable>);
Dataset<Row> cassandraTableData = sparkSession.read().format(org.apache.spark.sql.cassandra).options(cassandraMap).load();
System.out.println("Column Names : " + Arrays.asList(cassandraTableData .columns()));
System.out.println("Table Schema : " + cassandraTableData.schema());
Bullet Points:
- In above code initially, I have created SparkSession to run spark job in standalone mode.
- In the second step, we provided Cassandra table details like its hostname, portno, keyspace, tableName
- In the third step, we read Cassandra table using 'org.apache.spark.sql.cassandra' format and load Cassandra table into cassandraTableData Dataset using load command.
- We can find the table column names using columns() method
- We can find the table schema using schema() method