Create Spark Dataset from MySQL table


Sample Program :

SparkSession sparkSession = SparkSession.builder().appName("mysqltable").master("local[2]").getOrCreate();

Map<String, String> jdbcOptionsMap = new HashMap<String, String>(); 
String jdbcUrl = "jdbc:mysql://"+hostname+":"+portNo+"/"+databasename; 

      jdbcOptionsMap.put("url", jdbcUrl);  
      jdbcOptionsMap.put("driver", <driver>);
      jdbcOptionsMap.put("dbtable", <tableName>); 
      jdbcOptionsMap.put("user", <username>);
      jdbcOptionsMap.put("password", <password>);

        System.out.println("Reading From MySQL ");

Dataset<Row> mysqlData = sparkSession.read().format("jdbc").options(jdbcOptionsMap).load();
System.out.println("Column Names : " + Arrays.asList(tableData.columns()));
System.out.println("Table Schema : " + tableData.schema());

Bullet Points:

  1. In above code initially, I have created SparkSession to run spark job in standalone mode.
  2. In the second step, we should provide MySQL table details like its URL, driver, tableName, username, and password
  3. In the third step, we read MySQL table in JDBC format and load it into mysqlData Dataset using load command.
  4. We can know the table column names using columns() method
  5. We can view the table schema using schema() method