Writing A Spark Dataframe To MySQL: Tips And Tricks
Writing a Spark Dataframe to MySQL is something you might want to do for a number of reasons. It’s somewhat trivial to do so on the fly, you can do so like this:
This will create a new table called my_new_table and write the data there, inferring schema and column order from the dataframe. It’s often not the case that you’ll want to write to a new table, writing to a an existing one might be more common for an analytics or developer workflow. To make this happen programatically, it’s just a small change, adding “append” to the write step like so:
However this small change, opens up a world of problems. I documented some I have run into and how to overcome them:
Time Stamps
SparkSQL has some nice time stamp functions that are a bit more handy than the default ones in Scala. For example, current_timestamp() will give you a nicely formatted time signature.
Upserting
If you would like to upsert into MySQL you will not be able to do it from the DataFrameWrite’s method. You have to use prepared statements. Here is an example:
That’s all folks.
UPDATE: 10/24/2016. I removed some tips that were no longer true and added a tip for upserting.