Skip to content

JDBC Connector

Reads and writes any JDBC-compatible database: PostgreSQL, MySQL, Oracle, SQL Server, Redshift, Snowflake (via JDBC driver), and more.


Read

CONFIG:
  inputs:
    customers:
      format: jdbc
      url: "jdbc:postgresql://db-host:5432/mydb"
      table: public.customers          # or use sql:
      user: "{{ env.DB_USER }}"
      password: "{{ env.DB_PASS }}"
      options:
        fetchsize: "10000"

Parallelised read (requires a numeric partition column):

    large_table:
      format: jdbc
      url: "jdbc:postgresql://db-host:5432/mydb"
      table: public.transactions
      user: "{{ env.DB_USER }}"
      password: "{{ env.DB_PASS }}"
      options:
        partitionColumn: id
        lowerBound: "1"
        upperBound: "10000000"
        numPartitions: "16"
        fetchsize: "50000"

Custom SQL:

    recent_orders:
      format: jdbc
      url: "jdbc:postgresql://db-host:5432/mydb"
      sql: "SELECT * FROM orders WHERE created_at >= '{{ dt }}'"
      user: "{{ env.DB_USER }}"
      password: "{{ env.DB_PASS }}"

Requirement

url is required. Either table or sql is required.


Write

CONFIG:
  outputs:
    predictions:
      format: jdbc
      url: "jdbc:postgresql://db-host:5432/mydb"
      table: ml.predictions
      user: "{{ env.DB_USER }}"
      password: "{{ env.DB_PASS }}"
      mode: append
      options:
        batchsize: "5000"
        truncate: "true"

Fields

Field Type Required Description
format "jdbc" Yes Selects this connector
url string Yes JDBC connection URL
table string Conditional Fully qualified table name (required unless sql set)
sql string Conditional SQL query for reads (alternative to table)
user string No Database username
password string No Database password — use {{ env.VAR }}
mode overwrite | append No Write mode (outputs only)
options dict No Spark JDBC options

JDBC URLs by database

Database URL pattern
PostgreSQL jdbc:postgresql://host:5432/db
MySQL jdbc:mysql://host:3306/db
Oracle jdbc:oracle:thin:@host:1521/SID
SQL Server jdbc:sqlserver://host:1433;databaseName=db
Redshift jdbc:redshift://cluster.id.region.redshift.amazonaws.com:5439/db

JDBC driver setup

The JDBC driver JAR must be on the Spark classpath:

ENGINE:
  spark_conf:
    spark.jars: "/opt/spark/jars/postgresql-42.7.0.jar"
    # or for multiple JARs:
    # spark.jars: "/opt/spark/jars/pg.jar,/opt/spark/jars/mysql.jar"

On Databricks, install the driver via the cluster Libraries UI.