JDBC Connector¶
Reads and writes any JDBC-compatible database: PostgreSQL, MySQL, Oracle, SQL Server, Redshift, Snowflake (via JDBC driver), and more.
Read¶
CONFIG:
inputs:
customers:
format: jdbc
url: "jdbc:postgresql://db-host:5432/mydb"
table: public.customers # or use sql:
user: "{{ env.DB_USER }}"
password: "{{ env.DB_PASS }}"
options:
fetchsize: "10000"
Parallelised read (requires a numeric partition column):
large_table:
format: jdbc
url: "jdbc:postgresql://db-host:5432/mydb"
table: public.transactions
user: "{{ env.DB_USER }}"
password: "{{ env.DB_PASS }}"
options:
partitionColumn: id
lowerBound: "1"
upperBound: "10000000"
numPartitions: "16"
fetchsize: "50000"
Custom SQL:
recent_orders:
format: jdbc
url: "jdbc:postgresql://db-host:5432/mydb"
sql: "SELECT * FROM orders WHERE created_at >= '{{ dt }}'"
user: "{{ env.DB_USER }}"
password: "{{ env.DB_PASS }}"
Requirement
url is required. Either table or sql is required.
Write¶
CONFIG:
outputs:
predictions:
format: jdbc
url: "jdbc:postgresql://db-host:5432/mydb"
table: ml.predictions
user: "{{ env.DB_USER }}"
password: "{{ env.DB_PASS }}"
mode: append
options:
batchsize: "5000"
truncate: "true"
Fields¶
| Field | Type | Required | Description |
|---|---|---|---|
format |
"jdbc" |
Yes | Selects this connector |
url |
string | Yes | JDBC connection URL |
table |
string | Conditional | Fully qualified table name (required unless sql set) |
sql |
string | Conditional | SQL query for reads (alternative to table) |
user |
string | No | Database username |
password |
string | No | Database password — use {{ env.VAR }} |
mode |
overwrite | append |
No | Write mode (outputs only) |
options |
dict | No | Spark JDBC options |
JDBC URLs by database¶
| Database | URL pattern |
|---|---|
| PostgreSQL | jdbc:postgresql://host:5432/db |
| MySQL | jdbc:mysql://host:3306/db |
| Oracle | jdbc:oracle:thin:@host:1521/SID |
| SQL Server | jdbc:sqlserver://host:1433;databaseName=db |
| Redshift | jdbc:redshift://cluster.id.region.redshift.amazonaws.com:5439/db |
JDBC driver setup¶
The JDBC driver JAR must be on the Spark classpath:
ENGINE:
spark_conf:
spark.jars: "/opt/spark/jars/postgresql-42.7.0.jar"
# or for multiple JARs:
# spark.jars: "/opt/spark/jars/pg.jar,/opt/spark/jars/mysql.jar"
On Databricks, install the driver via the cluster Libraries UI.