Deploy to Databricks¶

Deploy a pipeline task to Databricks as a scheduled job — from config.yaml to a running job in one command.

Prerequisites¶

Install the engine with the Databricks extra:

pip install ubunye-engine[databricks]

Set your Databricks credentials:

export DATABRICKS_HOST="https://adb-123456789.azuredatabricks.net"
export DATABRICKS_TOKEN="dapi..."

Create `targets.yaml`¶

Define your deploy targets in a targets.yaml file at the usecase level (shared across all pipelines and tasks):

pipelines/
    fraud_detection/
        targets.yaml              ← usecase-level targets
        ingestion/
            claim_etl/
                config.yaml
                transformations.py

# fraud_detection/targets.yaml
targets:
  dev:
    host: "https://adb-dev-123.azuredatabricks.net"
    token_env: DATABRICKS_TOKEN        # env var holding the PAT
    workspace_path: /Workspace/ubunye   # where files are uploaded
    spark_version: "13.3.x-scala2.12"
    node_type_id: "i3.xlarge"
    num_workers: 0                      # 0 = single-node cluster

  prod:
    host: "https://adb-prod-456.azuredatabricks.net"
    token_env: DATABRICKS_TOKEN_PROD
    workspace_path: /Workspace/ubunye
    node_type_id: "i3.2xlarge"
    num_workers: 4

Task-level overrides¶

A task can override specific fields by adding its own targets.yaml:

# fraud_detection/ingestion/claim_etl/targets.yaml
targets:
  dev:
    num_workers: 2                     # this task needs more compute
    spark_conf:
      spark.sql.shuffle.partitions: "200"

The task-level file is deep-merged on top of the usecase-level one. Fields you don't override are inherited.

Preview with `--dry-run`¶

ubunye deploy databricks \
    -d pipelines -u fraud_detection -p ingestion -t claim_etl \
    --target dev --dry-run

Prints the job spec JSON without touching Databricks.

Deploy¶

ubunye deploy databricks \
    -d pipelines -u fraud_detection -p ingestion -t claim_etl \
    --target dev

This:

Validates the task config.yaml
Authenticates to the Databricks workspace
Uploads config.yaml, transformations.py, and helpers to /Workspace/ubunye/fraud_detection/ingestion/claim_etl/
Generates a wrapper notebook that calls ubunye.run_task()
Creates (or updates) a Databricks job named ubunye-fraud_detection-ingestion-claim_etl-dev

If the job already exists (same name), it is updated in place. Every Ubunye-managed job is tagged with ubunye_managed: "true" so it can be identified later.

Ad-hoc deploy (no targets.yaml)¶

For quick one-off deploys, pass the host directly:

ubunye deploy databricks \
    -d pipelines -u fraud_detection -p ingestion -t claim_etl \
    --host "https://adb-123.azuredatabricks.net" \
    --dry-run

Configuration reference¶

`DatabricksTargetConfig` fields¶

Field	Type	Default	Description
`host`	str	required	Databricks workspace URL
`token_env`	str	`DATABRICKS_TOKEN`	Environment variable holding the PAT
`workspace_path`	str	`/Workspace/ubunye`	Root path for uploaded files
`spark_version`	str	`13.3.x-scala2.12`	Databricks Runtime version
`node_type_id`	str	`i3.xlarge`	Instance type for the job cluster
`num_workers`	int	`0`	Number of workers (0 = single-node)
`spark_conf`	dict	`{}`	Spark configuration overrides
`aws_attributes`	dict	`{}`	AWS-specific cluster attributes

CLI flags¶

ubunye deploy databricks --help

Flag	Short	Default	Description
`--usecase-dir`	`-d`	required	Root directory of pipelines
`--usecase`	`-u`	required	Usecase name
`--package`	`-p`	required	Pipeline/package name
`--task`	`-t`	required	Task name
`--target`		`dev`	Deploy target name
`--mode`	`-m`	`PROD`	Config profile for the job
`--data-timestamp`	`-dt`		Data timestamp passed to the task
`--dry-run`		`false`	Print job spec without deploying
`--host`			Ad-hoc workspace URL (skips targets.yaml)
`--token`			Env var name for the token (ad-hoc)