Skip to content

Config Reference — Overview

Every Ubunye task is driven by a single config.yaml file. This page explains the top-level structure. Each section links to a dedicated reference page.


Top-level keys

MODEL: etl          # required — job type: etl | ml
VERSION: "1.0.0"    # required — semver string

ENGINE: ...         # optional — Spark settings and per-profile overrides
CONFIG: ...         # required — inputs, transform, outputs
ORCHESTRATION: ...  # optional — Airflow / Databricks / Prefect / Dagster metadata
Key Type Required Description
MODEL etl | ml Yes Declares the job type
VERSION semver string Yes Pipeline version (MAJOR.MINOR.PATCH)
ENGINE EngineConfig No Spark conf + per-profile overrides
CONFIG TaskConfig Yes Inputs, transform, outputs
ORCHESTRATION OrchestrationConfig No Export metadata for orchestrators

Jinja templating

Config values are rendered through Jinja2 before Pydantic validation. You can use environment variables, CLI-injected variables, and filters anywhere in the YAML:

CONFIG:
  inputs:
    events:
      format: hive
      db_name: "{{ env.HIVE_DB | default('raw') }}"
      tbl_name: events_{{ dt | default('2024-01-01') | replace('-', '_') }}

See Jinja Templating for all supported syntax.


Validation

Ubunye validates the rendered YAML against strict Pydantic v2 models. Run validation before deploying:

ubunye validate -d pipelines -u fraud -p etl -t claims

Reference pages

Section Description
Inputs & Outputs CONFIG.inputs and CONFIG.outputs — connector format and options
Engine & Profiles ENGINE — Spark conf and dev/staging/prod profiles
Transform CONFIG.transform — noop, task, model, and custom types
Orchestration ORCHESTRATION — schedule, retries, tags, platform-specific settings
Jinja Templating Variable interpolation, filters, and best practices