Config Reference — Overview¶

Every Ubunye task is driven by a single config.yaml file. This page explains the top-level structure. Each section links to a dedicated reference page.

Top-level keys¶

MODEL: etl          # required — job type: etl | ml
VERSION: "1.0.0"    # required — semver string

ENGINE: ...         # optional — Spark settings and per-profile overrides
CONFIG: ...         # required — inputs, transform, outputs
ORCHESTRATION: ...  # optional — Airflow / Databricks / Prefect / Dagster metadata

Key	Type	Required	Description
`MODEL`	`etl` \| `ml`	Yes	Declares the job type
`VERSION`	semver string	Yes	Pipeline version (`MAJOR.MINOR.PATCH`)
`ENGINE`	EngineConfig	No	Spark conf + per-profile overrides
`CONFIG`	TaskConfig	Yes	Inputs, transform, outputs
`ORCHESTRATION`	OrchestrationConfig	No	Export metadata for orchestrators

Jinja templating¶

Config values are rendered through Jinja2 before Pydantic validation. You can use environment variables, CLI-injected variables, and filters anywhere in the YAML:

CONFIG:
  inputs:
    events:
      format: hive
      db_name: "{{ env.HIVE_DB | default('raw') }}"
      tbl_name: events_{{ dt | default('2024-01-01') | replace('-', '_') }}

See Jinja Templating for all supported syntax.

Validation¶

Ubunye validates the rendered YAML against strict Pydantic v2 models. Run validation before deploying:

ubunye validate -d pipelines -u fraud -p etl -t claims

Reference pages¶

Section	Description
Inputs & Outputs	`CONFIG.inputs` and `CONFIG.outputs` — connector format and options
Engine & Profiles	`ENGINE` — Spark conf and dev/staging/prod profiles
Transform	`CONFIG.transform` — noop, task, model, and custom types
Orchestration	`ORCHESTRATION` — schedule, retries, tags, platform-specific settings
Jinja Templating	Variable interpolation, filters, and best practices