Quickstart¶

Build and run your first Ubunye pipeline in under 5 minutes.

1. Install¶

pip install ubunye-engine

2. Scaffold a task¶

ubunye init -d pipelines -u demo -p etl -t hello_world

This creates:

pipelines/demo/etl/hello_world/
    config.yaml              ← I/O and compute config
    transformations.py       ← your Python transform
    notebooks/
        hello_world_dev.ipynb  ← interactive dev notebook

3. Edit the config¶

Open pipelines/demo/etl/hello_world/config.yaml:

MODEL: etl
VERSION: "1.0.0"

CONFIG:
  inputs:
    source:
      format: hive
      db_name: default
      tbl_name: sample_data

  transform: {}

  outputs:
    sink:
      format: delta
      path: /tmp/ubunye_demo/output
      mode: overwrite

No Spark handy?

Swap the connectors for REST API or JDBC to run without a Hive metastore. See the Connectors overview.

4. (Optional) Add a transform¶

Edit transformations.py:

from ubunye.core.interfaces import Task

class HelloWorldTask(Task):
    def transform(self, sources: dict) -> dict:
        df = sources["source"]
        return {"sink": df.filter("value IS NOT NULL")}

Then reference it in config.yaml:

  transform:
    type: task          # loads transformations.py automatically

5. Validate the config¶

ubunye validate -d pipelines -u demo -p etl -t hello_world

Expected output:

[OK] Config is valid.

6. Preview the execution plan¶

ubunye plan -d pipelines -u demo -p etl -t hello_world

Prints a DAG: inputs → transform → outputs. Nothing is executed.

7. Run¶

ubunye run -d pipelines -u demo -p etl -t hello_world --profile dev

Optionally capture lineage:

ubunye run -d pipelines -u demo -p etl -t hello_world --profile dev --lineage

View recorded runs:

ubunye lineage list

8. (Optional) Run from Python¶

On Databricks or in a notebook, use the Python API instead of the CLI:

import ubunye

outputs = ubunye.run_task(
    task_dir="pipelines/demo/etl/hello_world",
    mode="DEV",
)

The Python API auto-detects an active SparkSession (Databricks) and reuses it.

Common errors¶

The engine validates configs strictly and gives structured, actionable error messages across all subsystems. Every error includes context and a hint.

Unknown field (typo)

Unknown fields in pipelines/fraud/ingestion/claim_etl/config.yaml:

  (top level):
    Unknown field 'ENGNE'
    Did you mean 'ENGINE'?

Undefined template variable

Template resolution failed for config.yaml:
  Undefined variable 'ds' in config value 's3://bucket/{{ ds }}/'.
  Available variables: ['env']. Use '| default(...)' for optional values.

Fix: pass the variable via CLI (--var ds=2025-01-01) or add a default ({{ ds | default('1970-01-01') }}).

Missing transformations.py

TaskNotFoundError: Missing transformations.py at .../hello_world/transformations.py.

  Task dir:      pipelines/demo/etl/hello_world
  Expected file: pipelines/demo/etl/hello_world/transformations.py
  Hint: Run 'ubunye init' to scaffold a new task, or check the directory path.

Unknown reader plugin

ReaderNotFoundError: Reader plugin 'hve' not found.

  Format:    hve
  Input:     source
  Installed: ['hive', 'jdbc', 'rest_api', 's3', 'unity']
  Hint: Check the 'format' field in CONFIG.inputs.source.

Invalid profile

ConfigProfileError: Profile 'staging' not found.

  Profile:   staging
  Available: ['dev', 'prod']
  Hint: Valid profiles: dev, prod

See the full Error Reference for the complete hierarchy and catching patterns.

9. (Optional) Deploy to Databricks¶

pip install ubunye-engine[databricks]
ubunye deploy databricks -d pipelines -u demo -p etl -t hello_world --target dev --dry-run

See the full Databricks Deployment Guide for setting up targets.yaml and deploying for real.

What's next?¶

Topic	Link
Full YAML schema	Config Reference
All built-in connectors	Connectors
Python API reference	API Reference
Deploying to Databricks	Deployment
Training and versioning ML models	Model Contract
CLI flags and sub-commands	CLI Reference
Writing custom plugins	Plugin Guide