Skip to content

Interfaces (Protocol APIs)

Ubunye Engine defines four formal typing.Protocol interfaces for its pluggable seams. Any class that implements the required methods satisfies the protocol — no inheritance needed.

Implementations are discovered at runtime via entry-point groups and environment-based auto-detection.


Design principles

Three production-policy decisions shape how every interface behaves:

Decision 1 — Non-blocking metadata writes

record(), register(), and promote() dispatch their metadata updates to a background worker thread with a bounded queue (default 1 000, configurable via UBUNYE_METADATA_QUEUE_SIZE).

  • Queue overflow: drop oldest pending write, log a warning.
  • Shutdown: flush() drains with a configurable timeout (default 30 s, UBUNYE_METADATA_FLUSH_TIMEOUT).
  • Artifact writes (model files) remain synchronous — the method returns only after the model is safely stored.

Uses a worker-thread pattern, not async/await — async fights Spark's threading model in Python.

Decision 2 — Graceful degradation

When a metadata write fails, the record is appended to a local fallback manifest at ~/.ubunye/fallback/{run_id}/{kind}.jsonl. Pipeline execution continues. ubunye sync lineage (or ubunye sync registry) replays the manifest with idempotent deduplication (key: run_id + task + recorded_at).

Auth is excluded — authentication failures always propagate immediately. A pipeline must not execute with invalid credentials.

Decision 3 — Schema evolution

Every cross-boundary dataclass separates strict core fields from a flexible metadata: Dict[str, str]:

  • Core fields never change without a major engine version bump.
  • New data goes into metadata — old clients ignore unknown keys, new clients write new keys without touching the schema.
  • Every record stamps engine_version so downstream consumers can filter by writer version.

Protocols

DeployAdapter

Deploys a task's config and artifacts to a remote environment.

from ubunye.interfaces import DeployAdapter, DeployContext, DeployResult

Bases: Protocol

Structural interface for deployment backends.

Each adapter is responsible for:

  1. Validating that the target dict contains the fields it needs.
  2. Preparing artifacts (notebooks, config bundles) for the target.
  3. Uploading artifacts and creating/updating the remote job.
  4. Returning a :class:DeployResult with job metadata.

name property

name

Short identifier for this adapter (e.g. "databricks").

deploy

deploy(ctx, target, *, dry_run=False)

Deploy a task to the target environment.

Parameters

ctx: Task metadata, config dict, and filesystem paths. target: Adapter-specific target configuration (validated by the adapter via :meth:validate_target). dry_run: If True, build and return the job spec without making any remote API calls.

Returns

DeployResult Job metadata including name, id, spec, and workspace path.

get_job_url

get_job_url(job_id, target)

Return a browser URL to the deployed job.

Parameters

job_id: The job identifier returned by :meth:deploy. target: The same target dict, which typically contains the host URL.

validate_target

validate_target(target)

Validate adapter-specific target config.

Raises

ubunye.core.errors.TargetNotFoundError When required keys are missing or values are invalid.

ubunye.interfaces.deploy.DeployContext dataclass

Structured metadata for a deploy operation.

Callers construct this from CLI flags and the parsed config; the adapter never derives task identity from filesystem paths.

ubunye.interfaces.deploy.DeployResult dataclass

Returned by :meth:DeployAdapter.deploy.

metadata carries adapter-specific extras (notebook source, cluster id, etc.) without polluting the core fields.

Built-in: DatabricksDeployAdapter — entry point ubunye.deploy_adapters:databricks.


RegistryBackend

Model lifecycle management: register, promote, demote, delete.

from ubunye.interfaces import RegistryBackend, ModelVersionInfo

Bases: Protocol

Structural interface for model registries.

Concrete implementations: FilesystemRegistryBackend (built-in), MLflowRegistryBackend (Phase 2).

delete

delete(*, use_case, model_name, version)

Remove a version and its artifacts.

demote

demote(*, use_case, model_name, version, to_stage)

Demote a version to a lower stage.

get

get(*, use_case, model_name, version=None, stage=None)

Retrieve a model version by exact version or active stage.

Returns

tuple of (artifact_path, ModelVersionInfo) artifact_path is an opaque string — it may be a local path, an MLflow artifact URI, or a cloud storage URL. Callers pass it to ModelClass.load(artifact_path).

Raises

ubunye.core.errors.VersionNotFoundError When no matching version exists.

list_versions

list_versions(*, use_case, model_name)

List all registered versions, newest first.

promote

promote(*, use_case, model_name, version, to_stage, gates=None, promoted_by=None)

Promote a version to a higher lifecycle stage.

If gates are provided, all checks must pass before the transition. On gate failure, raises :class:~ubunye.core.errors.PromotionBlockedError — this is NOT caught by the graceful degradation layer (auth and gate failures must propagate).

Metadata update is non-blocking per Decision 1.

register

register(*, use_case, model_name, version, model, metrics, metadata=None, lineage_run_id=None)

Register a trained model as a new version.

Artifact writes are synchronous — the method returns only after the model is safely persisted. Index/metadata updates are non-blocking per Decision 1.

Parameters

use_case: Logical namespace (e.g. "fraud_detection"). model_name: Model identifier within the use case. version: Explicit semver string, or None for auto-increment. model: A trained model instance whose save()/metadata() methods will be called by the backend. metrics: Training metrics dict. metadata: Optional flexible key-value pairs (Decision 3). lineage_run_id: Optional lineage run ID for cross-referencing.

ubunye.interfaces.registry.ModelVersionInfo dataclass

Cross-boundary model version record (Decision 3 compliant).

Core fields are strict; everything else goes into metadata. Every record stamps engine_version so downstream consumers can filter by writer version.

Built-in:

  • FilesystemRegistryBackend — entry point ubunye.registry_backends:filesystem.
  • MLflowRegistryBackend — entry point ubunye.registry_backends:mlflow. Layers MLflow experiment/run logging on top of filesystem storage. MLflow logging is best-effort — never blocks the pipeline.

LineageBackend

Run provenance: record, search, compact.

from ubunye.interfaces import LineageBackend, LineageRecord

Bases: Protocol

Structural interface for lineage storage backends.

Concrete implementations: FilesystemLineageBackend (built-in), DeltaLineageBackend (Phase 2).

compact

compact()

Optimize the underlying storage.

For filesystem backends this is a no-op. For Delta backends this triggers OPTIMIZE on the lineage table.

get_run

get_run(run_id)

Load a single run record by ID.

Raises

ubunye.core.errors.LineageRecordNotFoundError When no record exists for run_id.

record

record(record)

Persist a lineage record (non-blocking per Decision 1).

Creates or overwrites the record for the given run_id. Typically called twice per task: once at start (status="running") and once at end (status="success" or "error").

The actual write is dispatched to a background worker thread. On failure, the record is written to a fallback manifest (Decision 2).

Parameters

record: The lineage record to persist. Callers construct this with the current engine version and task metadata.

search

search(*, task=None, usecase=None, pipeline=None, status=None, since=None, limit=100)

Search across recorded runs with optional filters.

Parameters

task: Filter by task name. usecase: Filter by use case. pipeline: Filter by pipeline. status: Filter by status ("success", "error", "running"). since: ISO-8601 datetime; only return runs recorded at or after. limit: Maximum number of records to return.

ubunye.interfaces.lineage.LineageRecord dataclass

Cross-boundary lineage record (Decision 3 compliant).

Core columns are strict; everything else goes into metadata. Every record stamps engine_version so analysts can filter by writer version.

Built-in:

  • FileSystemLineageStore — entry point ubunye.lineage_backends:filesystem. Uses a run_id→Path index for O(1) lookups and an in-memory RunContext cache to avoid repeated JSON parsing on search.
  • DeltaLineageBackend — entry point ubunye.lineage_backends:delta. Writes to a Delta table via Spark SQL on Databricks; falls back to JSONL files locally.

AuthBackend

Credential resolution and validation.

from ubunye.interfaces import AuthBackend, Credentials

Bases: Protocol

Structural interface for authentication backends.

Concrete implementations: TokenAuthBackend (built-in), ServicePrincipalAuthBackend (Phase 2).

resolve

resolve(workspace_url, **kwargs)

Resolve credentials for the given workspace.

Parameters

workspace_url: The Databricks (or other) workspace host URL. **kwargs: Backend-specific hints (e.g. token_env="MY_TOKEN").

Returns

Credentials Populated credentials ready for client construction.

Raises

ubunye.core.errors.AuthNotFoundError When credentials cannot be located (missing env var, etc.).

validate

validate(credentials)

Validate that credentials are accepted by the workspace.

Returns True on success.

Raises

ubunye.core.errors.AuthInvalidError When the workspace rejects the credentials.

ubunye.interfaces.auth.Credentials dataclass

Resolved authentication credentials.

The auth_type field tells callers which fields are populated. metadata carries backend-specific extras (e.g. token expiry, scope, tenant ID) without polluting the core fields.

Built-in:

  • TokenAuthBackend — entry point ubunye.auth_backends:token.
  • ServicePrincipalAuthBackend — entry point ubunye.auth_backends:service_principal. Uses DATABRICKS_CLIENT_ID + DATABRICKS_CLIENT_SECRET for OAuth M2M auth.

Discovery and auto-detection

Entry-point groups

Group Built-in name Class
ubunye.deploy_adapters databricks DatabricksDeployAdapter
ubunye.registry_backends filesystem FilesystemRegistryBackend
ubunye.registry_backends mlflow MLflowRegistryBackend
ubunye.lineage_backends filesystem FileSystemLineageStore
ubunye.lineage_backends delta DeltaLineageBackend
ubunye.auth_backends token TokenAuthBackend
ubunye.auth_backends service_principal ServicePrincipalAuthBackend

Third-party packages register backends by adding entry points under these groups in their pyproject.toml.

Auto-detection

When no backend is explicitly configured, the engine probes the environment:

Seam Probe Result
Registry DATABRICKS_RUNTIME_VERSION + mlflow importable "mlflow"
Registry Otherwise "filesystem"
Lineage UBUNYE_LINEAGE_TABLE env var set "delta"
Lineage Otherwise "filesystem"
Auth DATABRICKS_CLIENT_ID + DATABRICKS_CLIENT_SECRET "service_principal"
Auth DATABRICKS_TOKEN "token"
Auth None Raises AuthNotFoundError

Service principal takes priority over token when both are set.


Fallback manifests and ubunye sync

When a metadata write fails (Decision 2), the record is appended to:

~/.ubunye/fallback/{run_id}/{kind}.jsonl

Each line is a JSON object with all record fields plus a _fallback_at timestamp.

Replay with:

ubunye sync lineage                    # replay all lineage manifests
ubunye sync lineage --run-id abc123    # replay a specific run
ubunye sync registry                   # replay all registry manifests

Deduplication key: run_id + task + recorded_at. Replayed manifests are archived to ~/.ubunye/fallback/synced/.


Writing a custom backend

  1. Implement the protocol methods (no base class needed).
  2. Register the entry point in your pyproject.toml:

    [project.entry-points."ubunye.lineage_backends"]
    my_backend = "my_package.lineage:MyLineageBackend"
    
  3. Install the package (pip install -e .).

  4. The engine discovers it automatically:

    from ubunye._internal.discovery import get_lineage_backend
    
    cls = get_lineage_backend("my_backend")
    store = cls()