Interfaces (Protocol APIs)¶
Ubunye Engine defines four formal typing.Protocol interfaces for its
pluggable seams. Any class that implements the required methods satisfies
the protocol — no inheritance needed.
Implementations are discovered at runtime via entry-point groups and environment-based auto-detection.
Design principles¶
Three production-policy decisions shape how every interface behaves:
Decision 1 — Non-blocking metadata writes¶
record(), register(), and promote() dispatch their metadata updates
to a background worker thread with a bounded queue (default 1 000,
configurable via UBUNYE_METADATA_QUEUE_SIZE).
- Queue overflow: drop oldest pending write, log a warning.
- Shutdown:
flush()drains with a configurable timeout (default 30 s,UBUNYE_METADATA_FLUSH_TIMEOUT). - Artifact writes (model files) remain synchronous — the method returns only after the model is safely stored.
Uses a worker-thread pattern, not async/await — async fights Spark's
threading model in Python.
Decision 2 — Graceful degradation¶
When a metadata write fails, the record is appended to a local fallback
manifest at ~/.ubunye/fallback/{run_id}/{kind}.jsonl. Pipeline
execution continues. ubunye sync lineage (or ubunye sync registry)
replays the manifest with idempotent deduplication (key:
run_id + task + recorded_at).
Auth is excluded — authentication failures always propagate immediately. A pipeline must not execute with invalid credentials.
Decision 3 — Schema evolution¶
Every cross-boundary dataclass separates strict core fields from a
flexible metadata: Dict[str, str]:
- Core fields never change without a major engine version bump.
- New data goes into
metadata— old clients ignore unknown keys, new clients write new keys without touching the schema. - Every record stamps
engine_versionso downstream consumers can filter by writer version.
Protocols¶
DeployAdapter¶
Deploys a task's config and artifacts to a remote environment.
Bases: Protocol
Structural interface for deployment backends.
Each adapter is responsible for:
- Validating that the target dict contains the fields it needs.
- Preparing artifacts (notebooks, config bundles) for the target.
- Uploading artifacts and creating/updating the remote job.
- Returning a :class:
DeployResultwith job metadata.
deploy ¶
Deploy a task to the target environment.
Parameters¶
ctx:
Task metadata, config dict, and filesystem paths.
target:
Adapter-specific target configuration (validated by the
adapter via :meth:validate_target).
dry_run:
If True, build and return the job spec without making
any remote API calls.
Returns¶
DeployResult Job metadata including name, id, spec, and workspace path.
ubunye.interfaces.deploy.DeployContext
dataclass
¶
Structured metadata for a deploy operation.
Callers construct this from CLI flags and the parsed config; the adapter never derives task identity from filesystem paths.
ubunye.interfaces.deploy.DeployResult
dataclass
¶
Returned by :meth:DeployAdapter.deploy.
metadata carries adapter-specific extras (notebook source, cluster
id, etc.) without polluting the core fields.
Built-in: DatabricksDeployAdapter — entry point ubunye.deploy_adapters:databricks.
RegistryBackend¶
Model lifecycle management: register, promote, demote, delete.
Bases: Protocol
Structural interface for model registries.
Concrete implementations: FilesystemRegistryBackend (built-in),
MLflowRegistryBackend (Phase 2).
get ¶
Retrieve a model version by exact version or active stage.
Returns¶
tuple of (artifact_path, ModelVersionInfo)
artifact_path is an opaque string — it may be a local
path, an MLflow artifact URI, or a cloud storage URL.
Callers pass it to ModelClass.load(artifact_path).
Raises¶
ubunye.core.errors.VersionNotFoundError When no matching version exists.
promote ¶
Promote a version to a higher lifecycle stage.
If gates are provided, all checks must pass before the
transition. On gate failure, raises
:class:~ubunye.core.errors.PromotionBlockedError — this is
NOT caught by the graceful degradation layer (auth and gate
failures must propagate).
Metadata update is non-blocking per Decision 1.
register ¶
Register a trained model as a new version.
Artifact writes are synchronous — the method returns only after the model is safely persisted. Index/metadata updates are non-blocking per Decision 1.
Parameters¶
use_case:
Logical namespace (e.g. "fraud_detection").
model_name:
Model identifier within the use case.
version:
Explicit semver string, or None for auto-increment.
model:
A trained model instance whose save()/metadata()
methods will be called by the backend.
metrics:
Training metrics dict.
metadata:
Optional flexible key-value pairs (Decision 3).
lineage_run_id:
Optional lineage run ID for cross-referencing.
ubunye.interfaces.registry.ModelVersionInfo
dataclass
¶
Cross-boundary model version record (Decision 3 compliant).
Core fields are strict; everything else goes into metadata.
Every record stamps engine_version so downstream consumers can
filter by writer version.
Built-in:
FilesystemRegistryBackend— entry pointubunye.registry_backends:filesystem.MLflowRegistryBackend— entry pointubunye.registry_backends:mlflow. Layers MLflow experiment/run logging on top of filesystem storage. MLflow logging is best-effort — never blocks the pipeline.
LineageBackend¶
Run provenance: record, search, compact.
Bases: Protocol
Structural interface for lineage storage backends.
Concrete implementations: FilesystemLineageBackend (built-in),
DeltaLineageBackend (Phase 2).
compact ¶
Optimize the underlying storage.
For filesystem backends this is a no-op. For Delta backends
this triggers OPTIMIZE on the lineage table.
get_run ¶
Load a single run record by ID.
Raises¶
ubunye.core.errors.LineageRecordNotFoundError
When no record exists for run_id.
record ¶
Persist a lineage record (non-blocking per Decision 1).
Creates or overwrites the record for the given run_id.
Typically called twice per task: once at start
(status="running") and once at end (status="success"
or "error").
The actual write is dispatched to a background worker thread. On failure, the record is written to a fallback manifest (Decision 2).
Parameters¶
record: The lineage record to persist. Callers construct this with the current engine version and task metadata.
search ¶
Search across recorded runs with optional filters.
Parameters¶
task:
Filter by task name.
usecase:
Filter by use case.
pipeline:
Filter by pipeline.
status:
Filter by status ("success", "error", "running").
since:
ISO-8601 datetime; only return runs recorded at or after.
limit:
Maximum number of records to return.
ubunye.interfaces.lineage.LineageRecord
dataclass
¶
Cross-boundary lineage record (Decision 3 compliant).
Core columns are strict; everything else goes into metadata.
Every record stamps engine_version so analysts can filter by
writer version.
Built-in:
FileSystemLineageStore— entry pointubunye.lineage_backends:filesystem. Uses arun_id→Pathindex for O(1) lookups and an in-memoryRunContextcache to avoid repeated JSON parsing on search.DeltaLineageBackend— entry pointubunye.lineage_backends:delta. Writes to a Delta table via Spark SQL on Databricks; falls back to JSONL files locally.
AuthBackend¶
Credential resolution and validation.
Bases: Protocol
Structural interface for authentication backends.
Concrete implementations: TokenAuthBackend (built-in),
ServicePrincipalAuthBackend (Phase 2).
resolve ¶
Resolve credentials for the given workspace.
Parameters¶
workspace_url:
The Databricks (or other) workspace host URL.
**kwargs:
Backend-specific hints (e.g. token_env="MY_TOKEN").
Returns¶
Credentials Populated credentials ready for client construction.
Raises¶
ubunye.core.errors.AuthNotFoundError When credentials cannot be located (missing env var, etc.).
ubunye.interfaces.auth.Credentials
dataclass
¶
Resolved authentication credentials.
The auth_type field tells callers which fields are populated.
metadata carries backend-specific extras (e.g. token expiry,
scope, tenant ID) without polluting the core fields.
Built-in:
TokenAuthBackend— entry pointubunye.auth_backends:token.ServicePrincipalAuthBackend— entry pointubunye.auth_backends:service_principal. UsesDATABRICKS_CLIENT_ID+DATABRICKS_CLIENT_SECRETfor OAuth M2M auth.
Discovery and auto-detection¶
Entry-point groups¶
| Group | Built-in name | Class |
|---|---|---|
ubunye.deploy_adapters |
databricks |
DatabricksDeployAdapter |
ubunye.registry_backends |
filesystem |
FilesystemRegistryBackend |
ubunye.registry_backends |
mlflow |
MLflowRegistryBackend |
ubunye.lineage_backends |
filesystem |
FileSystemLineageStore |
ubunye.lineage_backends |
delta |
DeltaLineageBackend |
ubunye.auth_backends |
token |
TokenAuthBackend |
ubunye.auth_backends |
service_principal |
ServicePrincipalAuthBackend |
Third-party packages register backends by adding entry points under these
groups in their pyproject.toml.
Auto-detection¶
When no backend is explicitly configured, the engine probes the environment:
| Seam | Probe | Result |
|---|---|---|
| Registry | DATABRICKS_RUNTIME_VERSION + mlflow importable |
"mlflow" |
| Registry | Otherwise | "filesystem" |
| Lineage | UBUNYE_LINEAGE_TABLE env var set |
"delta" |
| Lineage | Otherwise | "filesystem" |
| Auth | DATABRICKS_CLIENT_ID + DATABRICKS_CLIENT_SECRET |
"service_principal" |
| Auth | DATABRICKS_TOKEN |
"token" |
| Auth | None | Raises AuthNotFoundError |
Service principal takes priority over token when both are set.
Fallback manifests and ubunye sync¶
When a metadata write fails (Decision 2), the record is appended to:
Each line is a JSON object with all record fields plus a _fallback_at
timestamp.
Replay with:
ubunye sync lineage # replay all lineage manifests
ubunye sync lineage --run-id abc123 # replay a specific run
ubunye sync registry # replay all registry manifests
Deduplication key: run_id + task + recorded_at. Replayed manifests are
archived to ~/.ubunye/fallback/synced/.
Writing a custom backend¶
- Implement the protocol methods (no base class needed).
-
Register the entry point in your
pyproject.toml: -
Install the package (
pip install -e .). -
The engine discovers it automatically: