Skip to content

Changelog

All notable changes to Ubunye Engine will be documented here.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.


[Unreleased]

Added

  • Python API (ubunye/api.py) — run_task() and run_pipeline() for running Ubunye tasks from Python code (Databricks notebooks, scripts, tests) without the CLI. Auto-detects and reuses active SparkSessions. Exported from ubunye.__init__.

  • DatabricksBackend (ubunye/backends/databricks_backend.py) — backend that wraps an existing SparkSession instead of creating one. stop() is a no-op since we don't own the session.

  • Dev notebook scaffoldingubunye init now generates notebooks/<task>_dev.ipynb alongside config.yaml and transformations.py. The notebook uses DatabricksBackend, dbutils.widgets, and display(). The Load step is commented out by default.

  • Deployment docsdocs/deployment.md covering Databricks Asset Bundles pattern, GitHub Actions CI/CD, and Python API on Databricks. DABs belong in the usecase repo, not the engine.

  • Deploy workflow.github/workflows/deploy.yml validates configs on PR and runs unit tests. Bundle deployment is handled in the usecase repo.

  • ubunye test run CLI sub-command — runs tasks with a test profile and reports PASS/FAIL.

  • Model Registry (ubunye/models/) — library-independent ML lifecycle management.

  • UbunyeModel abstract contract: train, predict, save, load, metadata, validate.
  • ModelRegistry — filesystem-backed versioning with stages: development → staging → production → archived.
  • PromotionGate — configurable metric thresholds (min_*, max_*, require_drift_check).
  • load_model_class() — dynamic model file importer; mirrors the task-dir import pattern.
  • ModelTransform plugin (type: model) — train and predict from config YAML.
  • ubunye models CLI sub-commands: list, info, promote, demote, rollback, archive, compare.
  • RegistryConfig and ModelTransformParams Pydantic schema additions.

  • Lineage tracking (ubunye/lineage/) — automatic run provenance.

  • RunContext, LineageRecorder, FileSystemLineageStore, hash_dataframe.
  • ubunye lineage CLI sub-commands: show, list, compare, search, trace.
  • --lineage flag on ubunye run.

  • REST API connector — paginated HTTP reader and writer.

  • Pagination strategies: offset, cursor, next_link.
  • Auth: bearer, api_key (header or query param), basic.
  • Rate limiting with configurable requests_per_second.
  • Exponential backoff retry on configurable status codes.
  • Optional explicit schema declaration.

  • Config validationubunye validate command with full Pydantic v2 schema.

  • Format-specific field validation in IOConfig.
  • Jinja2 rendering before Pydantic validation.
  • Semver validation on VERSION field.

  • Test infrastructure — 288 unit tests, all Spark-free in tests/unit/.

Changed

  • ubunye/config/schema.py — added RegistryConfig, ModelTransformParams, FormatType.REST_API.
  • ubunye/__init__.py — exports run_task and run_pipeline.
  • ubunye/cli/main.py — mounted models_app, lineage_app, test_app Typer sub-apps; added notebook scaffolding to init.
  • pyproject.toml — added model entry point under ubunye.transforms.

Fixed

  • N/A

[0.1.0] — 2025-09-11

Added

  • First alpha release of Ubunye Engine.
  • Config-first ETL framework built on Apache Spark.
  • Plugin system for Readers, Writers, and Transforms via Python entry points.
  • Built-in connectors: Hive, JDBC, Delta, Unity Catalog, S3, binary.
  • CLI commands: init, run, plan, config, plugins, version.
  • Orchestration exporters: Airflow DAG Python file, Databricks Jobs API JSON.
  • Internal ML wrappers: SklearnModel, SparkMLModel, BatchPredictMixin, MLflowLoggingMixin.
  • Telemetry modules: JSON event log, Prometheus, OpenTelemetry.
  • Example tasks: fraud_detection/claims/claim_etl, rest_api/customer_sync.
  • SparkBackend with context manager and safe multiple-start support.