Changelog¶
All notable changes to Ubunye Engine will be documented here.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
[Unreleased]¶
Added¶
-
Python API (
ubunye/api.py) —run_task()andrun_pipeline()for running Ubunye tasks from Python code (Databricks notebooks, scripts, tests) without the CLI. Auto-detects and reuses active SparkSessions. Exported fromubunye.__init__. -
DatabricksBackend (
ubunye/backends/databricks_backend.py) — backend that wraps an existing SparkSession instead of creating one.stop()is a no-op since we don't own the session. -
Dev notebook scaffolding —
ubunye initnow generatesnotebooks/<task>_dev.ipynbalongsideconfig.yamlandtransformations.py. The notebook usesDatabricksBackend,dbutils.widgets, anddisplay(). The Load step is commented out by default. -
Deployment docs —
docs/deployment.mdcovering Databricks Asset Bundles pattern, GitHub Actions CI/CD, and Python API on Databricks. DABs belong in the usecase repo, not the engine. -
Deploy workflow —
.github/workflows/deploy.ymlvalidates configs on PR and runs unit tests. Bundle deployment is handled in the usecase repo. -
ubunye test runCLI sub-command — runs tasks with a test profile and reports PASS/FAIL. -
Model Registry (
ubunye/models/) — library-independent ML lifecycle management. UbunyeModelabstract contract:train,predict,save,load,metadata,validate.ModelRegistry— filesystem-backed versioning with stages: development → staging → production → archived.PromotionGate— configurable metric thresholds (min_*,max_*,require_drift_check).load_model_class()— dynamic model file importer; mirrors the task-dir import pattern.ModelTransformplugin (type: model) — train and predict from config YAML.ubunye modelsCLI sub-commands:list,info,promote,demote,rollback,archive,compare.-
RegistryConfigandModelTransformParamsPydantic schema additions. -
Lineage tracking (
ubunye/lineage/) — automatic run provenance. RunContext,LineageRecorder,FileSystemLineageStore,hash_dataframe.ubunye lineageCLI sub-commands:show,list,compare,search,trace.-
--lineageflag onubunye run. -
REST API connector — paginated HTTP reader and writer.
- Pagination strategies: offset, cursor, next_link.
- Auth: bearer, api_key (header or query param), basic.
- Rate limiting with configurable
requests_per_second. - Exponential backoff retry on configurable status codes.
-
Optional explicit schema declaration.
-
Config validation —
ubunye validatecommand with full Pydantic v2 schema. - Format-specific field validation in
IOConfig. - Jinja2 rendering before Pydantic validation.
-
Semver validation on
VERSIONfield. -
Test infrastructure — 288 unit tests, all Spark-free in
tests/unit/.
Changed¶
ubunye/config/schema.py— addedRegistryConfig,ModelTransformParams,FormatType.REST_API.ubunye/__init__.py— exportsrun_taskandrun_pipeline.ubunye/cli/main.py— mountedmodels_app,lineage_app,test_appTyper sub-apps; added notebook scaffolding toinit.pyproject.toml— addedmodelentry point underubunye.transforms.
Fixed¶
- N/A
[0.1.0] — 2025-09-11¶
Added¶
- First alpha release of Ubunye Engine.
- Config-first ETL framework built on Apache Spark.
- Plugin system for Readers, Writers, and Transforms via Python entry points.
- Built-in connectors: Hive, JDBC, Delta, Unity Catalog, S3, binary.
- CLI commands:
init,run,plan,config,plugins,version. - Orchestration exporters: Airflow DAG Python file, Databricks Jobs API JSON.
- Internal ML wrappers:
SklearnModel,SparkMLModel,BatchPredictMixin,MLflowLoggingMixin. - Telemetry modules: JSON event log, Prometheus, OpenTelemetry.
- Example tasks:
fraud_detection/claims/claim_etl,rest_api/customer_sync. SparkBackendwith context manager and safe multiple-start support.