Engine & Profiles¶
The ENGINE section controls Spark configuration and lets you define per-environment
overrides without duplicating your config.
Structure¶
ENGINE:
spark_conf: # base Spark settings (all profiles)
spark.sql.shuffle.partitions: "200"
spark.executor.memory: "4g"
profiles:
dev: # overrides applied with --profile dev
spark_conf:
spark.sql.shuffle.partitions: "4"
spark.executor.memory: "512m"
staging:
spark_conf:
spark.executor.memory: "8g"
prod:
spark_conf:
spark.executor.memory: "32g"
spark.executor.cores: "8"
Fields¶
EngineConfig¶
| Field | Type | Default | Description |
|---|---|---|---|
spark_conf |
Dict[str, str] |
{} |
Base Spark configuration keys applied in all profiles |
profiles |
Dict[str, EngineProfile] |
{} |
Named profiles that override spark_conf |
EngineProfile¶
| Field | Type | Default | Description |
|---|---|---|---|
spark_conf |
Dict[str, str] |
{} |
Spark conf overrides for this profile |
Profile merge rules¶
When --profile <name> is passed, the engine calls merged_spark_conf(profile):
- Start with the base
ENGINE.spark_confdict. - Update with the named profile's
spark_conf(profile values win on conflict).
Profile-only keys are additive; base keys not overridden by the profile are kept.
Using profiles at runtime¶
# Development — small cluster, few shuffle partitions
ubunye run -d pipelines -u fraud -p etl -t claims --profile dev
# Production — full cluster
ubunye run -d pipelines -u fraud -p etl -t claims --profile prod
Common Spark settings¶
ENGINE:
spark_conf:
# Shuffle tuning
spark.sql.shuffle.partitions: "200"
spark.sql.adaptive.enabled: "true"
spark.sql.adaptive.coalescePartitions.enabled: "true"
# Memory
spark.executor.memory: "8g"
spark.driver.memory: "4g"
spark.executor.memoryOverhead: "1g"
# Delta
spark.databricks.delta.optimizeWrite.enabled: "true"
spark.databricks.delta.autoCompact.enabled: "true"
# Hive
spark.sql.catalogImplementation: "hive"
spark.sql.warehouse.dir: "/user/hive/warehouse"
Note
All values must be strings — Spark's configuration API accepts only strings.
Use "true" not true, "200" not 200.
ENGINE is optional¶
If you omit ENGINE entirely the engine uses Spark defaults: