This project guides you through a Process Mining workflow on Databricks. It provides a Flask-based UI and orchestrates background jobs that transform raw source data into an OCEL-inspired data model: Event (what happened and when), Object (stateful entities with SCD Type 2 history), and EventObject (which objects participate in each event and their role). Pipelines are generated automatically from a user-defined mapping and run as Spark Declarative Pipelines.
Event mapping options: You can build events from an event log (one source row = one event, mapping id/activity/timestamp columns) or from a snapshot delta table (events are generated when configured columns change between consecutive snapshots—e.g. stage transitions). Object mapping options: Objects support CDC via Change Data Feed (CDF) or full snapshots; object type can come from a source column or a literal. The config schema (dminer/pipeline/config_schema.py) defines the full mapping structure.
Deploy with databricks bundle deploy and run the integration test with databricks bundle run snapshot-integration-test. Targets (dev / prod) and catalog/schema are configured in databricks.yml.