Skip to content

joshua-db/dminer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dminer

This project guides you through a Process Mining workflow on Databricks. It provides a Flask-based UI and orchestrates background jobs that transform raw source data into an OCEL-inspired data model: Event (what happened and when), Object (stateful entities with SCD Type 2 history), and EventObject (which objects participate in each event and their role). Pipelines are generated automatically from a user-defined mapping and run as Spark Declarative Pipelines.

Event mapping options: You can build events from an event log (one source row = one event, mapping id/activity/timestamp columns) or from a snapshot delta table (events are generated when configured columns change between consecutive snapshots—e.g. stage transitions). Object mapping options: Objects support CDC via Change Data Feed (CDF) or full snapshots; object type can come from a source column or a literal. The config schema (dminer/pipeline/config_schema.py) defines the full mapping structure.

Deploy with databricks bundle deploy and run the integration test with databricks bundle run snapshot-integration-test. Targets (dev / prod) and catalog/schema are configured in databricks.yml.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors