Rettungsdienst Importscript (ID: rd)

1. Project Goal

The purpose of this project is to provide an automated import mechanism for emergency medical service data into an i2b2 data warehouse.

This script acts as the bridge between an external, relational data format (CSV) and the i2b2 clinical data model. Its primary function is to Extract, Transform, and Load (ETL) this specific dataset.

2. General Idea & DWH Integration

This script is not intended to be run as a standalone file by end-users. Instead, it is designed to be integrated into a larger Data Warehouse (DWH) or data management platform.

The intended workflow is as follows:

User Action: A user navigates in DWH Admin to the "Art der Verarbeitung" selection in Daten-Import.
Script Selection: The user selects this script (e.g., "Rettungsdienst Importscript").
File Upload: The platform prompts the user to upload a single .zip file containing the emergency service data.
Execution: Upon receiving the file, the DWH backend executes this Python script, passing the path to the uploaded .zip file as an argument.
Transformation: The script handles the entire ETL process:
- It opens the .zip file.
- It reads the einsatzdaten.csv from within the zip.
- It transforms the relational (flat) CSV data into the vertical Entity-Attribute-Value (EAV) model used by i2b2's OBSERVATION_FACT table.
- It loads the transformed data into the i2b2 database.

3. How the Script Works (Technical Execution)

For a developer maintaining or improving this script, here is the step-by-step technical breakdown:

Step 1: Configuration

All transformation logic, file-level validation, and i2b2 concept mappings are defined in the CONFIG dictionary at the top of the script.
To add a new field: Add a new transformation rule to the CONFIG["i2b2_transforms"] list.
To change validation: Modify the regex patterns in CONFIG["files"]["einsatzdaten"]["regex_patterns"].

Step 2: Extraction

Unzip: The extract_zip function securely extracts the contents of the provided .zip file into a temporary directory.
Read CSV: The extract function reads the required einsatzdaten.csv file from the temporary directory into a pandas DataFrame. It expects a semicolon (;) delimiter.

Step 3: Preprocessing & Validation

Mandatory Columns: preprocess checks if all mandatory_columns (e.g., einsatznummer) are present.
Regex Validation: The validate_dataframe function iterates through the regex patterns defined in CONFIG and checks every value in the corresponding columns. If an invalid, non-empty value is found, the script will raise an error and stop.
Timestamp Check: It ensures that rows are not missing all possible clock/time columns, as a valid start_date is required for i2b2.

Step 4: Transformation (Relational to i2b2)

This is the core logic, handled by transform_dataframe and dataframe_to_i2b2.
Important: Facts are processed in ascending time order. This ensures that newer data automatically updates or overrides older information.

Find Earliest Timestamp: It scans all clock_columns for each row to find the earliest valid timestamp. This becomes the start_date for the i2b2 observation_fact and is stored in _metadata_start_date.
Assign Instance Number: It assigns an instance_num to ensure uniqueness for facts that share the same encounter and start time.
Apply Transforms: The script iterates over every row of the DataFrame and applies every rule in the CONFIG["i2b2_transforms"] list. A dispatcher (TRANSFORM_DISPATCHER) calls the correct function (tval_transform, code_transform, cd_transform) based on the rule's transform_type.
- tval_transform: Creates a simple fact with the CSV value in tval_char.
- code_transform: Creates a fact where the CSV value is appended to the concept_cd_base (e.g., AS:TYPE:A01).
- cd_transform: Creates a "modifier fact" that links a value (tval_char) to a concept_cd using a modifier_cd.
Add Metadata: The script enriches the i2b2 data with:
- import_date and update_date.
- A sourcesystem_cd generated by hashing the input .zip file. This allows for tracing all facts back to their source file.
- Script ID and Version (from environment variables uuid and script_version provided by the DWH).

Step 5: Loading

Connect: The load function reads database credentials (username, password, connection-url) from environment variables.
Delete Old Data: To ensure idempotency, the delete_from_db function first deletes any existing facts from the observation_fact table that match the encounter_num, start_date, and concept_cd of the data about to be loaded.
Insert New Data: The upload_into_db function inserts the new, transformed DataFrame into the observation_fact table in batches.

4. Local Development & Testing

Environment Configuration

To run the script locally, you must provide environment variables for the database connection. The load_env() helper function (for __main__ execution) will try to load a .env file from the parent directory of the script.

Create a .env file with the following keys:

username=YOUR_DB_USER
password=YOUR_DB_PASSWORD
connection-url=jdbc:postgresql://your-host:5432/your-db?searchPath=your_i2b2_schema
uuid=test-script-uuid
script_version=1.0-test

Running the Script

Once your .env file is set up and dependencies are installed, you can run the script from your terminal:

# Install dependencies
pip install -r requirements.txt

# Run the import
python rd_import.py /path/to/your/RettungsdienstData.zip

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
src		src
test/res		test/res
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rettungsdienst Importscript (ID: rd)

1. Project Goal

2. General Idea & DWH Integration

3. How the Script Works (Technical Execution)

Step 1: Configuration

Step 2: Extraction

Step 3: Preprocessing & Validation

Step 4: Transformation (Relational to i2b2)

Step 5: Loading

4. Local Development & Testing

Environment Configuration

Running the Script

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

aktin/rd-import-script

Folders and files

Latest commit

History

Repository files navigation

Rettungsdienst Importscript (ID: rd)

1. Project Goal

2. General Idea & DWH Integration

3. How the Script Works (Technical Execution)

Step 1: Configuration

Step 2: Extraction

Step 3: Preprocessing & Validation

Step 4: Transformation (Relational to i2b2)

Step 5: Loading

4. Local Development & Testing

Environment Configuration

Running the Script

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages