Skip to content

jon-tk-chan/SyntheticEDVolumesData

Repository files navigation

ED Synthetic Data Generator

You can try out the application here: ED Synthetic Data Generator on Replit

Purpose

This application creates realistic synthetic data that mimics what you would find in a hospital Emergency Department. The generated datasets can be used for:

  • Healthcare analytics training and education
  • Testing and developing new healthcare reporting systems
  • Planning and simulation of ED operations
  • Data visualization projects that require realistic hospital data
  • Research projects where real patient data cannot be used due to privacy concerns
  • Teaching healthcare professionals about data-driven decision-making

Features

  • Generate realistic Emergency Department visit data (up to 100,000 rows)
  • Customize column types, data distributions, and value ranges
  • Logically grouped data columns (demographics, timing, acuity, medical, disposition)
  • Derived columns with realistic interrelationships
  • Visualization of ED visit trends
  • Flexible data export in CSV or Excel formats

Data Columns

The application generates the following data columns:

Column Name Description
Patient Demographics
patient_id Unique identifier for each patient visit (alphanumeric).
age Patient age in years (numeric integer).
gender Patient gender (categorical).
age_group Age category derived from age (categorical).
Visit Timing
visit_start_time Date and time when the patient arrived (datetime).
visit_end_time Date and time when the patient left the ED (datetime).
length_of_stay_hours Duration of ED visit in hours (numeric decimal).
day_of_week Day of the week derived from visit_start_time (categorical).
hour_of_day Hour of day derived from visit_start_time (integer 0-23).
Acuity and Triage
ctas_level Canadian Triage Acuity Scale level (integer 1-5).
acuity_level Text description derived from ctas_level (categorical).
arrival_mode How the patient arrived at the ED (categorical).
Medical Information
presenting_complaint Primary reason for the ED visit (categorical).
consultant_seen Whether a specialist consultant was involved (boolean).
Disposition
discharge_status Final outcome of the ED visit (categorical).
hospital_site Hospital location where care was provided (categorical).

Installation

# Clone this repository
git clone https://github.com/YOUR_USERNAME/YOUR_REPO_NAME.git
cd YOUR_REPO_NAME

# Install dependencies
pip install -r dependencies.txt

# Run the application
streamlit run app.py

Requirements

  • Python 3.6+
  • Streamlit
  • Pandas
  • NumPy
  • Faker
  • XlsxWriter

Usage

  1. Configure the desired columns and their distributions
  2. Select the number of rows to generate
  3. Preview the generated data with statistics and visualizations
  4. Export the data to CSV or Excel format

Disclaimer

This application was created with the assistance of Replit AI. While care has been taken to ensure accuracy, it may contain technical errors or misinterpretations. The synthetic data generated is intended for educational and development purposes only and should not be used to make clinical or operational decisions. The distributions and relationships between variables are approximations and may not perfectly reflect real-world emergency department data. Users should verify the suitability of generated datasets for their specific use cases.

License

MIT

Future Improvements

  • Refine column order and default distribution selection
  • Refine default distributions for numerical columns
  • Add example data folder in repo
  • Add patient address and postal code columns (based on Canadian province: ON, BC)

About

Python Streamlit app for generating synthetic ED visits data. Use cases: data portfolio projects, testing/validation of healthcare analytics scripts, personal learning.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages