You can try out the application here: ED Synthetic Data Generator on Replit
This application creates realistic synthetic data that mimics what you would find in a hospital Emergency Department. The generated datasets can be used for:
- Healthcare analytics training and education
- Testing and developing new healthcare reporting systems
- Planning and simulation of ED operations
- Data visualization projects that require realistic hospital data
- Research projects where real patient data cannot be used due to privacy concerns
- Teaching healthcare professionals about data-driven decision-making
- Generate realistic Emergency Department visit data (up to 100,000 rows)
- Customize column types, data distributions, and value ranges
- Logically grouped data columns (demographics, timing, acuity, medical, disposition)
- Derived columns with realistic interrelationships
- Visualization of ED visit trends
- Flexible data export in CSV or Excel formats
The application generates the following data columns:
| Column Name | Description |
|---|---|
| Patient Demographics | |
| patient_id | Unique identifier for each patient visit (alphanumeric). |
| age | Patient age in years (numeric integer). |
| gender | Patient gender (categorical). |
| age_group | Age category derived from age (categorical). |
| Visit Timing | |
| visit_start_time | Date and time when the patient arrived (datetime). |
| visit_end_time | Date and time when the patient left the ED (datetime). |
| length_of_stay_hours | Duration of ED visit in hours (numeric decimal). |
| day_of_week | Day of the week derived from visit_start_time (categorical). |
| hour_of_day | Hour of day derived from visit_start_time (integer 0-23). |
| Acuity and Triage | |
| ctas_level | Canadian Triage Acuity Scale level (integer 1-5). |
| acuity_level | Text description derived from ctas_level (categorical). |
| arrival_mode | How the patient arrived at the ED (categorical). |
| Medical Information | |
| presenting_complaint | Primary reason for the ED visit (categorical). |
| consultant_seen | Whether a specialist consultant was involved (boolean). |
| Disposition | |
| discharge_status | Final outcome of the ED visit (categorical). |
| hospital_site | Hospital location where care was provided (categorical). |
# Clone this repository
git clone https://github.com/YOUR_USERNAME/YOUR_REPO_NAME.git
cd YOUR_REPO_NAME
# Install dependencies
pip install -r dependencies.txt
# Run the application
streamlit run app.py- Python 3.6+
- Streamlit
- Pandas
- NumPy
- Faker
- XlsxWriter
- Configure the desired columns and their distributions
- Select the number of rows to generate
- Preview the generated data with statistics and visualizations
- Export the data to CSV or Excel format
This application was created with the assistance of Replit AI. While care has been taken to ensure accuracy, it may contain technical errors or misinterpretations. The synthetic data generated is intended for educational and development purposes only and should not be used to make clinical or operational decisions. The distributions and relationships between variables are approximations and may not perfectly reflect real-world emergency department data. Users should verify the suitability of generated datasets for their specific use cases.
MIT
- Refine column order and default distribution selection
- Refine default distributions for numerical columns
- Add example data folder in repo
- Add patient address and postal code columns (based on Canadian province: ON, BC)