A modern web application that leverages Large Language Models to assist with data engineering tasks such as column type annotation, entity matching, error detection, and more.
- Docker & Docker Compose for Docker Start Option
- OR Python 3.13+ and Node.js 18+ for Local Start Option
# Clone the repository
git clone <repository-url>
cd lab25-gui4de
# Build and start
docker-compose up -d --build# Clone and setup
git clone <repository-url>
cd lab25-gui4de
# Backend setup
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
export PYTHONPATH=${PYTHONPATH}:./ # Windows: $env:PYTHONPATH = "$env:PYTHONPATH;."
# Frontend setup
cd client
npm install
# Start backend (Terminal 1)
python gui4de/server/api.py
# Start frontend (Terminal 2)
cd client
npm run dev- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Column Type Annotation - Automatically detect and annotate data types according to ontology
- Entity Matching - Find and link related records across datasets
- Error Detection - Identify data quality issues and anomalies
- Missing Value Imputation - Fill in missing data intelligently
- Schema Matching - Align schemas between different data sources
- Table Relationalization - Normalize and structure relational data
- Advisor Mode - Get intelligent suggestions to your dataset.
- Start the Application using one of the methods above
- Navigate to http://localhost:5173 in your browser
- Authenticate your valid OpenAI API key
- Select a Task Type from the available options
- Upload your CSV file or paste data
- Configure Parameters for the selected task
- Execute and watch real-time progress
- Review Results and download processed data
For users who want to use only the GUI4DE task functions in their own Python projects (without the web interface), you can install the package directly:
# Install from main branch (stable)
pip install git+https://github.com/DataManagementLab/lab25-gui4de.gitAfter installation, you can use GUI4DE tasks programmatically:
import gui4de
# Example: Column type annotation
results, cost = gui4de.column_type_annotation_task(
csv_file="your_data.csv",
ontology_type="DBPedia",
budget=1
)
# Example: Entity matching
results, cost = gui4de.entity_matching_task(
first_csv_file="table1.csv",
second_csv_file="table2.csv",
budget=0.4
)For detailed usage instructions and examples, see the Package Usage Guide and explore the scripts folder for ready-to-use scripts and practical examples.
- Developer Guide – Technical setup and development workflows
- Package Usage Guide – Instructions for using the package in your own projects
See LICENSE.md for details.