Skip to content

Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification

License

Notifications You must be signed in to change notification settings

DeSciL/Co-DETECT

Β 
Β 

Repository files navigation

Co-DETECT

The Repository for the EMNLP 2025 (Demo) Paper Co-DETECT: Collaborative Discovery of Edge cases in TExt ClassificaTion

arXiv license

Environment Requirements

  • Node.js: v22.11.0
  • Python: 3.11.10

Choose Your Testing Mode

πŸš€ Demo Mode (Fastest - 2 minutes)

  • No backend setup required
  • Pre-loaded demonstration data
  • Perfect for quick system overview

βš™οΈ Backend Mode (Complete - 5 minutes)

  • Full backend functionality (requires valid API keys)
  • Test actual data processing pipeline
  • Recommended for thorough evaluation

How to Use

Optional: Create a New Conda Environment

For ease of use, you could create and activate a new conda environment using:

conda create -n co_detect python=3.11.10
conda activate co_detect

0. Environment Variable Configuration

To use live API calls in Backend Mode, create a .env file in /annotation_fastapi/ with the following variables:

# Required for all Azure OpenAI requests
AZURE_API_KEY=your_azure_api_key_here

# Each model uses its own connection string (endpoint, deployment, API version)
ANNOTATION_MODEL_CONNECTION_STRING="https://your-azure-endpoint.openai.azure.com/openai/deployments/gpt-4o?api-version=2023-12-01-preview"
REASONING_MODEL_CONNECTION_STRING="https://your-azure-endpoint.openai.azure.com/openai/deployments/gpt-5?api-version=2023-12-01-preview"
EMBEDDING_MODEL_CONNECTION_STRING="https://your-azure-endpoint.openai.azure.com/openai/deployments/text-embedding-ada-002?api-version=2023-12-01-preview"

Note: Each model (annotation, reasoning, embedding) must have its own Azure connection string. See annotation_fastapi/utils.py for details.

Backend Mode always makes real Azure OpenAI API calls, so ensure your API key and connection strings are set.

1. Launch Backend (Only for Backend Mode)

cd annotation_fastapi
pip install -r requirements.txt
uvicorn main:app

Please note: once you run uvicorn main:app, it will show a message saying Uvicorn runing on http://127.0.0.1:8000 (Press CTRL+C to quit), but there is no need to click on the local host in your terminal, please just go ahead and follow the step 2 below.

2. Run Analysis Using Our Frontend

After launching the backend locally, you can run the analysis using our Deployed Frontend

2a. (Alternative) Launch Frontend on Your Own

a. Install nvm

If you haven't installed nvm yet, run:

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash

Then restart your terminal or run:

source ~/.nvm/nvm.sh

b. Install and Start Frontend

cd frontend
nvm install      # Reads version from .nvmrc (v22.11.0)
nvm use          # Switches to the project's Node.js version
npm install
npm run dev

Open Local Host

3. Interface Overview

Home Page - Setup Your Annotation Task

  1. Task Description: Describe your annotation task
  2. Labels: Add the categories you want to classify (minimum 2 labels)
  3. Text Input: Paste text directly or upload CSV file
  4. Submit: Click "Submit" to start analysis (or "Load Demo Data" for quick demo)

Dashboard Page - Analyze Results

  • Left Panel:
    • Previous Guidelines: Review earlier guideline versions Previous Guidelines
    • Current Guidelines: Edit task description and labels, download guidelines as .txt file Current Guidelines
    • Edge Case Handling: Saved improvement rules Edge Case Handling
  • Center Panel: Dual Scatter Plots
    • Upper plot: All annotated examples ↔ Right: All Examples Upper plot
    • Lower plot: Edge cases needing attention ↔ Right: Suggested Edge Cases Lower plot
    • Download Annotation Data: Export complete annotation results as .json file
  • Right Panel:
    • All Examples: Click points or examples to see details All Example
    • Suggested Edge Cases: Click + to save edge case handling suggestions, then iterate (iterate buttom at top right corner) Save Suggestions Save Suggestions
    • Annotate New Examples: Click Annotate New to annotate additional samples with current guideline on the left panel. Reannotating existing samples with πŸ” Annotate New Reannotate Existing

4. Two Testing Modes

Mode 1: Demo Mode (Recommended)

πŸš€ No setup required - instant demo

  1. Click "Load Demo Data" on home page
  2. Explore the dashboard immediately
  3. Test iteration with pre-loaded demo data

Mode 2: Backend Mode (Complete Experience)

βš™οΈ Full backend pipeline

Option A: Upload CSV file /annotation_fastapi/example/ghc_rnd.csv Option B: Click "Load Sample Examples" then submit

5. Key Features to Test

Interactive Analysis

  • Upper Plot ↔ All Examples: Click points in upper plot or examples in right panel - they highlight each other
  • Lower Plot ↔ Suggested Edge Cases: Click points in lower plot or suggestions in right panel - they interact
  • Cross-Plot Connection: Clicking lower plot points also highlights corresponding upper plot points

Improvement Workflow

  1. Save Suggestions: Click + button next to useful suggestions in "Suggested Edge Cases"
  2. View Saved Rules: Check "Edge Case Handling" panel for your saved rules
  3. Iterate: Click "Iterate" button to re-annotate using improved guidelines
  4. Compare Results: View before/after annotation changes

Citation

If you find our work helpful, please consider citing us:

@article{xiong2025co,
  title={Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification},
  author={Xiong, Chenfei and Ni, Jingwei and Fan, Yu and Zouhar, Vil{\'e}m and Rooein, Donya and Calvo-Bartolom{\'e}, Lorena and Hoyle, Alexander and Jin, Zhijing and Sachan, Mrinmaya and Leippold, Markus and others},
  journal={arXiv preprint arXiv:2507.05010},
  year={2025}
}

About

Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification

Resources

License

Stars

Watchers

Forks

Languages

  • TypeScript 60.6%
  • CSS 22.4%
  • Python 16.0%
  • Shell 0.6%
  • JavaScript 0.2%
  • Dockerfile 0.1%
  • HTML 0.1%