Skip to content

Consolidate data synthesis preprocessing into unified CLI module#8

Open
vincentha766 wants to merge 1 commit intomainfrom
refactor/data-synthesis
Open

Consolidate data synthesis preprocessing into unified CLI module#8
vincentha766 wants to merge 1 commit intomainfrom
refactor/data-synthesis

Conversation

@vincentha766
Copy link
Collaborator

  • Replace 18 dataset-specific scripts (ct_rate, amos_mm, abdomen_atlas) with a single data_synthesis.py providing four subcommands (vqa, report, vqa_translation, report_translation).
  • Remove deprecated rewrite/extract_qa/qa_fewshot/report_choice_quetions logics.
  • Extract LLM call infrastructure into llm_utils.py and prompt templates into separate .prompt files.
  • Add DATA_SYNTHESIS_SPEC.md documenting I/O formats and CLI arguments.

Replace 18 dataset-specific scripts (ct_rate, amos_mm, abdomen_atlas)
with a single data_synthesis.py providing four subcommands (vqa,
report, vqa_translation, report_translation). Extract LLM call
infrastructure into llm_utils.py and prompt templates into separate
.prompt files. Add DATA_SYNTHESIS_SPEC.md documenting I/O formats and
CLI arguments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant