From 1b13fc2f26995d9257bdaf9188c1823b52036dc0 Mon Sep 17 00:00:00 2001 From: fderuiter <127706008+fderuiter@users.noreply.github.com> Date: Tue, 24 Feb 2026 17:44:33 +0000 Subject: [PATCH 1/2] Add TODO.md for Spec Sheet Generator Added a comprehensive TODO list outlining the roadmap for the specification sheet generator project, including data ingestion, transformation, and CLI integration steps. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> --- TODO.md | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) create mode 100644 TODO.md diff --git a/TODO.md b/TODO.md new file mode 100644 index 00000000..409c765e --- /dev/null +++ b/TODO.md @@ -0,0 +1,42 @@ +# Project Roadmap: Specification Sheet Generator + +This document outlines the tasks required to automate the creation of a comprehensive specification sheet from iMednet Data Dictionary exports and aCRF PDFs. + +## Phase 1: Preparation & Analysis +- [ ] **Locate Input Data**: Verify the existence and location of `acrfs/` (PDFs), `data-dictionary/` (CSVs), and `SRS/` (Sample Excel). *Currently missing in repo root.* +- [ ] **Analyze Data Dictionary**: Inspect the full CSV export to understand all available fields (Forms, Fields, Variables, Codelists). +- [ ] **Analyze aCRF PDFs**: Determine if text/metadata extraction from PDFs is feasible or if manual mapping is required. + - [ ] Research Python PDF libraries (`pypdf`, `pdfplumber`) for extracting form names/page numbers. +- [ ] **Analyze Sample Specification Sheet**: + - [ ] Open the sample Excel in `SRS/`. + - [ ] Document the exact column structure, formatting rules (colors, fonts), and sheet organization. + +## Phase 2: Implementation - Data Ingestion +- [ ] **Create Data Dictionary Parser**: + - [ ] Implement a class to read `FORMS.csv`, `QUESTIONS.csv`, `CHOICES.csv`, etc. + - [ ] Link variables to forms and choices to variables. + - [ ] Handle edge cases (e.g., calculated fields, logic). +- [ ] **Create aCRF Processor**: + - [ ] Implement a prototype to extract relevant info (e.g., Form Name, Page Number) from PDFs. + - [ ] Map extracted info to the Data Dictionary forms. + +## Phase 3: Implementation - Excel Generation +- [ ] **Setup Excel Writer**: + - [ ] Initialize a new Excel workbook using `openpyxl` or `pandas` (with `xlsxwriter`). +- [ ] **Implement Formatting Logic**: + - [ ] Create style definitions (headers, borders, alternating rows) matching the `SRS` sample. +- [ ] **Develop Sheet Generator**: + - [ ] Write logic to create a separate sheet for each Form/aCRF. + - [ ] Populate sheets with combined data from Data Dictionary and aCRF. + +## Phase 4: Integration & CLI +- [ ] **Create CLI Command**: + - [ ] Add a new command (e.g., `imednet spec-gen`) to the `imednet` CLI. + - [ ] Arguments: `--dd-path`, `--acrf-path`, `--output`. +- [ ] **Orchestrate Workflow**: + - [ ] Connect Ingestion -> Transformation -> Generation steps. + +## Phase 5: Verification +- [ ] **Run on Sample Data**: Generate a spec sheet using the provided samples. +- [ ] **Compare with SRS**: Manually verify that the output matches the `SRS` sample in structure and content. +- [ ] **Refine**: Adjust formatting and logic as needed. From 1d362c3097afa9172618beaa52a4b276483a04c1 Mon Sep 17 00:00:00 2001 From: Frederick de Ruiter <127706008+fderuiter@users.noreply.github.com> Date: Tue, 24 Feb 2026 11:45:41 -0600 Subject: [PATCH 2/2] Fix typo in TODO for SRS directory --- TODO.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index 409c765e..76729c5e 100644 --- a/TODO.md +++ b/TODO.md @@ -3,7 +3,7 @@ This document outlines the tasks required to automate the creation of a comprehensive specification sheet from iMednet Data Dictionary exports and aCRF PDFs. ## Phase 1: Preparation & Analysis -- [ ] **Locate Input Data**: Verify the existence and location of `acrfs/` (PDFs), `data-dictionary/` (CSVs), and `SRS/` (Sample Excel). *Currently missing in repo root.* +- [ ] **Locate Input Data**: Verify the existence and location of `acrfs/` (PDFs), `data-dictionary/` (CSVs), and `srs/` (Sample Excel). - [ ] **Analyze Data Dictionary**: Inspect the full CSV export to understand all available fields (Forms, Fields, Variables, Codelists). - [ ] **Analyze aCRF PDFs**: Determine if text/metadata extraction from PDFs is feasible or if manual mapping is required. - [ ] Research Python PDF libraries (`pypdf`, `pdfplumber`) for extracting form names/page numbers.