From 1b13fc2f26995d9257bdaf9188c1823b52036dc0 Mon Sep 17 00:00:00 2001
From: fderuiter <127706008+fderuiter@users.noreply.github.com>
Date: Tue, 24 Feb 2026 17:44:33 +0000
Subject: [PATCH 1/2] Add TODO.md for Spec Sheet Generator

Added a comprehensive TODO list outlining the roadmap for the specification sheet generator project, including data ingestion, transformation, and CLI integration steps.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
---
 TODO.md | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)
 create mode 100644 TODO.md

diff --git a/TODO.md b/TODO.md
new file mode 100644
index 00000000..409c765e
--- /dev/null
+++ b/TODO.md
@@ -0,0 +1,42 @@
+# Project Roadmap: Specification Sheet Generator
+
+This document outlines the tasks required to automate the creation of a comprehensive specification sheet from iMednet Data Dictionary exports and aCRF PDFs.
+
+## Phase 1: Preparation & Analysis
+- [ ] **Locate Input Data**: Verify the existence and location of `acrfs/` (PDFs), `data-dictionary/` (CSVs), and `SRS/` (Sample Excel). *Currently missing in repo root.*
+- [ ] **Analyze Data Dictionary**: Inspect the full CSV export to understand all available fields (Forms, Fields, Variables, Codelists).
+- [ ] **Analyze aCRF PDFs**: Determine if text/metadata extraction from PDFs is feasible or if manual mapping is required.
+    - [ ] Research Python PDF libraries (`pypdf`, `pdfplumber`) for extracting form names/page numbers.
+- [ ] **Analyze Sample Specification Sheet**:
+    - [ ] Open the sample Excel in `SRS/`.
+    - [ ] Document the exact column structure, formatting rules (colors, fonts), and sheet organization.
+
+## Phase 2: Implementation - Data Ingestion
+- [ ] **Create Data Dictionary Parser**:
+    - [ ] Implement a class to read `FORMS.csv`, `QUESTIONS.csv`, `CHOICES.csv`, etc.
+    - [ ] Link variables to forms and choices to variables.
+    - [ ] Handle edge cases (e.g., calculated fields, logic).
+- [ ] **Create aCRF Processor**:
+    - [ ] Implement a prototype to extract relevant info (e.g., Form Name, Page Number) from PDFs.
+    - [ ] Map extracted info to the Data Dictionary forms.
+
+## Phase 3: Implementation - Excel Generation
+- [ ] **Setup Excel Writer**:
+    - [ ] Initialize a new Excel workbook using `openpyxl` or `pandas` (with `xlsxwriter`).
+- [ ] **Implement Formatting Logic**:
+    - [ ] Create style definitions (headers, borders, alternating rows) matching the `SRS` sample.
+- [ ] **Develop Sheet Generator**:
+    - [ ] Write logic to create a separate sheet for each Form/aCRF.
+    - [ ] Populate sheets with combined data from Data Dictionary and aCRF.
+
+## Phase 4: Integration & CLI
+- [ ] **Create CLI Command**:
+    - [ ] Add a new command (e.g., `imednet spec-gen`) to the `imednet` CLI.
+    - [ ] Arguments: `--dd-path`, `--acrf-path`, `--output`.
+- [ ] **Orchestrate Workflow**:
+    - [ ] Connect Ingestion -> Transformation -> Generation steps.
+
+## Phase 5: Verification
+- [ ] **Run on Sample Data**: Generate a spec sheet using the provided samples.
+- [ ] **Compare with SRS**: Manually verify that the output matches the `SRS` sample in structure and content.
+- [ ] **Refine**: Adjust formatting and logic as needed.

From 1d362c3097afa9172618beaa52a4b276483a04c1 Mon Sep 17 00:00:00 2001
From: Frederick de Ruiter <127706008+fderuiter@users.noreply.github.com>
Date: Tue, 24 Feb 2026 11:45:41 -0600
Subject: [PATCH 2/2] Fix typo in TODO for SRS directory

---
 TODO.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/TODO.md b/TODO.md
index 409c765e..76729c5e 100644
--- a/TODO.md
+++ b/TODO.md
@@ -3,7 +3,7 @@
 This document outlines the tasks required to automate the creation of a comprehensive specification sheet from iMednet Data Dictionary exports and aCRF PDFs.
 
 ## Phase 1: Preparation & Analysis
-- [ ] **Locate Input Data**: Verify the existence and location of `acrfs/` (PDFs), `data-dictionary/` (CSVs), and `SRS/` (Sample Excel). *Currently missing in repo root.*
+- [ ] **Locate Input Data**: Verify the existence and location of `acrfs/` (PDFs), `data-dictionary/` (CSVs), and `srs/` (Sample Excel).
 - [ ] **Analyze Data Dictionary**: Inspect the full CSV export to understand all available fields (Forms, Fields, Variables, Codelists).
 - [ ] **Analyze aCRF PDFs**: Determine if text/metadata extraction from PDFs is feasible or if manual mapping is required.
     - [ ] Research Python PDF libraries (`pypdf`, `pdfplumber`) for extracting form names/page numbers.