The aspect ratio of A4 paper is different from standard specifications.

## Summary

  When converting an A4 PDF, the rendered image has an aspect ratio of ~1.403
  instead of the expected 1.4142, due to grid quantization in `scale_to_fit`.

  ## Root Cause

  **Step 1** — `input.py: load_pdf_images` renders A4 correctly at 192 DPI:
  - 595.28 × 841.89 pt → 1587 × 2245 px → ratio **1.4147** ✓

  **Step 2** — `model/util.py: scale_to_fit` snaps to multiples of `grid_size=28`:
  ```python
  w_blocks = round(1587 / 28) = round(56.68) = 57  →  new_width  = 1596
  h_blocks = round(2245 / 28) = round(80.18) = 80  →  new_height = 2240
  Width rounds up (+9 px), height rounds down (−5 px) — both errors push
  the aspect ratio in the same direction:

  2240 / 1596 = 1.4035 ✗

  Impact

  The distortion is ~0.8%, which is negligible for OCR text accuracy, but causes
  systematic error when mapping bounding box coordinates back to the original PDF
  coordinate space.

  Environment

  - chandra-ocr v0.2.0
  - IMAGE_DPI = 192, MIN_PDF_IMAGE_DIM = 1024, grid_size = 28


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The aspect ratio of A4 paper is different from standard specifications. #95

Summary

Root Cause

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The aspect ratio of A4 paper is different from standard specifications. #95

Description

Summary

Root Cause

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions