Skip to content

The aspect ratio of A4 paper is different from standard specifications. #95

@asylee02

Description

@asylee02

Summary

When converting an A4 PDF, the rendered image has an aspect ratio of ~1.403
instead of the expected 1.4142, due to grid quantization in scale_to_fit.

Root Cause

Step 1input.py: load_pdf_images renders A4 correctly at 192 DPI:

  • 595.28 × 841.89 pt → 1587 × 2245 px → ratio 1.4147

Step 2model/util.py: scale_to_fit snaps to multiples of grid_size=28:

w_blocks = round(1587 / 28) = round(56.68) = 57new_width  = 1596
h_blocks = round(2245 / 28) = round(80.18) = 80new_height = 2240
Width rounds up (+9 px), height rounds down (−5 px) — both errors push
the aspect ratio in the same direction:

2240 / 1596 = 1.4035Impact

The distortion is ~0.8%, which is negligible for OCR text accuracy, but causes
systematic error when mapping bounding box coordinates back to the original PDF
coordinate space.

Environment

- chandra-ocr v0.2.0
- IMAGE_DPI = 192, MIN_PDF_IMAGE_DIM = 1024, grid_size = 28

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions