Add directory validation, empty dataset check and move metrics to module level in utils.py#95
Open
agentksimha wants to merge 11 commits intohumanai-foundation:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes made to RenAIssance_Transformer_OCR_Utsav_Rai/code/utils.py:
Added image_dir and text_dir validation in SpanishDocumentsDataset.init
Both directories were never validated before use. An invalid path causes os.listdir to throw a generic FileNotFoundError with no indication of which argument was wrong. Added explicit FileNotFoundError checks for both with descriptive messages.
Added empty dataset check in SpanishDocumentsDataset.init
If image_dir exists but contains no .jpg files, self.filenames is empty and the DataLoader silently produces nothing during training. Added a ValueError check after scanning for .jpg files.
Added image_path validation in generate_text_from_image_segment
The function had no path existence check before Image.open. A missing file would be caught by the generic except Exception block, producing a vague error message. Added an explicit FileNotFoundError check before the try block so the failure is immediately actionable.
Moved load_metric calls to module level
load_metric was called three times inside compute_metrics, meaning cer, wer and bleu metrics were loaded from disk on every evaluation step during training. Moved all three to module-level constants so they are loaded once at import time and reused across all calls.