Skip to content

fix: Unhandled encoding errors in text extraction#28

Open
mrwind-up-bird wants to merge 1 commit intomainfrom
autofix/dd2853cd/unhandled-encoding-errors-in-t
Open

fix: Unhandled encoding errors in text extraction#28
mrwind-up-bird wants to merge 1 commit intomainfrom
autofix/dd2853cd/unhandled-encoding-errors-in-t

Conversation

@mrwind-up-bird
Copy link
Copy Markdown
Collaborator

AutoFix: Unhandled encoding errors in text extraction

Category: error-handling
Severity: medium

Issue

Text file decoding uses utf-8 without error handling, which will raise UnicodeDecodeError for files with different encodings. This could crash the ingestion pipeline and leave sources in an inconsistent state.

Fix

Added proper error handling for UTF-8 decoding failures by wrapping the decode operation in a try-catch block. When UnicodeDecodeError occurs, the code falls back to latin-1 encoding with error replacement, which can handle any byte sequence and prevents the ingestion pipeline from crashing due to encoding issues.


Generated by nyxCore AutoFix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant