Feature: Use document extractors for grep and centralize file text extraction by pfurovYnP · Pull Request #114 · open-webui/open-terminal

pfurovYnP · 2026-04-21T08:28:54Z

(Codex)

#113

Motivation

Ensure the same document text extraction logic used by read_file is also applied when searching files with grep_search so searchable content from PDFs/Office files is included.
Reduce duplication by centralizing the extraction logic into helper functions.
Improve error handling for binary/unsupported files in the grep flow to avoid reading raw bytes as text.

Description

Introduced _extract_text_with_supported_document_extractors(file_path, mime) to encapsulate lookup and invocation of EXTRACTORS from open_terminal.utils.documents and return extracted text or None.
Added _read_file_as_text_representation_for_grep(file_path) which mirrors read_file behavior: attempt UTF-8 read first, fall back to the document extractors, and raise UnicodeDecodeError for unsupported binary files.
Refactored read_file to call the new extractor helper and simplified the previous inline extractor loop.
Updated grep_search to use _read_file_as_text_representation_for_grep so document-extracted text is searched line-by-line, preserving existing match and truncation behaviour.

Testing

Ran the automated test suite with pytest, and all tests completed successfully.
Exercised read_file and grep_search behavior manually against text files and PDF/Office files to verify extracted text is returned and searchable as expected.
Verified that binary files with allowed MIME prefixes are still returned as raw and unsupported binaries are rejected during search.

…presentations Use document extractors for grep and centralize file text extraction

pfurovYnP added 2 commits April 21, 2026 11:21

Refactor shared document extraction logic for read and grep

c6f860b

Merge pull request #1 from pfurovYnP/codex/update-grep-to-use-text-re…

bcb7593

…presentations Use document extractors for grep and centralize file text extraction

pfurovYnP changed the title ~~Use document extractors for grep and centralize file text extraction~~ Feature: Use document extractors for grep and centralize file text extraction Apr 21, 2026

update doc for grep

c1b3ffa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Use document extractors for grep and centralize file text extraction#114

Feature: Use document extractors for grep and centralize file text extraction#114
pfurovYnP wants to merge 3 commits intoopen-webui:mainfrom
pfurovYnP:main

pfurovYnP commented Apr 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pfurovYnP commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pfurovYnP commented Apr 21, 2026 •

edited

Loading