Skip to content

feat: add processing events and callbacks system#217

Open
Jah-yee wants to merge 2 commits intoHKUDS:mainfrom
Jah-yee:feat-processing-callbacks
Open

feat: add processing events and callbacks system#217
Jah-yee wants to merge 2 commits intoHKUDS:mainfrom
Jah-yee:feat-processing-callbacks

Conversation

@Jah-yee
Copy link

@Jah-yee Jah-yee commented Mar 2, 2026

Summary

This PR adds a lightweight event and callbacks system for document processing, so users can observe progress, collect metrics, and hook into key stages of the pipeline without changing core logic.

Motivation

For large batches, it is currently hard to see what RAG-Anything is doing: there are no structured events for logging/metrics and no safe extension points for custom side effects. A small callbacks layer provides a clearer story for monitoring and observability.

Changes

  • raganything/callbacks.py
    • ProcessingEvent dataclass for immutable, structured events (e.g. parse_start, parse_complete, text_insert, multimodal phases, query/batch stages).
    • ProcessingCallback base class with multiple overridable hooks, all accepting ProcessingEvent plus **kwargs for forward compatibility.
    • MetricsCallback that tracks document/block counts, errors, and durations, and exposes a read-only metrics snapshot.
    • CallbackManager to register/unregister callbacks, dispatch events, optionally keep a bounded event log, and isolate callback exceptions so a failing callback does not affect others.
  • tests/test_callbacks.py
    • Tests for event dispatch across multiple callbacks, hook invocation, metrics accumulation, exception isolation, and optional event log behavior.

Testing

  • Ran pytest locally including tests/test_callbacks.py; all tests passed.
  • Verified that a failing callback does not block other callbacks and that metrics update correctly in a simulated multi-document flow.

Thanks for your work on RAG-Anything—if you’d prefer different hook names or event structure, I’m glad to adjust this design.

@LarFii
Copy link
Collaborator

LarFii commented Mar 4, 2026

  1. The callback system is not integrated into the real RAGAnything execution paths yet.
    callbacks.py introduces CallbackManager and the usage example shows rag.callback_manager.register(...), but RAGAnything does not define callback_manager and does not dispatch callback events in parse/insert/multimodal/query flows. In practice, this makes the feature unusable and the example can fail with AttributeError.
    Refs: raganything/callbacks.py:20, raganything/callbacks.py:301, raganything/raganything.py:50

  2. The thread-safety statement appears inconsistent with implementation.
    The docstring says registration is thread-safe, but register/unregister and event-log writes are unsynchronized list mutations. Under concurrent use, race conditions are still possible.
    Ref: raganything/callbacks.py:256

  3. Tests are unit-only for the callback module, with no integration coverage.
    tests/test_callbacks.py validates CallbackManager behavior in isolation, but there are no tests asserting that callbacks are emitted from actual RAGAnything entry points (process_document_*, query, batch). So the missing integration above is not detected by tests.
    Ref: tests/test_callbacks.py:61

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants