Skip to content

docs: Update supported file types, provider management, connector management#1192

Open
aimurphy wants to merge 7 commits intomainfrom
doc-18-mar-26
Open

docs: Update supported file types, provider management, connector management#1192
aimurphy wants to merge 7 commits intomainfrom
doc-18-mar-26

Conversation

@aimurphy
Copy link
Collaborator

@aimurphy aimurphy commented Mar 19, 2026

  • Update supported file types
  • Add cloud connector sync
  • Update cloud connector management
  • Inspect failed tasks
  • Expand model provider management and add more warnings about changing and removing models/providers
  • Deleted the openrag-documentation.pdf since we now ingest from docs.openr.ag

Closes #1128

@aimurphy aimurphy self-assigned this Mar 19, 2026
@aimurphy aimurphy added the documentation 📘 Improvements or additions to documentation label Mar 19, 2026

If you use different embedding models for different documents, you can create [filters](/knowledge-filters) to separate documents that were embedded with different models.

If you use multiple embeddings models, be aware that similarity search (in **Chat**) can take longer as the agent searches each model's embeddings separately.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true. Please correct me if otherwise.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that statement is true.
src/services/search_service.py detects all indexed embedding_model values, generates a query embedding for each, then builds multiple KNN clauses (one per model field) and runs them together via dis_max. So with more embedding models, retrieval work increases (more embedding calls + larger query), which can make Chat retrieval slower, depending on provider latency and OpenSearch load.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks! I will add it back.

Caveat: If you remove a provider, then the Chat cannot generate an embedding with the appropriate embedding model. What happens in that case? Will the Chat ignore documents embedded by the missing providing? Or does it search with a "fallback" embedding model (potentially incorrect/missing results because the embedding dimensions/structure don't match)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from my understanding but always good to double check with @edwinjosechittilappilly.
If a provider is removed, OpenRAG still discovers embedding models from indexed documents and attempts to generate a query embedding for each, so if one of those models is no longer available, retrieval may fail and Chat may return an error or no useful answer; there is no safe cross-model fallback for those documents because embedding spaces and dimensions differ, and the system does not consistently fallback to searching only documents tied to still-available models.

@github-actions

This comment has been minimized.

@aimurphy aimurphy changed the title Docs: Update supported file types, provider management, connector management docs: Update supported file types, provider management, connector management Mar 19, 2026
@github-actions github-actions bot added documentation 📘 Improvements or additions to documentation and removed documentation 📘 Improvements or additions to documentation labels Mar 19, 2026
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to delete this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions github-actions bot added documentation 📘 Improvements or additions to documentation and removed documentation 📘 Improvements or additions to documentation labels Mar 19, 2026
@aimurphy aimurphy requested review from Wallgau and mendonk March 19, 2026 00:18
@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

Copy link
Collaborator

@mendonk mendonk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One typo, approved

:::tip
**Fetch latest docs** _only_ gets the latest OpenRAG documentation.

To update docs ingested from cloud storage connectors], see [Configure connectors](/knowledge-connectors).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To update docs ingested from cloud storage connectors], see [Configure connectors](/knowledge-connectors).
To update docs ingested from cloud storage connectors, see [Configure connectors](/knowledge-connectors).

@github-actions github-actions bot added the lgtm label Mar 19, 2026
Copy link
Collaborator

@Wallgau Wallgau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @aimurphy! Just synced with the team, @edwinjosechittilappilly is running tests to check files type. So let's hold on this PR until we can give you a solid list.
cc @edwinjosechittilappilly @prasanthcaibmcom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation 📘 Improvements or additions to documentation lgtm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: RTF file ingestion failed

3 participants