Skip to content

Conversation

@sampaccoud
Copy link
Member

@sampaccoud sampaccoud commented Aug 7, 2025

Purpose

We want to add fulltext (and semantic in a second phase) search to Docs.

The goal is to enable efficient and scalable search across document content by pushing relevant data to a dedicated search backend, such as OpenSearch. The backend should be pluggable.

Proposal

  • Add indexing logic in a search indexer that can be declared as a backend
  • Implement indexing for the Find backend. See corresponding PR in Find
  • Implement search views as a proxy
  • Implement triggers to update search index when a document or its accesses change. Synchronization should be done asyncrhonously as changing a document or its accesses affects all its descendants...

Fixes #322

@sampaccoud sampaccoud requested a review from joehybird August 7, 2025 16:40
@sampaccoud sampaccoud added feature add a new feature backend labels Aug 7, 2025
@joehybird joehybird force-pushed the index-to-search branch 3 times, most recently from 10bfd94 to 5bd6b18 Compare September 8, 2025 12:38
@gitguardian
Copy link

gitguardian bot commented Sep 8, 2025

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link
Member

@qbey qbey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First review, I know work is still ongoing and I did not read all the tests... :)

@joehybird joehybird force-pushed the index-to-search branch 16 times, most recently from 7cfa907 to 7255ec2 Compare September 15, 2025 13:01
@joehybird joehybird force-pushed the index-to-search branch 2 times, most recently from 64b77bc to 7521e24 Compare November 4, 2025 16:28
@joehybird joehybird force-pushed the index-to-search branch 4 times, most recently from 652c868 to e9fdc43 Compare November 13, 2025 09:06
@joehybird joehybird requested a review from qbey November 13, 2025 09:47
sampaccoud and others added 20 commits November 13, 2025 14:55
Search in Docs relies on an external project like "La Suite Find".
We need to declare a common external network in order to connect to
the search app and index our documents.
We need to content in our demo documents so that we can test
indexing.
Add indexer that loops across documents in the database, formats them
as json objects and indexes them in the remote "Find" mico-service.
On document content or permission changes, start a celery job that will call the
indexation API of the app "Find".

Signed-off-by: Fabre Florian <ffabre@hybird.org>
Signed-off-by: Fabre Florian <ffabre@hybird.org>
Signed-off-by: Fabre Florian <ffabre@hybird.org>
New API view that calls the indexed documents search view
(resource server) of app "Find".

Signed-off-by: Fabre Florian <ffabre@hybird.org>
New SEARCH_INDEXER_CLASS setting to define the indexer service class.
Raise ImpoperlyConfigured errors instead of RuntimeError in index service.

Signed-off-by: Fabre Florian <ffabre@hybird.org>
Signed-off-by: Fabre Florian <ffabre@hybird.org>
Filter deleted documents from visited ones.
Set default ordering to the Find API search call (-updated_at)
BaseDocumentIndexer.search now returns a list of document ids instead of models.
Do not call the indexer in signals when SEARCH_INDEXER_CLASS is not defined
or properly configured.

Signed-off-by: Fabre Florian <ffabre@hybird.org>
Only documents without title and content are ignored by indexer.
Add SEARCH_INDEXER_COUNTDOWN as configurable setting.
Make the search backend creation simplier (only 'get_document_indexer' now).
Allow indexation of deleted documents.

Signed-off-by: Fabre Florian <ffabre@hybird.org>
Add bin/fernetkey that generates a key for the OIDC_STORE_REFRESH_TOKEN_KEY
setting.

Signed-off-by: Fabre Florian <ffabre@hybird.org>
Add nginx with 'nginx' alias to the 'lasuite-net' network (keycloak calls)
Add celery-dev to the 'lasuite-net' network (Find API calls in jobs)
Set app-dev alias as 'impress' in the 'lasuite-net' network
Add indexer configuration in common settings

Signed-off-by: Fabre Florian <ffabre@hybird.org>
Rename FindDocumentIndexer as SearchIndexer
Rename FindDocumentSerializer as SearchDocumentSerializer
Rename package core.tasks.find as core.task.search
Remove logs on http errors in SearchIndexer
Factorise some code in search API view.

Signed-off-by: Fabre Florian <ffabre@hybird.org>
Replace indexer_debounce_lock|release functions by indexer_throttle_acquire()
Instead of mutex-like mechanism, simply set a flag in cache for an amount of
time that prevents any other task creation.

Signed-off-by: Fabre Florian <ffabre@hybird.org>
Keep ordering by score from Find API on search/ results and
fallback search still uses "-update_at" ordering as default

Refactor pagination to work with a list instead of a queryset

Signed-off-by: Fabre Florian <ffabre@hybird.org>
Set SEARCH_INDEXER_CLASS=None as default configuration for dev.
Rename docker network 'lasuite-net' as 'lasuite' to match with Drive
configuration.

Signed-off-by: Fabre Florian <ffabre@hybird.org>
Add documentation for env & Find+Docs configuration in dev mode

Signed-off-by: Fabre Florian <ffabre@hybird.org>
Reduce the number of Find API calls by grouping all the latest changes
for indexation : send all the documents updated or deleted since the
triggering of the task.

Signed-off-by: Fabre Florian <ffabre@hybird.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend feature add a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Full-Blown search feature

5 participants