Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
272d396
🔧(compose) configure external network for communication with search
sampaccoud Jul 23, 2025
f3b2907
✨(backend) add dummy content to demo documents
sampaccoud Aug 6, 2025
bc48ec9
✨(backend) add document search indexer
sampaccoud Jul 24, 2025
7728b56
✨(backend) add async triggers to enable document indexation with find
sampaccoud Aug 6, 2025
04cd06a
🔧(compose) Add some ignore for docker-compose local overrides
joehybird Aug 13, 2025
da2f360
✨(backend) add unit test for the 'index' command
joehybird Aug 13, 2025
409918e
✨(backend) add document search view
joehybird Aug 13, 2025
43f3e50
✨(backend) improve search indexer service configuration
joehybird Sep 11, 2025
9554640
✨(backend) refactor indexation signals and fix circular import issues
joehybird Sep 12, 2025
8abe74f
✨(backend) add fallback search & default ordering
joehybird Sep 17, 2025
65d4d75
✨(backend) Index partially empty documents
joehybird Sep 22, 2025
2af7937
✨(backend) Index deleted documents
joehybird Sep 24, 2025
e1fdace
🔧(backend) force a valid key for token storage in development mode
joehybird Oct 1, 2025
f44a7d0
🔧(backend) setup Docs app dockers to work with Find
joehybird Oct 6, 2025
c71c0d7
🔧(backend) force a valid key for token storage in development mode
joehybird Oct 7, 2025
7aa725a
✨(backend) some refactor of indexer classes & modules
joehybird Oct 7, 2025
5617372
✨(backend) throttle indexation tasks instead of debounce (simplier)
joehybird Oct 14, 2025
7fd532d
✨(backend) keep ordering from fulltext search in results
joehybird Oct 31, 2025
b1475df
WIP 💩(front) hack to use the fulltext search api
joehybird Oct 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitguardian.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
secret:
ignored_matches:
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ignored value contains a 44‐character key; storing static secrets in configuration is risky and may be better done via environment variables.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ignored value contains a 44‐character key; storing static secrets in configuration is risky and may be better done via environment variables.

- name:
match: "na1hhus-OLhq9mb9SO3R-8E4dONuMnqpZSY_SX8xcFk="
version: 2
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ venv.bak/
env.d/development/*.local
env.d/terraform

# Docker
compose.override.yml
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding "docker/auth/*.local" could inadvertently ignore local authentication files for all Docker builds; confirm this is intended.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding "docker/auth/*.local" could inadvertently ignore local authentication files for all Docker builds; confirm this is intended.

docker/auth/*.local

# npm
node_modules

Expand Down
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,9 @@ and this project adheres to
- ♿ update labels and shared document icon accessibility #1442
- 🍱(frontend) Fonts GDPR compliants #1453
- ♻️(service-worker) improve SW registration and update handling #1473
- ✨(backend) add async indexation of documents on save (or access save) #1276
- ✨(backend) add debounce mechanism to limit indexation jobs #1276
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same ticket number suitenumerique#1276 appears twice in the added feature list; duplicates can confuse readers and should be consolidated.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same ticket number suitenumerique#1276 appears twice in the added feature list; duplicates can confuse readers and should be consolidated.

- ✨(api) add API route to search for indexed documents in Find #1276

### Fixed

Expand Down
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,10 @@ demo: ## flush db then create a demo for load testing purpose
@$(MANAGE) create_demo
.PHONY: demo

index: ## index all documents to remote search
@$(MANAGE) index
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new index target references $(MANAGE) index but no phony dependency is added for 'demo', potentially causing unintended dependency ordering.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new index target references $(MANAGE) index but no phony dependency is added for 'demo', potentially causing unintended dependency ordering.

.PHONY: index

# Nota bene: Black should come after isort just in case they don't agree...
lint: ## lint back-end python sources
lint: \
Expand Down
6 changes: 6 additions & 0 deletions bin/fernetkey
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env bash

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated key is output with an extra newline; while harmless, it may cause downstream parsing errors if the consumer expects a plain key.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated key is output with an extra newline; while harmless, it may cause downstream parsing errors if the consumer expects a plain key.

# shellcheck source=bin/_config.sh
source "$(dirname "${BASH_SOURCE[0]}")/_config.sh"

_dc_run app-dev python -c 'from cryptography.fernet import Fernet;import sys; sys.stdout.write("\n" + Fernet.generate_key().decode() + "\n");'
18 changes: 18 additions & 0 deletions compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,11 @@ services:
- env.d/development/postgresql.local
ports:
- "8071:8000"
networks:
default: {}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Network alias "impress" defined for the backend service may conflict with existing aliases; verify that no other service uses the same alias to avoid name clashes.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Network alias "impress" defined for the backend service may conflict with existing aliases; verify that no other service uses the same alias to avoid name clashes.

lasuite-net:
aliases:
- impress
volumes:
- ./src/backend:/app
- ./data/static:/data/static
Expand All @@ -92,6 +97,9 @@ services:
command: ["celery", "-A", "impress.celery_app", "worker", "-l", "DEBUG"]
environment:
- DJANGO_CONFIGURATION=Development
networks:
- default
- lasuite-net
env_file:
- env.d/development/common
- env.d/development/common.local
Expand All @@ -107,6 +115,11 @@ services:
image: nginx:1.25
ports:
- "8083:8083"
networks:
default: {}
lasuite-net:
aliases:
- nginx
volumes:
- ./docker/files/etc/nginx/conf.d:/etc/nginx/conf.d:ro
depends_on:
Expand Down Expand Up @@ -217,3 +230,8 @@ services:
kc_postgresql:
condition: service_healthy
restart: true

networks:
lasuite-net:
name: lasuite-net
driver: bridge
16 changes: 15 additions & 1 deletion env.d/development/common
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,14 @@ LOGOUT_REDIRECT_URL=http://localhost:3000
OIDC_REDIRECT_ALLOWED_HOSTS=["http://localhost:8083", "http://localhost:3000"]
OIDC_AUTH_REQUEST_EXTRA_PARAMS={"acr_values": "eidas1"}

# Store OIDC tokens in the session
OIDC_STORE_ACCESS_TOKEN = True
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The placeholder value for OIDC_STORE_REFRESH_TOKEN_KEY looks like a toy 44‐character string; replace it with a real Fernet key before deployment.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The placeholder value for OIDC_STORE_REFRESH_TOKEN_KEY looks like a toy 44‐character string; replace it with a real Fernet key before deployment.

OIDC_STORE_REFRESH_TOKEN = True # Store the encrypted refresh token in the session.

# Must be a valid Fernet key (32 url-safe base64-encoded bytes)
# To create one, use the bin/fernetkey command.
OIDC_STORE_REFRESH_TOKEN_KEY = "na1hhus-OLhq9mb9SO3R-8E4dONuMnqpZSY_SX8xcFk="

# AI
AI_FEATURE_ENABLED=true
AI_BASE_URL=https://openaiendpoint.com
Expand All @@ -68,4 +76,10 @@ Y_PROVIDER_API_BASE_URL=http://y-provider-development:4444/api/
Y_PROVIDER_API_KEY=yprovider-api-key

# Theme customization
THEME_CUSTOMIZATION_CACHE_TIMEOUT=15
THEME_CUSTOMIZATION_CACHE_TIMEOUT=15

# Indexer
SEARCH_INDEXER_CLASS="core.services.search_indexers.SearchIndexer"
SEARCH_INDEXER_SECRET=find-api-key-for-docs-with-exactly-50-chars-length # Key generated by create_demo in Find app.
SEARCH_INDEXER_URL="http://find:8000/api/v1.0/documents/index/"
SEARCH_INDEXER_QUERY_URL="http://find:8000/api/v1.0/documents/search/"
10 changes: 10 additions & 0 deletions src/backend/core/api/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -889,3 +889,13 @@ class MoveDocumentSerializer(serializers.Serializer):
choices=enums.MoveNodePositionChoices.choices,
default=enums.MoveNodePositionChoices.LAST_CHILD,
)


Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string literal for the page field ends with an unmatched single quote, causing a syntax error that will prevent the module from loading.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string literal for the page field ends with an unmatched single quote, causing a syntax error that will prevent the module from loading.

class SearchDocumentSerializer(serializers.Serializer):
"""Serializer for fulltext search requests through Find application"""

q = serializers.CharField(required=True, allow_blank=False, trim_whitespace=True)
page_size = serializers.IntegerField(
required=False, min_value=1, max_value=50, default=20
)
page = serializers.IntegerField(required=False, min_value=1, default=1)
117 changes: 117 additions & 0 deletions src/backend/core/api/viewsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,15 @@
from django.core.cache import cache
from django.core.exceptions import ValidationError
from django.core.files.storage import default_storage
from django.core.paginator import InvalidPage, Paginator
from django.core.validators import URLValidator
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search_serializer_class is set to ListDocumentSerializer, but the new SearchDocumentSerializer was added; this mismatch will make search responses use the wrong serializer.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search_serializer_class is set to ListDocumentSerializer, but the new SearchDocumentSerializer was added; this mismatch will make search responses use the wrong serializer.

from django.db import connection, transaction
from django.db import models as db
from django.db.models.expressions import RawSQL
from django.db.models.functions import Left, Length
from django.http import Http404, StreamingHttpResponse
from django.urls import reverse
from django.utils.decorators import method_decorator
from django.utils.functional import cached_property
from django.utils.text import capfirst, slugify
from django.utils.translation import gettext_lazy as _
Expand All @@ -31,9 +33,11 @@
from csp.constants import NONE
from csp.decorators import csp_update
from lasuite.malware_detection import malware_detection
from lasuite.oidc_login.decorators import refresh_oidc_access_token
from rest_framework import filters, status, viewsets
from rest_framework import response as drf_response
from rest_framework.permissions import AllowAny
from rest_framework.utils.urls import replace_query_param as drf_replace_query_param

from core import authentication, choices, enums, models
from core.services.ai_services import AIService
Expand All @@ -47,6 +51,10 @@
from core.services.converter_services import (
YdocConverter,
)
from core.services.search_indexers import (
get_document_indexer,
get_visited_document_ids_of,
)
from core.tasks.mail import send_ask_for_access_mail
from core.utils import extract_attachments, filter_descendants

Expand Down Expand Up @@ -373,6 +381,7 @@ class DocumentViewSet(
list_serializer_class = serializers.ListDocumentSerializer
trashbin_serializer_class = serializers.ListDocumentSerializer
tree_serializer_class = serializers.ListDocumentSerializer
search_serializer_class = serializers.ListDocumentSerializer

def get_queryset(self):
"""Get queryset performing all annotation and filtering on the document tree structure."""
Expand Down Expand Up @@ -1064,6 +1073,114 @@ def duplicate(self, request, *args, **kwargs):
{"id": str(duplicated_document.id)}, status=status.HTTP_201_CREATED
)

def _search_simple(self, request, text):
"""
Returns a queryset filtered by the content of the document title
"""
# As the 'list' view we get a prefiltered queryset (deleted docs are excluded)
queryset = self.get_queryset()
filterset = DocumentFilter({"title": text}, queryset=queryset)

if not filterset.is_valid():
raise drf.exceptions.ValidationError(filterset.errors)

queryset = filterset.filter_queryset(queryset)

return self.get_response_for_queryset(
queryset.order_by("-updated_at"),
context={
"request": request,
},
)

def _search_fulltext(self, indexer, request, params):
"""
Returns a queryset from the results the fulltext search of Find
"""
access_token = request.session.get("oidc_access_token")
user = request.user
text = params.validated_data["q"]
page_size = params.validated_data.get("page_size", 20)
page_number = params.validated_data.get("page", 1)
queryset = models.Document.objects.all()

# Retrieve the documents ids from Find.
results = indexer.search(
text=text,
token=access_token,
visited=get_visited_document_ids_of(queryset, user),
page=1,
page_size=min(200, (page_size * page_number) + 1),
)

docs_by_uuid = {str(d.pk): d for d in queryset.filter(pk__in=results)}
ordered_docs = [docs_by_uuid[id] for id in results]

paginator = Paginator(
ordered_docs, per_page=page_size, allow_empty_first_page=True
)

try:
page = paginator.page(page_number)
except InvalidPage as e:
raise drf.exceptions.NotFound(_("Invalid page.")) from e

serializer = self.get_serializer(
page.object_list,
many=True,
context={
"request": request,
},
)
next_url, prev_url = None, None

if page.has_next():
next_url = request.build_absolute_uri()
next_url = drf_replace_query_param(
next_url, "page", page.next_page_number()
)

if page.has_previous():
prev_url = request.build_absolute_uri()
prev_url = drf_replace_query_param(
prev_url, "page", page.previous_page_number()
)

output = {
"count": paginator.count,
"next": next_url,
"previous": prev_url,
"results": serializer.data,
}

return drf.response.Response(output)

@drf.decorators.action(detail=False, methods=["get"], url_path="search")
@method_decorator(refresh_oidc_access_token)
def search(self, request, *args, **kwargs):
"""
Returns a DRF response containing the filtered, annotated and ordered document list.

Applies filtering based on request parameter 'q' from `SearchDocumentSerializer`.
Depending of the configuration it can be:
- A fulltext search through the opensearch indexation app "find" if the backend is
enabled (see SEARCH_INDEXER_CLASS)
- A filtering by the model field 'title'.

The ordering is always by the most recent first.
"""
params = serializers.SearchDocumentSerializer(data=request.query_params)
params.is_valid(raise_exception=True)

indexer = get_document_indexer()

if indexer:
return self._search_fulltext(indexer, request, params=params)

# The indexer is not configured, we fallback on a simple icontains filter by the
# model field 'title'.
return self._search_simple(request, text=params.validated_data["q"])

@drf.decorators.action(detail=True, methods=["get"], url_path="versions")
def versions_list(self, request, *args, **kwargs):
"""
Expand Down
22 changes: 15 additions & 7 deletions src/backend/core/apps.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,19 @@
"""Impress Core application"""
# from django.apps import AppConfig
# from django.utils.translation import gettext_lazy as _

from django.apps import AppConfig
from django.utils.translation import gettext_lazy as _

# class CoreConfig(AppConfig):
# """Configuration class for the impress core app."""

# name = "core"
# app_label = "core"
# verbose_name = _("impress core application")
class CoreConfig(AppConfig):
"""Configuration class for the impress core app."""

name = "core"
app_label = "core"
verbose_name = _("Impress core application")

def ready(self):
"""
Import signals when the app is ready.
"""
# pylint: disable=import-outside-toplevel, unused-import
from . import signals # noqa: PLC0415
40 changes: 40 additions & 0 deletions src/backend/core/management/commands/index.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
"""
Handle search setup that needs to be done at bootstrap time.
"""

import logging
import time

from django.core.management.base import BaseCommand, CommandError

from core.services.search_indexers import get_document_indexer

logger = logging.getLogger("docs.search.bootstrap_search")


class Command(BaseCommand):
"""Index all documents to remote search service"""

help = __doc__

def handle(self, *args, **options):
"""Launch and log search index generation."""
indexer = get_document_indexer()

if not indexer:
raise CommandError("The indexer is not enabled or properly configured.")

logger.info("Starting to regenerate Find index...")
start = time.perf_counter()

try:
count = indexer.index()
except Exception as err:
raise CommandError("Unable to regenerate index") from err

duration = time.perf_counter() - start
logger.info(
"Search index regenerated from %d document(s) in %.2f seconds.",
count,
duration,
)
Loading