Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
fd49f28
🔧(compose) configure external network for communication with search
sampaccoud Jul 23, 2025
254a541
✨(backend) add dummy content to demo documents
sampaccoud Aug 6, 2025
cb77f68
✨(backend) add document search indexer
sampaccoud Jul 24, 2025
1a9011f
✨(backend) add async triggers to enable document indexation with find
sampaccoud Aug 6, 2025
0145a05
🔧(compose) Add some ignore for docker-compose local overrides
joehybird Aug 13, 2025
47fd1df
✨(backend) add unit test for the 'index' command
joehybird Aug 13, 2025
1b36f6b
✨(backend) add document search view
joehybird Aug 13, 2025
ff08b3e
✨(backend) improve search indexer service configuration
joehybird Sep 11, 2025
f49902a
✨(backend) refactor indexation signals and fix circular import issues
joehybird Sep 12, 2025
9525b97
✨(backend) add fallback search & default ordering
joehybird Sep 17, 2025
57a6f73
✨(backend) Index partially empty documents
joehybird Sep 22, 2025
ad92bf6
✨(backend) Index deleted documents
joehybird Sep 24, 2025
f9c0a2b
🔧(backend) tool for valid fernet key used in OIDC token storage
joehybird Oct 1, 2025
bc89551
🔧(backend) setup Docs app dockers to work with Find
joehybird Oct 6, 2025
33f1555
✨(backend) some refactor of indexer classes & modules
joehybird Oct 7, 2025
337129c
✨(backend) throttle indexation tasks instead of debounce (simplier)
joehybird Oct 14, 2025
4974b5c
✨(backend) keep ordering from fulltext search in results
joehybird Oct 31, 2025
78e2bc6
🔧(compose) disable indexer in default configuration
joehybird Nov 3, 2025
a83f140
📝(backend) add fulltext search documentation
joehybird Oct 3, 2025
5427f18
✨(backend) use batches in indexing task
joehybird Oct 31, 2025
3652732
🩹(backend) fix empty indexation batch
joehybird Nov 14, 2025
f2106dd
✨(backend) adapt to Find new search pagination
joehybird Nov 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ venv.bak/
env.d/development/*.local
env.d/terraform

# Docker
compose.override.yml
docker/auth/*.local

# npm
node_modules

Expand Down
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,9 @@ and this project adheres to
- ♿ update labels and shared document icon accessibility #1442
- 🍱(frontend) Fonts GDPR compliants #1453
- ♻️(service-worker) improve SW registration and update handling #1473
- ✨(backend) add async indexation of documents on save (or access save) #1276
- ✨(backend) add debounce mechanism to limit indexation jobs #1276
- ✨(api) add API route to search for indexed documents in Find #1276

### Fixed

Expand Down
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,10 @@ demo: ## flush db then create a demo for load testing purpose
@$(MANAGE) create_demo
.PHONY: demo

index: ## index all documents to remote search
@$(MANAGE) index
.PHONY: index

# Nota bene: Black should come after isort just in case they don't agree...
lint: ## lint back-end python sources
lint: \
Expand Down
6 changes: 6 additions & 0 deletions bin/fernetkey
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env bash

# shellcheck source=bin/_config.sh
source "$(dirname "${BASH_SOURCE[0]}")/_config.sh"

_dc_run app-dev python -c 'from cryptography.fernet import Fernet;import sys; sys.stdout.write("\n" + Fernet.generate_key().decode() + "\n");'
18 changes: 18 additions & 0 deletions compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,11 @@ services:
- env.d/development/postgresql.local
ports:
- "8071:8000"
networks:
default: {}
lasuite:
aliases:
- impress
volumes:
- ./src/backend:/app
- ./data/static:/data/static
Expand All @@ -92,6 +97,9 @@ services:
command: ["celery", "-A", "impress.celery_app", "worker", "-l", "DEBUG"]
environment:
- DJANGO_CONFIGURATION=Development
networks:
- default
- lasuite
env_file:
- env.d/development/common
- env.d/development/common.local
Expand All @@ -107,6 +115,11 @@ services:
image: nginx:1.25
ports:
- "8083:8083"
networks:
default: {}
lasuite:
aliases:
- nginx
volumes:
- ./docker/files/etc/nginx/conf.d:/etc/nginx/conf.d:ro
depends_on:
Expand Down Expand Up @@ -217,3 +230,8 @@ services:
kc_postgresql:
condition: service_healthy
restart: true

networks:
lasuite:
name: lasuite-network
driver: bridge
1 change: 1 addition & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ flowchart TD
Back --> DB("Database (PostgreSQL)")
Back <--> Celery --> DB
Back ----> S3("Minio (S3)")
Back -- REST API --> Find
```

### Architecture decision records
Expand Down
7 changes: 7 additions & 0 deletions docs/env.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,13 @@ These are the environment variables you can set for the `impress-backend` contai
| OIDC_USERINFO_SHORTNAME_FIELD | OIDC token claims to create shortname | first_name |
| POSTHOG_KEY | Posthog key for analytics | |
| REDIS_URL | Cache url | redis://redis:6379/1 |
| SEARCH_INDEXER_CLASS | Class of the backend for document indexation & search | |
| SEARCH_INDEXER_BATCH_SIZE | Size of each batch for indexation of all documents | 100000 |
| SEARCH_INDEXER_COUNTDOWN | Minimum debounce delay of indexation jobs (in seconds) | 1 |
| SEARCH_INDEXER_URL | Find application endpoint for indexation | |
| SEARCH_INDEXER_SECRET | Token for indexation queries | |
| SEARCH_INDEXER_QUERY_URL | Find application endpoint for search | |
| SEARCH_INDEXER_QUERY_LIMIT | Maximum number of results expected from search endpoint | 50 |
| SENTRY_DSN | Sentry host | |
| SESSION_COOKIE_AGE | duration of the cookie session | 60*60*12 |
| SPECTACULAR_SETTINGS_ENABLE_DJANGO_DEPLOY_CHECK | | false |
Expand Down
41 changes: 41 additions & 0 deletions docs/search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Setup the Find search for Impress

This configuration will enable the fulltext search feature for Docs :
- Each save on **core.Document** or **core.DocumentAccess** will trigger the indexer
- The `api/v1.0/documents/search/` will work as a proxy with the Find API for fulltext search.

## Create an index service for Docs

Configure a **Service** for Docs application with these settings

- **Name**: `docs`<br>_request.auth.name of the Docs application._
- **Client id**: `impress`<br>_Name of the token audience or client_id of the Docs application._

See [how-to-use-indexer.md](how-to-use-indexer.md) for details.

## Configure settings of Docs

Add those Django settings the Docs application to enable the feature.

```shell
SEARCH_INDEXER_CLASS="core.services.search_indexers.FindDocumentIndexer"
SEARCH_INDEXER_COUNTDOWN=10 # Debounce delay in seconds for the indexer calls.

# The token from service "docs" of Find application (development).
SEARCH_INDEXER_SECRET="find-api-key-for-docs-with-exactly-50-chars-length"
SEARCH_INDEXER_URL="http://find:8000/api/v1.0/documents/index/"

# Search endpoint. Uses the OIDC token for authentication
SEARCH_INDEXER_QUERY_URL="http://find:8000/api/v1.0/documents/search/"
# Maximum number of results expected from the search endpoint
SEARCH_INDEXER_QUERY_LIMIT=50
```

We also need to enable the **OIDC Token** refresh or the authentication will fail quickly.

```shell
# Store OIDC tokens in the session
OIDC_STORE_ACCESS_TOKEN = True # Store the access token in the session
OIDC_STORE_REFRESH_TOKEN = True # Store the encrypted refresh token in the session
OIDC_STORE_REFRESH_TOKEN_KEY = "your-32-byte-encryption-key==" # Must be a valid Fernet key (32 url-safe base64-encoded bytes)
```
11 changes: 11 additions & 0 deletions docs/system-requirements.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,17 @@ Production deployments differ significantly from development environments. The t
| 5433 | PostgreSQL (Keycloak) |
| 1081 | MailCatcher |

**With fulltext search service**

| Port | Service |
| --------- | --------------------- |
| 8081 | Find (Django) |
| 9200 | Opensearch |
| 9600 | Opensearch admin |
| 5601 | Opensearch dashboard |
| 25432 | PostgreSQL (Find) |


## 6. Sizing Guidelines

**RAM** – start at 8 GB dev / 16 GB staging / 32 GB prod. Postgres and Keycloak are the first to OOM; scale them first.
Expand Down
17 changes: 16 additions & 1 deletion env.d/development/common
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ OIDC_OP_JWKS_ENDPOINT=http://nginx:8083/realms/impress/protocol/openid-connect/c
OIDC_OP_AUTHORIZATION_ENDPOINT=http://localhost:8083/realms/impress/protocol/openid-connect/auth
OIDC_OP_TOKEN_ENDPOINT=http://nginx:8083/realms/impress/protocol/openid-connect/token
OIDC_OP_USER_ENDPOINT=http://nginx:8083/realms/impress/protocol/openid-connect/userinfo
OIDC_OP_INTROSPECTION_ENDPOINT=http://nginx:8083/realms/impress/protocol/openid-connect/token/introspect

OIDC_RP_CLIENT_ID=impress
OIDC_RP_CLIENT_SECRET=ThisIsAnExampleKeyForDevPurposeOnly
Expand All @@ -49,6 +50,14 @@ LOGOUT_REDIRECT_URL=http://localhost:3000
OIDC_REDIRECT_ALLOWED_HOSTS=["http://localhost:8083", "http://localhost:3000"]
OIDC_AUTH_REQUEST_EXTRA_PARAMS={"acr_values": "eidas1"}

# Store OIDC tokens in the session
OIDC_STORE_ACCESS_TOKEN = True
OIDC_STORE_REFRESH_TOKEN = True # Store the encrypted refresh token in the session.

# Must be a valid Fernet key (32 url-safe base64-encoded bytes)
# To create one, use the bin/fernetkey command.
# OIDC_STORE_REFRESH_TOKEN_KEY="your-32-byte-encryption-key=="

# AI
AI_FEATURE_ENABLED=true
AI_BASE_URL=https://openaiendpoint.com
Expand All @@ -68,4 +77,10 @@ Y_PROVIDER_API_BASE_URL=http://y-provider-development:4444/api/
Y_PROVIDER_API_KEY=yprovider-api-key

# Theme customization
THEME_CUSTOMIZATION_CACHE_TIMEOUT=15
THEME_CUSTOMIZATION_CACHE_TIMEOUT=15

# Indexer (disabled)
# SEARCH_INDEXER_CLASS="core.services.search_indexers.SearchIndexer"
SEARCH_INDEXER_SECRET=find-api-key-for-docs-with-exactly-50-chars-length # Key generated by create_demo in Find app.
SEARCH_INDEXER_URL="http://find:8000/api/v1.0/documents/index/"
SEARCH_INDEXER_QUERY_URL="http://find:8000/api/v1.0/documents/search/"
10 changes: 10 additions & 0 deletions src/backend/core/api/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -889,3 +889,13 @@ class MoveDocumentSerializer(serializers.Serializer):
choices=enums.MoveNodePositionChoices.choices,
default=enums.MoveNodePositionChoices.LAST_CHILD,
)


class SearchDocumentSerializer(serializers.Serializer):
"""Serializer for fulltext search requests through Find application"""

q = serializers.CharField(required=True, allow_blank=False, trim_whitespace=True)
page_size = serializers.IntegerField(
required=False, min_value=1, max_value=50, default=20
)
page = serializers.IntegerField(required=False, min_value=1, default=1)
84 changes: 84 additions & 0 deletions src/backend/core/api/viewsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from django.db.models.functions import Left, Length
from django.http import Http404, StreamingHttpResponse
from django.urls import reverse
from django.utils.decorators import method_decorator
from django.utils.functional import cached_property
from django.utils.text import capfirst, slugify
from django.utils.translation import gettext_lazy as _
Expand All @@ -31,6 +32,7 @@
from csp.constants import NONE
from csp.decorators import csp_update
from lasuite.malware_detection import malware_detection
from lasuite.oidc_login.decorators import refresh_oidc_access_token
from rest_framework import filters, status, viewsets
from rest_framework import response as drf_response
from rest_framework.permissions import AllowAny
Expand All @@ -47,6 +49,10 @@
from core.services.converter_services import (
YdocConverter,
)
from core.services.search_indexers import (
get_document_indexer,
get_visited_document_ids_of,
)
from core.tasks.mail import send_ask_for_access_mail
from core.utils import extract_attachments, filter_descendants

Expand Down Expand Up @@ -373,6 +379,7 @@ class DocumentViewSet(
list_serializer_class = serializers.ListDocumentSerializer
trashbin_serializer_class = serializers.ListDocumentSerializer
tree_serializer_class = serializers.ListDocumentSerializer
search_serializer_class = serializers.ListDocumentSerializer

def get_queryset(self):
"""Get queryset performing all annotation and filtering on the document tree structure."""
Expand Down Expand Up @@ -1064,6 +1071,83 @@ def duplicate(self, request, *args, **kwargs):
{"id": str(duplicated_document.id)}, status=status.HTTP_201_CREATED
)

def _search_simple(self, request, text):
"""
Returns a queryset filtered by the content of the document title
"""
# As the 'list' view we get a prefiltered queryset (deleted docs are excluded)
queryset = self.get_queryset()
filterset = DocumentFilter({"title": text}, queryset=queryset)

if not filterset.is_valid():
raise drf.exceptions.ValidationError(filterset.errors)

queryset = filterset.filter_queryset(queryset)

return self.get_response_for_queryset(
queryset.order_by("-updated_at"),
context={
"request": request,
},
)

def _search_fulltext(self, indexer, request, params):
"""
Returns a queryset from the results the fulltext search of Find
"""
access_token = request.session.get("oidc_access_token")
user = request.user
text = params.validated_data["q"]
queryset = models.Document.objects.all()

# Retrieve the documents ids from Find.
results = indexer.search(
text=text,
token=access_token,
visited=get_visited_document_ids_of(queryset, user),
)

docs_by_uuid = {str(d.pk): d for d in queryset.filter(pk__in=results)}
ordered_docs = [docs_by_uuid[id] for id in results]

page = self.paginate_queryset(ordered_docs)

serializer = self.get_serializer(
page if page else ordered_docs,
many=True,
context={
"request": request,
},
)

return self.get_paginated_response(serializer.data)

@drf.decorators.action(detail=False, methods=["get"], url_path="search")
@method_decorator(refresh_oidc_access_token)
def search(self, request, *args, **kwargs):
"""
Returns a DRF response containing the filtered, annotated and ordered document list.

Applies filtering based on request parameter 'q' from `SearchDocumentSerializer`.
Depending of the configuration it can be:
- A fulltext search through the opensearch indexation app "find" if the backend is
enabled (see SEARCH_INDEXER_CLASS)
- A filtering by the model field 'title'.

The ordering is always by the most recent first.
"""
params = serializers.SearchDocumentSerializer(data=request.query_params)
params.is_valid(raise_exception=True)

indexer = get_document_indexer()

if indexer:
return self._search_fulltext(indexer, request, params=params)

# The indexer is not configured, we fallback on a simple icontains filter by the
# model field 'title'.
return self._search_simple(request, text=params.validated_data["q"])

@drf.decorators.action(detail=True, methods=["get"], url_path="versions")
def versions_list(self, request, *args, **kwargs):
"""
Expand Down
22 changes: 15 additions & 7 deletions src/backend/core/apps.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,19 @@
"""Impress Core application"""
# from django.apps import AppConfig
# from django.utils.translation import gettext_lazy as _

from django.apps import AppConfig
from django.utils.translation import gettext_lazy as _

# class CoreConfig(AppConfig):
# """Configuration class for the impress core app."""

# name = "core"
# app_label = "core"
# verbose_name = _("impress core application")
class CoreConfig(AppConfig):
"""Configuration class for the impress core app."""

name = "core"
app_label = "core"
verbose_name = _("Impress core application")

def ready(self):
"""
Import signals when the app is ready.
"""
# pylint: disable=import-outside-toplevel, unused-import
from . import signals # noqa: PLC0415
Loading
Loading