-
Notifications
You must be signed in to change notification settings - Fork 0
Index to search #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
272d396
f3b2907
bc48ec9
7728b56
04cd06a
da2f360
409918e
43f3e50
9554640
8abe74f
65d4d75
2af7937
e1fdace
f44a7d0
c71c0d7
7aa725a
5617372
7fd532d
b1475df
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| secret: | ||
| ignored_matches: | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The ignored value contains a 44‐character key; storing static secrets in configuration is risky and may be better done via environment variables. |
||
| - name: | ||
| match: "na1hhus-OLhq9mb9SO3R-8E4dONuMnqpZSY_SX8xcFk=" | ||
| version: 2 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -43,6 +43,10 @@ venv.bak/ | |
| env.d/development/*.local | ||
| env.d/terraform | ||
|
|
||
| # Docker | ||
| compose.override.yml | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding "docker/auth/*.local" could inadvertently ignore local authentication files for all Docker builds; confirm this is intended.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding "docker/auth/*.local" could inadvertently ignore local authentication files for all Docker builds; confirm this is intended. |
||
| docker/auth/*.local | ||
|
|
||
| # npm | ||
| node_modules | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -74,6 +74,9 @@ and this project adheres to | |
| - ♿ update labels and shared document icon accessibility #1442 | ||
| - 🍱(frontend) Fonts GDPR compliants #1453 | ||
| - ♻️(service-worker) improve SW registration and update handling #1473 | ||
| - ✨(backend) add async indexation of documents on save (or access save) #1276 | ||
| - ✨(backend) add debounce mechanism to limit indexation jobs #1276 | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The same ticket number suitenumerique#1276 appears twice in the added feature list; duplicates can confuse readers and should be consolidated.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The same ticket number suitenumerique#1276 appears twice in the added feature list; duplicates can confuse readers and should be consolidated. |
||
| - ✨(api) add API route to search for indexed documents in Find #1276 | ||
|
|
||
| ### Fixed | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -247,6 +247,10 @@ demo: ## flush db then create a demo for load testing purpose | |
| @$(MANAGE) create_demo | ||
| .PHONY: demo | ||
|
|
||
| index: ## index all documents to remote search | ||
| @$(MANAGE) index | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The new index target references $(MANAGE) index but no phony dependency is added for 'demo', potentially causing unintended dependency ordering.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The new index target references $(MANAGE) index but no phony dependency is added for 'demo', potentially causing unintended dependency ordering. |
||
| .PHONY: index | ||
|
|
||
| # Nota bene: Black should come after isort just in case they don't agree... | ||
| lint: ## lint back-end python sources | ||
| lint: \ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The generated key is output with an extra newline; while harmless, it may cause downstream parsing errors if the consumer expects a plain key.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The generated key is output with an extra newline; while harmless, it may cause downstream parsing errors if the consumer expects a plain key. |
||
| # shellcheck source=bin/_config.sh | ||
| source "$(dirname "${BASH_SOURCE[0]}")/_config.sh" | ||
|
|
||
| _dc_run app-dev python -c 'from cryptography.fernet import Fernet;import sys; sys.stdout.write("\n" + Fernet.generate_key().decode() + "\n");' | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -72,6 +72,11 @@ services: | |
| - env.d/development/postgresql.local | ||
| ports: | ||
| - "8071:8000" | ||
| networks: | ||
| default: {} | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Network alias "impress" defined for the backend service may conflict with existing aliases; verify that no other service uses the same alias to avoid name clashes.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Network alias "impress" defined for the backend service may conflict with existing aliases; verify that no other service uses the same alias to avoid name clashes. |
||
| lasuite-net: | ||
| aliases: | ||
| - impress | ||
| volumes: | ||
| - ./src/backend:/app | ||
| - ./data/static:/data/static | ||
|
|
@@ -92,6 +97,9 @@ services: | |
| command: ["celery", "-A", "impress.celery_app", "worker", "-l", "DEBUG"] | ||
| environment: | ||
| - DJANGO_CONFIGURATION=Development | ||
| networks: | ||
| - default | ||
| - lasuite-net | ||
| env_file: | ||
| - env.d/development/common | ||
| - env.d/development/common.local | ||
|
|
@@ -107,6 +115,11 @@ services: | |
| image: nginx:1.25 | ||
| ports: | ||
| - "8083:8083" | ||
| networks: | ||
| default: {} | ||
| lasuite-net: | ||
| aliases: | ||
| - nginx | ||
| volumes: | ||
| - ./docker/files/etc/nginx/conf.d:/etc/nginx/conf.d:ro | ||
| depends_on: | ||
|
|
@@ -217,3 +230,8 @@ services: | |
| kc_postgresql: | ||
| condition: service_healthy | ||
| restart: true | ||
|
|
||
| networks: | ||
| lasuite-net: | ||
| name: lasuite-net | ||
| driver: bridge | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -49,6 +49,14 @@ LOGOUT_REDIRECT_URL=http://localhost:3000 | |
| OIDC_REDIRECT_ALLOWED_HOSTS=["http://localhost:8083", "http://localhost:3000"] | ||
| OIDC_AUTH_REQUEST_EXTRA_PARAMS={"acr_values": "eidas1"} | ||
|
|
||
| # Store OIDC tokens in the session | ||
| OIDC_STORE_ACCESS_TOKEN = True | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The placeholder value for OIDC_STORE_REFRESH_TOKEN_KEY looks like a toy 44‐character string; replace it with a real Fernet key before deployment.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The placeholder value for OIDC_STORE_REFRESH_TOKEN_KEY looks like a toy 44‐character string; replace it with a real Fernet key before deployment. |
||
| OIDC_STORE_REFRESH_TOKEN = True # Store the encrypted refresh token in the session. | ||
|
|
||
| # Must be a valid Fernet key (32 url-safe base64-encoded bytes) | ||
| # To create one, use the bin/fernetkey command. | ||
| OIDC_STORE_REFRESH_TOKEN_KEY = "na1hhus-OLhq9mb9SO3R-8E4dONuMnqpZSY_SX8xcFk=" | ||
|
|
||
| # AI | ||
| AI_FEATURE_ENABLED=true | ||
| AI_BASE_URL=https://openaiendpoint.com | ||
|
|
@@ -68,4 +76,10 @@ Y_PROVIDER_API_BASE_URL=http://y-provider-development:4444/api/ | |
| Y_PROVIDER_API_KEY=yprovider-api-key | ||
|
|
||
| # Theme customization | ||
| THEME_CUSTOMIZATION_CACHE_TIMEOUT=15 | ||
| THEME_CUSTOMIZATION_CACHE_TIMEOUT=15 | ||
|
|
||
| # Indexer | ||
| SEARCH_INDEXER_CLASS="core.services.search_indexers.SearchIndexer" | ||
| SEARCH_INDEXER_SECRET=find-api-key-for-docs-with-exactly-50-chars-length # Key generated by create_demo in Find app. | ||
| SEARCH_INDEXER_URL="http://find:8000/api/v1.0/documents/index/" | ||
| SEARCH_INDEXER_QUERY_URL="http://find:8000/api/v1.0/documents/search/" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -889,3 +889,13 @@ class MoveDocumentSerializer(serializers.Serializer): | |
| choices=enums.MoveNodePositionChoices.choices, | ||
| default=enums.MoveNodePositionChoices.LAST_CHILD, | ||
| ) | ||
|
|
||
|
|
||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The string literal for the page field ends with an unmatched single quote, causing a syntax error that will prevent the module from loading.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The string literal for the page field ends with an unmatched single quote, causing a syntax error that will prevent the module from loading. |
||
| class SearchDocumentSerializer(serializers.Serializer): | ||
| """Serializer for fulltext search requests through Find application""" | ||
|
|
||
| q = serializers.CharField(required=True, allow_blank=False, trim_whitespace=True) | ||
| page_size = serializers.IntegerField( | ||
| required=False, min_value=1, max_value=50, default=20 | ||
| ) | ||
| page = serializers.IntegerField(required=False, min_value=1, default=1) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,13 +14,15 @@ | |
| from django.core.cache import cache | ||
| from django.core.exceptions import ValidationError | ||
| from django.core.files.storage import default_storage | ||
| from django.core.paginator import InvalidPage, Paginator | ||
| from django.core.validators import URLValidator | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. search_serializer_class is set to ListDocumentSerializer, but the new SearchDocumentSerializer was added; this mismatch will make search responses use the wrong serializer.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. search_serializer_class is set to ListDocumentSerializer, but the new SearchDocumentSerializer was added; this mismatch will make search responses use the wrong serializer. |
||
| from django.db import connection, transaction | ||
| from django.db import models as db | ||
| from django.db.models.expressions import RawSQL | ||
| from django.db.models.functions import Left, Length | ||
| from django.http import Http404, StreamingHttpResponse | ||
| from django.urls import reverse | ||
| from django.utils.decorators import method_decorator | ||
| from django.utils.functional import cached_property | ||
| from django.utils.text import capfirst, slugify | ||
| from django.utils.translation import gettext_lazy as _ | ||
|
|
@@ -31,9 +33,11 @@ | |
| from csp.constants import NONE | ||
| from csp.decorators import csp_update | ||
| from lasuite.malware_detection import malware_detection | ||
| from lasuite.oidc_login.decorators import refresh_oidc_access_token | ||
| from rest_framework import filters, status, viewsets | ||
| from rest_framework import response as drf_response | ||
| from rest_framework.permissions import AllowAny | ||
| from rest_framework.utils.urls import replace_query_param as drf_replace_query_param | ||
|
|
||
| from core import authentication, choices, enums, models | ||
| from core.services.ai_services import AIService | ||
|
|
@@ -47,6 +51,10 @@ | |
| from core.services.converter_services import ( | ||
| YdocConverter, | ||
| ) | ||
| from core.services.search_indexers import ( | ||
| get_document_indexer, | ||
| get_visited_document_ids_of, | ||
| ) | ||
| from core.tasks.mail import send_ask_for_access_mail | ||
| from core.utils import extract_attachments, filter_descendants | ||
|
|
||
|
|
@@ -373,6 +381,7 @@ class DocumentViewSet( | |
| list_serializer_class = serializers.ListDocumentSerializer | ||
| trashbin_serializer_class = serializers.ListDocumentSerializer | ||
| tree_serializer_class = serializers.ListDocumentSerializer | ||
| search_serializer_class = serializers.ListDocumentSerializer | ||
|
|
||
| def get_queryset(self): | ||
| """Get queryset performing all annotation and filtering on the document tree structure.""" | ||
|
|
@@ -1064,6 +1073,114 @@ def duplicate(self, request, *args, **kwargs): | |
| {"id": str(duplicated_document.id)}, status=status.HTTP_201_CREATED | ||
| ) | ||
|
|
||
| def _search_simple(self, request, text): | ||
| """ | ||
| Returns a queryset filtered by the content of the document title | ||
| """ | ||
| # As the 'list' view we get a prefiltered queryset (deleted docs are excluded) | ||
| queryset = self.get_queryset() | ||
| filterset = DocumentFilter({"title": text}, queryset=queryset) | ||
|
|
||
| if not filterset.is_valid(): | ||
| raise drf.exceptions.ValidationError(filterset.errors) | ||
|
|
||
| queryset = filterset.filter_queryset(queryset) | ||
|
|
||
| return self.get_response_for_queryset( | ||
| queryset.order_by("-updated_at"), | ||
| context={ | ||
| "request": request, | ||
| }, | ||
| ) | ||
|
|
||
| def _search_fulltext(self, indexer, request, params): | ||
| """ | ||
| Returns a queryset from the results the fulltext search of Find | ||
| """ | ||
| access_token = request.session.get("oidc_access_token") | ||
| user = request.user | ||
| text = params.validated_data["q"] | ||
| page_size = params.validated_data.get("page_size", 20) | ||
| page_number = params.validated_data.get("page", 1) | ||
| queryset = models.Document.objects.all() | ||
|
|
||
| # Retrieve the documents ids from Find. | ||
| results = indexer.search( | ||
| text=text, | ||
| token=access_token, | ||
| visited=get_visited_document_ids_of(queryset, user), | ||
| page=1, | ||
| page_size=min(200, (page_size * page_number) + 1), | ||
| ) | ||
|
|
||
| docs_by_uuid = {str(d.pk): d for d in queryset.filter(pk__in=results)} | ||
| ordered_docs = [docs_by_uuid[id] for id in results] | ||
|
|
||
| paginator = Paginator( | ||
| ordered_docs, per_page=page_size, allow_empty_first_page=True | ||
| ) | ||
|
|
||
| try: | ||
| page = paginator.page(page_number) | ||
| except InvalidPage as e: | ||
| raise drf.exceptions.NotFound(_("Invalid page.")) from e | ||
|
|
||
| serializer = self.get_serializer( | ||
| page.object_list, | ||
| many=True, | ||
| context={ | ||
| "request": request, | ||
| }, | ||
| ) | ||
| next_url, prev_url = None, None | ||
|
|
||
| if page.has_next(): | ||
| next_url = request.build_absolute_uri() | ||
| next_url = drf_replace_query_param( | ||
| next_url, "page", page.next_page_number() | ||
| ) | ||
|
|
||
| if page.has_previous(): | ||
| prev_url = request.build_absolute_uri() | ||
| prev_url = drf_replace_query_param( | ||
| prev_url, "page", page.previous_page_number() | ||
| ) | ||
|
|
||
| output = { | ||
| "count": paginator.count, | ||
| "next": next_url, | ||
| "previous": prev_url, | ||
| "results": serializer.data, | ||
| } | ||
|
|
||
| return drf.response.Response(output) | ||
|
|
||
| @drf.decorators.action(detail=False, methods=["get"], url_path="search") | ||
| @method_decorator(refresh_oidc_access_token) | ||
| def search(self, request, *args, **kwargs): | ||
| """ | ||
| Returns a DRF response containing the filtered, annotated and ordered document list. | ||
|
|
||
| Applies filtering based on request parameter 'q' from `SearchDocumentSerializer`. | ||
| Depending of the configuration it can be: | ||
| - A fulltext search through the opensearch indexation app "find" if the backend is | ||
| enabled (see SEARCH_INDEXER_CLASS) | ||
| - A filtering by the model field 'title'. | ||
|
|
||
| The ordering is always by the most recent first. | ||
| """ | ||
| params = serializers.SearchDocumentSerializer(data=request.query_params) | ||
| params.is_valid(raise_exception=True) | ||
|
|
||
| indexer = get_document_indexer() | ||
|
|
||
| if indexer: | ||
| return self._search_fulltext(indexer, request, params=params) | ||
|
|
||
| # The indexer is not configured, we fallback on a simple icontains filter by the | ||
| # model field 'title'. | ||
| return self._search_simple(request, text=params.validated_data["q"]) | ||
|
|
||
| @drf.decorators.action(detail=True, methods=["get"], url_path="versions") | ||
| def versions_list(self, request, *args, **kwargs): | ||
| """ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,11 +1,19 @@ | ||
| """Impress Core application""" | ||
| # from django.apps import AppConfig | ||
| # from django.utils.translation import gettext_lazy as _ | ||
|
|
||
| from django.apps import AppConfig | ||
| from django.utils.translation import gettext_lazy as _ | ||
|
|
||
| # class CoreConfig(AppConfig): | ||
| # """Configuration class for the impress core app.""" | ||
|
|
||
| # name = "core" | ||
| # app_label = "core" | ||
| # verbose_name = _("impress core application") | ||
| class CoreConfig(AppConfig): | ||
| """Configuration class for the impress core app.""" | ||
|
|
||
| name = "core" | ||
| app_label = "core" | ||
| verbose_name = _("Impress core application") | ||
|
|
||
| def ready(self): | ||
| """ | ||
| Import signals when the app is ready. | ||
| """ | ||
| # pylint: disable=import-outside-toplevel, unused-import | ||
| from . import signals # noqa: PLC0415 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| """ | ||
| Handle search setup that needs to be done at bootstrap time. | ||
| """ | ||
|
|
||
| import logging | ||
| import time | ||
|
|
||
| from django.core.management.base import BaseCommand, CommandError | ||
|
|
||
| from core.services.search_indexers import get_document_indexer | ||
|
|
||
| logger = logging.getLogger("docs.search.bootstrap_search") | ||
|
|
||
|
|
||
| class Command(BaseCommand): | ||
| """Index all documents to remote search service""" | ||
|
|
||
| help = __doc__ | ||
|
|
||
| def handle(self, *args, **options): | ||
| """Launch and log search index generation.""" | ||
| indexer = get_document_indexer() | ||
|
|
||
| if not indexer: | ||
| raise CommandError("The indexer is not enabled or properly configured.") | ||
|
|
||
| logger.info("Starting to regenerate Find index...") | ||
| start = time.perf_counter() | ||
|
|
||
| try: | ||
| count = indexer.index() | ||
| except Exception as err: | ||
| raise CommandError("Unable to regenerate index") from err | ||
|
|
||
| duration = time.perf_counter() - start | ||
| logger.info( | ||
| "Search index regenerated from %d document(s) in %.2f seconds.", | ||
| count, | ||
| duration, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ignored value contains a 44‐character key; storing static secrets in configuration is risky and may be better done via environment variables.