Skip to content

Conversation

@hannaseithe
Copy link
Contributor

@hannaseithe hannaseithe commented Aug 6, 2025

Short description

This is part 2 of this PR #3784. It shows one way we could set a Constraint that guarantees a slugs uniqueness (here only done for pagetranslation). In part I have implemented application level safeguards, but as mentioned there, they are not really sufficient to guarantee a slugs uniqueness. Using an ExclusionConstraint is only one way though, there are other options which I will explain:

Proposed changes

Why is a simple DB constraint not implementable as UniqueConstraint?

A unique constraint basically creates an Index (can be partial when used with condition). But we need the condition "where page_id differs from other page_id". Meaning we need to be comparing two rows, which is not doable neither in Django nor SQL.

But there are ways to ensure uniqueness of slugs on the database level especially with PostgreSQL. The different tools we need are

  • Database Triggers
  • Denormalization of a column (not so much a tool but a pattern)
  • ExclusionConstraint

A database trigger alone would allow us to before-hook into every INSERTor UPDATE on for example the cms_pagetranslation table and then call a function that checks for the uniqueness of the slug against the other pagetranslation rows and throws an exception if its not unique. I have great doubts though that this would be a sufficient solution, since on every Insert/Update-Trigger we would have to do a join on every single row, since we need to access the region field on the foreign key pageof every row we compare against. I have great doubts that this would not impact our performance noticably.

the trigger (using the pgtrigger library):

        triggers = [
            pgtrigger.Trigger(
                name="check_page_translation_slug",
                when=pgtrigger.Before,
                operation=pgtrigger.Insert | pgtrigger.Update,
                func="""
                BEGIN
                    -- Look up the region for the new/updated page
                    SELECT region_id INTO new_region_id
                    FROM cms_page
                    WHERE id = NEW.page_id;

                    -- Check if there's a conflict (same slug/language/region but different page)
                    IF EXISTS (
                        SELECT 1
                        FROM cms_pagetranslation t
                        JOIN cms_page p ON t.page_id = p.id
                        WHERE t.slug = NEW.slug
                        AND t.language_id = NEW.language_id
                        AND p.region_id = new_region_id
                        AND t.page_id <> NEW.page_id
                    ) THEN
                        RAISE EXCEPTION 'Slug must be unique per language and region across different pages.';
                    END IF;

                    RETURN NEW;
                END;
                """,
            ),
        ]

Triggers can either be directly defined inside a migrations.RunSQL (not the best way) or we can use the django-pgtrigger package, which allows us to define the trigger inside the META class of the model itself.

Another option than this elaborate trigger described above would be to use the PostgresQL specific ExclusionConstraint that is actually a constraint that does exactly what we want

            ExclusionConstraint(
                name='exclude_same_slug_lang_region_diff_page',
                expressions=[
                    (F('slug'), RangeOperators.EQUAL),
                    (F('language'), RangeOperators.EQUAL),
                    (F('region'), RangeOperators.EQUAL),
                    (F('page'), RangeOperators.NOT_EQUAL)
                ],
                index_type='gist',
            ),

BUT it cannot cope with joins, so (F('page__region', RangeOperators.Equal) wont work!

=> this basically forces us into the position to consider denormalization of the region_id column into pagetranslation
WHAT DO I MEAN BY THAT?
Create a field region on pagetranslation (or rather on AbstractContentTranslation) like this region = models.ForeignKey("cms.Region", null=True, editable=False, on_delete=models.CASCADE)

  • make it uneditable so it wont show up in forms
  • and create two triggers that keep it in sync with page.region:
  1. When page should ever change its region
  2. If a pagetranslation should ever change its page
    Both rare cases, but we need to be safe

Once we would have region denormalized on pagetranslation, we then can either chose to implement the trigger from above without the JOIN or use the ExclusionConstraint

How to test on the DB Level inside the shell

from django.db import connection

with connection.cursor() as cursor:
    cursor.execute("""
        INSERT INTO cms_pagetranslation (last_updated, automatic_translation, machine_translated, minor_edit, version, currently_in_translation, slug, title, status, content, language_id, region_id, page_id)
        VALUES ('2025-08-05 14:30:00+00', FALSE, FALSE, FALSE, 1000, FALSE, 'start', 'title', 'Draft', '', 1, 1,  1)
    """)
    cursor.execute("""
        INSERT INTO cms_pagetranslation (last_updated, automatic_translation, machine_translated, minor_edit, version, currently_in_translation, slug, title, status, content, language_id, region_id, page_id)
        VALUES ('2025-08-05 14:30:00+00', FALSE, FALSE, FALSE, 1000, FALSE, 'start', 'title', 'Draft', '', 1, 1, 2)
     """)

should throw an IntegrityError


Pull Request Review Guidelines

Side effects

Faithfulness to issue description and design

There are no intended deviations from the issue and design.

Resolved issues

Fixes: #3060
Blocked by: #3837
Related to: #3917


Pull Request Review Guidelines

@hannaseithe
Copy link
Contributor Author

hannaseithe commented Aug 11, 2025

Summary of options for Solution

Option 1: Database trigger with JOIN

  • on every *-translation insert/update
  • check all rows (and do a JOIN for each row), that there isnt another *-translation with a different page, the same language and the same region => otherwise throw database exception

Option 2: Denormalize region and Database trigger

  • denormalize region from * into *-translation table (use two triggers to keep them in sync)
  • create database trigger like in Option 1 without the JOIN

Option 3: Denormalize region & Exclusion constraint (implemented in the PoC)

  • denormalize like in Option 2
  • create an Exclusion Constraint

Option 4: Denormalize region & Create is_current field & Unique Constraint (with partial index) only for is_current version

  • denormalize as in Option 2
  • create is_current field on *-translation
  • create unique constraint mit WHERE is_current=TRUE bzw. condition=Q(is_current=True)

Pseudo-Option: Materialized View (instead of column denormalization) & Database Trigger:

This is (imo) not an option, because we would not guarantee absolute syncronicity of the Materialized View Table

@hannaseithe
Copy link
Contributor Author

hannaseithe commented Sep 8, 2025

In Django 5.1 we get GeneratedField which should be a better substitute than the denormalization solution. We would not need a trigger at all and could use the GeneratedField in combination with an exclusionConstraint (Generated Field basically has internal triggers that would keep the column up to date). Since there is a PR for upgrading to Django5.2 (#3837) . I will set a BLOCKED label and defer the implementation until we have upgraded to Django5.2

The generated field on pageTranslation model:

    page_region_id = models.GeneratedField(expression=F("page__region"), stored=True)

and the exclusionConstraint:

            ExclusionConstraint(
                name="exclude_same_slug_lang_region_diff_page",
                expressions=[
                    (F("slug"), RangeOperators.EQUAL),
                    (F("language"), RangeOperators.EQUAL),
                    (F("page_region_id"), RangeOperators.EQUAL),
                    (F("page"), RangeOperators.NOT_EQUAL),
                ],
                index_type="gist",
            ),

@michael-markl
Copy link
Member

michael-markl commented Nov 3, 2025

In Django 5.1 we get GeneratedField which should be a better substitute than the denormalization solution

I believe that GeneratedField does not support this use case. In the expression for the generated field, you can only reference fields of the same model, so page__region would not be allowed, right?

Apart from that, I think the trigger solution from Option 1 might be the cleanest, because we don't risk other data inconsistencies just to solve another data inconstency :) Also, I would assume, that the join does not have a huge performance impact since it will probably use the index on cms_page.id. For bulk operations, this might be a different story, but it's probably still feasible (?).

Edit: On the other hand, I am not sure whether Option 1 would be safe with regard to concurrency (as opposed to the ExclusionConstraint). We might need to acquire table locks (e.g. LOCK TABLE cms_page, cms_pagetranslations IN EXCLUSIVE MODE;) inside the trigger if that is possible (which I believe it is).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocked Blocked by external dependency

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multiple events have the same path

3 participants