Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions changelogs/DP-45831.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#
# Write your changelog entry here. Every pull request must have a changelog yml file.
#
# Change types:
# #############################################################################
# You can use one of the following types:
# - Added: For new features.
# - Changed: For changes to existing functionality.
# - Deprecated: For soon-to-be removed features.
# - Removed: For removed features.
# - Fixed: For any bug fixes.
# - Security: In case of vulnerabilities.
#
# Format
# #############################################################################
# The format is crucial. Please follow the examples below. For reference, the requirements are:
# - All 3 parts are required and you must include "Type", "description" and "issue".
# - "Type" must be left aligned and followed by a colon.
# - "description" must be indented with 2 spaces followed by a colon
# - "issue" must be indented with 4 spaces followed by a colon.
# - "issue" is for the Jira ticket number only e.g. DP-1234
# - No extra spaces, indents, or blank lines are allowed.
#
# Example:
# #############################################################################
# Fixed:
# - description: Fixes scrolling on edit pages in Safari.
# issue: DP-13314
#
# You may add more than 1 description & issue for each type using the following format:
# Changed:
# - description: Automating the release branch.
# issue: DP-10166
# - description: Second change item that needs a description.
# issue: DP-19875
# - description: Third change item that needs a description along with an issue.
# issue: DP-19843
#
Changed:
- description: Update intenal links on mass.gov that are redirects.
issue: DP-45831
1 change: 1 addition & 0 deletions conf/drupal/config/core.extension.yml
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ module:
mass_microsites: 0
mass_more_lists: 0
mass_nav: 0
mass_redirect_normalizer: 0
mass_redirects: 0
mass_scheduled_transitions: 0
mass_schema_apply_action: 0
Expand Down
144 changes: 144 additions & 0 deletions docroot/modules/custom/mass_redirect_normalizer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Redirect Link Normalizer

This module rewrites internal links that still point at **redirect source paths** so they use the **final path** instead. For rich text, when the final target is a node, it also adds `data-entity-*` attributes.

The same logic runs in two places:

- **Bulk Drush command** — scan many entities and fix stored values.
- **`hook_entity_presave()`** — when an editor saves a node or paragraph, links are normalized on that save.

---

## What gets scanned

For each **node** or **paragraph**, the code looks at:

- Text fields: `text_long`, `text_with_summary`, `string_long` (HTML `href` values inside the markup).
- **Link** fields (`link` type): the stored URI.

It does **not** change random text; it only rewrites values the resolver treats as redirect-based internal links (see integration tests for examples).

## Why there are two classes

- `RedirectLinkResolver`:
- Only link logic.
- It finds the final path and rewrites one text value or one link value.
- It does **not** save entities.
- `RedirectLinkNormalizationManager`:
- Entity workflow logic.
- It loops fields on node/paragraph, calls the resolver, handles dry-run, and saves revisions when needed.

This split makes the code easier to test and maintain.

---

## Drush command

| Item | Value |
|------|--------|
| Command | `mass-redirect-normalizer:normalize-links` |
| Alias | `mnrl` |

### Options

| Option | Meaning |
|--------|---------|
| `--simulate` | Dry run: **no** database writes. Same idea as global `ddev drush --simulate ...`. |
| `--limit=N` | Max eligible entities to process **total** across node + paragraph. Command stops when it reaches `N`. **`0` = no limit. |
| `--bundle=...` | Only that bundle (node type or paragraph type machine name). Still checked after load. |
| `--entity-ids=1,2,3` | Only these IDs. IDs are checked in both node and paragraph entities. Ignores `--limit`. |

By default, bulk command processes only **published** content.

- Nodes must be published.
- Paragraphs are processed only when their parent node is published.
- If a published node has a newer unpublished draft revision, that node and its
child paragraphs are skipped by bulk command (so we do not touch draft work).

### Default table columns

| Column | Notes |
|--------|--------|
| Status | `would_update` (simulate) or `updated` (real run). |
| Entity type | `node` or `paragraph`. |
| Entity ID | Entity id. |
| Parent node ID | For **paragraphs**, the host node id from `Helper::getParentNode()`. For nodes, `-`. |
| Bundle | Bundle / type machine name. |
| URL before / URL after | This is just the link value, not full HTML. For link fields, it shows the stored path/URL. For text fields, it shows only links that changed (`href`). If many links changed in one field, they are joined with `; `. If the value is too long, CLI shortens it. |

### What the command skips

- **Orphan paragraphs** — paragraphs that are not attached to real host content (`Helper::isParagraphOrphan()`). They are **not** processed and **do not** appear as rows.
- Entities with **no** redirect-based links to fix produce **no** rows (empty table is normal).
- Unpublished/trashed content is skipped.
- Published content with newer unpublished draft revisions is skipped.

### Simulate, then run, then verify (manual QA)

1. **Preview:**
`ddev drush mass-redirect-normalizer:normalize-links --simulate --limit=100`
2. **Apply:**
`ddev drush mass-redirect-normalizer:normalize-links --limit=100`
3. **Re-check:** run **simulate** again with the same filters. Items that were fixed should **not** show `would_update` anymore (unless something else changed them back).

For big runs, command prints progress notice every 100 processed entities. This
is expected and helps confirm it is still running.

For a narrow retest after you know specific IDs:

`ddev drush mass-redirect-normalizer:normalize-links --simulate --entity-ids=123,456`

### Important detail about saved content

On **first save**, `hook_entity_presave()` may already rewrite links in the stored field values. So if you create test content in the UI and then expect the bulk command to “see” the old redirect URL in the database, it might already be normalized. The automated tests handle that case where needed.

Document links in entity-reference-only fields:

- If the field stores only an entity reference (no URL/href string), this
command does not rewrite that stored reference value.
- If a document URL appears in supported text/link fields and points through a
redirect, it is covered by this command.

---

## Automated tests

Existing-site integration tests live here:

`docroot/modules/custom/mass_redirect_normalizer/tests/src/ExistingSite/RedirectLinkNormalizationTest.php`

Run tests:

```bash
ddev exec ./vendor/bin/phpunit docroot/modules/custom/mass_redirect_normalizer/tests/src/ExistingSite/RedirectLinkNormalizationTest.php
```

### What is covered

- Redirect chain resolution (including query and fragment support).
- Rich-text rewriting (`href`) and node metadata attributes (`data-entity-*`).
- Link field URI normalization (`internal:/...` and absolute local mass.gov URLs).
- Redirect loops and max-depth behavior (no infinite follow, expected stop point).
- External URL behavior (ignored; no rewrite).
- Alias-like non-node targets (rewrite link, but do not add node metadata).
- Presave normalization path for nodes (`hook_entity_presave()` behavior).
- Manager behavior:
- Run it twice gives same result (first run fixes links, second run has nothing new to fix).
- Multi-value link field handling (only redirecting values change).
- Link item metadata preservation (`title`, `options`).
- Drush command behavior:
- Bundle filter.
- Targeted runs with `--entity-ids`.
- Simulate mode row output (`would_update`) and URL before/after columns.

---

## Periodic / bulk cleanup

Use the Drush command above for one-off or scheduled bulk runs.

---

## Post-run usage refresh

For large backfills, regenerate entity usage so usage reports stay accurate.
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
name: Mass Redirect Normalizer
type: module
description: Normalize internal links that point at redirects to their final targets.
core_version_requirement: ^10 || ^11
package: Custom
dependencies:
- mass_fields:mass_fields
- mass_content:mass_content
- mayflower:mayflower

Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<?php

use Drupal\Core\Entity\EntityInterface;
use Drupal\node\NodeInterface;
use Drupal\paragraphs\ParagraphInterface;

/**
* Implements hook_entity_presave().
*/
function mass_redirect_normalizer_entity_presave(EntityInterface $entity) {
if (!$entity instanceof NodeInterface && !$entity instanceof ParagraphInterface) {
return;
}

/** @var \Drupal\mass_redirect_normalizer\RedirectLinkNormalizationManager $manager */
$manager = \Drupal::service('mass_redirect_normalizer.manager');
$manager->normalizeEntity($entity, FALSE);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
services:
mass_redirect_normalizer.resolver:
class: Drupal\mass_redirect_normalizer\RedirectLinkResolver
arguments: ['@entity_type.manager', '@path_alias.manager', '@request_stack', '@router.request_context']

mass_redirect_normalizer.manager:
class: Drupal\mass_redirect_normalizer\RedirectLinkNormalizationManager
arguments: ['@mass_redirect_normalizer.resolver', '@datetime.time']

Drupal\mass_redirect_normalizer\RedirectLinkNormalizationManager:
alias: mass_redirect_normalizer.manager

Loading