Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions changelogs/DP-45831.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#
# Write your changelog entry here. Every pull request must have a changelog yml file.
#
# Change types:
# #############################################################################
# You can use one of the following types:
# - Added: For new features.
# - Changed: For changes to existing functionality.
# - Deprecated: For soon-to-be removed features.
# - Removed: For removed features.
# - Fixed: For any bug fixes.
# - Security: In case of vulnerabilities.
#
# Format
# #############################################################################
# The format is crucial. Please follow the examples below. For reference, the requirements are:
# - All 3 parts are required and you must include "Type", "description" and "issue".
# - "Type" must be left aligned and followed by a colon.
# - "description" must be indented with 2 spaces followed by a colon
# - "issue" must be indented with 4 spaces followed by a colon.
# - "issue" is for the Jira ticket number only e.g. DP-1234
# - No extra spaces, indents, or blank lines are allowed.
#
# Example:
# #############################################################################
# Fixed:
# - description: Fixes scrolling on edit pages in Safari.
# issue: DP-13314
#
# You may add more than 1 description & issue for each type using the following format:
# Changed:
# - description: Automating the release branch.
# issue: DP-10166
# - description: Second change item that needs a description.
# issue: DP-19875
# - description: Third change item that needs a description along with an issue.
# issue: DP-19843
#
Changed:
- description: Update intenal links on mass.gov that are redirects.
issue: DP-45831
1 change: 1 addition & 0 deletions composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,7 @@
"drupal/r4032login": "^2.2",
"drupal/rabbit_hole": "^1.1",
"drupal/redirect": "^1",
"drupal/redirect_audit": "^1.3",
"drupal/require_on_publish": "^2.0",
"drupal/scheduled_transitions": "^2.7",
"drupal/schema_metatag": "^3.0",
Expand Down
63 changes: 62 additions & 1 deletion composer.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions conf/drupal/config/core.extension.yml
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,7 @@ module:
rabbit_hole: 0
redirect: 0
redirect_404: 0
redirect_audit: 0
require_on_publish: 0
responsive_image: 0
rest: 0
Expand Down
6 changes: 3 additions & 3 deletions conf/drupal/config/mass_utility.settings.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
allowed_urls: "https://www.youtube.com/\r\nhttps://docs.digital.mass.gov\r\nhttps://public.dep.state.ma.us/\r\nhttps://calendar.google.com/\r\nhttps://dashboards.digital.mass.gov/\r\nhttps://docs.google.com/\r\nhttps://drive.google.com/\r\nhttps://fusiontables.googleusercontent.com/\r\nhttps://libraryh3lp.com/\r\nhttps://mass-eoeea.maps.arcgis.com/\r\nhttps://massgov.formstack.com/forms/sample\r\nhttps://massgov.github.io\r\nhttps://public.tableau.com/\r\nhttps://www.google.com/\r\nhttps://www.massdot.state.ma.us/\r\nhttps://www.massmarinefisheries.net/\r\nhttps://www.youtube.com/\r\nhttps://youtu.be/\r\nhttps://memamaps.maps.arcgis.com/\r\nhttps://maps.google.com/\r\nhttps://licensing.reg.state.ma.us/\r\nhttps://hwy.massdot.state.ma.us/\r\nhttps://dphanalytics.hhs.mass.gov/\r\nhttps://code.highcharts.com/\r\nhttps://eoeea.maps.arcgis.com/\r\nhttps://eeaonline.eea.state.ma.us/\r\nhttps://gis.massdot.state.ma.us/\r\nhttps://dotfeeds.state.ma.us/\r\nhttps://massgis.maps.arcgis.com/\r\nhttps://recollect.net/\r\nhttp://massdot.maps.arcgis.com/\r\nhttps://massdot.maps.arcgis.com/\r\nhttps://calculator.digital.mass.gov/\r\nhttps://api.recollect.net/\r\nhttps://www.eia.gov/beta/states/iframe\r\nhttps://mdphgis.maps.arcgis.com/\r\nhttps://app.powerbigov.us/\r\nhttps://calc.a4we.org/\r\nhttps://w.soundcloud.com/\r\nhttps://www.google.com/maps\r\nhttps://nedews.nrcc.cornell.edu/\r\nhttps://flo.uri.sh/\r\nhttps://app.smartsheet.com/\r\nhttps://experience.arcgis.com/\r\nhttps://hedfuel.azurewebsites.net/\r\nhttps://dhcd-production-public.s3.amazonaws.com/\r\nhttps://cloud.samsara.com/o/8600/fleet/viewer/\r\nhttps://hwywebqa.massdot.state.ma.us\r\nhttps://player.vimeo.com/video/\r\nhttps://massgov.formstack.com/forms/"
forms_allowed_hostnames:
- '/^mass-forms\.ddev\.site$/'
- '/^forms\.mass\.local$/'
- '/^forms\.mass\.gov$/'
- /^mass-forms\.ddev\.site$/
- /^forms\.mass\.local$/
- /^forms\.mass\.gov$/
- '/^[a-zA-Z0-9\-]+-mass-forms\.pantheonsite\.io$/'
- '/^[a-zA-Z0-9\-]+\.forms\.mass\.gov$/'
header_mixed_urls: "<front>\r\n"
7 changes: 7 additions & 0 deletions conf/drupal/config/redirect_audit.settings.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
_core:
default_config_hash: hz8P2E_PUpHAuiZpVNwTrG3074UQ_2Q8SuYZpIp_v6U
autofix_enabled: false
scan_on_change: true
batch_size: 50
max_chain_depth: 10
items_per_page: 20
126 changes: 126 additions & 0 deletions docroot/modules/custom/mass_redirect_normalizer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Redirect Link Normalizer

This module rewrites internal links that still point at **redirect source paths** so they use the **final path** instead. For rich text, when the final target is a node, it also adds `data-entity-*` attributes.

The same logic runs in two places:

- **Bulk Drush command** — scan many entities and fix stored values.
- **`hook_entity_presave()`** — when an editor saves a node or paragraph, links are normalized on that save.

---

## What gets scanned

For each **node** or **paragraph**, the code looks at:

- Text fields: `text_long`, `text_with_summary`, `string_long` (HTML `href` values inside the markup).
- **Link** fields (`link` type): the stored URI.

It does **not** change random text; it only rewrites values the resolver treats as redirect-based internal links (see integration tests for examples).

## Mre about code

- `RedirectLinkResolver`:
- Only link logic.
- It finds the final path and rewrites one text value or one link value.
- It does **not** save entities.
- `RedirectLinkNormalizationManager`:
- Entity workflow logic.
- It loops fields on node/paragraph, calls the resolver, handles dry-run, and saves revisions when needed.

This split keeps code easier to test and easier to maintain.

---

## Drush command

| Item | Value |
|------|--------|
| Command | `mass-redirect-normalizer:normalize-links` |
| Alias | `mnrl` |

### Options

| Option | Meaning |
|--------|---------|
| `--simulate` | Dry run: **no** database writes. Same idea as global `ddev drush --simulate ...`. |
| `--limit=N` | Max entities **per entity type** to load from the query. **`0` = no limit.** When `--entity-type=all`, you get up to **N nodes** and up to **N paragraphs** (two separate caps). |
| `--entity-type=node\|paragraph\|all` | Default **`all`** (nodes and paragraphs). |
| `--bundle=...` | Only that bundle (node type or paragraph type machine name). Still checked after load. |
| `--entity-ids=1,2,3` | Only these IDs. **Requires** `--entity-type=node` or `paragraph` (**not** `all`). Ignores `--limit`. |

### Default table columns

| Column | Notes |
|--------|--------|
| Status | `would_update` (simulate) or `updated` (real run). |
| Entity type | `node` or `paragraph`. |
| Entity ID | Entity id. |
| Parent node ID | For **paragraphs**, the host node id from `Helper::getParentNode()`. For nodes, `-`. |
| Bundle | Bundle / type machine name. |
| URL before / URL after | This is just the link value, not full HTML. For link fields, it shows the stored path/URL. For text fields, it shows only links that changed (`href`). If many links changed in one field, they are joined with `; `. If the value is too long, CLI shortens it. |

### What the command skips

- **Orphan paragraphs** — paragraphs that are not attached to real host content (`Helper::isParagraphOrphan()`). They are **not** processed and **do not** appear as rows.
- Entities with **no** redirect-based links to fix produce **no** rows (empty table is normal).

### Simulate, then run, then verify (manual QA)

1. **Preview:**
`ddev drush mass-redirect-normalizer:normalize-links --simulate --limit=100`
2. **Apply:**
`ddev drush mass-redirect-normalizer:normalize-links --limit=100`
3. **Re-check:** run **simulate** again with the same filters. Items that were fixed should **not** show `would_update` anymore (unless something else changed them back).

For a narrow retest after you know specific IDs:

`ddev drush mass-redirect-normalizer:normalize-links --simulate --entity-type=paragraph --entity-ids=123,456`

### Important detail about saved content

On **first save**, `hook_entity_presave()` may already rewrite links in the stored field values. So if you create test content in the UI and then expect the bulk command to “see” the old redirect URL in the database, it might already be normalized. The automated tests handle that case where needed.

---

## Automated tests

Existing-site integration tests live here:

`docroot/modules/custom/mass_redirect_normalizer/tests/src/ExistingSite/RedirectLinkNormalizationTest.php`

Run tests:

```bash
ddev exec ./vendor/bin/phpunit docroot/modules/custom/mass_redirect_normalizer/tests/src/ExistingSite/RedirectLinkNormalizationTest.php
```

### What is covered

- Redirect chain resolution (including query and fragment support).
- Rich-text rewriting (`href`) and node metadata attributes (`data-entity-*`).
- Link field URI normalization (`internal:/...` and absolute local mass.gov URLs).
- Redirect loops and max-depth behavior (no infinite follow, expected stop point).
- External URL behavior (ignored; no rewrite).
- Alias-like non-node targets (rewrite link, but do not add node metadata).
- Presave normalization path for nodes (`hook_entity_presave()` behavior).
- Manager behavior:
- Run it twice gives same result (first run fixes links, second run has nothing new to fix).
- Multi-value link field handling (only redirecting values change).
- Link item metadata preservation (`title`, `options`).
- Drush command behavior:
- Entity type and bundle filters.
- Targeted runs with `--entity-ids`.
- Simulate mode row output (`would_update`) and URL before/after columns.

---

## Periodic / bulk cleanup

Use the Drush command above for one-off or scheduled bulk runs.

---

## Post-run usage refresh

For large backfills, regenerate entity usage so usage reports stay accurate.
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
name: Mass Redirect Normalizer
type: module
description: Normalize internal links that point at redirects to their final targets.
core_version_requirement: ^10 || ^11
package: Custom
dependencies:
- mass_fields:mass_fields
- mass_content:mass_content
- mayflower:mayflower

Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<?php

use Drupal\Core\Entity\EntityInterface;
use Drupal\node\NodeInterface;
use Drupal\paragraphs\ParagraphInterface;

/**
* Implements hook_entity_presave().
*/
function mass_redirect_normalizer_entity_presave(EntityInterface $entity) {
if (!$entity instanceof NodeInterface && !$entity instanceof ParagraphInterface) {
return;
}

/** @var \Drupal\mass_redirect_normalizer\RedirectLinkNormalizationManager $manager */
$manager = \Drupal::service('mass_redirect_normalizer.manager');
$manager->normalizeEntity($entity, FALSE);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
services:
mass_redirect_normalizer.resolver:
class: Drupal\mass_redirect_normalizer\RedirectLinkResolver
arguments: ['@entity_type.manager', '@path_alias.manager', '@request_stack', '@router.request_context']

mass_redirect_normalizer.manager:
class: Drupal\mass_redirect_normalizer\RedirectLinkNormalizationManager
arguments: ['@mass_redirect_normalizer.resolver', '@datetime.time']

Drupal\mass_redirect_normalizer\RedirectLinkNormalizationManager:
alias: mass_redirect_normalizer.manager

Loading
Loading