fix(indesign): exclude rich media blocks from export output#4614
fix(indesign): exclude rich media blocks from export output#4614wil-gerken wants to merge 5 commits intotrunkfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses an InDesign export issue where rich media Gutenberg blocks (e.g., PDF file embeds and oEmbed content) leak raw HTML/URLs into the exported tagged-text output, causing InDesign import problems.
Changes:
- Add an exclusion list in the InDesign converter to skip
core/file,core/embed,core/video, andcore/audioblocks during block processing. - Add unit tests verifying
core/fileandcore/embedblocks do not contribute raw markup/URLs to the export output.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| includes/optional-modules/indesign-export/class-indesign-converter.php | Skips specific rich media block types during process_blocks() to prevent raw markup/URLs from entering the export. |
| tests/unit-tests/indesign-exporter/indesign-exporter.php | Adds unit tests asserting core/file and core/embed content is excluded from exported output. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
adekbadek
left a comment
There was a problem hiding this comment.
Tested in an isolated env by running InDesign_Converter::convert_post() against six fixture posts: file/embed/video/audio at top level, file inside a core/group, embed inside core/columns, and a plain control. All exclusions worked as expected and the surrounding content was preserved in every case. No PHP notices in debug.log.
The fix is correct and the recursion handles nested containers as advertised. Approving – a few non-blocking suggestions inline. The main one is that the unit tests only cover core/file/core/embed nested in core/group, but the implementation (and docblock) supports core/video, core/audio, and core/columns too. I verified those work at runtime, but it would be nice to lock them in with tests.
| 'core/file', | ||
| 'core/embed', | ||
| 'core/video', | ||
| 'core/audio', |
There was a problem hiding this comment.
Suggestion: expose this as a filter (e.g. newspack_indesign_export_excluded_blocks) so publishers with custom rich-media blocks can extend the list without patching. Also worth promoting to a private const since it's rebuilt on every process_blocks() call.
Minor related note: legacy core-embed/* block names (pre-WP 5.6 content) won't match core/embed – a filter sidesteps that, or a strpos( $blockName, 'core-embed/' ) === 0 check would catch them. Low priority, depends on how old your content is.
There was a problem hiding this comment.
Thanks! This is a great suggestion.
-
I've promoted the array to an
EXCLUDED_BLOCK_TYPESclass constant. I left it as unscopedconstrather thanprivate constso the default list remains accessible if needed, but I'm happy to change if you think it should be private. -
I added the
newspack_indesign_export_excluded_blocksfilter so publishers can extend the list for custom block types. The filter result is cast to an array and filtered to strings, so a misbehaving callback won't break the export. -
I added an
is_excluded_block()helper which handles the explicit list and legacycore-embed/*names via astrposprefix check. Legacy variants are tied tocore/embed's filter state, so if a publisher removescore/embedfrom the filter, the legacy variants are also un-blocked for consistency. -
Last, I added a null guard in that helper since
parse_blocks()returnsblockName === nullfor freeform chunks. I have a pre-push review command that caught that as a PHP 8.3 deprecation.
I don't think this fix is urgent, and the changes were extensive enough that it seems best to re-review. Can you please give it another look when you have a chance?
| if ( is_string( $chunk ) ) { | ||
| $new_inner_content[] = $chunk; | ||
| } else { | ||
| $inner_block = $block['innerBlocks'][ $inner_index++ ]; |
There was a problem hiding this comment.
Defensive nit: $block['innerBlocks'][ $inner_index++ ] assumes the parse_blocks() invariant that null placeholders in innerContent line up positionally with innerBlocks entries. That's safe in practice, but a ! isset(...) guard would harden it against malformed input. Take it or leave it.
There was a problem hiding this comment.
Thanks! Taking it / added. I like safeguards like this much better as well.
| $this->assertStringContainsString( 'Before the group.', $content ); | ||
| $this->assertStringContainsString( 'After the group.', $content ); | ||
| $this->assertStringNotContainsString( 'youtube.com', $content ); | ||
| $this->assertStringNotContainsString( 'abc123', $content ); |
There was a problem hiding this comment.
Coverage gap worth filling: the implementation also strips core/video and core/audio and the docblock calls out core/columns as a supported container, but neither is exercised here. I verified at runtime that all three work, but suggest adding cases for:
core/videoand/orcore/audioexclusion (locks in the full list).- An embed nested in
core/columns(differentinnerContentshape thancore/group). - Two excluded siblings in a row inside a container, to exercise the
$inner_indexincrement path.
There was a problem hiding this comment.
Great suggestions. I added all three:
core/videoandcore/audiotop-level exclusion- an embed nested in
core/columns - two excluded siblings in a row to exercise the
$inner_indexincrement path.
I also added a test for legacy core-embed/youtube blocks while I was in there, to lock in the prefix check from the other thread.
All 27 InDesign tests pass.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
All Submissions:
Changes proposed in this Pull Request:
When a post contains a
core/fileblock (PDF embed) orcore/embedblock (YouTube, etc.), the InDesign export was including raw HTML markup in the output — such as<object>tags, download link text, and bare embed URLs — instead of skipping those blocks entirely. This resulted in import errors and corrupted print output for publishers using the InDesign export tool.Rich media blocks have no print equivalent and should be excluded from the export. This fix adds an explicit exclusion list in
process_blocks()forcore/file,core/embed,core/video, andcore/audio.These blocks are now skipped before serialization, ensuring that only printable content is exported while leaving surrounding content intact.
Closes
NPPM-2677.How to test the changes in this Pull Request:
Reproduce the problem (without this PR):
wp newspack optional-modules enable indesign-exportcore/file), and another paragraph after it..txtfile — observe that<object>tags, the PDF filename, and "Download" text appear between the two paragraphs in the export output.Verify the fix (with this PR):
.txtfile — the two paragraphs should appear cleanly with no<object>tag, PDF filename, or "Download" text between them.core/embed) — the YouTube URL should not appear in the export.Other information: