Skip to content

Conversation

@mercykolapo
Copy link

@mercykolapo mercykolapo commented Oct 7, 2025

Issue Link: #27 (comment)

Problem
The code was creating invalid HTML by putting table elements (⁠  ⁠, ⁠  ⁠, etc.) inside ⁠  ⁠ tags, which breaks HTML rules.

Solution
Added smart detection and handling for elements that can't be wrapped in anchor tags.

What Changed
•⁠ ⁠New function: ⁠ _contains_non_transparent_elements() ⁠ - detects problematic HTML elements
•⁠ ⁠New function: ⁠ _handle_non_transparent_reference() ⁠ - handles these cases properly
•⁠ ⁠Updated: ⁠ references_reference() ⁠ - now checks for problems before creating links
•⁠ ⁠Added: List of HTML elements that can't be inside ⁠  ⁠ tags

How It Works
1.⁠ ⁠Check: Before creating a link, check if the content has problematic elements
2.⁠ ⁠Fix: Try to move the link inside the content (e.g., ⁠ text ⁠ becomes ⁠ text ⁠)
3.⁠ ⁠Fallback: If can't fix, render without link and show warning

•⁠ ⁠HTML is now valid
•⁠ ⁠No breaking changes
•⁠ ⁠Clear warnings when links can't be created
•⁠ ⁠Works with existing code

Files Changed
•⁠ ⁠⁠ src/docc/plugins/html/init.py ⁠ - Added new functions and updated reference handling

Testing
•⁠ ⁠Added detection for table elements, forms, and other problematic HTML
•⁠ ⁠Handles edge cases gracefully
•⁠ ⁠Maintains backward compatibility

Comment on lines 834 to 851
def _contains_non_transparent_elements(node: Node) -> bool:
"""
Check if a node or any of its descendants contains HTML elements
that cannot be descendants of an <a> tag.
"""
if isinstance(node, HTMLTag):
if node.tag_name.lower() in _NON_TRANSPARENT_ELEMENTS:
return True
# Check children recursively
for child in node.children:
if _contains_non_transparent_elements(child):
return True
elif hasattr(node, 'children'):
# For other node types with children, check recursively
for child in node.children:
if _contains_non_transparent_elements(child):
return True
return False
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the Visitor class that abstracts visiting children. You should use that instead of rolling your own, unless there's a reason not to?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback. I'll fix this. I am just getting familiar with the repo.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented the abstract visitor class. I didn't know it was already in existence.

return None

# If inversion failed, render without the link and warn
import warnings
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to import this at the top of the file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment on lines 914 to 919
for j, original_child in enumerate(node.children):
if j == i:
# Replace this child with the modified version
new_node.append(insertion_point)
else:
new_node.append(original_child)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would replace_child be useful here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. Done!
Thank you for the feedback.

new_node.append(insertion_point)
else:
new_node.append(original_child)
return new_node
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the following example, where do anchors end up? Not saying there's anything wrong with what you have; I just want to check my intuition.

<table>
    <a href="foo">
        <tr>
            <td>hello</td>
            <td>world</td>
        </tr>
    </a>
</table>

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback, the current logic created a nested anchor tag. I have fixed this. The current logic now handles cases where there is already an anchor tag. I will push my changes in a moment.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my bad! I was unclear. I guess I was asking about:

<table>
    <Reference ...>
        <tr>
            <td>hello</td>
            <td>world</td>
        </tr>
    </Reference>
</table>

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function finds the first text node ("hello") and wraps it in an anchor
It then wraps the entire in another anchor to preserve structure
This creates nested anchors - outer anchor around the row, inner anchor around the text
The HTML is valid but creates nested clickable areas
The result looks like this:

<table>
    <a href="...">
        <tr>
            <td><a href="...">hello</a></td>
            <td>world</td>
        </tr>
    </a>
</table>

I will push a fix for this because nested anchor tags was not accounted for if not present in the original HTML.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed a fix for this. Thank you again for the feedback!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, we'll have to figure out what to do with this. I guess we'll need some plumbing for pytest and GitHub Actions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into this. Thanks!

@SamWilsn
Copy link
Owner

I haven't forgotten about this one. I'm playing around with it locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants