-
Notifications
You must be signed in to change notification settings - Fork 9
Fixed HTML Validation Issues with Non-Transparent Elements in References #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/docc/plugins/html/__init__.py
Outdated
| def _contains_non_transparent_elements(node: Node) -> bool: | ||
| """ | ||
| Check if a node or any of its descendants contains HTML elements | ||
| that cannot be descendants of an <a> tag. | ||
| """ | ||
| if isinstance(node, HTMLTag): | ||
| if node.tag_name.lower() in _NON_TRANSPARENT_ELEMENTS: | ||
| return True | ||
| # Check children recursively | ||
| for child in node.children: | ||
| if _contains_non_transparent_elements(child): | ||
| return True | ||
| elif hasattr(node, 'children'): | ||
| # For other node types with children, check recursively | ||
| for child in node.children: | ||
| if _contains_non_transparent_elements(child): | ||
| return True | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is the Visitor class that abstracts visiting children. You should use that instead of rolling your own, unless there's a reason not to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback. I'll fix this. I am just getting familiar with the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have implemented the abstract visitor class. I didn't know it was already in existence.
src/docc/plugins/html/__init__.py
Outdated
| return None | ||
|
|
||
| # If inversion failed, render without the link and warn | ||
| import warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay to import this at the top of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
src/docc/plugins/html/__init__.py
Outdated
| for j, original_child in enumerate(node.children): | ||
| if j == i: | ||
| # Replace this child with the modified version | ||
| new_node.append(insertion_point) | ||
| else: | ||
| new_node.append(original_child) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would replace_child be useful here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely. Done!
Thank you for the feedback.
| new_node.append(insertion_point) | ||
| else: | ||
| new_node.append(original_child) | ||
| return new_node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the following example, where do anchors end up? Not saying there's anything wrong with what you have; I just want to check my intuition.
<table>
<a href="foo">
<tr>
<td>hello</td>
<td>world</td>
</tr>
</a>
</table>There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback, the current logic created a nested anchor tag. I have fixed this. The current logic now handles cases where there is already an anchor tag. I will push my changes in a moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, my bad! I was unclear. I guess I was asking about:
<table>
<Reference ...>
<tr>
<td>hello</td>
<td>world</td>
</tr>
</Reference>
</table>There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function finds the first text node ("hello") and wraps it in an anchor
It then wraps the entire in another anchor to preserve structure
This creates nested anchors - outer anchor around the row, inner anchor around the text
The HTML is valid but creates nested clickable areas
The result looks like this:
<table>
<a href="...">
<tr>
<td><a href="...">hello</a></td>
<td>world</td>
</tr>
</a>
</table>
I will push a fix for this because nested anchor tags was not accounted for if not present in the original HTML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have pushed a fix for this. Thank you again for the feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, we'll have to figure out what to do with this. I guess we'll need some plumbing for pytest and GitHub Actions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into this. Thanks!
|
I haven't forgotten about this one. I'm playing around with it locally. |
Issue Link: #27 (comment)
Problem
The code was creating invalid HTML by putting table elements ( , , etc.) inside tags, which breaks HTML rules.
Solution
Added smart detection and handling for elements that can't be wrapped in anchor tags.
What Changed
• New function: _contains_non_transparent_elements() - detects problematic HTML elements
• New function: _handle_non_transparent_reference() - handles these cases properly
• Updated: references_reference() - now checks for problems before creating links
• Added: List of HTML elements that can't be inside tags
How It Works
1. Check: Before creating a link, check if the content has problematic elements
2. Fix: Try to move the link inside the content (e.g., text becomes text )
3. Fallback: If can't fix, render without link and show warning
• HTML is now valid
• No breaking changes
• Clear warnings when links can't be created
• Works with existing code
Files Changed
• src/docc/plugins/html/init.py - Added new functions and updated reference handling
Testing
• Added detection for table elements, forms, and other problematic HTML
• Handles edge cases gracefully
• Maintains backward compatibility