Skip to content

Conversation

@ArtyomBaranovskiy
Copy link

Hello,

I'm using your tool to build a kind of WebGrabber so I have to handle really many cases.
One of them is the html document with " tag instead of DocType.
It's rendered by any modern browser without errors so I expect the same functionality from CSQuery.
However, default output formatter transorms the tag into " which is handled in incorrect way by browsers.

I suggest the following pull request to fix the issue. I'm sorry for having no time to dig deeper to the root cause why described tag is parsed as html comment.

Short commit description:

1)Prevent Default OutputFormatter from breaking leading xml comment

  • As some sites specify leading xml comment tag instead of doctype, xml
    comment tag should not be broken during parsing of any document
  • When the comment tag is like - simply render it's NodeValue as
    no more wrapping is required

- As some sites specify leading xml comment tag instead of doctype, xml
  comment tag should not be broken during parsing of any document
- When the comment tag is like <?...?> - simply render it's NodeValue as
  no more wrapping is required
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants