Skip to content

Exception on invalid xml. #145

@rillian

Description

@rillian

Some logging output got into my tei files, and hooktest asserts rather than reporting the error:

  File "${HOME}/HookTest/HookTest/capitains_units/cts.py", line 434, in auto_rng
    xml = parse(self.path)
  File "src/lxml/etree.pyx", line 3435, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1840, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1866, in lxml.etree._parseDocumentFromURL
  File "src/lxml/parser.pxi", line 1770, in lxml.etree._parseDocFromFile
  File "src/lxml/parser.pxi", line 1163, in lxml.etree._BaseParser._parseDocFromFile
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
  File "tests/repo1/data/hafez/divan/hafez.divan.perseus-eng1.xml", line 1
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1

One may reproduce by prepending the string 'Garbage text\n' to e.g. the beginning of tests/repo1/data/hafez/divan/hafez.divan.perseus-eng1.xml.

The XMLSyntaxError is hidden by the imap_unordered call through the threadpool and presents instead as a MaybeEncodingError because lxml.etree can't pickle its _ListErrorLog. Flattening the parallel iterator to a serial one reveals the underlying issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions