Skip to content

Conversation

tomdpsrd
Copy link

@tomdpsrd tomdpsrd commented Jul 28, 2025

Actual Behavior

Document.summary() is not working with python3 when the document is based on bytes and not on string content.
The new released version (0.8.4.1) contains an old modification that put the regexp in string instead of bytes.

Linked issue :
#194

Steps to Reproduce the Problem

Follow the readme steps

>>> import requests
>>> from readability import Document

>>> response = requests.get('http://example.com')
>>> doc = Document(response.content)
>>> doc.title()
Traceback (most recent call last):
...
    RE_CHARSET.findall(page) + RE_PRAGMA.findall(page) + RE_XML.findall(page)
    ^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot use a string pattern on a bytes-like object

@tomdpsrd tomdpsrd changed the title Correction bytes Document - Correct when instantiated with bytes content instead of bytes Jul 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant