Skip to content

TheGuardian.com image galleries are not picked up #78

@thom4parisot

Description

@thom4parisot

Hello,

I tried to apply readability on a specific layout of The Guardian, which heavily relies on JavaScript but still has most of the text available in the HTML source code:

http://www.theguardian.com/football/gallery/2014/sep/10/memory-lane-1980s-footballers-at-home-in-pictures

Readability returned this chunk of HTML:

<div><div> comments <p>Sign in or create your Guardian account to join the discussion. </p> <p>This discussion is closed for comments.</p> <p> We’re doing some maintenance right now. You can still read comments, but please come back later to add your own. </p> <p> Commenting has been disabled for this account (why?) </p> </div></div>

Do you know guys why the main content is not properly extracted, and if it fixable?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions