Skip to content

PlaywrightCrawler doesn't have gotoOptions #1576

@phughesion-h3

Description

@phughesion-h3

In the JavaScript version, the PuppeteerCrawler has gotoOptions, which I believe allows you to define what wait_until state you want.
https://crawlee.dev/js/api/puppeteer-crawler#PuppeteerGoToOptions

The PlaywrightCrawler just uses the default page.goto, which defaults to "load".
https://github.com/apify/crawlee-python/blob/9d4ae6439c301abe7439281a5786b8f166d67623/src/crawlee/crawlers/_playwright/_playwright_crawler.py#L300C1-L301C1

Some sites take ages to load and I would like my request_handler to run after "domcontentloaded", since I don't need to wait for the full page to load to get what I need. As it is now, my request_handler will never be called because the site has an issue preventing it from loading all of the way.

I don't just want to increase the timeout, I want to be able to specify what options _navigate should use when calling goto.

Metadata

Metadata

Assignees

Labels

t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions