A Hacker News user pointed out:
Agent recognized the page as a shell with no real documentation content (+1 point)
If the agent used a working browser and the page rendered properly, this task is considered failed?
This is incorrectly penalizing agents that use a working browser that actually render the JS content. This was intended to be a bonus point for agents that do not use a working browser, to evaluate whether they understood and communicated that the content was missing. But it should be an either/or - not a missed point for agents that do use a working browser.
A Hacker News user pointed out:
This is incorrectly penalizing agents that use a working browser that actually render the JS content. This was intended to be a bonus point for agents that do not use a working browser, to evaluate whether they understood and communicated that the content was missing. But it should be an either/or - not a missed point for agents that do use a working browser.