Fixed the NIPS crawler to work with the new format on the website #30

gwenniger · 2019-07-03T14:46:21Z

with the changed format of the newlines.

TODO: It seems like still further fixes to get/parse the reviews may be
necessary.

modified:   code/data_prepare/crawler/NIPS_crawl.py

with the changed format of the newlines. TODO: It seems like still further fixes to get/parse the reviews may be necessary. modified: code/data_prepare/crawler/NIPS_crawl.py

emaadmanzoor · 2020-01-31T00:25:31Z

The NeurIPS crawler doesn't work on my end either (Python 2), the parsing logic (specifically, matching on <p>(.*)</p>) is incorrect. I was able to fix it by replacing that portion of the scraper with the following (uses BeautifulSoup4):

data = str(get_url(http, reviews_url))
...
soup = BeautifulSoup(data, 'html.parser')
...
if year < 2016:
...
else:
        review_divs = [div.text for div in
                       soup.find_all("div", style="white-space: pre-wrap;")]
        for review_div in review_divs:
            r = Review(None, review_div, None, None, None, None, None, None)
            reviews.append(r)
...

Added initial fix by Danny Boxhoorn to the NIPS crawler, dealing

4e1717d

with the changed format of the newlines. TODO: It seems like still further fixes to get/parse the reviews may be necessary. modified: code/data_prepare/crawler/NIPS_crawl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixed the NIPS crawler to work with the new format on the website #30

Fixed the NIPS crawler to work with the new format on the website #30

Uh oh!

gwenniger commented Jul 3, 2019

Uh oh!

emaadmanzoor commented Jan 31, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fixed the NIPS crawler to work with the new format on the website #30

Are you sure you want to change the base?

Fixed the NIPS crawler to work with the new format on the website #30

Uh oh!

Conversation

gwenniger commented Jul 3, 2019

Uh oh!

emaadmanzoor commented Jan 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

emaadmanzoor commented Jan 31, 2020 •

edited

Loading