Fixed some issues by lalalaurentiu · Pull Request #684 · peviitor-ro/based_scraper_py

lalalaurentiu · 2026-03-25T13:40:24Z

This pull request removes the custom Scraper class and related infrastructure, refactoring the codebase to use direct HTTP requests and BeautifulSoup for web scraping instead. It also updates the Decathlon and ThoughtWorks site scrapers to work without the custom scraper and improves location normalization. Additionally, it removes a GitHub Actions workflow and a utility script that are no longer needed.

Major refactoring and simplification:

The scraper_peviitor.py file, which defined the custom Scraper and Rules classes, is removed entirely. All sites now use direct requests and BeautifulSoup calls for scraping, simplifying the codebase and reducing maintenance overhead.
The apiUpdateFiles.py script, which handled API update calls, is removed as it is no longer used in the workflow.
The associated GitHub Actions workflow .github/workflows/update-Api.yml is deleted, as it depended on the removed script.

Site scraper updates:

sites/decathlon.py is rewritten to fetch and parse job listings directly using requests and BeautifulSoup, with improved location normalization and city/county mapping logic. The Decathlon logo URL is also updated.
sites/thoughtworks.py is refactored to fetch jobs via direct HTTP requests, removing the dependency on the custom Scraper class, and improves handling of missing or malformed location data. [1] [2] [3] [4]

…calls and improve city normalization logic

lalalaurentiu and others added 5 commits March 25, 2026 15:12

Remove outdated API update workflow and scraper implementation

93bced0

Merge branch 'main' of https://github.com/lalalaurentiu/based_scraper_py

9964b9d

Merge branch 'peviitor-ro:main' into main

bd14859

Refactor Decathlon and ThoughtWorks scrapers to use requests for API …

ba21cfb

…calls and improve city normalization logic

Fix job links and improve city handling in LSEG and ThalesGroup scrapers

d7e85f9

lalalaurentiu merged commit b38ebb5 into peviitor-ro:main Mar 25, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed some issues#684

Fixed some issues#684
lalalaurentiu merged 5 commits intopeviitor-ro:mainfrom
lalalaurentiu:main

lalalaurentiu commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lalalaurentiu commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant