Skip to content

Fixed some issues#684

Merged
lalalaurentiu merged 5 commits intopeviitor-ro:mainfrom
lalalaurentiu:main
Mar 25, 2026
Merged

Fixed some issues#684
lalalaurentiu merged 5 commits intopeviitor-ro:mainfrom
lalalaurentiu:main

Conversation

@lalalaurentiu
Copy link
Copy Markdown
Collaborator

This pull request removes the custom Scraper class and related infrastructure, refactoring the codebase to use direct HTTP requests and BeautifulSoup for web scraping instead. It also updates the Decathlon and ThoughtWorks site scrapers to work without the custom scraper and improves location normalization. Additionally, it removes a GitHub Actions workflow and a utility script that are no longer needed.

Major refactoring and simplification:

  • The scraper_peviitor.py file, which defined the custom Scraper and Rules classes, is removed entirely. All sites now use direct requests and BeautifulSoup calls for scraping, simplifying the codebase and reducing maintenance overhead.
  • The apiUpdateFiles.py script, which handled API update calls, is removed as it is no longer used in the workflow.
  • The associated GitHub Actions workflow .github/workflows/update-Api.yml is deleted, as it depended on the removed script.

Site scraper updates:

  • sites/decathlon.py is rewritten to fetch and parse job listings directly using requests and BeautifulSoup, with improved location normalization and city/county mapping logic. The Decathlon logo URL is also updated.
  • sites/thoughtworks.py is refactored to fetch jobs via direct HTTP requests, removing the dependency on the custom Scraper class, and improves handling of missing or malformed location data. [1] [2] [3] [4]

@lalalaurentiu lalalaurentiu merged commit b38ebb5 into peviitor-ro:main Mar 25, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant