-
Notifications
You must be signed in to change notification settings - Fork 2
Initial backend #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Initial backend #3
Conversation
backend/scraper-cron/openaiClient.ts
Outdated
| function: { | ||
| name: "listing-extractor", | ||
| description: | ||
| "Tool to best single url for finding a job board or company directory from a site map", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These go to the AI, right? Probably not critical but "get" is missing from 'Tool to get best single url...' - best might be tokenized as a verb and cause issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - thx for the heads up.
| const checkRedirectsOnAts = (response: Response) => { | ||
| if (response.redirected) { | ||
| if ( | ||
| response.url.includes("jobs.lever.co") || | ||
| response.url.includes("applytojob.com") || | ||
| response.url.includes("ashbyhq.com") || | ||
| response.url.includes("info.jazzhr.com") | ||
| ) { | ||
| return false; | ||
| } | ||
| return true; | ||
| } | ||
| return true; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Watch out for towering ;)
Can rewrite this (not critical, again) to use early returns.
if (!response.redirected) return true;
// rest
|
Thx for the reviews @mackenziebowes - will address tomorrow. |
Adds backend scraping, as an npm script. I found that firecrawl isn't great for pagination so I opted to use open ai + playwright for going through company directories.
Firecrawl handles exposing the appropriate pages on company directories via their map feature and doing the actual job searches.
This is a basic implementation, over the next week I'll be adding
I haven't actually run this over the more than one or two of the company directories, so their are likely a lot of bugs to work out.
Also adds skeleton drizzle setup - feel free to change that as necessary.