Skip to content

Conversation

@Christopher-Stevers
Copy link

@Christopher-Stevers Christopher-Stevers commented Nov 26, 2025

Adds backend scraping, as an npm script. I found that firecrawl isn't great for pagination so I opted to use open ai + playwright for going through company directories.

Firecrawl handles exposing the appropriate pages on company directories via their map feature and doing the actual job searches.

This is a basic implementation, over the next week I'll be adding

  • setting this up to run on a cron
  • working onfault tolerance
  • saving to jobs to the db
  • filtering jobs/companies being parsed - probably most important to make sure they are actually startups.

I haven't actually run this over the more than one or two of the company directories, so their are likely a lot of bugs to work out.
Also adds skeleton drizzle setup - feel free to change that as necessary.

function: {
name: "listing-extractor",
description:
"Tool to best single url for finding a job board or company directory from a site map",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These go to the AI, right? Probably not critical but "get" is missing from 'Tool to get best single url...' - best might be tokenized as a verb and cause issues

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - thx for the heads up.

Comment on lines 42 to 55
const checkRedirectsOnAts = (response: Response) => {
if (response.redirected) {
if (
response.url.includes("jobs.lever.co") ||
response.url.includes("applytojob.com") ||
response.url.includes("ashbyhq.com") ||
response.url.includes("info.jazzhr.com")
) {
return false;
}
return true;
}
return true;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watch out for towering ;)

Can rewrite this (not critical, again) to use early returns.

if (!response.redirected) return true;
// rest

@Christopher-Stevers
Copy link
Author

Thx for the reviews @mackenziebowes - will address tomorrow.

@Christopher-Stevers Christopher-Stevers marked this pull request as draft December 1, 2025 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants