Skip to content

Expand Ashby slug pool, add slug verification utility, and expose Ashby filters in API#2

Merged
Mayank-glitch-cpu merged 6 commits intomainfrom
codex/integrate-ashby-job-ingestion-strategy
Feb 11, 2026
Merged

Expand Ashby slug pool, add slug verification utility, and expose Ashby filters in API#2
Mayank-glitch-cpu merged 6 commits intomainfrom
codex/integrate-ashby-job-ingestion-strategy

Conversation

@Mayank-glitch-cpu
Copy link
Copy Markdown
Owner

Motivation

  • Broaden Ashby coverage so Step 1 can scan many more public Ashby job boards by default.
  • Provide a reproducible utility to generate and optionally verify candidate Ashby slugs against jobs.ashbyhq.com.
  • Allow callers to restrict or filter Ashby ingestion per-request (companies, keywords, postedToday, publishedWithinHours) instead of only using a static built-in set.

Description

  • Replaced the small static company list with a much larger ASHBY_COMPANY_SLUGS array and updated the fetcher to use the Ashby public posting API (backend/src/services/theirstack-fetcher.ts) while adding parsing, normalization, keyword/date filters, request delays, and dynamic name generation via slugToCompanyName and resolveCompanies.
  • Added FetchAshbyOptions support and refactored fetchTheirStack to accept companySlugs, keywords, postedTodayOnly, publishedWithinHours, limit, and range and to return deduplicated, filtered RawJob objects.
  • Extended the Step 1 pipeline route to accept and parse new query parameters (companies, keywords, postedToday, publishedWithinHours, limit) and forward them into fetchTheirStack (backend/src/routes/pipeline.ts).
  • Added new env-config items in backend/src/config.ts to support ASHBY_KEYWORDS, ASHBY_PUBLISHED_WITHIN_HOURS, ASHBY_INCLUDE_COMPENSATION, and ASHBY_REQUEST_DELAY_MS.
  • Added scripts/ashby_slugs_verified.py, a small CLI that writes curated slug lists to disk and can optionally verify each slug by checking https://jobs.ashbyhq.com/<slug>/jobs.json.
  • Updated README.md with usage notes for the Ashby smart-ingestion request parameters and instructions for the slug verification utility.

Testing

  • Built the backend TypeScript with pnpm --filter jobslist-backend build and the build completed successfully.
  • Executed python scripts/ashby_slugs_verified.py --verify in this environment; the script ran and generated outputs, but network egress to jobs.ashbyhq.com is blocked here so verification returned 0/83 verified (no reachable endpoints).
  • No additional automated tests were added; existing build step passed and the new script and route parsing were exercised in the development environment described above.

Codex Task

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands Ashby ingestion by switching to Ashby’s public posting API, greatly enlarging the default Ashby company slug pool, and exposing per-request filtering controls through the Step 1 pipeline endpoint. It also adds a small Python utility to generate and optionally verify candidate Ashby slugs, plus documentation and env-config updates to support the new behavior.

Changes:

  • Refactored Ashby/TheirStack fetcher to use Ashby’s posting API, add keyword/date filtering, and support per-request options (companies/keywords/freshness/limit/range).
  • Extended /api/pipeline/step1 to accept and forward Ashby filter query params.
  • Added new Ashby-related env config, a slug verification script, and README/.env.example documentation updates.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
scripts/ashby_slugs_verified.py New CLI utility to write curated slug lists and optionally verify slugs via jobs.json.
backend/src/services/theirstack-fetcher.ts Switches to Ashby posting API, expands slug pool, adds filtering/normalization, delay, and option-driven fetching.
backend/src/routes/pipeline.ts Adds query parsing for Ashby filters and forwards them into the fetcher.
backend/src/config.ts Adds env-config for Ashby keywords, lookback window, compensation inclusion, and request delay.
README.md Documents new Ashby behavior, request params, and slug script usage.
.env.example Adds example values for the new Ashby env vars.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +234 to +237
const companies = resolveCompanies(options.companySlugs);
const keywords = normalizeKeywords(options.keywords);
const lookbackHours = options.publishedWithinHours ?? config.ashbyPublishedWithinHours;
const postedTodayOnly = options.postedTodayOnly || false;
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When companySlugs is omitted, resolveCompanies(...) returns the full ASHBY_COMPANY_SLUGS list (currently very large), so a default Step 1 run will issue hundreds of requests. With the default per-company delay, this can turn Step 1 into a multi-minute operation. Consider capping the default number of companies per run and/or adding limited concurrency with rate limiting to keep ingestion time bounded.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

@Mayank-glitch-cpu Mayank-glitch-cpu Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot may be we can make this under user conntrol how many comapnies he wants to get

Comment on lines +94 to +96
const companies = parseCommaList(request.query.companies).map((slug) => slug.toLowerCase());
const keywords = parseCommaList(request.query.keywords).map((keyword) => keyword.toLowerCase());
const postedTodayOnly = parseBoolean(request.query.postedToday);
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a caller supplies companies=..., unknown slugs are silently kept in the request but later dropped by resolveCompanies(...), which can lead to a successful-but-empty run that’s hard to debug. Consider validating companies against the supported slug pool and returning 400 (or at least logging/returning the invalid slugs) when none match.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment on lines +92 to +93
const limit = !isNaN(limitVal) ? limitVal : 10;

Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit is defaulted to 10 even when the query param isn’t provided. Since this value is always forwarded to fetchTheirStack, it overrides config.jobCount/JOB_COUNT for source=theirstack runs. Consider leaving limit as undefined unless request.query.limit is present (or defaulting it from config.jobCount specifically for TheirStack).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

),
ashbyIncludeCompensation:
(process.env.ASHBY_INCLUDE_COMPENSATION || 'true').toLowerCase() === 'true',
ashbyRequestDelayMs: parseInt(process.env.ASHBY_REQUEST_DELAY_MS || '1000', 10),
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ashbyRequestDelayMs is parsed with parseInt(...) but not validated. If ASHBY_REQUEST_DELAY_MS is non-numeric, this becomes NaN and the > 0 check later will evaluate false, effectively disabling request throttling unexpectedly. Consider falling back to the default (1000ms) when Number.isNaN(...) (and/or clamping to >= 0).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment on lines +207 to +209
const result: AshbyPublicResponse = await response.json();
const postings = result.jobs ?? [];
return postings.map((posting) => ({ company: companyName, slug, posting }));
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

await response.json() can throw if Ashby returns non-JSON (e.g., HTML error page) even when response.ok is true, which would currently fail the entire Step 1 run. Consider guarding JSON parsing with try/catch (and logging the slug + status) so one bad response doesn’t stop ingestion.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment on lines +17 to +21
# Compact curated seed list; can be extended over time.
SLUGS = [
'airtable','alan','altura','away','deliveroo','duolingo','flock-safety','hackerone','notion','opendoor',
'oyster','posthog','ramp','sequoia','sony','vanta','cursor','deel','harvey','modern-treasury','openai',
'reddit','shopify','snowflake','apify','ashby','buffer','factory','hcompany','jerry.ai','lightning','linear',
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The curated slug seed list is duplicated between this script (SLUGS) and the backend’s ASHBY_COMPANY_SLUGS, which can drift over time. Consider sourcing both from a single canonical file (e.g., have the script generate a file consumed by the backend, or have the script read the TS list) to avoid divergence.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

source === 'theirstack'
? {
companies,
keywords,
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For source=theirstack, when no keywords query param is provided you pass keywords=[] and the fetcher falls back to config.ashbyKeywords, but the response still echoes keywords: []. This makes the API response misleading vs. the actual behavior. Consider omitting keywords from filters when not provided, or returning something like keywords: 'default' / the resolved keyword list.

Suggested change
keywords,
keywords: request.query.keywords ? keywords : 'default',

Copilot uses AI. Check for mistakes.
Comment on lines +194 to +198
const response = await fetch(url.toString(), {
method: 'GET',
headers: {
Accept: 'application/json',
'User-Agent': 'JobSync-Service/1.0',
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fetch(...) call isn’t wrapped in a try/catch. With the expanded slug pool, a single transient network failure (DNS, timeout, connection reset) will throw and abort the entire Step 1 run instead of logging and moving on to the next company. Consider catching request errors in fetchCompanyJobs and returning [] on failure.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 11, 2026

@Mayank-glitch-cpu I've opened a new pull request, #3, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 11, 2026

@Mayank-glitch-cpu I've opened a new pull request, #4, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 11, 2026

@Mayank-glitch-cpu I've opened a new pull request, #5, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 11, 2026

@Mayank-glitch-cpu I've opened a new pull request, #6, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 11, 2026

@Mayank-glitch-cpu I've opened a new pull request, #7, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 11, 2026

@Mayank-glitch-cpu I've opened a new pull request, #8, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits February 11, 2026 18:42
…ilures

Co-authored-by: Mayank-glitch-cpu <85422185+Mayank-glitch-cpu@users.noreply.github.com>
…work

Wrap fetch() in try/catch to handle network failures gracefully
@Mayank-glitch-cpu Mayank-glitch-cpu merged commit 8bb2885 into main Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants