Expand Ashby slug pool, add slug verification utility, and expose Ashby filters in API by Mayank-glitch-cpu · Pull Request #2 · Mayank-glitch-cpu/JobSync-Service

Mayank-glitch-cpu · 2026-02-11T18:22:24Z

Motivation

Broaden Ashby coverage so Step 1 can scan many more public Ashby job boards by default.
Provide a reproducible utility to generate and optionally verify candidate Ashby slugs against jobs.ashbyhq.com.
Allow callers to restrict or filter Ashby ingestion per-request (companies, keywords, postedToday, publishedWithinHours) instead of only using a static built-in set.

Description

Replaced the small static company list with a much larger ASHBY_COMPANY_SLUGS array and updated the fetcher to use the Ashby public posting API (backend/src/services/theirstack-fetcher.ts) while adding parsing, normalization, keyword/date filters, request delays, and dynamic name generation via slugToCompanyName and resolveCompanies.
Added FetchAshbyOptions support and refactored fetchTheirStack to accept companySlugs, keywords, postedTodayOnly, publishedWithinHours, limit, and range and to return deduplicated, filtered RawJob objects.
Extended the Step 1 pipeline route to accept and parse new query parameters (companies, keywords, postedToday, publishedWithinHours, limit) and forward them into fetchTheirStack (backend/src/routes/pipeline.ts).
Added new env-config items in backend/src/config.ts to support ASHBY_KEYWORDS, ASHBY_PUBLISHED_WITHIN_HOURS, ASHBY_INCLUDE_COMPENSATION, and ASHBY_REQUEST_DELAY_MS.
Added scripts/ashby_slugs_verified.py, a small CLI that writes curated slug lists to disk and can optionally verify each slug by checking https://jobs.ashbyhq.com/<slug>/jobs.json.
Updated README.md with usage notes for the Ashby smart-ingestion request parameters and instructions for the slug verification utility.

Testing

Built the backend TypeScript with pnpm --filter jobslist-backend build and the build completed successfully.
Executed python scripts/ashby_slugs_verified.py --verify in this environment; the script ran and generated outputs, but network egress to jobs.ashbyhq.com is blocked here so verification returned 0/83 verified (no reachable endpoints).
No additional automated tests were added; existing build step passed and the new script and route parsing were exercised in the development environment described above.

Codex Task

Copilot

Pull request overview

This PR expands Ashby ingestion by switching to Ashby’s public posting API, greatly enlarging the default Ashby company slug pool, and exposing per-request filtering controls through the Step 1 pipeline endpoint. It also adds a small Python utility to generate and optionally verify candidate Ashby slugs, plus documentation and env-config updates to support the new behavior.

Changes:

Refactored Ashby/TheirStack fetcher to use Ashby’s posting API, add keyword/date filtering, and support per-request options (companies/keywords/freshness/limit/range).
Extended /api/pipeline/step1 to accept and forward Ashby filter query params.
Added new Ashby-related env config, a slug verification script, and README/.env.example documentation updates.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
scripts/ashby_slugs_verified.py	New CLI utility to write curated slug lists and optionally verify slugs via `jobs.json`.
backend/src/services/theirstack-fetcher.ts	Switches to Ashby posting API, expands slug pool, adds filtering/normalization, delay, and option-driven fetching.
backend/src/routes/pipeline.ts	Adds query parsing for Ashby filters and forwards them into the fetcher.
backend/src/config.ts	Adds env-config for Ashby keywords, lookback window, compensation inclusion, and request delay.
README.md	Documents new Ashby behavior, request params, and slug script usage.
.env.example	Adds example values for the new Ashby env vars.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-11T18:29:55Z

backend/src/services/theirstack-fetcher.ts

+  const companies = resolveCompanies(options.companySlugs);
+  const keywords = normalizeKeywords(options.keywords);
+  const lookbackHours = options.publishedWithinHours ?? config.ashbyPublishedWithinHours;
+  const postedTodayOnly = options.postedTodayOnly || false;


When companySlugs is omitted, resolveCompanies(...) returns the full ASHBY_COMPANY_SLUGS list (currently very large), so a default Step 1 run will issue hundreds of requests. With the default per-company delay, this can turn Step 1 into a multi-minute operation. Consider capping the default number of companies per run and/or adding limited concurrency with rate limiting to keep ingestion time bounded.

@copilot may be we can make this under user conntrol how many comapnies he wants to get

Copilot · 2026-02-11T18:29:55Z

backend/src/routes/pipeline.ts

+    const companies = parseCommaList(request.query.companies).map((slug) => slug.toLowerCase());
+    const keywords = parseCommaList(request.query.keywords).map((keyword) => keyword.toLowerCase());
+    const postedTodayOnly = parseBoolean(request.query.postedToday);


If a caller supplies companies=..., unknown slugs are silently kept in the request but later dropped by resolveCompanies(...), which can lead to a successful-but-empty run that’s hard to debug. Consider validating companies against the supported slug pool and returning 400 (or at least logging/returning the invalid slugs) when none match.

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-02-11T18:29:56Z

backend/src/routes/pipeline.ts

+    const limit = !isNaN(limitVal) ? limitVal : 10;
+


limit is defaulted to 10 even when the query param isn’t provided. Since this value is always forwarded to fetchTheirStack, it overrides config.jobCount/JOB_COUNT for source=theirstack runs. Consider leaving limit as undefined unless request.query.limit is present (or defaulting it from config.jobCount specifically for TheirStack).

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-02-11T18:29:56Z

backend/src/config.ts

+  ),
+  ashbyIncludeCompensation:
+    (process.env.ASHBY_INCLUDE_COMPENSATION || 'true').toLowerCase() === 'true',
+  ashbyRequestDelayMs: parseInt(process.env.ASHBY_REQUEST_DELAY_MS || '1000', 10),


ashbyRequestDelayMs is parsed with parseInt(...) but not validated. If ASHBY_REQUEST_DELAY_MS is non-numeric, this becomes NaN and the > 0 check later will evaluate false, effectively disabling request throttling unexpectedly. Consider falling back to the default (1000ms) when Number.isNaN(...) (and/or clamping to >= 0).

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-02-11T18:29:56Z

backend/src/services/theirstack-fetcher.ts

+  const result: AshbyPublicResponse = await response.json();
+  const postings = result.jobs ?? [];
+  return postings.map((posting) => ({ company: companyName, slug, posting }));


await response.json() can throw if Ashby returns non-JSON (e.g., HTML error page) even when response.ok is true, which would currently fail the entire Step 1 run. Consider guarding JSON parsing with try/catch (and logging the slug + status) so one bad response doesn’t stop ingestion.

@copilot open a new pull request to apply changes based on this feedback

backend/src/config.ts

Copilot · 2026-02-11T18:29:57Z

scripts/ashby_slugs_verified.py

+# Compact curated seed list; can be extended over time.
+SLUGS = [
+    'airtable','alan','altura','away','deliveroo','duolingo','flock-safety','hackerone','notion','opendoor',
+    'oyster','posthog','ramp','sequoia','sony','vanta','cursor','deel','harvey','modern-treasury','openai',
+    'reddit','shopify','snowflake','apify','ashby','buffer','factory','hcompany','jerry.ai','lightning','linear',


The curated slug seed list is duplicated between this script (SLUGS) and the backend’s ASHBY_COMPANY_SLUGS, which can drift over time. Consider sourcing both from a single canonical file (e.g., have the script generate a file consumed by the backend, or have the script read the TS list) to avoid divergence.

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-02-11T18:29:57Z

backend/src/routes/pipeline.ts

+        source === 'theirstack'
+          ? {
+              companies,
+              keywords,


For source=theirstack, when no keywords query param is provided you pass keywords=[] and the fetcher falls back to config.ashbyKeywords, but the response still echoes keywords: []. This makes the API response misleading vs. the actual behavior. Consider omitting keywords from filters when not provided, or returning something like keywords: 'default' / the resolved keyword list.

Suggested change

keywords,

keywords: request.query.keywords ? keywords : 'default',

Copilot · 2026-02-11T18:29:57Z

backend/src/services/theirstack-fetcher.ts

+  const response = await fetch(url.toString(), {
+    method: 'GET',
+    headers: {
+      Accept: 'application/json',
+      'User-Agent': 'JobSync-Service/1.0',


The fetch(...) call isn’t wrapped in a try/catch. With the expanded slug pool, a single transient network failure (DNS, timeout, connection reset) will throw and abort the entire Step 1 run instead of logging and moving on to the next company. Consider catching request errors in fetchCompanyJobs and returning [] on failure.

@copilot open a new pull request to apply changes based on this feedback

backend/src/services/theirstack-fetcher.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot · 2026-02-11T18:35:02Z

@Mayank-glitch-cpu I've opened a new pull request, #3, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot · 2026-02-11T18:35:50Z

@Mayank-glitch-cpu I've opened a new pull request, #4, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot · 2026-02-11T18:36:41Z

@Mayank-glitch-cpu I've opened a new pull request, #5, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot · 2026-02-11T18:36:46Z

@Mayank-glitch-cpu I've opened a new pull request, #6, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot · 2026-02-11T18:37:08Z

@Mayank-glitch-cpu I've opened a new pull request, #7, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot · 2026-02-11T18:37:16Z

@Mayank-glitch-cpu I've opened a new pull request, #8, to work on those changes. Once the pull request is ready, I'll request review from you.

…ilures Co-authored-by: Mayank-glitch-cpu <85422185+Mayank-glitch-cpu@users.noreply.github.com>

…work Wrap fetch() in try/catch to handle network failures gracefully

Expand Ashby slug coverage and add slug verification utility

65c7c00

Mayank-glitch-cpu added the codex label Feb 11, 2026 — with ChatGPT Codex Connector

Mayank-glitch-cpu self-assigned this Feb 11, 2026

Mayank-glitch-cpu requested a review from Copilot February 11, 2026 18:23

Copilot started reviewing on behalf of Mayank-glitch-cpu February 11, 2026 18:23 View session

Copilot AI reviewed Feb 11, 2026

View reviewed changes

Update backend/src/services/theirstack-fetcher.ts

a322a4c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

This was referenced Feb 11, 2026

Add NaN validation for ashbyRequestDelayMs config parsing #3

Merged

Guard JSON parsing in Ashby fetcher against non-JSON responses #4

Draft

Update backend/src/config.ts

cd0b44e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI mentioned this pull request Feb 11, 2026

Validate company slugs in TheirStack API and reject invalid requests #5

Draft

This was referenced Feb 11, 2026

Fix limit parameter to respect config.jobCount for TheirStack fetcher #6

Merged

Centralize Ashby slug list to eliminate duplication between Python and TypeScript #7

Merged

Initial plan

e327559

Copilot AI mentioned this pull request Feb 11, 2026

Wrap fetch() in try/catch to handle network failures gracefully #8

Merged

Copilot AI and others added 2 commits February 11, 2026 18:42

Add error handling to fetchCompanyJobs to prevent abort on network fa…

99d9b44

…ilures Co-authored-by: Mayank-glitch-cpu <85422185+Mayank-glitch-cpu@users.noreply.github.com>

Merge pull request #8 from Mayank-glitch-cpu/copilot/sub-pr-2-please-…

b91c391

…work Wrap fetch() in try/catch to handle network failures gracefully

Mayank-glitch-cpu merged commit 8bb2885 into main Feb 11, 2026

	keywords,
	keywords: request.query.keywords ? keywords : 'default',

Conversation

Mayank-glitch-cpu commented Feb 11, 2026

Motivation

Description

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Mayank-glitch-cpu Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Mayank-glitch-cpu Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Mayank-glitch-cpu Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Mayank-glitch-cpu Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Mayank-glitch-cpu Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Mayank-glitch-cpu Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Mayank-glitch-cpu Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI commented Feb 11, 2026

Uh oh!

Copilot AI commented Feb 11, 2026

Uh oh!

Copilot AI commented Feb 11, 2026

Uh oh!

Copilot AI commented Feb 11, 2026

Uh oh!

Copilot AI commented Feb 11, 2026

Uh oh!

Copilot AI commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mayank-glitch-cpu Feb 11, 2026 •

edited

Loading