feat: authenticated portal scanning w/ LinkedIn support#379
feat: authenticated portal scanning w/ LinkedIn support#379DSnoNintendo wants to merge 9 commits intosantifer:mainfrom
Conversation
📝 WalkthroughWalkthroughThis PR adds a new authenticated scanning mode "scan-auth": a CLI entrypoint ( Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as scan-auth.mjs
participant Browser as Playwright<br/>Browser Context
participant Scanner as LinkedInScanner
participant FS as File System
User->>CLI: node scan-auth.mjs linkedin
CLI->>Browser: Launch persistent profile context
CLI->>Browser: Inject init script / hide navigator.webdriver
CLI->>Browser: Check session (scanner.checkSession)
Browser-->>CLI: session OK / not logged in
alt --login flow
User->>CLI: node scan-auth.mjs --login linkedin
CLI->>Browser: Navigate to loginUrl (headful)
CLI->>User: Prompt: press ENTER when logged in
User->>Browser: Complete interactive login
User->>CLI: Press ENTER
CLI->>Browser: Re-check session
end
CLI->>FS: Load data/scan-history.tsv → dedup keys
CLI->>Scanner: scan(context, config, { scanHistory })
loop per keyword/search
Scanner->>Browser: Navigate to search results page
Scanner->>Browser: Extract job cards
loop per visible card
Scanner->>Browser: Click card → capture job ID / canonical URL
Scanner->>Browser: Extract JD, apply URL, metadata
alt passes filters & new
Scanner->>CLI: add to listings
else filtered/duplicate
Scanner->>CLI: add to skipped
end
end
end
CLI->>FS: Write jds/{slug}.md (YAML frontmatter)
CLI->>FS: Append entries to data/pipeline.md
CLI->>FS: Append accepted & skipped rows to data/scan-history.tsv
CLI->>User: Print summary (searched, found, saved, skipped, errors)
CLI->>Browser: Close context
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
README.md (1)
298-304:⚠️ Potential issue | 🟡 MinorUpdate the mode count after adding
scan-auth.The tree now includes the new
scan-authmode, so# 14 skill modesis stale.Proposed docs fix
-├── modes/ # 14 skill modes +├── modes/ # 15 skill modes🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 298 - 304, Update the header comment that lists the number of mode files: change the line containing "# 14 skill modes" to reflect the new total after adding scan-auth.md (increment the count to include scan-auth.md); ensure the comment above the modes/ tree matches the actual files (e.g., "# 15 skill modes") so the README's modes/ section and the added scan-auth.md stay in sync.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/skills/career-ops/SKILL.md:
- Line 35: Update the docs to match the actual CLI behavior: in SKILL.md remove
references to unsupported flags `--dry-run`, `--search`, and `--max` from the
"scan-auth mode" description, leaving only the real supported `--login` flag, or
alternatively implement these flags in `scan-auth.mjs` (functions/arg parsing
tied to `scan-auth.mjs` and portal-specific `scan-auth/<portal>.mjs`) before
documenting them; ensure the README text and any router docs consistently
reflect whichever choice you make so agents won't rely on non-functional flags.
In `@CLAUDE.md`:
- Line 67: The docs currently list scan-auth.mjs in the main files but omit it
from the agent-facing skill-mode/command routing sections; update CLAUDE.md by
adding `scan-auth` (and a brief descriptor like "Authenticated portal scanner —
scan-auth.mjs") alongside every occurrence where `scan` is mentioned in the
skill-mode and command routing examples so agents can discover the authenticated
scanner; ensure you reference the main file name `scan-auth.mjs` and the related
folder `scan-auth/` in those routing lists and examples (same places that
mention `scan`) for consistency.
In `@modes/scan-auth.md`:
- Around line 9-12: Add blank lines before and after the fenced code block that
shows how to run the login command and set the fence language for the summary
output block; specifically, update the node login block around "node
scan-auth.mjs --login <portal>" to have an empty line before the opening ```bash
and after the closing ``` and change the summary output fence (the block that
starts with "{Portal} Scan — {YYYY-MM-DD}" in the same file) to use a language
tag such as text (i.e., ```text) instead of a bare fence; apply the same
blank-line-and-language fixes to the other similar fenced block around lines
39-53 to satisfy markdownlint.
In `@scan-auth.mjs`:
- Around line 84-87: The CLI currently exits when portalId is missing; change
this to the documented behavior by replacing the process.exit(1) path: when
portalId is falsy, iterate over supportedNames and invoke the existing scan
routine for each portal (the function in this file that handles a single portal
scan—refer to the scanner/handler used elsewhere in scan-auth.mjs) instead of
printing usage and exiting; retain the usage/exit behavior only for
invalid/unrecognized portal values and preserve the usage message when a
provided portal is not in supportedNames.
- Around line 319-325: The summary message when new listings are detected prints
an incorrect next-step command; in the block that checks results.listings (the
code using results.listings, portalId and the console.log that prints
`/career-ops ${portalId}`), change the guidance to instruct users to run
`/career-ops pipeline` instead of `/career-ops ${portalId}` so the next step
points to the routed pipeline mode.
- Around line 195-198: The appendToPipeline function currently returns silently
if PIPELINE_PATH doesn't exist; update it to handle a missing data/ directory by
either creating the parent directory and the pipeline.md file before writing or
by returning a write count / throwing a warning/error so the caller can detect
failure; specifically modify appendToPipeline (and its use sites) to check
existsSync(PIPELINE_PATH), and if missing create the directory
(fs.mkdirSync(path.dirname(PIPELINE_PATH), { recursive: true })) and an empty
PIPELINE_PATH file, or alternatively return 0 and log a warning via the same
logger so the caller reading the write count can report failure instead of
reporting success. Ensure the change touches appendToPipeline, PIPELINE_PATH
checks, and the code path that reports success so consumers know when pipeline
insertion was impossible.
In `@scan-auth/linkedin.mjs`:
- Around line 624-626: The code currently synthesizes a LinkedIn view URL even
when jobId is missing; update the logic around applicationUrl (the variable set
from applyEl?.href?.trim() ?? `https://www.linkedin.com/jobs/view/${jobId}/`) to
only construct the `https://www.linkedin.com/jobs/view/${jobId}/` fallback when
`jobId` (or currentJobId) is truthy/non-empty; otherwise set applicationUrl to
an empty string (or null) so no fake URL is saved. Ensure you still trim
applyEl.href if present and prefer it over the view URL, but guard the template
fallback with an explicit check for jobId before building it.
- Around line 375-386: The hostname check in `#unwrapRedirect` is too permissive
(uses hostname.includes('linkedin.com')) allowing domains like notlinkedin.com;
update it to only accept exact LinkedIn hostnames (e.g., linkedin.com,
www.linkedin.com, touch.www.linkedin.com, www.linkedin.cn, or other canonical
LinkedIn hosts used by your app) before processing /safety/go and decoding the
nested url; ensure the check happens on u.hostname (or normalized hostname) and
only then call u.searchParams.get('url') and decodeURIComponent, keeping the
rest of the existing logic in `#unwrapRedirect` intact.
- Around line 357-368: The current `#buildSearches` method appends human-readable
date and experience text into the keywords string (levelPrefix and dateSuffix)
which LinkedIn ignores; instead map config.date_posted to f_TPR values (e.g.,
'24' -> 'r86400', 'Week' -> 'r604800', 'Month' -> 'r2592000') and map
config.experience_level entries to LinkedIn numeric f_E codes (1..6) joined by
commas, then build URLSearchParams with keywords set to the plain kw and add
f_TPR and f_E parameters when present; update the code that constructs params
(replace the levelPrefix/dateSuffix logic and the params = new URLSearchParams({
keywords: query })) to include params.append('f_TPR', mappedValue) and
params.append('f_E', mappedCsv) so filters are sent as real LinkedIn filter
parameters.
- Around line 208-234: Reorder and protect per-card processing: after obtaining
the handle from this.#getCard(page, i) call, call this.isJobCardViewed(page,
card) before invoking this.extractJob so you don't click/open viewed listings;
if skipViewed is true and the card is viewed, use a new no-click helper (e.g.,
extractJobMetadataNoClick or similar) that reads
title/company/location/listingUrl from the card DOM without clicking, push the
skipped entry and stats, and skip extraction; wrap the whole per-card logic (the
sequence around this.#getCard, this.isJobCardViewed, this.extractJob and pushing
results) in a try/finally that always calls await card.dispose() in the finally
block so the handle is never leaked, and only call extractJob (which clicks)
when you know you will process the detail panel.
- Around line 464-475: In `#clickAndExtractJobId`, replace the fixed
sleep(CARD_CLICK_DELAY_MS) after clicking the card with a wait-for-change using
page.waitForFunction: capture the prior URL or prior currentJobId before the
click, perform the click via page.evaluate as currently done, then call
page.waitForFunction to poll until new URL.searchParams.get('currentJobId') !==
priorJobId (or until page.url() changes), with a sensible timeout; after
waitForFunction resolves, read the updated page.url() and extract currentJobId
and listingUrl as before to reliably get the updated job id instead of racing
with a fixed delay.
---
Outside diff comments:
In `@README.md`:
- Around line 298-304: Update the header comment that lists the number of mode
files: change the line containing "# 14 skill modes" to reflect the new total
after adding scan-auth.md (increment the count to include scan-auth.md); ensure
the comment above the modes/ tree matches the actual files (e.g., "# 15 skill
modes") so the README's modes/ section and the added scan-auth.md stay in sync.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1f4f2805-1431-455a-a8d0-df31ca12de79
📒 Files selected for processing (9)
.claude/skills/career-ops/SKILL.mdCLAUDE.mdDATA_CONTRACT.mdREADME.mdmodes/scan-auth.mdpackage.jsonscan-auth.mjsscan-auth/linkedin.mjstemplates/portals.example.yml
| function appendToPipeline(listings) { | ||
| if (listings.length === 0) return; | ||
| if (!existsSync(PIPELINE_PATH)) return; | ||
|
|
There was a problem hiding this comment.
Do not silently drop pipeline entries when data/pipeline.md is missing.
Right now JDs and scan history can be written while pipeline.md is skipped, but Line 415 still reports success. Create the data/ file, or return a write count and warn/fail when pipeline insertion is impossible. As per coding guidelines, **/*.mjs scripts should handle missing data/ directories gracefully.
One possible fix direction
function appendToPipeline(listings) {
- if (listings.length === 0) return;
- if (!existsSync(PIPELINE_PATH)) return;
+ if (listings.length === 0) return 0;
+ mkdirSync(dirname(PIPELINE_PATH), { recursive: true });
+ if (!existsSync(PIPELINE_PATH)) {
+ writeFileSync(PIPELINE_PATH, '# Pipeline\n\n## Pending\n', 'utf-8');
+ }
let text = readFileSync(PIPELINE_PATH, 'utf-8');
@@
- if (firstH2 === -1) return;
+ if (firstH2 === -1) {
+ warn(`Could not find a pending section in ${PIPELINE_PATH}; pipeline not updated`);
+ return 0;
+ }
@@
writeFileSync(PIPELINE_PATH, text, 'utf-8');
+ return listings.length;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function appendToPipeline(listings) { | |
| if (listings.length === 0) return; | |
| if (!existsSync(PIPELINE_PATH)) return; | |
| function appendToPipeline(listings) { | |
| if (listings.length === 0) return 0; | |
| mkdirSync(dirname(PIPELINE_PATH), { recursive: true }); | |
| if (!existsSync(PIPELINE_PATH)) { | |
| writeFileSync(PIPELINE_PATH, '# Pipeline\n\n## Pending\n', 'utf-8'); | |
| } | |
| let text = readFileSync(PIPELINE_PATH, 'utf-8'); | |
| if (firstH2 === -1) { | |
| warn(`Could not find a pending section in ${PIPELINE_PATH}; pipeline not updated`); | |
| return 0; | |
| } | |
| writeFileSync(PIPELINE_PATH, text, 'utf-8'); | |
| return listings.length; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scan-auth.mjs` around lines 195 - 198, The appendToPipeline function
currently returns silently if PIPELINE_PATH doesn't exist; update it to handle a
missing data/ directory by either creating the parent directory and the
pipeline.md file before writing or by returning a write count / throwing a
warning/error so the caller can detect failure; specifically modify
appendToPipeline (and its use sites) to check existsSync(PIPELINE_PATH), and if
missing create the directory (fs.mkdirSync(path.dirname(PIPELINE_PATH), {
recursive: true })) and an empty PIPELINE_PATH file, or alternatively return 0
and log a warning via the same logger so the caller reading the write count can
report failure instead of reporting success. Ensure the change touches
appendToPipeline, PIPELINE_PATH checks, and the code path that reports success
so consumers know when pipeline insertion was impossible.
There was a problem hiding this comment.
@DSnoNintendo, understood — the silent no-op when pipeline.md is absent is intentional and consistent with scan.mjs's existing behavior. I'll note this for future reviews.
✏️ Learnings added
Learnt from: DSnoNintendo
Repo: santifer/career-ops PR: 379
File: scan-auth.mjs:195-198
Timestamp: 2026-04-19T17:51:06.950Z
Learning: In `scan-auth.mjs` (and consistent with `scan.mjs`), `appendToPipeline` intentionally returns silently (no error/warning) when `data/pipeline.md` does not exist. This is a deliberate design choice to mirror the existing `scan.mjs` behavior — do not flag it as a missing-data-directory handling issue.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: CR
Repo: santifer/career-ops PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-19T10:52:21.261Z
Learning: Applies to data/applications.md : Run `node dedup-tracker.mjs` to remove duplicate entries in the applications.md tracker based on company+role matching
Learnt from: CR
Repo: santifer/career-ops PR: 0
File: GEMINI.md:0-0
Timestamp: 2026-04-19T10:52:39.223Z
Learning: Run 'node verify-pipeline.mjs' to check data pipeline health and integrity
Learnt from: CR
Repo: santifer/career-ops PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-19T10:52:21.261Z
Learning: Run `node verify-pipeline.mjs` to health-check the tracker and reports pipeline
Learnt from: CR
Repo: santifer/career-ops PR: 0
File: GEMINI.md:0-0
Timestamp: 2026-04-19T10:52:39.223Z
Learning: Applies to **/*.{mjs,js} : Script files like update-system.mjs, verify-pipeline.mjs, generate-pdf.mjs, merge-tracker.mjs, and gemini-eval.mjs are auto-updatable system files — NEVER put user data in these files
Learnt from: CR
Repo: santifer/career-ops PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-19T10:52:21.261Z
Learning: Applies to {*.mjs,templates/**,config/**} : All scripts must be Node.js .mjs modules; use Playwright for PDF generation and web scraping, YAML for configuration, and HTML/CSS for CV templates
Learnt from: CR
Repo: santifer/career-ops PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-19T10:52:21.261Z
Learning: GitHub Actions CI/CD runs test-all.mjs (63+ checks) on every PR, applies auto-labels based on risk (🔴 core-architecture, ⚠️ agent-behavior, 📄 docs), and enforces status checks before merge to main
| const card = await this.#getCard(page, i); | ||
| const cardExists = await page.evaluate(c => c != null, card); | ||
| if (!cardExists) { | ||
| warn(` ✗ Card ${i} not found`); | ||
| stats.errors++; | ||
| consecutiveFailures++; | ||
| await card.dispose(); | ||
| continue; | ||
| } | ||
|
|
||
| // 1. Extract title, company, click card, extract detail | ||
| const data = await this.extractJob(page, card); | ||
|
|
||
| // 2. Check viewed label | ||
| if (skipViewed && await this.isJobCardViewed(page, card)) { | ||
| log(` ✗ Viewed: skipped card ${i}`); | ||
| stats.skipped_viewed++; | ||
| skipped.push({ | ||
| url: data.listingUrl || '', | ||
| title: data.title || '', | ||
| company: data.company || '', | ||
| status: 'skipped_viewed_linkedin', | ||
| }); | ||
| await card.dispose(); | ||
| continue; | ||
| } | ||
| await card.dispose(); |
There was a problem hiding this comment.
❓ Verification inconclusive
Script executed:
wc -l scan-auth/linkedin.mjsRepository: santifer/career-ops
Repository: santifer/career-ops
Exit code: 0
stdout:
636 scan-auth/linkedin.mjs
Script executed:
cat -n scan-auth/linkedin.mjs | head -250Repository: santifer/career-ops
Repository: santifer/career-ops
Exit code: 0
stdout:
1 /**
2 * LinkedIn Scanner
3 *
4 * All LinkedIn-specific logic: selectors, config parsing, session checks,
5 * pagination, card extraction, search URL construction, and the scan loop.
6 *
7 * Per-card flow (scan loop):
8 * 1. Resolve card element once via `#getCard` (evaluateHandle)
9 * 2. extractJob() reads title/company/location from the card DOM,
10 * then clicks the card via `#clickAndExtractJobId` to get the job ID
11 * from the URL's currentJobId param, then calls `#extractDetailFromPanel`
12 * to scrape the apply link and JD text from the opened detail panel
13 * 3. isJobCardViewed() checks for LinkedIn's "Viewed" label on the card
14 * 4. Blocklist, dedup (job ID + company::title), and title filter run
15 * against the job data — dedup catches both same-run duplicates
16 * and cross-portal matches (Greenhouse/Ashby/Lever via scan-history.tsv)
17 * 5. JD-level filter and apply URL validation run post-click
18 * 6. Accepted listings and skipped entries are returned to the harness
19 *
20 * Apply URL resolution:
21 * - External apply link (via xpathApplyUrl) → unwrapped from LinkedIn's
22 * /safety/go redirect
23 * - Easy Apply / no external link → falls back to the listing URL
24 * (https://www.linkedin.com/jobs/view/{JOB_ID}/)
25 *
26 * Listing URL:
27 * Always stored as https://www.linkedin.com/jobs/view/{JOB_ID}/, extracted
28 * from the currentJobId URL param after clicking the card.
29 */
30
31
32 import yaml from 'js-yaml';
33
34 // ---------------------------------------------------------------------------
35 // Selectors — grouped for easy maintenance when LinkedIn changes DOM
36 // ---------------------------------------------------------------------------
37
38 const SELECTORS = {
39 xpathListingCard: "//button[starts-with(`@aria-label`, 'Dismiss') and contains(`@aria-label`, 'job')]/ancestor::div[`@role`='button']",
40 /** Dismiss control on each left-rail card; aria-label is `Dismiss {job title} job` (EN UI). */
41 cardDismissButtonCss: 'button[aria-label^="Dismiss"][aria-label*=" job"]',
42 /** Company name often appears on this anchor when present. */
43 cardCompanyLinkQuery: 'a[href*="/company/"]',
44 xpathApplyUrl: "//a[`@aria-label`='Apply on company website']",
45 xpathMoreButton: "//span[normalize-space(text())='more']",
46 jdContent: 'span[data-testid="expandable-text-box"]',
47 loggedIn: 'a[aria-label*="My Network"]',
48 xpathCurrentPage: "//button[`@aria-current`='true'][starts-with(`@aria-label`, 'Page')]",
49 xpathPageButton: "//button[starts-with(`@aria-label`, 'Page')]",
50
51 viewedStatusTagQuery: 'p, span, li',
52 viewedStatusLabels: ['Viewed'],
53 /** Characters allowed between the label and the next status token in `innerText` (middle dot, bullet, pipe) */
54 viewedStatusLineSeparatorCharClass: '·•|',
55 };
56
57 const NOISE_LABELS = new Set([
58 'more', 'show more', 'see more',
59 'less', 'show less', 'see less',
60 'retry premium',
61 ]);
62 const CARD_CLICK_DELAY_MS = 1000;
63 const SESSION_CHECK_DELAY_MS = 3000;
64 const NAV_TIMEOUT_MS = 30000;
65 const DEFAULT_DELAY_PAGES_MS = [3000, 8000];
66 const DEFAULT_DELAY_SEARCHES_MS = [5000, 15000];
67
68 function randomDelay(range) {
69 const [min, max] = range;
70 return Math.floor(Math.random() * (max - min) + min);
71 }
72 function sleep(ms) { return new Promise(r => setTimeout(r, ms)); }
73
74 function log(msg) { console.log(`[linkedin] ${msg}`); }
75 function warn(msg) { console.warn(`[linkedin] ⚠ ${msg}`); }
76
77 export default class LinkedInScanner {
78 name = 'LinkedIn';
79 portalId = 'linkedin';
80 loginUrl = 'https://www.linkedin.com/login';
81 feedUrl = 'https://www.linkedin.com/feed/';
82
83 // -------------------------------------------------------------------------
84 // Config parsing — extracts linkedin_searches section from portals.yml
85 // -------------------------------------------------------------------------
86
87 parseConfig(raw) {
88 const doc = yaml.load(raw);
89 const ls = doc.linkedin_searches || {};
90 return {
91 title_filter: doc.title_filter || { positive: [], negative: [] },
92 keywords: ls.keywords || [],
93 employer_blocklist: ls.employer_blocklist || [],
94 date_posted: ls.date_posted,
95 max_results: ls.max_results_per_search,
96 delay_pages: ls.delay_between_pages_ms,
97 delay_searches: ls.delay_between_searches_ms,
98 experience_level: ls.experience_level,
99 skip_viewed: ls.skip_viewed,
100 };
101 }
102
103 // -------------------------------------------------------------------------
104 // Session management
105 // -------------------------------------------------------------------------
106
107 async isLoggedIn(page) {
108 const url = page.url();
109 if (url.includes('/login') || url.includes('/uas/') || url.includes('/checkpoint/')) {
110 return false;
111 }
112 if (await page.$(SELECTORS.loggedIn)) return true;
113 return false;
114 }
115
116 async checkSession(page) {
117 await page.goto(this.feedUrl, { waitUntil: 'domcontentloaded', timeout: NAV_TIMEOUT_MS });
118 await sleep(SESSION_CHECK_DELAY_MS);
119 return this.isLoggedIn(page);
120 }
121
122 // -------------------------------------------------------------------------
123 // Scan — the main extraction loop
124 //
125 // Handles extraction, filtering, dedup, and employer blocklist.
126 // Returns only accepted listings ready to be saved.
127 //
128 // Options: { scanHistory, skipViewed }
129 // -------------------------------------------------------------------------
130
131 async scan(context, config, options = {}) {
132 const maxPerSearch = options.maxResults || config.max_results || 25;
133 const delayPages = config.delay_pages || DEFAULT_DELAY_PAGES_MS;
134 const delaySearches = config.delay_searches || DEFAULT_DELAY_SEARCHES_MS;
135 const titleFilter = config.title_filter;
136 const employerBlocklist = config.employer_blocklist || [];
137 const scanHistory = options.scanHistory || new Set();
138 /** Omit cards LinkedIn marks as already opened. Default true when `skip_viewed` is absent in portals.yml. */
139 const skipViewed = options.skipViewed !== undefined
140 ? Boolean(options.skipViewed)
141 : config.skip_viewed !== false;
142
143 const keywords = config.keywords || [];
144 if (keywords.length === 0) {
145 log('No keywords found in portals.yml');
146 return null;
147 }
148
149 const searches = this.#buildSearches(config);
150
151 const toRun = options.searchFilter
152 ? searches.filter(s => s.name === options.searchFilter)
153 : searches;
154
155 if (toRun.length === 0) {
156 log(`No keyword matching "${options.searchFilter}"`);
157 log(`Available: ${searches.map(s => s.name).join(', ')}`);
158 return null;
159 }
160
161 const listings = [];
162 const skipped = [];
163 const errors = [];
164 const stats = {
165 searched: 0, found: 0, extracted: 0,
166 skipped_filter: 0, skipped_dedup: 0, skipped_viewed: 0, errors: 0,
167 };
168
169 // Circuit breaker: bail out after too many consecutive extraction failures
170 const MAX_CONSECUTIVE_FAILURES = 15;
171 let consecutiveFailures = 0;
172
173 // Single page for all searches — avoids Playwright stealing window focus
174 const page = await context.newPage();
175
176 for (const search of toRun) {
177 log(`\n── Search: ${search.name} ──`);
178 stats.searched++;
179 consecutiveFailures = 0; // reset circuit breaker per search
180
181 try {
182 await page.goto(search.url, { waitUntil: 'domcontentloaded', timeout: NAV_TIMEOUT_MS });
183 await sleep(randomDelay(delayPages));
184
185 let accepted = 0;
186 let hasNextPage = true;
187
188 while (hasNextPage && accepted < maxPerSearch) {
189 const currentPage = await this.#getCurrentPage(page);
190 log(`Page ${currentPage || 1}`);
191
192 const cardCount = await this.#getCardCount(page);
193 log(`Found ${cardCount} job cards`);
194 stats.found += cardCount;
195
196 for (let i = 0; i < cardCount; i++) {
197 if (accepted >= maxPerSearch) {
198 log(`Reached max results (${maxPerSearch}) for this search`);
199 break;
200 }
201
202 // Circuit breaker: stop if extraction is consistently failing
203 if (consecutiveFailures >= MAX_CONSECUTIVE_FAILURES) {
204 warn(`${MAX_CONSECUTIVE_FAILURES} consecutive extraction failures — stopping this search (likely throttled or DOM changed)`);
205 break;
206 }
207
208 const card = await this.#getCard(page, i);
209 const cardExists = await page.evaluate(c => c != null, card);
210 if (!cardExists) {
211 warn(` ✗ Card ${i} not found`);
212 stats.errors++;
213 consecutiveFailures++;
214 await card.dispose();
215 continue;
216 }
217
218 // 1. Extract title, company, click card, extract detail
219 const data = await this.extractJob(page, card);
220
221 // 2. Check viewed label
222 if (skipViewed && await this.isJobCardViewed(page, card)) {
223 log(` ✗ Viewed: skipped card ${i}`);
224 stats.skipped_viewed++;
225 skipped.push({
226 url: data.listingUrl || '',
227 title: data.title || '',
228 company: data.company || '',
229 status: 'skipped_viewed_linkedin',
230 });
231 await card.dispose();
232 continue;
233 }
234 await card.dispose();
235
236
237 if (!data.title) {
238 warn(` ✗ No title on card ${i} (data); skipping`);
239 stats.errors++;
240 consecutiveFailures++;
241 continue;
242 }
243 // 3. Blocklist, dedup, and title filter
244 if (employerBlocklist.length && data.company) {
245 const companyLower = data.company.toLowerCase();
246 if (employerBlocklist.some(b => companyLower === b.toLowerCase())) {
247 log(` ✗ Blocked employer: ${data.company}`);
248 stats.skipped_filter++;
249 continue;
250 }
Script executed:
cat -n scan-auth/linkedin.mjs | sed -n '190,280p'Repository: santifer/career-ops
Repository: santifer/career-ops
Exit code: 0
stdout:
190 log(`Page ${currentPage || 1}`);
191
192 const cardCount = await this.#getCardCount(page);
193 log(`Found ${cardCount} job cards`);
194 stats.found += cardCount;
195
196 for (let i = 0; i < cardCount; i++) {
197 if (accepted >= maxPerSearch) {
198 log(`Reached max results (${maxPerSearch}) for this search`);
199 break;
200 }
201
202 // Circuit breaker: stop if extraction is consistently failing
203 if (consecutiveFailures >= MAX_CONSECUTIVE_FAILURES) {
204 warn(`${MAX_CONSECUTIVE_FAILURES} consecutive extraction failures — stopping this search (likely throttled or DOM changed)`);
205 break;
206 }
207
208 const card = await this.#getCard(page, i);
209 const cardExists = await page.evaluate(c => c != null, card);
210 if (!cardExists) {
211 warn(` ✗ Card ${i} not found`);
212 stats.errors++;
213 consecutiveFailures++;
214 await card.dispose();
215 continue;
216 }
217
218 // 1. Extract title, company, click card, extract detail
219 const data = await this.extractJob(page, card);
220
221 // 2. Check viewed label
222 if (skipViewed && await this.isJobCardViewed(page, card)) {
223 log(` ✗ Viewed: skipped card ${i}`);
224 stats.skipped_viewed++;
225 skipped.push({
226 url: data.listingUrl || '',
227 title: data.title || '',
228 company: data.company || '',
229 status: 'skipped_viewed_linkedin',
230 });
231 await card.dispose();
232 continue;
233 }
234 await card.dispose();
235
236
237 if (!data.title) {
238 warn(` ✗ No title on card ${i} (data); skipping`);
239 stats.errors++;
240 consecutiveFailures++;
241 continue;
242 }
243 // 3. Blocklist, dedup, and title filter
244 if (employerBlocklist.length && data.company) {
245 const companyLower = data.company.toLowerCase();
246 if (employerBlocklist.some(b => companyLower === b.toLowerCase())) {
247 log(` ✗ Blocked employer: ${data.company}`);
248 stats.skipped_filter++;
249 continue;
250 }
251 }
252
253 const companyTitleKey = (data.company && data.title)
254 ? `${data.company}::${data.title}`.toLowerCase() : null;
255 const dedupKey = data.jobId || companyTitleKey;
256 if ((dedupKey && scanHistory.has(dedupKey)) || (companyTitleKey && scanHistory.has(companyTitleKey))) {
257 log(` ✗ Already seen: ${data.title} (${data.company})`);
258 stats.skipped_dedup++;
259 skipped.push({
260 url: data.listingUrl || '',
261 title: data.title || '',
262 company: data.company || '',
263 status: 'skipped_dup',
264 });
265 continue;
266 }
267
268 if (!this.#matchesFilter(data.title, '', titleFilter)) {
269 log(` ✗ Filtered: ${data.title} (${data.company})`);
270 stats.skipped_filter++;
271 continue;
272 }
273
274 stats.extracted++;
275
276 if (!data.applicationUrl) {
277 log(` ✗ No apply URL: ${data.title} (${data.company})`);
278 stats.skipped_filter++;
279 continue;
280 }
Script executed:
cat -n scan-auth/linkedin.mjs | sed -n '290,380p'Repository: santifer/career-ops
Repository: santifer/career-ops
Exit code: 0
stdout:
290 stats.errors++;
291 consecutiveFailures++;
292 continue;
293 }
294
295 consecutiveFailures = 0;
296 if (dedupKey) scanHistory.add(dedupKey);
297 if (companyTitleKey) scanHistory.add(companyTitleKey);
298 listings.push({
299 title: data.title,
300 company: data.company,
301 applicationUrl: data.applicationUrl,
302 jdText: data.jdText,
303 url: data.listingUrl,
304 });
305 accepted++;
306 log(` ✓ Accepted: ${data.title} at ${data.company}`);
307 }
308
309 if (consecutiveFailures >= MAX_CONSECUTIVE_FAILURES) {
310 hasNextPage = false;
311 } else if (accepted < maxPerSearch) {
312 hasNextPage = await this.#goToNextPage(page);
313 if (hasNextPage) {
314 log(`Navigating to next page...`);
315 await sleep(randomDelay(delayPages));
316 }
317 } else {
318 hasNextPage = false;
319 }
320 }
321 } catch (e) {
322 log(`Search "${search.name}" failed: ${e.message}`);
323 errors.push({ search: search.name, error: e.message });
324 stats.errors++;
325 }
326
327 if (toRun.indexOf(search) < toRun.length - 1) {
328 const d = randomDelay(delaySearches);
329 log(`Waiting ${(d / 1000).toFixed(1)}s before next search...`);
330 await sleep(d);
331 }
332 }
333
334 await page.close();
335 return { listings, skipped, errors, stats };
336 }
337
338 // -------------------------------------------------------------------------
339 // Private — filtering
340 // -------------------------------------------------------------------------
341
342 `#matchesFilter`(title, jdText, filter) {
343 if (!filter) return true;
344 const combined = `${title} ${jdText}`.toLowerCase();
345 const titleLower = title.toLowerCase();
346 const hasPositive = !filter.positive?.length ||
347 filter.positive.some(kw => combined.includes(kw.toLowerCase()));
348 const hasNegative = filter.negative?.length &&
349 filter.negative.some(kw => titleLower.includes(kw.toLowerCase()));
350 return hasPositive && !hasNegative;
351 }
352
353 // -------------------------------------------------------------------------
354 // Private — search URL construction
355 // -------------------------------------------------------------------------
356
357 `#buildSearches`(config) {
358 const datePostedMap = { '24': 'past 24 hours', 'Week': 'past week', 'Month': 'past month' };
359 const dateSuffix = datePostedMap[config.date_posted] || '';
360 const levels = config.experience_level || [];
361 const levelPrefix = levels.length ? levels.join(' or ') : '';
362
363 return config.keywords.map(kw => {
364 let query = levelPrefix ? `${levelPrefix} ${kw}` : kw;
365 if (dateSuffix) query += ` posted in the ${dateSuffix}`;
366 const params = new URLSearchParams({ keywords: query });
367 return { name: kw, url: `https://www.linkedin.com/jobs/search-results/?${params}` };
368 });
369 }
370
371 // -------------------------------------------------------------------------
372 // Private — URL helpers
373 // -------------------------------------------------------------------------
374
375 `#unwrapRedirect`(href) {
376 const trimmed = (href || '').trim();
377 if (!trimmed) return '';
378 try {
379 const u = new URL(trimmed);
380 if (!u.hostname.includes('linkedin.com')) return trimmed;
Script executed:
grep -n "extractJob" scan-auth/linkedin.mjs | head -5Repository: santifer/career-ops
Repository: santifer/career-ops
Exit code: 0
stdout:
9: * 2. extractJob() reads title/company/location from the card DOM,
219: const data = await this.extractJob(page, card);
488: async extractJob(page, card) {
Script executed:
cat -n scan-auth/linkedin.mjs | sed -n '290,350p'Repository: santifer/career-ops
Repository: santifer/career-ops
Exit code: 0
stdout:
290 stats.errors++;
291 consecutiveFailures++;
292 continue;
293 }
294
295 consecutiveFailures = 0;
296 if (dedupKey) scanHistory.add(dedupKey);
297 if (companyTitleKey) scanHistory.add(companyTitleKey);
298 listings.push({
299 title: data.title,
300 company: data.company,
301 applicationUrl: data.applicationUrl,
302 jdText: data.jdText,
303 url: data.listingUrl,
304 });
305 accepted++;
306 log(` ✓ Accepted: ${data.title} at ${data.company}`);
307 }
308
309 if (consecutiveFailures >= MAX_CONSECUTIVE_FAILURES) {
310 hasNextPage = false;
311 } else if (accepted < maxPerSearch) {
312 hasNextPage = await this.#goToNextPage(page);
313 if (hasNextPage) {
314 log(`Navigating to next page...`);
315 await sleep(randomDelay(delayPages));
316 }
317 } else {
318 hasNextPage = false;
319 }
320 }
321 } catch (e) {
322 log(`Search "${search.name}" failed: ${e.message}`);
323 errors.push({ search: search.name, error: e.message });
324 stats.errors++;
325 }
326
327 if (toRun.indexOf(search) < toRun.length - 1) {
328 const d = randomDelay(delaySearches);
329 log(`Waiting ${(d / 1000).toFixed(1)}s before next search...`);
330 await sleep(d);
331 }
332 }
333
334 await page.close();
335 return { listings, skipped, errors, stats };
336 }
337
338 // -------------------------------------------------------------------------
339 // Private — filtering
340 // -------------------------------------------------------------------------
341
342 `#matchesFilter`(title, jdText, filter) {
343 if (!filter) return true;
344 const combined = `${title} ${jdText}`.toLowerCase();
345 const titleLower = title.toLowerCase();
346 const hasPositive = !filter.positive?.length ||
347 filter.positive.some(kw => combined.includes(kw.toLowerCase()));
348 const hasNegative = filter.negative?.length &&
349 filter.negative.some(kw => titleLower.includes(kw.toLowerCase()));
350 return hasPositive && !hasNegative;
Script executed:
cat -n scan-auth/linkedin.mjs | sed -n '488,550p'Repository: santifer/career-ops
Repository: santifer/career-ops
Exit code: 0
stdout:
488 async extractJob(page, card) {
489 const data = await page.evaluate(({ card, sel, noise }) => {
490 const noiseSet = new Set((noise || []).map((s) => String(s).toLowerCase()));
491
492 let title = '';
493 const dismiss = card.querySelector(sel.cardDismissButtonCss || 'button[aria-label^="Dismiss"]');
494 if (dismiss) {
495 const al = dismiss.getAttribute('aria-label') || '';
496 const m = al.match(/^Dismiss\s+(.+?)\s+job\s*$/i);
497 if (m) title = m[1].trim();
498 }
499
500 let company = '';
501 for (const a of card.querySelectorAll(sel.cardCompanyLinkQuery || 'a[href*="/company/"]')) {
502 const t = (a.textContent ?? '').trim();
503 if (t.length >= 1 && t.length < 120 && !noiseSet.has(t.toLowerCase())) {
504 company = t;
505 break;
506 }
507 }
508
509 function looksLikeLocation(s) {
510 return /\((On-?site|Hybrid|Remote)\)/i.test(s)
511 || /,\s*[A-Z]{2}\b/.test(s)
512 || /\bRemote\b/i.test(s);
513 }
514
515 function looksLikeMetaLine(s) {
516 const lower = s.toLowerCase();
517 return lower === 'viewed'
518 || lower === '·'
519 || /school alumni work(s)? here/i.test(s)
520 || /early applicant/i.test(lower)
521 || /^\d+ benefits?$/i.test(s)
522 || /^posted on\b/i.test(s)
523 || /\b(hour|day|week|month)s?\s+ago$/i.test(s)
524 || /^[\d·|•\s]+$/.test(s);
525 }
526
527 if (!company) {
528 const paragraphs = [...card.querySelectorAll('p')]
529 .map((p) => (p.textContent ?? '').trim())
530 .filter((t) => t.length > 0 && t.length < 200);
531 for (const t of paragraphs) {
532 if (noiseSet.has(t.toLowerCase())) continue;
533 if (title && (t === title || t.includes(title))) continue;
534 if (looksLikeMetaLine(t)) continue;
535 if (looksLikeLocation(t)) continue;
536 if (t.length < 2) continue;
537 company = t;
538 break;
539 }
540 }
541
542 let location = '';
543 for (const t of [...card.querySelectorAll('p')].map((p) => (p.textContent ?? '').trim())) {
544 if (looksLikeLocation(t) && t !== title && t !== company) {
545 location = t;
546 break;
547 }
548 }
549
550 return {
Script executed:
grep -n "isJobCardViewed" scan-auth/linkedin.mjs | head -5Repository: santifer/career-ops
Repository: santifer/career-ops
Exit code: 0
stdout:
13: * 3. isJobCardViewed() checks for LinkedIn's "Viewed" label on the card
222: if (skipViewed && await this.isJobCardViewed(page, card)) {
576: async isJobCardViewed(page, card) {
Script executed:
cat -n scan-auth/linkedin.mjs | sed -n '560,620p'Repository: santifer/career-ops
Repository: santifer/career-ops
Exit code: 0
stdout:
560 data.listingUrl = listingUrl;
561 data.clicked = clicked;
562
563 // Extract detail from the now-open panel
564 if (clicked) {
565 const detail = await this.#extractDetailFromPanel(page, jobId);
566 data.applicationUrl = this.#unwrapRedirect(detail.applicationUrl);
567 data.jdText = detail.jdText;
568 }
569
570 return data;
571 }
572
573 /**
574 * Whether LinkedIn marks the listing card as already opened (see SELECTORS.viewedStatusLabels).
575 */
576 async isJobCardViewed(page, card) {
577 return page.evaluate(({ card, sel }) => {
578 const labels = sel.viewedStatusLabels || [];
579 const labelSet = new Set(labels.map((s) => String(s).toLowerCase()));
580 const tagQuery = sel.viewedStatusTagQuery || 'p, span, li';
581 for (const el of card.querySelectorAll(tagQuery)) {
582 const t = (el.textContent ?? '').trim();
583 if (t && labelSet.has(t.toLowerCase())) return true;
584 }
585
586 const sepClass = sel.viewedStatusLineSeparatorCharClass || '·•|';
587 const line = (card.innerText ?? '').replace(/\s+/g, ' ').trim();
588 for (const raw of labels) {
589 const esc = String(raw).replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
590 const re = new RegExp(`\\b${esc}\\b\\s*[${sepClass}]`, 'i');
591 if (re.test(line)) return true;
592 }
593
594 return false;
595 }, { card, sel: SELECTORS });
596 }
597
598 /**
599 * Scrape the detail panel (right side) after a card has been clicked.
600 * Clicks "more" to expand truncated JDs, then extracts the apply URL
601 * and full JD text. Falls back to the listing URL if no external apply
602 * link is found (Easy Apply jobs).
603 */
604 async `#extractDetailFromPanel`(page, jobId) {
605 // Try to expand the description
606 await page.evaluate(({ xpath }) => {
607 const result = document.evaluate(xpath, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);
608 const moreSpan = result.singleNodeValue;
609 if (moreSpan) moreSpan.click();
610 }, { xpath: SELECTORS.xpathMoreButton });
611 await sleep(250);
612
613 const detail = await page.evaluate(({ sel, jobId }) => {
614 function xpathFirst(expression) {
615 const result = document.evaluate(
616 expression, document, null,
617 XPathResult.FIRST_ORDERED_NODE_TYPE, null
618 );
619 return result.singleNodeValue;
620 }
Script executed:
cat -n scan-auth/linkedin.mjs | sed -n '550,575p'Repository: santifer/career-ops
Repository: santifer/career-ops
Exit code: 0
stdout:
550 return {
551 title: title || '',
552 company: company || '',
553 location: location || '',
554 };
555 }, { card, sel: SELECTORS, noise: [...NOISE_LABELS] });
556
557 // Click card and extract job ID from the resulting URL
558 const { clicked, jobId, listingUrl } = await this.#clickAndExtractJobId(page, card);
559 data.jobId = jobId;
560 data.listingUrl = listingUrl;
561 data.clicked = clicked;
562
563 // Extract detail from the now-open panel
564 if (clicked) {
565 const detail = await this.#extractDetailFromPanel(page, jobId);
566 data.applicationUrl = this.#unwrapRedirect(detail.applicationUrl);
567 data.jdText = detail.jdText;
568 }
569
570 return data;
571 }
572
573 /**
574 * Whether LinkedIn marks the listing card as already opened (see SELECTORS.viewedStatusLabels).
575 */
Script executed:
grep -n "clickAndExtractJobId" scan-auth/linkedin.mjs | head -3Repository: santifer/career-ops
Repository: santifer/career-ops
Exit code: 0
stdout:
10: * then clicks the card via `#clickAndExtractJobId` to get the job ID
464: async `#clickAndExtractJobId`(page, card) {
558: const { clicked, jobId, listingUrl } = await this.#clickAndExtractJobId(page, card);
Script executed:
cat -n scan-auth/linkedin.mjs | sed -n '440,485p'Repository: santifer/career-ops
Repository: santifer/career-ops
Exit code: 0
stdout:
440
441 // -------------------------------------------------------------------------
442 // Private — extraction helpers
443 // -------------------------------------------------------------------------
444
445 async `#getCardCount`(page) {
446 return page.evaluate((xpath) => {
447 const result = document.evaluate(xpath, document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
448 return result.snapshotLength;
449 }, SELECTORS.xpathListingCard);
450 }
451
452 async `#getCard`(page, index) {
453 return page.evaluateHandle(({ xpath, idx }) => {
454 const result = document.evaluate(xpath, document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
455 return result.snapshotItem(idx) || null;
456 }, { xpath: SELECTORS.xpathListingCard, idx: index });
457 }
458
459 /**
460 * Click a card and extract the job ID from the resulting URL.
461 * LinkedIn updates the URL's currentJobId query param when a card is selected.
462 * Returns { clicked, jobId, listingUrl }.
463 */
464 async `#clickAndExtractJobId`(page, card) {
465 const clicked = await page.evaluate((c) => {
466 if (c) { c.click(); return true; }
467 return false;
468 }, card);
469 if (!clicked) return { clicked: false, jobId: '', listingUrl: '' };
470
471 await sleep(CARD_CLICK_DELAY_MS);
472 const url = page.url();
473 try {
474 const jobId = new URL(url).searchParams.get('currentJobId') || '';
475 const listingUrl = jobId ? `https://www.linkedin.com/jobs/view/${jobId}/` : '';
476 return { clicked: true, jobId, listingUrl };
477 } catch {
478 return { clicked: true, jobId: '', listingUrl: '' };
479 }
480 }
481
482 /**
483 * Extract all data for a single job card. Reads title/company/location from
484 * the card DOM (no click), then clicks the card to get the job ID and opens
485 * the detail panel to scrape the apply URL and JD text.
Script executed:
cat -n scan-auth/linkedin.mjs | sed -n '208,295p'Repository: santifer/career-ops
Repository: santifer/career-ops
Exit code: 0
stdout:
208 const card = await this.#getCard(page, i);
209 const cardExists = await page.evaluate(c => c != null, card);
210 if (!cardExists) {
211 warn(` ✗ Card ${i} not found`);
212 stats.errors++;
213 consecutiveFailures++;
214 await card.dispose();
215 continue;
216 }
217
218 // 1. Extract title, company, click card, extract detail
219 const data = await this.extractJob(page, card);
220
221 // 2. Check viewed label
222 if (skipViewed && await this.isJobCardViewed(page, card)) {
223 log(` ✗ Viewed: skipped card ${i}`);
224 stats.skipped_viewed++;
225 skipped.push({
226 url: data.listingUrl || '',
227 title: data.title || '',
228 company: data.company || '',
229 status: 'skipped_viewed_linkedin',
230 });
231 await card.dispose();
232 continue;
233 }
234 await card.dispose();
235
236
237 if (!data.title) {
238 warn(` ✗ No title on card ${i} (data); skipping`);
239 stats.errors++;
240 consecutiveFailures++;
241 continue;
242 }
243 // 3. Blocklist, dedup, and title filter
244 if (employerBlocklist.length && data.company) {
245 const companyLower = data.company.toLowerCase();
246 if (employerBlocklist.some(b => companyLower === b.toLowerCase())) {
247 log(` ✗ Blocked employer: ${data.company}`);
248 stats.skipped_filter++;
249 continue;
250 }
251 }
252
253 const companyTitleKey = (data.company && data.title)
254 ? `${data.company}::${data.title}`.toLowerCase() : null;
255 const dedupKey = data.jobId || companyTitleKey;
256 if ((dedupKey && scanHistory.has(dedupKey)) || (companyTitleKey && scanHistory.has(companyTitleKey))) {
257 log(` ✗ Already seen: ${data.title} (${data.company})`);
258 stats.skipped_dedup++;
259 skipped.push({
260 url: data.listingUrl || '',
261 title: data.title || '',
262 company: data.company || '',
263 status: 'skipped_dup',
264 });
265 continue;
266 }
267
268 if (!this.#matchesFilter(data.title, '', titleFilter)) {
269 log(` ✗ Filtered: ${data.title} (${data.company})`);
270 stats.skipped_filter++;
271 continue;
272 }
273
274 stats.extracted++;
275
276 if (!data.applicationUrl) {
277 log(` ✗ No apply URL: ${data.title} (${data.company})`);
278 stats.skipped_filter++;
279 continue;
280 }
281
282 if (!this.#matchesFilter(data.title, data.jdText || '', titleFilter)) {
283 log(` ✗ Filtered after JD: ${data.title} (${data.company})`);
284 stats.skipped_filter++;
285 continue;
286 }
287
288 if (!data.jdText) {
289 warn(` ✗ No JD content: ${data.title}`);
290 stats.errors++;
291 consecutiveFailures++;
292 continue;
293 }
294
295 consecutiveFailures = 0;
Check viewed status before extracting, and protect card disposal with try/finally.
Line 219 calls extractJob(), which clicks the card at line 558 and opens the detail panel. Only then, at line 222, does isJobCardViewed() check whether LinkedIn marked it as viewed—meaning viewed jobs are unnecessarily opened and scraped before being skipped. Additionally, if extractJob() throws, the card handle at line 208 is never disposed, leaking browser resources.
Move isJobCardViewed() before the click (since it only reads card DOM), create a no-click helper to extract title/company/location for skipped entries, and wrap per-card work in try/finally to ensure card.dispose() always runs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scan-auth/linkedin.mjs` around lines 208 - 234, Reorder and protect per-card
processing: after obtaining the handle from this.#getCard(page, i) call, call
this.isJobCardViewed(page, card) before invoking this.extractJob so you don't
click/open viewed listings; if skipViewed is true and the card is viewed, use a
new no-click helper (e.g., extractJobMetadataNoClick or similar) that reads
title/company/location/listingUrl from the card DOM without clicking, push the
skipped entry and stats, and skip extraction; wrap the whole per-card logic (the
sequence around this.#getCard, this.isJobCardViewed, this.extractJob and pushing
results) in a try/finally that always calls await card.dispose() in the finally
block so the handle is never leaked, and only call extractJob (which clicks)
when you know you will process the detail panel.
| #unwrapRedirect(href) { | ||
| const trimmed = (href || '').trim(); | ||
| if (!trimmed) return ''; | ||
| try { | ||
| const u = new URL(trimmed); | ||
| if (!u.hostname.includes('linkedin.com')) return trimmed; | ||
| if (!u.pathname.includes('/safety/go')) return trimmed; | ||
| const nested = u.searchParams.get('url'); | ||
| if (!nested) return trimmed; | ||
| const decoded = decodeURIComponent(nested); | ||
| new URL(decoded); | ||
| return decoded; |
There was a problem hiding this comment.
Tighten the trusted LinkedIn redirect host check.
hostname.includes('linkedin.com') trusts hosts like notlinkedin.com or linkedin.com.evil.tld and unwraps their /safety/go?url=... targets. Match exact LinkedIn hosts before decoding nested URLs. As per coding guidelines, **/*.mjs: Check for command injection, path traversal, and SSRF.
🛡️ Proposed host validation fix
const u = new URL(trimmed);
- if (!u.hostname.includes('linkedin.com')) return trimmed;
- if (!u.pathname.includes('/safety/go')) return trimmed;
+ const hostname = u.hostname.toLowerCase();
+ const isLinkedInHost = hostname === 'linkedin.com' || hostname.endsWith('.linkedin.com');
+ if (!isLinkedInHost) return trimmed;
+ if (u.pathname !== '/safety/go') return trimmed;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scan-auth/linkedin.mjs` around lines 375 - 386, The hostname check in
`#unwrapRedirect` is too permissive (uses hostname.includes('linkedin.com'))
allowing domains like notlinkedin.com; update it to only accept exact LinkedIn
hostnames (e.g., linkedin.com, www.linkedin.com, touch.www.linkedin.com,
www.linkedin.cn, or other canonical LinkedIn hosts used by your app) before
processing /safety/go and decoding the nested url; ensure the check happens on
u.hostname (or normalized hostname) and only then call u.searchParams.get('url')
and decodeURIComponent, keeping the rest of the existing logic in
`#unwrapRedirect` intact.
There was a problem hiding this comment.
This logic is heuristic-based and any potential issues would require malicious code being hosted on the LinkedIn job portal.
There was a problem hiding this comment.
@DSnoNintendo, that's a fair point. Since #unwrapRedirect only operates on URLs already present in LinkedIn's own DOM (sourced from the Apply on company website anchor), any abuse of the permissive .includes('linkedin.com') check would presuppose an attacker already controlling content served by LinkedIn's job portal. The decoded URL is also returned as a string rather than triggering an automated outbound fetch, which removes the server-side SSRF vector. I'll defer to your assessment here.
✏️ Learnings added
Learnt from: DSnoNintendo
Repo: santifer/career-ops PR: 379
File: scan-auth/linkedin.mjs:375-386
Timestamp: 2026-04-19T17:58:15.791Z
Learning: In `scan-auth/linkedin.mjs` (career-ops repo), the `#unwrapRedirect` method's `hostname.includes('linkedin.com')` check is intentionally heuristic. The URLs it processes are sourced exclusively from LinkedIn's own DOM (the `//a[aria-label='Apply on company website']` XPath selector), so exploiting the permissive check would require an attacker to already control content on LinkedIn's portal. The decoded result is stored as a string, not used for an automated server-side HTTP request, so there is no SSRF risk. Do not flag `#unwrapRedirect` for the permissive hostname check.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: DSnoNintendo
Repo: santifer/career-ops PR: 379
File: scan-auth/linkedin.mjs:464-475
Timestamp: 2026-04-19T17:56:07.061Z
Learning: In `scan-auth/linkedin.mjs` (career-ops repo), clicking a LinkedIn job card in the search results left rail triggers an **immediate/synchronous SPA navigation** that updates the `currentJobId` URL query parameter. A fixed `sleep(CARD_CLICK_DELAY_MS)` after the click is intentional — it is a rendering buffer for the detail panel, not a guard for the URL change. Do not flag `#clickAndExtractJobId` for using a fixed delay instead of `page.waitForFunction` for the URL/jobId change.
Learnt from: DSnoNintendo
Repo: santifer/career-ops PR: 379
File: scan-auth/linkedin.mjs:357-368
Timestamp: 2026-04-19T17:52:55.390Z
Learning: In `scan-auth/linkedin.mjs` (career-ops repo), LinkedIn job search filters (date posted, experience level) are applied via search query strings embedded in the keywords parameter, not via URL parameters like `f_TPR` or `f_E`. Do not flag the `#buildSearches` method for using query string text for filters — this is the current correct approach for LinkedIn job search URLs.
| async #clickAndExtractJobId(page, card) { | ||
| const clicked = await page.evaluate((c) => { | ||
| if (c) { c.click(); return true; } | ||
| return false; | ||
| }, card); | ||
| if (!clicked) return { clicked: false, jobId: '', listingUrl: '' }; | ||
|
|
||
| await sleep(CARD_CLICK_DELAY_MS); | ||
| const url = page.url(); | ||
| try { | ||
| const jobId = new URL(url).searchParams.get('currentJobId') || ''; | ||
| const listingUrl = jobId ? `https://www.linkedin.com/jobs/view/${jobId}/` : ''; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
head -n 480 scan-auth/linkedin.mjs | tail -n 30Repository: santifer/career-ops
Length of output: 1182
🏁 Script executed:
grep -n "CARD_CLICK_DELAY_MS\|NAV_TIMEOUT_MS" scan-auth/linkedin.mjs | head -20Repository: santifer/career-ops
Length of output: 492
🏁 Script executed:
grep -n "#clickAndExtractJobId" scan-auth/linkedin.mjsRepository: santifer/career-ops
Length of output: 279
🏁 Script executed:
grep -n "page.waitForFunction\|waitForFunction" scan-auth/linkedin.mjsRepository: santifer/career-ops
Length of output: 124
🏁 Script executed:
grep -rn "waitForFunction" scan-auth/Repository: santifer/career-ops
Length of output: 147
🏁 Script executed:
grep -n "sleep(" scan-auth/linkedin.mjs | head -20Repository: santifer/career-ops
Length of output: 379
🏁 Script executed:
sed -n '425,445n' scan-auth/linkedin.mjsRepository: santifer/career-ops
Length of output: 45
🏁 Script executed:
cat -n scan-auth/linkedin.mjs | sed -n '464,480p'Repository: santifer/career-ops
Length of output: 791
🏁 Script executed:
cat -n scan-auth/linkedin.mjs | sed -n '425,440p'Repository: santifer/career-ops
Length of output: 863
🏁 Script executed:
grep -B 5 -A 10 "page.waitForFunction" scan-auth/linkedin.mjsRepository: santifer/career-ops
Length of output: 819
🏁 Script executed:
cat -n scan-auth/linkedin.mjs | sed -n '1,75p'Repository: santifer/career-ops
Length of output: 3941
🏁 Script executed:
cat -n scan-auth/linkedin.mjs | sed -n '555,565p'Repository: santifer/career-ops
Length of output: 571
🏁 Script executed:
grep -n "import.*page\|Playwright\|version" scan-auth/linkedin.mjs | head -10Repository: santifer/career-ops
Length of output: 144
🏁 Script executed:
cat -n scan-auth/linkedin.mjs | sed -n '464,480p' && echo "---" && cat -n scan-auth/linkedin.mjs | sed -n '460,465p'Repository: santifer/career-ops
Length of output: 1128
🏁 Script executed:
grep -B 5 "currentJobId" scan-auth/linkedin.mjs | head -20Repository: santifer/career-ops
Length of output: 884
Replace fixed delay with change detection to reliably capture the updated job ID.
The URL's currentJobId parameter updates asynchronously after the card click. A fixed 1-second delay can race and read an empty or stale value, causing the scanner to extract job details from the wrong listing.
Use page.waitForFunction to wait for the parameter to actually change rather than relying on a fixed timeout:
Proposed fix
async `#clickAndExtractJobId`(page, card) {
+ const previousJobId = await page.evaluate(() => (
+ new URL(window.location.href).searchParams.get('currentJobId') || ''
+ )).catch(() => '');
+
const clicked = await page.evaluate((c) => {
if (c) { c.click(); return true; }
return false;
}, card);
if (!clicked) return { clicked: false, jobId: '', listingUrl: '' };
- await sleep(CARD_CLICK_DELAY_MS);
+ await page.waitForFunction((prev) => {
+ const current = new URL(window.location.href).searchParams.get('currentJobId') || '';
+ return current && current !== prev;
+ }, previousJobId, { timeout: NAV_TIMEOUT_MS }).catch(() => {});
+
const url = page.url();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scan-auth/linkedin.mjs` around lines 464 - 475, In `#clickAndExtractJobId`,
replace the fixed sleep(CARD_CLICK_DELAY_MS) after clicking the card with a
wait-for-change using page.waitForFunction: capture the prior URL or prior
currentJobId before the click, perform the click via page.evaluate as currently
done, then call page.waitForFunction to poll until new
URL.searchParams.get('currentJobId') !== priorJobId (or until page.url()
changes), with a sensible timeout; after waitForFunction resolves, read the
updated page.url() and extract currentJobId and listingUrl as before to reliably
get the updated job id instead of racing with a fixed delay.
There was a problem hiding this comment.
Although a page.waitForFunction could work here, clicking a job card instantly redirects a browser to the associated job.
There was a problem hiding this comment.
@DSnoNintendo, that's a good clarification — if the card click triggers an immediate SPA navigation that synchronously updates currentJobId in the URL, the waitForFunction polling loop would be redundant (it would resolve on the very first check anyway). The existing sleep(CARD_CLICK_DELAY_MS) makes sense as a rendering buffer for the detail panel to populate after the navigation, not as a guard on the URL change itself. Happy to withdraw this concern.
✏️ Learnings added
Learnt from: DSnoNintendo
Repo: santifer/career-ops PR: 379
File: scan-auth/linkedin.mjs:464-475
Timestamp: 2026-04-19T17:56:07.061Z
Learning: In `scan-auth/linkedin.mjs` (career-ops repo), clicking a LinkedIn job card in the search results left rail triggers an **immediate/synchronous SPA navigation** that updates the `currentJobId` URL query parameter. A fixed `sleep(CARD_CLICK_DELAY_MS)` after the click is intentional — it is a rendering buffer for the detail panel, not a guard for the URL change. Do not flag `#clickAndExtractJobId` for using a fixed delay instead of `page.waitForFunction` for the URL/jobId change.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: DSnoNintendo
Repo: santifer/career-ops PR: 379
File: scan-auth/linkedin.mjs:357-368
Timestamp: 2026-04-19T17:52:55.390Z
Learning: In `scan-auth/linkedin.mjs` (career-ops repo), LinkedIn job search filters (date posted, experience level) are applied via search query strings embedded in the keywords parameter, not via URL parameters like `f_TPR` or `f_E`. Do not flag the `#buildSearches` method for using query string text for filters — this is the current correct approach for LinkedIn job search URLs.
Learnt from: CR
Repo: santifer/career-ops PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-19T10:52:21.261Z
Learning: Applies to {check-liveness.mjs,liveness-core.mjs} : Verify offer liveness using Playwright by navigating to the URL and taking a snapshot; only footer/navbar without JD content indicates closed posting, while title + description + Apply indicates active
| // if no apply url or easy apply button is present, use the listing url | ||
| let applicationUrl = applyEl?.href?.trim() ?? `https://www.linkedin.com/jobs/view/${jobId}/`; | ||
|
|
There was a problem hiding this comment.
Do not synthesize a fallback URL when jobId is missing.
If currentJobId extraction fails and there is no external apply link, this returns https://www.linkedin.com/jobs/view//, which passes the non-empty applicationUrl check and gets saved. Return an empty URL unless a real jobId exists.
🐛 Proposed fallback fix
- // if no apply url or easy apply button is present, use the listing url
- let applicationUrl = applyEl?.href?.trim() ?? `https://www.linkedin.com/jobs/view/${jobId}/`;
+ // If no external apply URL is present, fall back only when a real job ID exists.
+ let applicationUrl = applyEl?.href?.trim() ?? '';
+ if (!applicationUrl && jobId) {
+ applicationUrl = `https://www.linkedin.com/jobs/view/${jobId}/`;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // if no apply url or easy apply button is present, use the listing url | |
| let applicationUrl = applyEl?.href?.trim() ?? `https://www.linkedin.com/jobs/view/${jobId}/`; | |
| // If no external apply URL is present, fall back only when a real job ID exists. | |
| let applicationUrl = applyEl?.href?.trim() ?? ''; | |
| if (!applicationUrl && jobId) { | |
| applicationUrl = `https://www.linkedin.com/jobs/view/${jobId}/`; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scan-auth/linkedin.mjs` around lines 624 - 626, The code currently
synthesizes a LinkedIn view URL even when jobId is missing; update the logic
around applicationUrl (the variable set from applyEl?.href?.trim() ??
`https://www.linkedin.com/jobs/view/${jobId}/`) to only construct the
`https://www.linkedin.com/jobs/view/${jobId}/` fallback when `jobId` (or
currentJobId) is truthy/non-empty; otherwise set applicationUrl to an empty
string (or null) so no fake URL is saved. Ensure you still trim applyEl.href if
present and prefer it over the view URL, but guard the template fallback with an
explicit check for jobId before building it.
There was a problem hiding this comment.
This logic only runs if an EasyApply button is available on the job page.
There was a problem hiding this comment.
@DSnoNintendo, thanks for the context — that's a good clarification. You're right that in normal operation the ?? fallback is only reached for Easy Apply jobs, where there's no //a[@aria-label='Apply on company website'] element, and by that point a card has been clicked and the detail panel is open, so a valid currentJobId should always be present in the URL.
The concern is purely defensive: #clickAndExtractJobId already returns jobId: '' as a safety catch (e.g., if a try/catch fires or LinkedIn's URL structure changes), and that '' value is forwarded directly into #extractDetailFromPanel. If both conditions occur simultaneously — applyEl is null and jobId is '' — the current code produces https://www.linkedin.com/jobs/view// and it would silently pass the non-empty URL check downstream.
The proposed guard would add no overhead in the common case, but would prevent a malformed URL from being saved in that edge scenario:
let applicationUrl = applyEl?.href?.trim() ?? '';
if (!applicationUrl && jobId) {
applicationUrl = `https://www.linkedin.com/jobs/view/${jobId}/`;
}That said, if you're confident the calling site always supplies a non-empty jobId before reaching this function, I understand leaving it as-is. Happy to defer to your judgment here.
🧠 Learnings used
Learnt from: DSnoNintendo
Repo: santifer/career-ops PR: 379
File: scan-auth/linkedin.mjs:357-368
Timestamp: 2026-04-19T17:52:55.390Z
Learning: In `scan-auth/linkedin.mjs` (career-ops repo), LinkedIn job search filters (date posted, experience level) are applied via search query strings embedded in the keywords parameter, not via URL parameters like `f_TPR` or `f_E`. Do not flag the `#buildSearches` method for using query string text for filters — this is the current correct approach for LinkedIn job search URLs.
…ead of '/career-ops pipeline' once scan job is complete
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
.claude/skills/career-ops/SKILL.md (1)
30-35:⚠️ Potential issue | 🟠 MajorKeep the router docs aligned with the portal-required CLI.
The router and menu still advertise bare
/career-ops scan-auth, butscan-auth.mjsexits unless a portal is provided. Either route barescan-authto an interactive/default portal flow, or document/career-ops scan-auth linkedinconsistently.Docs-only alignment
-| `scan-auth`, `scan-auth <portal>` | `scan-auth` | +| `scan-auth <portal>` | `scan-auth` | @@ - /career-ops scan-auth → Authenticated portal scan (LinkedIn etc.) + /career-ops scan-auth linkedin → Authenticated portal scan (LinkedIn)Also applies to: 63-63
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/skills/career-ops/SKILL.md around lines 30 - 35, The router/menu docs advertise a bare "scan-auth" command but scan-auth.mjs requires a portal argument; update routing or docs to be consistent: either change the router/menu entry for scan-auth to require a portal (e.g., show "scan-auth <portal>" and examples like "scan-auth linkedin") or implement a default/interactive flow in the scan-auth handler that routes bare "scan-auth" to prompt for or default to a portal (e.g., invoke the same logic as scan-auth.mjs with a default portal or interactive prompt). Locate references to "scan-auth", "scan-auth.mjs", and any router/menu entries and make the change so the CLI usage in the docs and router behavior match.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/skills/career-ops/SKILL.md:
- Around line 83-89: MD022 violations: add a single blank line after the
headings "Standalone modes (only their mode file):" and "Modes delegated to
subagent:" so each heading is followed by an empty line before the next
paragraph. Edit the SKILL.md content near those headings and insert one blank
line after each heading (and also ensure any subsequent headings like "Read
`modes/{mode}.md`" or the list starting with "For `scan`..." remain separated),
keeping the existing text and references to `modes/{mode}.md` and `_shared.md`
unchanged.
In `@scan-auth.mjs`:
- Around line 273-293: The saveJd function currently builds filenames from only
company and title which causes collisions; update saveJd to append a stable
unique suffix (e.g., the LinkedIn job ID from detail.jobId or detail.id or
detail.listingId) to the slugified filename so each listing produces a unique
file (fallback to a deterministic fallback like a timestamp only if no ID
exists), ensure the chosen ID field is sanitized the same way as the slug, and
keep returning the same jds/<filename> path; refer to saveJd, JDS_DIR, slug,
filename, filepath and detail.* to locate where to change the filename
generation and return value.
- Around line 75-97: Current parsing ignores unknown flags and extra positional
args which allows e.g. "node scan-auth.mjs linkedin --dry-run" to proceed;
change argument validation in the portalId/FLAGS logic to explicitly parse args
into two sets (recognized flags using FLAGS and positional args), reject any
unknown flag (error + process.exit(1)) and reject more than one positional
portal (error + process.exit(1)); update usage/error messages to include
supportedNames and mention invalid tokens; ensure FLAG.login is derived from the
parsed flags set and that SCANNERS[portalId] only runs after these validations
succeed.
- Around line 319-324: The "Next step" message printed after listing new results
is missing a connector making the sentence awkward; update the console.log that
prints the next-step instruction (the console.log after the loop that references
"/career-ops pipeline") to include a connector — e.g. change its text to "Next
step: run the /career-ops pipeline jobs saved to the pipeline." or "Next step:
run the /career-ops pipeline jobs that were saved to the pipeline." so the
command and the rest of the sentence read correctly.
---
Duplicate comments:
In @.claude/skills/career-ops/SKILL.md:
- Around line 30-35: The router/menu docs advertise a bare "scan-auth" command
but scan-auth.mjs requires a portal argument; update routing or docs to be
consistent: either change the router/menu entry for scan-auth to require a
portal (e.g., show "scan-auth <portal>" and examples like "scan-auth linkedin")
or implement a default/interactive flow in the scan-auth handler that routes
bare "scan-auth" to prompt for or default to a portal (e.g., invoke the same
logic as scan-auth.mjs with a default portal or interactive prompt). Locate
references to "scan-auth", "scan-auth.mjs", and any router/menu entries and make
the change so the CLI usage in the docs and router behavior match.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1899db7a-2d86-4b7c-b192-2c00ed7d642a
📒 Files selected for processing (2)
.claude/skills/career-ops/SKILL.mdscan-auth.mjs
| ### Standalone modes (only their mode file): | ||
| Read `modes/{mode}.md` | ||
|
|
||
| Applies to: `tracker`, `deep`, `training`, `project`, `patterns`, `followup` | ||
|
|
||
| ### Modes delegated to subagent: | ||
| For `scan`, `apply` (with Playwright), and `pipeline` (3+ URLs): launch as Agent with the content of `_shared.md` + `modes/{mode}.md` injected into the subagent prompt. | ||
| For `scan`, `scan-auth`, `apply` (with Playwright), and `pipeline` (3+ URLs): launch as Agent with the content of `_shared.md` + `modes/{mode}.md` injected into the subagent prompt. |
There was a problem hiding this comment.
Restore markdownlint-required blank lines after headings.
MD022 expects a blank line below these headings; the current text starts immediately after each heading.
Suggested markdownlint fix
### Standalone modes (only their mode file):
+
Read `modes/{mode}.md`
@@
### Modes delegated to subagent:
+
For `scan`, `scan-auth`, `apply` (with Playwright), and `pipeline` (3+ URLs): launch as Agent with the content of `_shared.md` + `modes/{mode}.md` injected into the subagent prompt.🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 83-83: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 88-88: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/skills/career-ops/SKILL.md around lines 83 - 89, MD022 violations:
add a single blank line after the headings "Standalone modes (only their mode
file):" and "Modes delegated to subagent:" so each heading is followed by an
empty line before the next paragraph. Edit the SKILL.md content near those
headings and insert one blank line after each heading (and also ensure any
subsequent headings like "Read `modes/{mode}.md`" or the list starting with "For
`scan`..." remain separated), keeping the existing text and references to
`modes/{mode}.md` and `_shared.md` unchanged.
| const FLAGS = new Set(['--login']); | ||
| const portalId = (() => { | ||
| for (let i = 0; i < args.length; i++) { | ||
| if (FLAGS.has(args[i])) continue; | ||
| return args[i]; | ||
| } | ||
| return null; | ||
| })(); | ||
|
|
||
| if (!portalId) { | ||
| console.error(`Usage: node scan-auth.mjs [options] <portal>\n\nSupported portals: ${supportedNames.join(', ')}`); | ||
| process.exit(1); | ||
| } | ||
| if (!SCANNERS[portalId]) { | ||
| console.error(`Unknown portal: "${portalId}"\nSupported portals: ${supportedNames.join(', ')}`); | ||
| process.exit(1); | ||
| } | ||
|
|
||
| const scanner = SCANNERS[portalId]; | ||
|
|
||
| const FLAG = { | ||
| login: args.includes('--login'), | ||
| }; |
There was a problem hiding this comment.
Reject unsupported flags and extra positional args before scanning.
node scan-auth.mjs linkedin --dry-run currently scans and writes files because unknown args after the portal are ignored. Fail closed on unknown flags and multiple portals so typos or stale flags cannot trigger a mutating run.
Suggested CLI validation
const FLAGS = new Set(['--login']);
+const unknownFlags = args.filter(arg => arg.startsWith('--') && !FLAGS.has(arg));
+if (unknownFlags.length > 0) {
+ console.error(`Unknown option(s): ${unknownFlags.join(', ')}\n\nSupported options: ${[...FLAGS].join(', ')}`);
+ process.exit(1);
+}
+
+const positionalArgs = args.filter(arg => !arg.startsWith('--'));
+if (positionalArgs.length > 1) {
+ console.error(`Expected one portal, got: ${positionalArgs.join(', ')}\nSupported portals: ${supportedNames.join(', ')}`);
+ process.exit(1);
+}
+
const portalId = (() => {
for (let i = 0; i < args.length; i++) {
if (FLAGS.has(args[i])) continue;
return args[i];
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scan-auth.mjs` around lines 75 - 97, Current parsing ignores unknown flags
and extra positional args which allows e.g. "node scan-auth.mjs linkedin
--dry-run" to proceed; change argument validation in the portalId/FLAGS logic to
explicitly parse args into two sets (recognized flags using FLAGS and positional
args), reject any unknown flag (error + process.exit(1)) and reject more than
one positional portal (error + process.exit(1)); update usage/error messages to
include supportedNames and mention invalid tokens; ensure FLAG.login is derived
from the parsed flags set and that SCANNERS[portalId] only runs after these
validations succeed.
| function saveJd(detail) { | ||
| mkdirSync(JDS_DIR, { recursive: true }); | ||
| const slug = slugify(`${detail.company}-${detail.title}`); | ||
| const filename = `${slug}.md`; | ||
| const filepath = join(JDS_DIR, filename); | ||
|
|
||
| const content = `--- | ||
| title: ${yamlEscape(detail.title)} | ||
| company: ${yamlEscape(detail.company)} | ||
| application_url: ${yamlEscape(detail.applicationUrl || '')} | ||
| scraped: "${new Date().toISOString().split('T')[0]}" | ||
| source: ${portalId} | ||
| --- | ||
|
|
||
| # ${detail.title} — ${detail.company} | ||
|
|
||
| ${detail.jdText} | ||
| `; | ||
|
|
||
| writeFileSync(filepath, content, 'utf-8'); | ||
| return `jds/${filename}`; |
There was a problem hiding this comment.
Prevent JD filename collisions for same company/title listings.
saveJd() uses only company-title for the filename, so two accepted LinkedIn jobs with the same company and title overwrite the same jds/*.md file and both pipeline entries can point to the final write. Include a stable unique suffix such as the LinkedIn job ID.
Suggested collision-resistant filename
function saveJd(detail) {
mkdirSync(JDS_DIR, { recursive: true });
- const slug = slugify(`${detail.company}-${detail.title}`);
+ const jobId = extractJobIdFromUrl(detail.url);
+ const slug = slugify(`${detail.company}-${detail.title}${jobId ? `-${jobId}` : ''}`);
const filename = `${slug}.md`;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scan-auth.mjs` around lines 273 - 293, The saveJd function currently builds
filenames from only company and title which causes collisions; update saveJd to
append a stable unique suffix (e.g., the LinkedIn job ID from detail.jobId or
detail.id or detail.listingId) to the slugified filename so each listing
produces a unique file (fallback to a deterministic fallback like a timestamp
only if no ID exists), ensure the chosen ID field is sanitized the same way as
the slug, and keep returning the same jds/<filename> path; refer to saveJd,
JDS_DIR, slug, filename, filepath and detail.* to locate where to change the
filename generation and return value.
| if (results.listings.length > 0) { | ||
| console.log('\nNew listings:'); | ||
| for (const l of results.listings) { | ||
| console.log(` • ${l.title} — ${l.company}`); | ||
| } | ||
| console.log(`\nNext step: run /career-ops pipeline jobs saved to the pipeline.`); |
There was a problem hiding this comment.
Clarify the next-step message.
Line 324 is missing the connector after the command, making the instruction awkward.
Suggested wording
- console.log(`\nNext step: run /career-ops pipeline jobs saved to the pipeline.`);
+ console.log('\nNext step: run /career-ops pipeline to process jobs saved to the pipeline.');🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scan-auth.mjs` around lines 319 - 324, The "Next step" message printed after
listing new results is missing a connector making the sentence awkward; update
the console.log that prints the next-step instruction (the console.log after the
loop that references "/career-ops pipeline") to include a connector — e.g.
change its text to "Next step: run the /career-ops pipeline jobs saved to the
pipeline." or "Next step: run the /career-ops pipeline jobs that were saved to
the pipeline." so the command and the rest of the sentence read correctly.
Authenticated Portal Scanning with LinkedIn Support
Harness for scanning job portals that require authentication & LinkedIn job portal scraper
Summary
scan-auth.mjsharness andscan-auth/linkedin.mjsscanner for portals that require login (starting with LinkedIn)jds/, appends entries todata/pipeline.mdanddata/scan-history.tsvdirectly — no intermediate files or post-processing neededscan.mjs) are automatically skippedscan-auth.mjsis a standalone Node.js CLI that runs end-to-end without any LLM involvement. Amodes/scan-auth.mdmode is included so agents (Claude Code, OpenCode, Gemini CLI) can orchestrate it via/career-ops scan-auth, but the scanner itself requires no AI to operate.How it works
The scanner opens LinkedIn search results for each keyword configured in
portals.yml, then processes cards one at a time:/safety/goredirect on apply URLs; falls back to the listing URL for Easy Apply jobsportals.yml, skips cards LinkedIn marks as "Viewed" and records them in scan historyUse with career-ops agent
The agent runs
node scan-auth.mjs linkedin, waits for it to complete, and reports the summary. The scanner handles everything — JD files, pipeline entries, scan history — so the agent doesn't need to do any post-processing. New listings appear indata/pipeline.mdready for/career-ops pipelineto evaluate them.If the scanner reports "Not logged in", the agent instructs the user to run
node scan-auth.mjs --login linkedinin a separate terminal (the login flow requires an interactive browser window).Interaction with existing codebase
scan.mjs(API scanner): No changes. Both scanners sharedata/scan-history.tsvfor dedup. The authenticated scanner loadscompany::titlekeys from all portals so a job found via Greenhouse won't be re-accepted from LinkedIn.portals.yml: Adds alinkedin_searchessection (keywords, date filter, experience level, delays, employer blocklist). Existingtitle_filter,tracked_companies, andsearch_queriessections are unchanged.data/pipeline.md: Accepted listings are appended as- [ ] local:jds/{file} | {company} | {title}under the first##section. Language-agnostic — works regardless of section heading language.data/scan-history.tsv: Both accepted and skipped entries are recorded (statuses:added,skipped_dup,skipped_viewed_linkedin).jds/: JD files are saved with YAML frontmatter (title,company,application_url,scraped,source).New files
scan-auth.mjsscan-auth/linkedin.mjsmodes/scan-auth.md/career-ops scan-authUsage
Test plan
node scan-auth.mjs— prints usage and exits non-zeronode scan-auth.mjs --login linkedin— opens browser, saves session, exits without scanningnode scan-auth.mjs linkedin— full scan, writes JDs + pipeline + scan history, prints summaryscan-history.tsv, confirm LinkedIn scanner skips itRelated issue
Main issue: #238
Umbrella issue: #230
Type of change
Checklist
node test-all.mjsand all tests passnote: browser-profiles are saved to overwritten to
~/.scan-auth/<portal>/profile/byscan-auth.mjsSummary by CodeRabbit
New Features
/career-ops scan-auth) for LinkedIn job discoveryscan:authnpm script for streamlined executionDocumentation