feat: authenticated portal scanning w/ LinkedIn support by DSnoNintendo · Pull Request #379 · santifer/career-ops

DSnoNintendo · 2026-04-19T17:38:17Z

Authenticated Portal Scanning with LinkedIn Support

Harness for scanning job portals that require authentication & LinkedIn job portal scraper

Summary

Adds scan-auth.mjs harness and scan-auth/linkedin.mjs scanner for portals that require login (starting with LinkedIn)
Scans LinkedIn job search results using Playwright with a persistent browser profile — login once, scan repeatedly
Writes accepted JDs to jds/, appends entries to data/pipeline.md and data/scan-history.tsv directly — no intermediate files or post-processing needed
Deduplicates across portals: jobs already found via the existing Greenhouse/Ashby/Lever API scanner (scan.mjs) are automatically skipped
The entire workflow is agent-agnostic — scan-auth.mjs is a standalone Node.js CLI that runs end-to-end without any LLM involvement. A modes/scan-auth.md mode is included so agents (Claude Code, OpenCode, Gemini CLI) can orchestrate it via /career-ops scan-auth, but the scanner itself requires no AI to operate.

How it works

The scanner opens LinkedIn search results for each keyword configured in portals.yml, then processes cards one at a time:

Reads title and company from the card DOM (no click needed)
Clicks the card to get the job ID from the URL and open the detail panel
Expands the JD, scrapes the apply link and full description
Runs employer blocklist (case-insensitive exact match), dedup, and title filter checks
Unwraps LinkedIn's /safety/go redirect on apply URLs; falls back to the listing URL for Easy Apply jobs
If specified in portals.yml, skips cards LinkedIn marks as "Viewed" and records them in scan history

Use with career-ops agent

/career-ops scan-auth linkedin

The agent runs node scan-auth.mjs linkedin, waits for it to complete, and reports the summary. The scanner handles everything — JD files, pipeline entries, scan history — so the agent doesn't need to do any post-processing. New listings appear in data/pipeline.md ready for /career-ops pipeline to evaluate them.

If the scanner reports "Not logged in", the agent instructs the user to run node scan-auth.mjs --login linkedin in a separate terminal (the login flow requires an interactive browser window).

Interaction with existing codebase

scan.mjs (API scanner): No changes. Both scanners share data/scan-history.tsv for dedup. The authenticated scanner loads company::title keys from all portals so a job found via Greenhouse won't be re-accepted from LinkedIn.
portals.yml: Adds a linkedin_searches section (keywords, date filter, experience level, delays, employer blocklist). Existing title_filter, tracked_companies, and search_queries sections are unchanged.
data/pipeline.md: Accepted listings are appended as - [ ] local:jds/{file} | {company} | {title} under the first ## section. Language-agnostic — works regardless of section heading language.
data/scan-history.tsv: Both accepted and skipped entries are recorded (statuses: added, skipped_dup, skipped_viewed_linkedin).
jds/: JD files are saved with YAML frontmatter (title, company, application_url, scraped, source).

New files

File	Purpose
`scan-auth.mjs`	CLI harness: browser launch, login flow, scan history, JD/pipeline/TSV writing, summary output
`scan-auth/linkedin.mjs`	LinkedIn scanner class: config parsing, session checks, card extraction, pagination, filtering
`modes/scan-auth.md`	Agent mode instructions for `/career-ops scan-auth`

Usage

# One-time login (opens browser, you log in manually)
node scan-auth.mjs --login linkedin

# Run a scan
node scan-auth.mjs linkedin

# Or through the career-ops agent
/career-ops scan-auth linkedin

Test plan

node scan-auth.mjs — prints usage and exits non-zero
node scan-auth.mjs --login linkedin — opens browser, saves session, exits without scanning
node scan-auth.mjs linkedin — full scan, writes JDs + pipeline + scan history, prints summary
Cross-portal dedup: add a job to scan-history.tsv, confirm LinkedIn scanner skips it

Related issue

Main issue: #238
Umbrella issue: #230

Type of change

Bug fix
New feature
Documentation / translation
Refactor (no behavior change)

Checklist

I have read CONTRIBUTING.md
I linked a related issue above (required for features and architecture changes)
My PR does not include personal data (CV, email, real names)
I ran node test-all.mjs and all tests pass
My changes respect the Data Contract (no modifications to user-layer files)
note: browser-profiles are saved to overwritten to ~/.scan-auth/<portal>/profile/ by scan-auth.mjs
My changes align with the project roadmap

Summary by CodeRabbit

New Features
- Added authenticated portal scanning mode (/career-ops scan-auth) for LinkedIn job discovery
- LinkedIn scanner supports keyword searches, experience-level filtering, employer blocklists, and automatic job deduplication
- Extracts job details and application URLs from logged-in sessions using persistent browser profiles
- Added scan:auth npm script for streamlined execution
Documentation
- Added setup and usage guides for authenticated LinkedIn scanning
- Documented LinkedIn configuration options (keywords, date filters, max results, delays)

…portal dedup

coderabbitai · 2026-04-19T17:39:13Z

📝 Walkthrough

Walkthrough

This PR adds a new authenticated scanning mode "scan-auth": a CLI entrypoint (scan-auth.mjs), a LinkedIn scanner module (scan-auth/linkedin.mjs), config/examples, docs, router/skill updates to recognize the mode, and package/script entries for invoking authenticated Playwright-based scans using persistent browser profiles.

Changes

Cohort / File(s)	Summary
Skill & Router Docs `.claude/skills/career-ops/SKILL.md`	Added `scan-auth` to CLI mode hints, router table, subagent/Agent injection rules, and discovery menu.
Top-level Documentation `README.md`, `CLAUDE.md`, `modes/scan-auth.md`	New/expanded docs describing `scan-auth` mode, LinkedIn scanner workflow, login flow, config, outputs, and files (`scan-auth.mjs`, `scan-auth/`), plus mode documentation page.
Data Contract & Templates `DATA_CONTRACT.md`, `templates/portals.example.yml`	Declared `~/.scan-auth/<portal>/profile/` in User Layer; added `modes/scan-auth.md` and `scan-auth/*.mjs` to System Layer; added `linkedin_searches` example config block.
CLI & Orchestrator `scan-auth.mjs`	New Node CLI (Playwright persistent contexts) handling `--login`, session checks, cross-portal dedup via `data/scan-history.tsv`, writing `jds/` markdown, appending pipeline entries and scan-history, and printing run summary.
LinkedIn Scanner `scan-auth/linkedin.mjs`	New LinkedInScanner class: config parsing, login/session checks, search URL building, pagination, card extraction, click-to-get job ID, JD/apply URL extraction, filters (viewed, employer blocklist, title/keyword), deduplication, stats, error handling, and redirect unwrapping.
Package Scripts `package.json`	Added `scan:auth` npm script → `node scan-auth.mjs`.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as scan-auth.mjs
    participant Browser as Playwright<br/>Browser Context
    participant Scanner as LinkedInScanner
    participant FS as File System

    User->>CLI: node scan-auth.mjs linkedin
    CLI->>Browser: Launch persistent profile context
    CLI->>Browser: Inject init script / hide navigator.webdriver
    CLI->>Browser: Check session (scanner.checkSession)
    Browser-->>CLI: session OK / not logged in

    alt --login flow
        User->>CLI: node scan-auth.mjs --login linkedin
        CLI->>Browser: Navigate to loginUrl (headful)
        CLI->>User: Prompt: press ENTER when logged in
        User->>Browser: Complete interactive login
        User->>CLI: Press ENTER
        CLI->>Browser: Re-check session
    end

    CLI->>FS: Load data/scan-history.tsv → dedup keys
    CLI->>Scanner: scan(context, config, { scanHistory })
    loop per keyword/search
        Scanner->>Browser: Navigate to search results page
        Scanner->>Browser: Extract job cards
        loop per visible card
            Scanner->>Browser: Click card → capture job ID / canonical URL
            Scanner->>Browser: Extract JD, apply URL, metadata
            alt passes filters & new
                Scanner->>CLI: add to listings
            else filtered/duplicate
                Scanner->>CLI: add to skipped
            end
        end
    end

    CLI->>FS: Write jds/{slug}.md (YAML frontmatter)
    CLI->>FS: Append entries to data/pipeline.md
    CLI->>FS: Append accepted & skipped rows to data/scan-history.tsv
    CLI->>User: Print summary (searched, found, saved, skipped, errors)
    CLI->>Browser: Close context

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 43.48% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and specifically describes the main change: introducing authenticated portal scanning with LinkedIn support, which is the core feature across all modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

README.md (1)

298-304: ⚠️ Potential issue | 🟡 Minor

Update the mode count after adding scan-auth.

The tree now includes the new scan-auth mode, so # 14 skill modes is stale.

Proposed docs fix

-├── modes/                       # 14 skill modes
+├── modes/                       # 15 skill modes

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 298 - 304, Update the header comment that lists the
number of mode files: change the line containing "# 14 skill modes" to reflect
the new total after adding scan-auth.md (increment the count to include
scan-auth.md); ensure the comment above the modes/ tree matches the actual files
(e.g., "# 15 skill modes") so the README's modes/ section and the added
scan-auth.md stay in sync.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/career-ops/SKILL.md:
- Line 35: Update the docs to match the actual CLI behavior: in SKILL.md remove
references to unsupported flags `--dry-run`, `--search`, and `--max` from the
"scan-auth mode" description, leaving only the real supported `--login` flag, or
alternatively implement these flags in `scan-auth.mjs` (functions/arg parsing
tied to `scan-auth.mjs` and portal-specific `scan-auth/<portal>.mjs`) before
documenting them; ensure the README text and any router docs consistently
reflect whichever choice you make so agents won't rely on non-functional flags.

In `@CLAUDE.md`:
- Line 67: The docs currently list scan-auth.mjs in the main files but omit it
from the agent-facing skill-mode/command routing sections; update CLAUDE.md by
adding `scan-auth` (and a brief descriptor like "Authenticated portal scanner —
scan-auth.mjs") alongside every occurrence where `scan` is mentioned in the
skill-mode and command routing examples so agents can discover the authenticated
scanner; ensure you reference the main file name `scan-auth.mjs` and the related
folder `scan-auth/` in those routing lists and examples (same places that
mention `scan`) for consistency.

In `@modes/scan-auth.md`:
- Around line 9-12: Add blank lines before and after the fenced code block that
shows how to run the login command and set the fence language for the summary
output block; specifically, update the node login block around "node
scan-auth.mjs --login <portal>" to have an empty line before the opening ```bash
and after the closing ``` and change the summary output fence (the block that
starts with "{Portal} Scan — {YYYY-MM-DD}" in the same file) to use a language
tag such as text (i.e., ```text) instead of a bare fence; apply the same
blank-line-and-language fixes to the other similar fenced block around lines
39-53 to satisfy markdownlint.

In `@scan-auth.mjs`:
- Around line 84-87: The CLI currently exits when portalId is missing; change
this to the documented behavior by replacing the process.exit(1) path: when
portalId is falsy, iterate over supportedNames and invoke the existing scan
routine for each portal (the function in this file that handles a single portal
scan—refer to the scanner/handler used elsewhere in scan-auth.mjs) instead of
printing usage and exiting; retain the usage/exit behavior only for
invalid/unrecognized portal values and preserve the usage message when a
provided portal is not in supportedNames.
- Around line 319-325: The summary message when new listings are detected prints
an incorrect next-step command; in the block that checks results.listings (the
code using results.listings, portalId and the console.log that prints
`/career-ops ${portalId}`), change the guidance to instruct users to run
`/career-ops pipeline` instead of `/career-ops ${portalId}` so the next step
points to the routed pipeline mode.
- Around line 195-198: The appendToPipeline function currently returns silently
if PIPELINE_PATH doesn't exist; update it to handle a missing data/ directory by
either creating the parent directory and the pipeline.md file before writing or
by returning a write count / throwing a warning/error so the caller can detect
failure; specifically modify appendToPipeline (and its use sites) to check
existsSync(PIPELINE_PATH), and if missing create the directory
(fs.mkdirSync(path.dirname(PIPELINE_PATH), { recursive: true })) and an empty
PIPELINE_PATH file, or alternatively return 0 and log a warning via the same
logger so the caller reading the write count can report failure instead of
reporting success. Ensure the change touches appendToPipeline, PIPELINE_PATH
checks, and the code path that reports success so consumers know when pipeline
insertion was impossible.

In `@scan-auth/linkedin.mjs`:
- Around line 624-626: The code currently synthesizes a LinkedIn view URL even
when jobId is missing; update the logic around applicationUrl (the variable set
from applyEl?.href?.trim() ?? `https://www.linkedin.com/jobs/view/${jobId}/`) to
only construct the `https://www.linkedin.com/jobs/view/${jobId}/` fallback when
`jobId` (or currentJobId) is truthy/non-empty; otherwise set applicationUrl to
an empty string (or null) so no fake URL is saved. Ensure you still trim
applyEl.href if present and prefer it over the view URL, but guard the template
fallback with an explicit check for jobId before building it.
- Around line 375-386: The hostname check in `#unwrapRedirect` is too permissive
(uses hostname.includes('linkedin.com')) allowing domains like notlinkedin.com;
update it to only accept exact LinkedIn hostnames (e.g., linkedin.com,
www.linkedin.com, touch.www.linkedin.com, www.linkedin.cn, or other canonical
LinkedIn hosts used by your app) before processing /safety/go and decoding the
nested url; ensure the check happens on u.hostname (or normalized hostname) and
only then call u.searchParams.get('url') and decodeURIComponent, keeping the
rest of the existing logic in `#unwrapRedirect` intact.
- Around line 357-368: The current `#buildSearches` method appends human-readable
date and experience text into the keywords string (levelPrefix and dateSuffix)
which LinkedIn ignores; instead map config.date_posted to f_TPR values (e.g.,
'24' -> 'r86400', 'Week' -> 'r604800', 'Month' -> 'r2592000') and map
config.experience_level entries to LinkedIn numeric f_E codes (1..6) joined by
commas, then build URLSearchParams with keywords set to the plain kw and add
f_TPR and f_E parameters when present; update the code that constructs params
(replace the levelPrefix/dateSuffix logic and the params = new URLSearchParams({
keywords: query })) to include params.append('f_TPR', mappedValue) and
params.append('f_E', mappedCsv) so filters are sent as real LinkedIn filter
parameters.
- Around line 208-234: Reorder and protect per-card processing: after obtaining
the handle from this.#getCard(page, i) call, call this.isJobCardViewed(page,
card) before invoking this.extractJob so you don't click/open viewed listings;
if skipViewed is true and the card is viewed, use a new no-click helper (e.g.,
extractJobMetadataNoClick or similar) that reads
title/company/location/listingUrl from the card DOM without clicking, push the
skipped entry and stats, and skip extraction; wrap the whole per-card logic (the
sequence around this.#getCard, this.isJobCardViewed, this.extractJob and pushing
results) in a try/finally that always calls await card.dispose() in the finally
block so the handle is never leaked, and only call extractJob (which clicks)
when you know you will process the detail panel.
- Around line 464-475: In `#clickAndExtractJobId`, replace the fixed
sleep(CARD_CLICK_DELAY_MS) after clicking the card with a wait-for-change using
page.waitForFunction: capture the prior URL or prior currentJobId before the
click, perform the click via page.evaluate as currently done, then call
page.waitForFunction to poll until new URL.searchParams.get('currentJobId') !==
priorJobId (or until page.url() changes), with a sensible timeout; after
waitForFunction resolves, read the updated page.url() and extract currentJobId
and listingUrl as before to reliably get the updated job id instead of racing
with a fixed delay.

---

Outside diff comments:
In `@README.md`:
- Around line 298-304: Update the header comment that lists the number of mode
files: change the line containing "# 14 skill modes" to reflect the new total
after adding scan-auth.md (increment the count to include scan-auth.md); ensure
the comment above the modes/ tree matches the actual files (e.g., "# 15 skill
modes") so the README's modes/ section and the added scan-auth.md stay in sync.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1f4f2805-1431-455a-a8d0-df31ca12de79

📥 Commits

Reviewing files that changed from the base of the PR and between 411afb3 and a60cf34.

📒 Files selected for processing (9)

.claude/skills/career-ops/SKILL.md
CLAUDE.md
DATA_CONTRACT.md
README.md
modes/scan-auth.md
package.json
scan-auth.mjs
scan-auth/linkedin.mjs
templates/portals.example.yml

coderabbitai · 2026-04-19T17:44:52Z

+function appendToPipeline(listings) {
+  if (listings.length === 0) return;
+  if (!existsSync(PIPELINE_PATH)) return;
+


⚠️ Potential issue | 🟠 Major

Do not silently drop pipeline entries when data/pipeline.md is missing.

Right now JDs and scan history can be written while pipeline.md is skipped, but Line 415 still reports success. Create the data/ file, or return a write count and warn/fail when pipeline insertion is impossible. As per coding guidelines, **/*.mjs scripts should handle missing data/ directories gracefully.

One possible fix direction

function appendToPipeline(listings) { - if (listings.length === 0) return; - if (!existsSync(PIPELINE_PATH)) return; + if (listings.length === 0) return 0; + mkdirSync(dirname(PIPELINE_PATH), { recursive: true }); + if (!existsSync(PIPELINE_PATH)) { + writeFileSync(PIPELINE_PATH, '# Pipeline\n\n## Pending\n', 'utf-8'); + } let text = readFileSync(PIPELINE_PATH, 'utf-8'); @@ - if (firstH2 === -1) return; + if (firstH2 === -1) { + warn(`Could not find a pending section in ${PIPELINE_PATH}; pipeline not updated`); + return 0; + } @@ writeFileSync(PIPELINE_PATH, text, 'utf-8'); + return listings.length; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

function appendToPipeline(listings) {

if (listings.length === 0) return;

if (!existsSync(PIPELINE_PATH)) return;

function appendToPipeline(listings) {

if (listings.length === 0) return 0;

mkdirSync(dirname(PIPELINE_PATH), { recursive: true });

if (!existsSync(PIPELINE_PATH)) {

writeFileSync(PIPELINE_PATH, '# Pipeline\n\n## Pending\n', 'utf-8');

}

let text = readFileSync(PIPELINE_PATH, 'utf-8');

if (firstH2 === -1) {

warn(`Could not find a pending section in ${PIPELINE_PATH}; pipeline not updated`);

return 0;

}

writeFileSync(PIPELINE_PATH, text, 'utf-8');

return listings.length;

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scan-auth.mjs` around lines 195 - 198, The appendToPipeline function currently returns silently if PIPELINE_PATH doesn't exist; update it to handle a missing data/ directory by either creating the parent directory and the pipeline.md file before writing or by returning a write count / throwing a warning/error so the caller can detect failure; specifically modify appendToPipeline (and its use sites) to check existsSync(PIPELINE_PATH), and if missing create the directory (fs.mkdirSync(path.dirname(PIPELINE_PATH), { recursive: true })) and an empty PIPELINE_PATH file, or alternatively return 0 and log a warning via the same logger so the caller reading the write count can report failure instead of reporting success. Ensure the change touches appendToPipeline, PIPELINE_PATH checks, and the code path that reports success so consumers know when pipeline insertion was impossible.

@DSnoNintendo, understood — the silent no-op when pipeline.md is absent is intentional and consistent with scan.mjs's existing behavior. I'll note this for future reviews.

✏️ Learnings added

Learnt from: DSnoNintendo Repo: santifer/career-ops PR: 379 File: scan-auth.mjs:195-198 Timestamp: 2026-04-19T17:51:06.950Z Learning: In `scan-auth.mjs` (and consistent with `scan.mjs`), `appendToPipeline` intentionally returns silently (no error/warning) when `data/pipeline.md` does not exist. This is a deliberate design choice to mirror the existing `scan.mjs` behavior — do not flag it as a missing-data-directory handling issue.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: CR Repo: santifer/career-ops PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-04-19T10:52:21.261Z Learning: Applies to data/applications.md : Run `node dedup-tracker.mjs` to remove duplicate entries in the applications.md tracker based on company+role matching

Learnt from: CR Repo: santifer/career-ops PR: 0 File: GEMINI.md:0-0 Timestamp: 2026-04-19T10:52:39.223Z Learning: Run 'node verify-pipeline.mjs' to check data pipeline health and integrity

Learnt from: CR Repo: santifer/career-ops PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-04-19T10:52:21.261Z Learning: Run `node verify-pipeline.mjs` to health-check the tracker and reports pipeline

Learnt from: CR Repo: santifer/career-ops PR: 0 File: GEMINI.md:0-0 Timestamp: 2026-04-19T10:52:39.223Z Learning: Applies to **/*.{mjs,js} : Script files like update-system.mjs, verify-pipeline.mjs, generate-pdf.mjs, merge-tracker.mjs, and gemini-eval.mjs are auto-updatable system files — NEVER put user data in these files

Learnt from: CR Repo: santifer/career-ops PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-04-19T10:52:21.261Z Learning: Applies to {*.mjs,templates/**,config/**} : All scripts must be Node.js .mjs modules; use Playwright for PDF generation and web scraping, YAML for configuration, and HTML/CSS for CV templates

Learnt from: CR Repo: santifer/career-ops PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-04-19T10:52:21.261Z Learning: GitHub Actions CI/CD runs test-all.mjs (63+ checks) on every PR, applies auto-labels based on risk (🔴 core-architecture, ⚠️ agent-behavior, 📄 docs), and enforces status checks before merge to main

coderabbitai · 2026-04-19T17:44:53Z

+            const card = await this.#getCard(page, i);
+            const cardExists = await page.evaluate(c => c != null, card);
+            if (!cardExists) {
+              warn(`  ✗ Card ${i} not found`);
+              stats.errors++;
+              consecutiveFailures++;
+              await card.dispose();
+              continue;
+            }
+
+            // 1. Extract title, company, click card, extract detail
+            const data = await this.extractJob(page, card);
+
+            // 2. Check viewed label
+            if (skipViewed && await this.isJobCardViewed(page, card)) {
+              log(`  ✗ Viewed: skipped card ${i}`);
+              stats.skipped_viewed++;
+              skipped.push({
+                url: data.listingUrl || '',
+                title: data.title || '',
+                company: data.company || '',
+                status: 'skipped_viewed_linkedin',
+              });
+              await card.dispose();
+              continue;
+            }
+            await card.dispose();


⚠️ Potential issue | 🟠 Major

❓ Verification inconclusive

Script executed:

wc -l scan-auth/linkedin.mjs

Repository: santifer/career-ops

Repository: santifer/career-ops
Exit code: 0

stdout:

636 scan-auth/linkedin.mjs

Script executed:

cat -n scan-auth/linkedin.mjs | head -250

Repository: santifer/career-ops

Repository: santifer/career-ops
Exit code: 0

stdout:

1 /** 2 * LinkedIn Scanner 3 * 4 * All LinkedIn-specific logic: selectors, config parsing, session checks, 5 * pagination, card extraction, search URL construction, and the scan loop. 6 * 7 * Per-card flow (scan loop): 8 * 1. Resolve card element once via `#getCard` (evaluateHandle) 9 * 2. extractJob() reads title/company/location from the card DOM, 10 * then clicks the card via `#clickAndExtractJobId` to get the job ID 11 * from the URL's currentJobId param, then calls `#extractDetailFromPanel` 12 * to scrape the apply link and JD text from the opened detail panel 13 * 3. isJobCardViewed() checks for LinkedIn's "Viewed" label on the card 14 * 4. Blocklist, dedup (job ID + company::title), and title filter run 15 * against the job data — dedup catches both same-run duplicates 16 * and cross-portal matches (Greenhouse/Ashby/Lever via scan-history.tsv) 17 * 5. JD-level filter and apply URL validation run post-click 18 * 6. Accepted listings and skipped entries are returned to the harness 19 * 20 * Apply URL resolution: 21 * - External apply link (via xpathApplyUrl) → unwrapped from LinkedIn's 22 * /safety/go redirect 23 * - Easy Apply / no external link → falls back to the listing URL 24 * (https://www.linkedin.com/jobs/view/{JOB_ID}/) 25 * 26 * Listing URL: 27 * Always stored as https://www.linkedin.com/jobs/view/{JOB_ID}/, extracted 28 * from the currentJobId URL param after clicking the card. 29 */ 30 31 32 import yaml from 'js-yaml'; 33 34 // --------------------------------------------------------------------------- 35 // Selectors — grouped for easy maintenance when LinkedIn changes DOM 36 // --------------------------------------------------------------------------- 37 38 const SELECTORS = { 39 xpathListingCard: "//button[starts-with(`@aria-label`, 'Dismiss') and contains(`@aria-label`, 'job')]/ancestor::div[`@role`='button']", 40 /** Dismiss control on each left-rail card; aria-label is `Dismiss {job title} job` (EN UI). */ 41 cardDismissButtonCss: 'button[aria-label^="Dismiss"][aria-label*=" job"]', 42 /** Company name often appears on this anchor when present. */ 43 cardCompanyLinkQuery: 'a[href*="/company/"]', 44 xpathApplyUrl: "//a[`@aria-label`='Apply on company website']", 45 xpathMoreButton: "//span[normalize-space(text())='more']", 46 jdContent: 'span[data-testid="expandable-text-box"]', 47 loggedIn: 'a[aria-label*="My Network"]', 48 xpathCurrentPage: "//button[`@aria-current`='true'][starts-with(`@aria-label`, 'Page')]", 49 xpathPageButton: "//button[starts-with(`@aria-label`, 'Page')]", 50 51 viewedStatusTagQuery: 'p, span, li', 52 viewedStatusLabels: ['Viewed'], 53 /** Characters allowed between the label and the next status token in `innerText` (middle dot, bullet, pipe) */ 54 viewedStatusLineSeparatorCharClass: '·•|', 55 }; 56 57 const NOISE_LABELS = new Set([ 58 'more', 'show more', 'see more', 59 'less', 'show less', 'see less', 60 'retry premium', 61 ]); 62 const CARD_CLICK_DELAY_MS = 1000; 63 const SESSION_CHECK_DELAY_MS = 3000; 64 const NAV_TIMEOUT_MS = 30000; 65 const DEFAULT_DELAY_PAGES_MS = [3000, 8000]; 66 const DEFAULT_DELAY_SEARCHES_MS = [5000, 15000]; 67 68 function randomDelay(range) { 69 const [min, max] = range; 70 return Math.floor(Math.random() * (max - min) + min); 71 } 72 function sleep(ms) { return new Promise(r => setTimeout(r, ms)); } 73 74 function log(msg) { console.log(`[linkedin] ${msg}`); } 75 function warn(msg) { console.warn(`[linkedin] ⚠ ${msg}`); } 76 77 export default class LinkedInScanner { 78 name = 'LinkedIn'; 79 portalId = 'linkedin'; 80 loginUrl = 'https://www.linkedin.com/login'; 81 feedUrl = 'https://www.linkedin.com/feed/'; 82 83 // ------------------------------------------------------------------------- 84 // Config parsing — extracts linkedin_searches section from portals.yml 85 // ------------------------------------------------------------------------- 86 87 parseConfig(raw) { 88 const doc = yaml.load(raw); 89 const ls = doc.linkedin_searches || {}; 90 return { 91 title_filter: doc.title_filter || { positive: [], negative: [] }, 92 keywords: ls.keywords || [], 93 employer_blocklist: ls.employer_blocklist || [], 94 date_posted: ls.date_posted, 95 max_results: ls.max_results_per_search, 96 delay_pages: ls.delay_between_pages_ms, 97 delay_searches: ls.delay_between_searches_ms, 98 experience_level: ls.experience_level, 99 skip_viewed: ls.skip_viewed, 100 }; 101 } 102 103 // ------------------------------------------------------------------------- 104 // Session management 105 // ------------------------------------------------------------------------- 106 107 async isLoggedIn(page) { 108 const url = page.url(); 109 if (url.includes('/login') || url.includes('/uas/') || url.includes('/checkpoint/')) { 110 return false; 111 } 112 if (await page.$(SELECTORS.loggedIn)) return true; 113 return false; 114 } 115 116 async checkSession(page) { 117 await page.goto(this.feedUrl, { waitUntil: 'domcontentloaded', timeout: NAV_TIMEOUT_MS }); 118 await sleep(SESSION_CHECK_DELAY_MS); 119 return this.isLoggedIn(page); 120 } 121 122 // ------------------------------------------------------------------------- 123 // Scan — the main extraction loop 124 // 125 // Handles extraction, filtering, dedup, and employer blocklist. 126 // Returns only accepted listings ready to be saved. 127 // 128 // Options: { scanHistory, skipViewed } 129 // ------------------------------------------------------------------------- 130 131 async scan(context, config, options = {}) { 132 const maxPerSearch = options.maxResults || config.max_results || 25; 133 const delayPages = config.delay_pages || DEFAULT_DELAY_PAGES_MS; 134 const delaySearches = config.delay_searches || DEFAULT_DELAY_SEARCHES_MS; 135 const titleFilter = config.title_filter; 136 const employerBlocklist = config.employer_blocklist || []; 137 const scanHistory = options.scanHistory || new Set(); 138 /** Omit cards LinkedIn marks as already opened. Default true when `skip_viewed` is absent in portals.yml. */ 139 const skipViewed = options.skipViewed !== undefined 140 ? Boolean(options.skipViewed) 141 : config.skip_viewed !== false; 142 143 const keywords = config.keywords || []; 144 if (keywords.length === 0) { 145 log('No keywords found in portals.yml'); 146 return null; 147 } 148 149 const searches = this.#buildSearches(config); 150 151 const toRun = options.searchFilter 152 ? searches.filter(s => s.name === options.searchFilter) 153 : searches; 154 155 if (toRun.length === 0) { 156 log(`No keyword matching "${options.searchFilter}"`); 157 log(`Available: ${searches.map(s => s.name).join(', ')}`); 158 return null; 159 } 160 161 const listings = []; 162 const skipped = []; 163 const errors = []; 164 const stats = { 165 searched: 0, found: 0, extracted: 0, 166 skipped_filter: 0, skipped_dedup: 0, skipped_viewed: 0, errors: 0, 167 }; 168 169 // Circuit breaker: bail out after too many consecutive extraction failures 170 const MAX_CONSECUTIVE_FAILURES = 15; 171 let consecutiveFailures = 0; 172 173 // Single page for all searches — avoids Playwright stealing window focus 174 const page = await context.newPage(); 175 176 for (const search of toRun) { 177 log(`\n── Search: ${search.name} ──`); 178 stats.searched++; 179 consecutiveFailures = 0; // reset circuit breaker per search 180 181 try { 182 await page.goto(search.url, { waitUntil: 'domcontentloaded', timeout: NAV_TIMEOUT_MS }); 183 await sleep(randomDelay(delayPages)); 184 185 let accepted = 0; 186 let hasNextPage = true; 187 188 while (hasNextPage && accepted < maxPerSearch) { 189 const currentPage = await this.#getCurrentPage(page); 190 log(`Page ${currentPage || 1}`); 191 192 const cardCount = await this.#getCardCount(page); 193 log(`Found ${cardCount} job cards`); 194 stats.found += cardCount; 195 196 for (let i = 0; i < cardCount; i++) { 197 if (accepted >= maxPerSearch) { 198 log(`Reached max results (${maxPerSearch}) for this search`); 199 break; 200 } 201 202 // Circuit breaker: stop if extraction is consistently failing 203 if (consecutiveFailures >= MAX_CONSECUTIVE_FAILURES) { 204 warn(`${MAX_CONSECUTIVE_FAILURES} consecutive extraction failures — stopping this search (likely throttled or DOM changed)`); 205 break; 206 } 207 208 const card = await this.#getCard(page, i); 209 const cardExists = await page.evaluate(c => c != null, card); 210 if (!cardExists) { 211 warn(` ✗ Card ${i} not found`); 212 stats.errors++; 213 consecutiveFailures++; 214 await card.dispose(); 215 continue; 216 } 217 218 // 1. Extract title, company, click card, extract detail 219 const data = await this.extractJob(page, card); 220 221 // 2. Check viewed label 222 if (skipViewed && await this.isJobCardViewed(page, card)) { 223 log(` ✗ Viewed: skipped card ${i}`); 224 stats.skipped_viewed++; 225 skipped.push({ 226 url: data.listingUrl || '', 227 title: data.title || '', 228 company: data.company || '', 229 status: 'skipped_viewed_linkedin', 230 }); 231 await card.dispose(); 232 continue; 233 } 234 await card.dispose(); 235 236 237 if (!data.title) { 238 warn(` ✗ No title on card ${i} (data); skipping`); 239 stats.errors++; 240 consecutiveFailures++; 241 continue; 242 } 243 // 3. Blocklist, dedup, and title filter 244 if (employerBlocklist.length && data.company) { 245 const companyLower = data.company.toLowerCase(); 246 if (employerBlocklist.some(b => companyLower === b.toLowerCase())) { 247 log(` ✗ Blocked employer: ${data.company}`); 248 stats.skipped_filter++; 249 continue; 250 }

Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '190,280p'

Repository: santifer/career-ops

Repository: santifer/career-ops
Exit code: 0

stdout:

190 log(`Page ${currentPage || 1}`); 191 192 const cardCount = await this.#getCardCount(page); 193 log(`Found ${cardCount} job cards`); 194 stats.found += cardCount; 195 196 for (let i = 0; i < cardCount; i++) { 197 if (accepted >= maxPerSearch) { 198 log(`Reached max results (${maxPerSearch}) for this search`); 199 break; 200 } 201 202 // Circuit breaker: stop if extraction is consistently failing 203 if (consecutiveFailures >= MAX_CONSECUTIVE_FAILURES) { 204 warn(`${MAX_CONSECUTIVE_FAILURES} consecutive extraction failures — stopping this search (likely throttled or DOM changed)`); 205 break; 206 } 207 208 const card = await this.#getCard(page, i); 209 const cardExists = await page.evaluate(c => c != null, card); 210 if (!cardExists) { 211 warn(` ✗ Card ${i} not found`); 212 stats.errors++; 213 consecutiveFailures++; 214 await card.dispose(); 215 continue; 216 } 217 218 // 1. Extract title, company, click card, extract detail 219 const data = await this.extractJob(page, card); 220 221 // 2. Check viewed label 222 if (skipViewed && await this.isJobCardViewed(page, card)) { 223 log(` ✗ Viewed: skipped card ${i}`); 224 stats.skipped_viewed++; 225 skipped.push({ 226 url: data.listingUrl || '', 227 title: data.title || '', 228 company: data.company || '', 229 status: 'skipped_viewed_linkedin', 230 }); 231 await card.dispose(); 232 continue; 233 } 234 await card.dispose(); 235 236 237 if (!data.title) { 238 warn(` ✗ No title on card ${i} (data); skipping`); 239 stats.errors++; 240 consecutiveFailures++; 241 continue; 242 } 243 // 3. Blocklist, dedup, and title filter 244 if (employerBlocklist.length && data.company) { 245 const companyLower = data.company.toLowerCase(); 246 if (employerBlocklist.some(b => companyLower === b.toLowerCase())) { 247 log(` ✗ Blocked employer: ${data.company}`); 248 stats.skipped_filter++; 249 continue; 250 } 251 } 252 253 const companyTitleKey = (data.company && data.title) 254 ? `${data.company}::${data.title}`.toLowerCase() : null; 255 const dedupKey = data.jobId || companyTitleKey; 256 if ((dedupKey && scanHistory.has(dedupKey)) || (companyTitleKey && scanHistory.has(companyTitleKey))) { 257 log(` ✗ Already seen: ${data.title} (${data.company})`); 258 stats.skipped_dedup++; 259 skipped.push({ 260 url: data.listingUrl || '', 261 title: data.title || '', 262 company: data.company || '', 263 status: 'skipped_dup', 264 }); 265 continue; 266 } 267 268 if (!this.#matchesFilter(data.title, '', titleFilter)) { 269 log(` ✗ Filtered: ${data.title} (${data.company})`); 270 stats.skipped_filter++; 271 continue; 272 } 273 274 stats.extracted++; 275 276 if (!data.applicationUrl) { 277 log(` ✗ No apply URL: ${data.title} (${data.company})`); 278 stats.skipped_filter++; 279 continue; 280 }

Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '290,380p'

Repository: santifer/career-ops

Repository: santifer/career-ops
Exit code: 0

stdout:

290 stats.errors++; 291 consecutiveFailures++; 292 continue; 293 } 294 295 consecutiveFailures = 0; 296 if (dedupKey) scanHistory.add(dedupKey); 297 if (companyTitleKey) scanHistory.add(companyTitleKey); 298 listings.push({ 299 title: data.title, 300 company: data.company, 301 applicationUrl: data.applicationUrl, 302 jdText: data.jdText, 303 url: data.listingUrl, 304 }); 305 accepted++; 306 log(` ✓ Accepted: ${data.title} at ${data.company}`); 307 } 308 309 if (consecutiveFailures >= MAX_CONSECUTIVE_FAILURES) { 310 hasNextPage = false; 311 } else if (accepted < maxPerSearch) { 312 hasNextPage = await this.#goToNextPage(page); 313 if (hasNextPage) { 314 log(`Navigating to next page...`); 315 await sleep(randomDelay(delayPages)); 316 } 317 } else { 318 hasNextPage = false; 319 } 320 } 321 } catch (e) { 322 log(`Search "${search.name}" failed: ${e.message}`); 323 errors.push({ search: search.name, error: e.message }); 324 stats.errors++; 325 } 326 327 if (toRun.indexOf(search) < toRun.length - 1) { 328 const d = randomDelay(delaySearches); 329 log(`Waiting ${(d / 1000).toFixed(1)}s before next search...`); 330 await sleep(d); 331 } 332 } 333 334 await page.close(); 335 return { listings, skipped, errors, stats }; 336 } 337 338 // ------------------------------------------------------------------------- 339 // Private — filtering 340 // ------------------------------------------------------------------------- 341 342 `#matchesFilter`(title, jdText, filter) { 343 if (!filter) return true; 344 const combined = `${title} ${jdText}`.toLowerCase(); 345 const titleLower = title.toLowerCase(); 346 const hasPositive = !filter.positive?.length || 347 filter.positive.some(kw => combined.includes(kw.toLowerCase())); 348 const hasNegative = filter.negative?.length && 349 filter.negative.some(kw => titleLower.includes(kw.toLowerCase())); 350 return hasPositive && !hasNegative; 351 } 352 353 // ------------------------------------------------------------------------- 354 // Private — search URL construction 355 // ------------------------------------------------------------------------- 356 357 `#buildSearches`(config) { 358 const datePostedMap = { '24': 'past 24 hours', 'Week': 'past week', 'Month': 'past month' }; 359 const dateSuffix = datePostedMap[config.date_posted] || ''; 360 const levels = config.experience_level || []; 361 const levelPrefix = levels.length ? levels.join(' or ') : ''; 362 363 return config.keywords.map(kw => { 364 let query = levelPrefix ? `${levelPrefix} ${kw}` : kw; 365 if (dateSuffix) query += ` posted in the ${dateSuffix}`; 366 const params = new URLSearchParams({ keywords: query }); 367 return { name: kw, url: `https://www.linkedin.com/jobs/search-results/?${params}` }; 368 }); 369 } 370 371 // ------------------------------------------------------------------------- 372 // Private — URL helpers 373 // ------------------------------------------------------------------------- 374 375 `#unwrapRedirect`(href) { 376 const trimmed = (href || '').trim(); 377 if (!trimmed) return ''; 378 try { 379 const u = new URL(trimmed); 380 if (!u.hostname.includes('linkedin.com')) return trimmed;

Script executed:

grep -n "extractJob" scan-auth/linkedin.mjs | head -5

Repository: santifer/career-ops

Repository: santifer/career-ops
Exit code: 0

stdout:

9: * 2. extractJob() reads title/company/location from the card DOM, 219: const data = await this.extractJob(page, card); 488: async extractJob(page, card) {

Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '290,350p'

Repository: santifer/career-ops

Repository: santifer/career-ops
Exit code: 0

stdout:

290 stats.errors++; 291 consecutiveFailures++; 292 continue; 293 } 294 295 consecutiveFailures = 0; 296 if (dedupKey) scanHistory.add(dedupKey); 297 if (companyTitleKey) scanHistory.add(companyTitleKey); 298 listings.push({ 299 title: data.title, 300 company: data.company, 301 applicationUrl: data.applicationUrl, 302 jdText: data.jdText, 303 url: data.listingUrl, 304 }); 305 accepted++; 306 log(` ✓ Accepted: ${data.title} at ${data.company}`); 307 } 308 309 if (consecutiveFailures >= MAX_CONSECUTIVE_FAILURES) { 310 hasNextPage = false; 311 } else if (accepted < maxPerSearch) { 312 hasNextPage = await this.#goToNextPage(page); 313 if (hasNextPage) { 314 log(`Navigating to next page...`); 315 await sleep(randomDelay(delayPages)); 316 } 317 } else { 318 hasNextPage = false; 319 } 320 } 321 } catch (e) { 322 log(`Search "${search.name}" failed: ${e.message}`); 323 errors.push({ search: search.name, error: e.message }); 324 stats.errors++; 325 } 326 327 if (toRun.indexOf(search) < toRun.length - 1) { 328 const d = randomDelay(delaySearches); 329 log(`Waiting ${(d / 1000).toFixed(1)}s before next search...`); 330 await sleep(d); 331 } 332 } 333 334 await page.close(); 335 return { listings, skipped, errors, stats }; 336 } 337 338 // ------------------------------------------------------------------------- 339 // Private — filtering 340 // ------------------------------------------------------------------------- 341 342 `#matchesFilter`(title, jdText, filter) { 343 if (!filter) return true; 344 const combined = `${title} ${jdText}`.toLowerCase(); 345 const titleLower = title.toLowerCase(); 346 const hasPositive = !filter.positive?.length || 347 filter.positive.some(kw => combined.includes(kw.toLowerCase())); 348 const hasNegative = filter.negative?.length && 349 filter.negative.some(kw => titleLower.includes(kw.toLowerCase())); 350 return hasPositive && !hasNegative;

Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '488,550p'

Repository: santifer/career-ops

Repository: santifer/career-ops
Exit code: 0

stdout:

488 async extractJob(page, card) { 489 const data = await page.evaluate(({ card, sel, noise }) => { 490 const noiseSet = new Set((noise || []).map((s) => String(s).toLowerCase())); 491 492 let title = ''; 493 const dismiss = card.querySelector(sel.cardDismissButtonCss || 'button[aria-label^="Dismiss"]'); 494 if (dismiss) { 495 const al = dismiss.getAttribute('aria-label') || ''; 496 const m = al.match(/^Dismiss\s+(.+?)\s+job\s*$/i); 497 if (m) title = m[1].trim(); 498 } 499 500 let company = ''; 501 for (const a of card.querySelectorAll(sel.cardCompanyLinkQuery || 'a[href*="/company/"]')) { 502 const t = (a.textContent ?? '').trim(); 503 if (t.length >= 1 && t.length < 120 && !noiseSet.has(t.toLowerCase())) { 504 company = t; 505 break; 506 } 507 } 508 509 function looksLikeLocation(s) { 510 return /$(On-?site|Hybrid|Remote)$/i.test(s) 511 || /,\s*[A-Z]{2}\b/.test(s) 512 || /\bRemote\b/i.test(s); 513 } 514 515 function looksLikeMetaLine(s) { 516 const lower = s.toLowerCase(); 517 return lower === 'viewed' 518 || lower === '·' 519 || /school alumni work(s)? here/i.test(s) 520 || /early applicant/i.test(lower) 521 || /^\d+ benefits?$/i.test(s) 522 || /^posted on\b/i.test(s) 523 || /\b(hour|day|week|month)s?\s+ago$/i.test(s) 524 || /^[\d·|•\s]+$/.test(s); 525 } 526 527 if (!company) { 528 const paragraphs = [...card.querySelectorAll('p')] 529 .map((p) => (p.textContent ?? '').trim()) 530 .filter((t) => t.length > 0 && t.length < 200); 531 for (const t of paragraphs) { 532 if (noiseSet.has(t.toLowerCase())) continue; 533 if (title && (t === title || t.includes(title))) continue; 534 if (looksLikeMetaLine(t)) continue; 535 if (looksLikeLocation(t)) continue; 536 if (t.length < 2) continue; 537 company = t; 538 break; 539 } 540 } 541 542 let location = ''; 543 for (const t of [...card.querySelectorAll('p')].map((p) => (p.textContent ?? '').trim())) { 544 if (looksLikeLocation(t) && t !== title && t !== company) { 545 location = t; 546 break; 547 } 548 } 549 550 return {

Script executed:

grep -n "isJobCardViewed" scan-auth/linkedin.mjs | head -5

Repository: santifer/career-ops

Repository: santifer/career-ops
Exit code: 0

stdout:

13: * 3. isJobCardViewed() checks for LinkedIn's "Viewed" label on the card 222: if (skipViewed && await this.isJobCardViewed(page, card)) { 576: async isJobCardViewed(page, card) {

Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '560,620p'

Repository: santifer/career-ops

Repository: santifer/career-ops
Exit code: 0

stdout:

560 data.listingUrl = listingUrl; 561 data.clicked = clicked; 562 563 // Extract detail from the now-open panel 564 if (clicked) { 565 const detail = await this.#extractDetailFromPanel(page, jobId); 566 data.applicationUrl = this.#unwrapRedirect(detail.applicationUrl); 567 data.jdText = detail.jdText; 568 } 569 570 return data; 571 } 572 573 /** 574 * Whether LinkedIn marks the listing card as already opened (see SELECTORS.viewedStatusLabels). 575 */ 576 async isJobCardViewed(page, card) { 577 return page.evaluate(({ card, sel }) => { 578 const labels = sel.viewedStatusLabels || []; 579 const labelSet = new Set(labels.map((s) => String(s).toLowerCase())); 580 const tagQuery = sel.viewedStatusTagQuery || 'p, span, li'; 581 for (const el of card.querySelectorAll(tagQuery)) { 582 const t = (el.textContent ?? '').trim(); 583 if (t && labelSet.has(t.toLowerCase())) return true; 584 } 585 586 const sepClass = sel.viewedStatusLineSeparatorCharClass || '·•|'; 587 const line = (card.innerText ?? '').replace(/\s+/g, ' ').trim(); 588 for (const raw of labels) { 589 const esc = String(raw).replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); 590 const re = new RegExp(`\\b${esc}\\b\\s*[${sepClass}]`, 'i'); 591 if (re.test(line)) return true; 592 } 593 594 return false; 595 }, { card, sel: SELECTORS }); 596 } 597 598 /** 599 * Scrape the detail panel (right side) after a card has been clicked. 600 * Clicks "more" to expand truncated JDs, then extracts the apply URL 601 * and full JD text. Falls back to the listing URL if no external apply 602 * link is found (Easy Apply jobs). 603 */ 604 async `#extractDetailFromPanel`(page, jobId) { 605 // Try to expand the description 606 await page.evaluate(({ xpath }) => { 607 const result = document.evaluate(xpath, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null); 608 const moreSpan = result.singleNodeValue; 609 if (moreSpan) moreSpan.click(); 610 }, { xpath: SELECTORS.xpathMoreButton }); 611 await sleep(250); 612 613 const detail = await page.evaluate(({ sel, jobId }) => { 614 function xpathFirst(expression) { 615 const result = document.evaluate( 616 expression, document, null, 617 XPathResult.FIRST_ORDERED_NODE_TYPE, null 618 ); 619 return result.singleNodeValue; 620 }

Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '550,575p'

Repository: santifer/career-ops

Repository: santifer/career-ops
Exit code: 0

stdout:

550 return { 551 title: title || '', 552 company: company || '', 553 location: location || '', 554 }; 555 }, { card, sel: SELECTORS, noise: [...NOISE_LABELS] }); 556 557 // Click card and extract job ID from the resulting URL 558 const { clicked, jobId, listingUrl } = await this.#clickAndExtractJobId(page, card); 559 data.jobId = jobId; 560 data.listingUrl = listingUrl; 561 data.clicked = clicked; 562 563 // Extract detail from the now-open panel 564 if (clicked) { 565 const detail = await this.#extractDetailFromPanel(page, jobId); 566 data.applicationUrl = this.#unwrapRedirect(detail.applicationUrl); 567 data.jdText = detail.jdText; 568 } 569 570 return data; 571 } 572 573 /** 574 * Whether LinkedIn marks the listing card as already opened (see SELECTORS.viewedStatusLabels). 575 */

Script executed:

grep -n "clickAndExtractJobId" scan-auth/linkedin.mjs | head -3

Repository: santifer/career-ops

Repository: santifer/career-ops
Exit code: 0

stdout:

10: * then clicks the card via `#clickAndExtractJobId` to get the job ID 464: async `#clickAndExtractJobId`(page, card) { 558: const { clicked, jobId, listingUrl } = await this.#clickAndExtractJobId(page, card);

Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '440,485p'

Repository: santifer/career-ops

Repository: santifer/career-ops
Exit code: 0

stdout:

440 441 // ------------------------------------------------------------------------- 442 // Private — extraction helpers 443 // ------------------------------------------------------------------------- 444 445 async `#getCardCount`(page) { 446 return page.evaluate((xpath) => { 447 const result = document.evaluate(xpath, document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null); 448 return result.snapshotLength; 449 }, SELECTORS.xpathListingCard); 450 } 451 452 async `#getCard`(page, index) { 453 return page.evaluateHandle(({ xpath, idx }) => { 454 const result = document.evaluate(xpath, document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null); 455 return result.snapshotItem(idx) || null; 456 }, { xpath: SELECTORS.xpathListingCard, idx: index }); 457 } 458 459 /** 460 * Click a card and extract the job ID from the resulting URL. 461 * LinkedIn updates the URL's currentJobId query param when a card is selected. 462 * Returns { clicked, jobId, listingUrl }. 463 */ 464 async `#clickAndExtractJobId`(page, card) { 465 const clicked = await page.evaluate((c) => { 466 if (c) { c.click(); return true; } 467 return false; 468 }, card); 469 if (!clicked) return { clicked: false, jobId: '', listingUrl: '' }; 470 471 await sleep(CARD_CLICK_DELAY_MS); 472 const url = page.url(); 473 try { 474 const jobId = new URL(url).searchParams.get('currentJobId') || ''; 475 const listingUrl = jobId ? `https://www.linkedin.com/jobs/view/${jobId}/` : ''; 476 return { clicked: true, jobId, listingUrl }; 477 } catch { 478 return { clicked: true, jobId: '', listingUrl: '' }; 479 } 480 } 481 482 /** 483 * Extract all data for a single job card. Reads title/company/location from 484 * the card DOM (no click), then clicks the card to get the job ID and opens 485 * the detail panel to scrape the apply URL and JD text.

Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '208,295p'

Repository: santifer/career-ops

Repository: santifer/career-ops
Exit code: 0

stdout:

208 const card = await this.#getCard(page, i); 209 const cardExists = await page.evaluate(c => c != null, card); 210 if (!cardExists) { 211 warn(` ✗ Card ${i} not found`); 212 stats.errors++; 213 consecutiveFailures++; 214 await card.dispose(); 215 continue; 216 } 217 218 // 1. Extract title, company, click card, extract detail 219 const data = await this.extractJob(page, card); 220 221 // 2. Check viewed label 222 if (skipViewed && await this.isJobCardViewed(page, card)) { 223 log(` ✗ Viewed: skipped card ${i}`); 224 stats.skipped_viewed++; 225 skipped.push({ 226 url: data.listingUrl || '', 227 title: data.title || '', 228 company: data.company || '', 229 status: 'skipped_viewed_linkedin', 230 }); 231 await card.dispose(); 232 continue; 233 } 234 await card.dispose(); 235 236 237 if (!data.title) { 238 warn(` ✗ No title on card ${i} (data); skipping`); 239 stats.errors++; 240 consecutiveFailures++; 241 continue; 242 } 243 // 3. Blocklist, dedup, and title filter 244 if (employerBlocklist.length && data.company) { 245 const companyLower = data.company.toLowerCase(); 246 if (employerBlocklist.some(b => companyLower === b.toLowerCase())) { 247 log(` ✗ Blocked employer: ${data.company}`); 248 stats.skipped_filter++; 249 continue; 250 } 251 } 252 253 const companyTitleKey = (data.company && data.title) 254 ? `${data.company}::${data.title}`.toLowerCase() : null; 255 const dedupKey = data.jobId || companyTitleKey; 256 if ((dedupKey && scanHistory.has(dedupKey)) || (companyTitleKey && scanHistory.has(companyTitleKey))) { 257 log(` ✗ Already seen: ${data.title} (${data.company})`); 258 stats.skipped_dedup++; 259 skipped.push({ 260 url: data.listingUrl || '', 261 title: data.title || '', 262 company: data.company || '', 263 status: 'skipped_dup', 264 }); 265 continue; 266 } 267 268 if (!this.#matchesFilter(data.title, '', titleFilter)) { 269 log(` ✗ Filtered: ${data.title} (${data.company})`); 270 stats.skipped_filter++; 271 continue; 272 } 273 274 stats.extracted++; 275 276 if (!data.applicationUrl) { 277 log(` ✗ No apply URL: ${data.title} (${data.company})`); 278 stats.skipped_filter++; 279 continue; 280 } 281 282 if (!this.#matchesFilter(data.title, data.jdText || '', titleFilter)) { 283 log(` ✗ Filtered after JD: ${data.title} (${data.company})`); 284 stats.skipped_filter++; 285 continue; 286 } 287 288 if (!data.jdText) { 289 warn(` ✗ No JD content: ${data.title}`); 290 stats.errors++; 291 consecutiveFailures++; 292 continue; 293 } 294 295 consecutiveFailures = 0;

Check viewed status before extracting, and protect card disposal with try/finally.

Line 219 calls extractJob(), which clicks the card at line 558 and opens the detail panel. Only then, at line 222, does isJobCardViewed() check whether LinkedIn marked it as viewed—meaning viewed jobs are unnecessarily opened and scraped before being skipped. Additionally, if extractJob() throws, the card handle at line 208 is never disposed, leaking browser resources.

Move isJobCardViewed() before the click (since it only reads card DOM), create a no-click helper to extract title/company/location for skipped entries, and wrap per-card work in try/finally to ensure card.dispose() always runs.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scan-auth/linkedin.mjs` around lines 208 - 234, Reorder and protect per-card processing: after obtaining the handle from this.#getCard(page, i) call, call this.isJobCardViewed(page, card) before invoking this.extractJob so you don't click/open viewed listings; if skipViewed is true and the card is viewed, use a new no-click helper (e.g., extractJobMetadataNoClick or similar) that reads title/company/location/listingUrl from the card DOM without clicking, push the skipped entry and stats, and skip extraction; wrap the whole per-card logic (the sequence around this.#getCard, this.isJobCardViewed, this.extractJob and pushing results) in a try/finally that always calls await card.dispose() in the finally block so the handle is never leaked, and only call extractJob (which clicks) when you know you will process the detail panel.

coderabbitai · 2026-04-19T17:44:53Z

+  #unwrapRedirect(href) {
+    const trimmed = (href || '').trim();
+    if (!trimmed) return '';
+    try {
+      const u = new URL(trimmed);
+      if (!u.hostname.includes('linkedin.com')) return trimmed;
+      if (!u.pathname.includes('/safety/go')) return trimmed;
+      const nested = u.searchParams.get('url');
+      if (!nested) return trimmed;
+      const decoded = decodeURIComponent(nested);
+      new URL(decoded);
+      return decoded;


⚠️ Potential issue | 🟡 Minor

Tighten the trusted LinkedIn redirect host check.

hostname.includes('linkedin.com') trusts hosts like notlinkedin.com or linkedin.com.evil.tld and unwraps their /safety/go?url=... targets. Match exact LinkedIn hosts before decoding nested URLs. As per coding guidelines, **/*.mjs: Check for command injection, path traversal, and SSRF.

🛡️ Proposed host validation fix

const u = new URL(trimmed); - if (!u.hostname.includes('linkedin.com')) return trimmed; - if (!u.pathname.includes('/safety/go')) return trimmed; + const hostname = u.hostname.toLowerCase(); + const isLinkedInHost = hostname === 'linkedin.com' || hostname.endsWith('.linkedin.com'); + if (!isLinkedInHost) return trimmed; + if (u.pathname !== '/safety/go') return trimmed;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scan-auth/linkedin.mjs` around lines 375 - 386, The hostname check in `#unwrapRedirect` is too permissive (uses hostname.includes('linkedin.com')) allowing domains like notlinkedin.com; update it to only accept exact LinkedIn hostnames (e.g., linkedin.com, www.linkedin.com, touch.www.linkedin.com, www.linkedin.cn, or other canonical LinkedIn hosts used by your app) before processing /safety/go and decoding the nested url; ensure the check happens on u.hostname (or normalized hostname) and only then call u.searchParams.get('url') and decodeURIComponent, keeping the rest of the existing logic in `#unwrapRedirect` intact.

This logic is heuristic-based and any potential issues would require malicious code being hosted on the LinkedIn job portal.

@DSnoNintendo, that's a fair point. Since #unwrapRedirect only operates on URLs already present in LinkedIn's own DOM (sourced from the Apply on company website anchor), any abuse of the permissive .includes('linkedin.com') check would presuppose an attacker already controlling content served by LinkedIn's job portal. The decoded URL is also returned as a string rather than triggering an automated outbound fetch, which removes the server-side SSRF vector. I'll defer to your assessment here.

✏️ Learnings added

Learnt from: DSnoNintendo Repo: santifer/career-ops PR: 379 File: scan-auth/linkedin.mjs:375-386 Timestamp: 2026-04-19T17:58:15.791Z Learning: In `scan-auth/linkedin.mjs` (career-ops repo), the `#unwrapRedirect` method's `hostname.includes('linkedin.com')` check is intentionally heuristic. The URLs it processes are sourced exclusively from LinkedIn's own DOM (the `//a[aria-label='Apply on company website']` XPath selector), so exploiting the permissive check would require an attacker to already control content on LinkedIn's portal. The decoded result is stored as a string, not used for an automated server-side HTTP request, so there is no SSRF risk. Do not flag `#unwrapRedirect` for the permissive hostname check.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: DSnoNintendo Repo: santifer/career-ops PR: 379 File: scan-auth/linkedin.mjs:464-475 Timestamp: 2026-04-19T17:56:07.061Z Learning: In `scan-auth/linkedin.mjs` (career-ops repo), clicking a LinkedIn job card in the search results left rail triggers an **immediate/synchronous SPA navigation** that updates the `currentJobId` URL query parameter. A fixed `sleep(CARD_CLICK_DELAY_MS)` after the click is intentional — it is a rendering buffer for the detail panel, not a guard for the URL change. Do not flag `#clickAndExtractJobId` for using a fixed delay instead of `page.waitForFunction` for the URL/jobId change.

Learnt from: DSnoNintendo Repo: santifer/career-ops PR: 379 File: scan-auth/linkedin.mjs:357-368 Timestamp: 2026-04-19T17:52:55.390Z Learning: In `scan-auth/linkedin.mjs` (career-ops repo), LinkedIn job search filters (date posted, experience level) are applied via search query strings embedded in the keywords parameter, not via URL parameters like `f_TPR` or `f_E`. Do not flag the `#buildSearches` method for using query string text for filters — this is the current correct approach for LinkedIn job search URLs.

coderabbitai · 2026-04-19T17:44:53Z

+  async #clickAndExtractJobId(page, card) {
+    const clicked = await page.evaluate((c) => {
+      if (c) { c.click(); return true; }
+      return false;
+    }, card);
+    if (!clicked) return { clicked: false, jobId: '', listingUrl: '' };
+
+    await sleep(CARD_CLICK_DELAY_MS);
+    const url = page.url();
+    try {
+      const jobId = new URL(url).searchParams.get('currentJobId') || '';
+      const listingUrl = jobId ? `https://www.linkedin.com/jobs/view/${jobId}/` : '';


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

head -n 480 scan-auth/linkedin.mjs | tail -n 30

Repository: santifer/career-ops

Length of output: 1182

🏁 Script executed:

grep -n "CARD_CLICK_DELAY_MS\|NAV_TIMEOUT_MS" scan-auth/linkedin.mjs | head -20

Repository: santifer/career-ops

Length of output: 492

🏁 Script executed:

grep -n "#clickAndExtractJobId" scan-auth/linkedin.mjs

Repository: santifer/career-ops

Length of output: 279

🏁 Script executed:

grep -n "page.waitForFunction\|waitForFunction" scan-auth/linkedin.mjs

Repository: santifer/career-ops

Length of output: 124

🏁 Script executed:

grep -rn "waitForFunction" scan-auth/

Repository: santifer/career-ops

Length of output: 147

🏁 Script executed:

grep -n "sleep(" scan-auth/linkedin.mjs | head -20

Repository: santifer/career-ops

Length of output: 379

🏁 Script executed:

sed -n '425,445n' scan-auth/linkedin.mjs

Repository: santifer/career-ops

Length of output: 45

🏁 Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '464,480p'

Repository: santifer/career-ops

Length of output: 791

🏁 Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '425,440p'

Repository: santifer/career-ops

Length of output: 863

🏁 Script executed:

grep -B 5 -A 10 "page.waitForFunction" scan-auth/linkedin.mjs

Repository: santifer/career-ops

Length of output: 819

🏁 Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '1,75p'

Repository: santifer/career-ops

Length of output: 3941

🏁 Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '555,565p'

Repository: santifer/career-ops

Length of output: 571

🏁 Script executed:

grep -n "import.*page\|Playwright\|version" scan-auth/linkedin.mjs | head -10

Repository: santifer/career-ops

Length of output: 144

🏁 Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '464,480p' && echo "---" && cat -n scan-auth/linkedin.mjs | sed -n '460,465p'

Repository: santifer/career-ops

Length of output: 1128

🏁 Script executed:

grep -B 5 "currentJobId" scan-auth/linkedin.mjs | head -20

Repository: santifer/career-ops

Length of output: 884

Replace fixed delay with change detection to reliably capture the updated job ID.

The URL's currentJobId parameter updates asynchronously after the card click. A fixed 1-second delay can race and read an empty or stale value, causing the scanner to extract job details from the wrong listing.

Use page.waitForFunction to wait for the parameter to actually change rather than relying on a fixed timeout:

Proposed fix

async `#clickAndExtractJobId`(page, card) { + const previousJobId = await page.evaluate(() => ( + new URL(window.location.href).searchParams.get('currentJobId') || '' + )).catch(() => ''); + const clicked = await page.evaluate((c) => { if (c) { c.click(); return true; } return false; }, card); if (!clicked) return { clicked: false, jobId: '', listingUrl: '' }; - await sleep(CARD_CLICK_DELAY_MS); + await page.waitForFunction((prev) => { + const current = new URL(window.location.href).searchParams.get('currentJobId') || ''; + return current && current !== prev; + }, previousJobId, { timeout: NAV_TIMEOUT_MS }).catch(() => {}); + const url = page.url();

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scan-auth/linkedin.mjs` around lines 464 - 475, In `#clickAndExtractJobId`, replace the fixed sleep(CARD_CLICK_DELAY_MS) after clicking the card with a wait-for-change using page.waitForFunction: capture the prior URL or prior currentJobId before the click, perform the click via page.evaluate as currently done, then call page.waitForFunction to poll until new URL.searchParams.get('currentJobId') !== priorJobId (or until page.url() changes), with a sensible timeout; after waitForFunction resolves, read the updated page.url() and extract currentJobId and listingUrl as before to reliably get the updated job id instead of racing with a fixed delay.

Although a page.waitForFunction could work here, clicking a job card instantly redirects a browser to the associated job.

@DSnoNintendo, that's a good clarification — if the card click triggers an immediate SPA navigation that synchronously updates currentJobId in the URL, the waitForFunction polling loop would be redundant (it would resolve on the very first check anyway). The existing sleep(CARD_CLICK_DELAY_MS) makes sense as a rendering buffer for the detail panel to populate after the navigation, not as a guard on the URL change itself. Happy to withdraw this concern.

✏️ Learnings added

Learnt from: DSnoNintendo Repo: santifer/career-ops PR: 379 File: scan-auth/linkedin.mjs:464-475 Timestamp: 2026-04-19T17:56:07.061Z Learning: In `scan-auth/linkedin.mjs` (career-ops repo), clicking a LinkedIn job card in the search results left rail triggers an **immediate/synchronous SPA navigation** that updates the `currentJobId` URL query parameter. A fixed `sleep(CARD_CLICK_DELAY_MS)` after the click is intentional — it is a rendering buffer for the detail panel, not a guard for the URL change. Do not flag `#clickAndExtractJobId` for using a fixed delay instead of `page.waitForFunction` for the URL/jobId change.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: DSnoNintendo Repo: santifer/career-ops PR: 379 File: scan-auth/linkedin.mjs:357-368 Timestamp: 2026-04-19T17:52:55.390Z Learning: In `scan-auth/linkedin.mjs` (career-ops repo), LinkedIn job search filters (date posted, experience level) are applied via search query strings embedded in the keywords parameter, not via URL parameters like `f_TPR` or `f_E`. Do not flag the `#buildSearches` method for using query string text for filters — this is the current correct approach for LinkedIn job search URLs.

Learnt from: CR Repo: santifer/career-ops PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-04-19T10:52:21.261Z Learning: Applies to {check-liveness.mjs,liveness-core.mjs} : Verify offer liveness using Playwright by navigating to the URL and taking a snapshot; only footer/navbar without JD content indicates closed posting, while title + description + Apply indicates active

coderabbitai · 2026-04-19T17:44:53Z

+      // if no apply url or easy apply button is present, use the listing url
+      let applicationUrl = applyEl?.href?.trim() ?? `https://www.linkedin.com/jobs/view/${jobId}/`;
+


⚠️ Potential issue | 🟠 Major

Do not synthesize a fallback URL when jobId is missing.

If currentJobId extraction fails and there is no external apply link, this returns https://www.linkedin.com/jobs/view//, which passes the non-empty applicationUrl check and gets saved. Return an empty URL unless a real jobId exists.

🐛 Proposed fallback fix

- // if no apply url or easy apply button is present, use the listing url - let applicationUrl = applyEl?.href?.trim() ?? `https://www.linkedin.com/jobs/view/${jobId}/`; + // If no external apply URL is present, fall back only when a real job ID exists. + let applicationUrl = applyEl?.href?.trim() ?? ''; + if (!applicationUrl && jobId) { + applicationUrl = `https://www.linkedin.com/jobs/view/${jobId}/`; + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// if no apply url or easy apply button is present, use the listing url

let applicationUrl = applyEl?.href?.trim() ?? `https://www.linkedin.com/jobs/view/${jobId}/`;

// If no external apply URL is present, fall back only when a real job ID exists.

let applicationUrl = applyEl?.href?.trim() ?? '';

if (!applicationUrl && jobId) {

applicationUrl = `https://www.linkedin.com/jobs/view/${jobId}/`;

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scan-auth/linkedin.mjs` around lines 624 - 626, The code currently synthesizes a LinkedIn view URL even when jobId is missing; update the logic around applicationUrl (the variable set from applyEl?.href?.trim() ?? `https://www.linkedin.com/jobs/view/${jobId}/`) to only construct the `https://www.linkedin.com/jobs/view/${jobId}/` fallback when `jobId` (or currentJobId) is truthy/non-empty; otherwise set applicationUrl to an empty string (or null) so no fake URL is saved. Ensure you still trim applyEl.href if present and prefer it over the view URL, but guard the template fallback with an explicit check for jobId before building it.

This logic only runs if an EasyApply button is available on the job page.

@DSnoNintendo, thanks for the context — that's a good clarification. You're right that in normal operation the ?? fallback is only reached for Easy Apply jobs, where there's no //a[@aria-label='Apply on company website'] element, and by that point a card has been clicked and the detail panel is open, so a valid currentJobId should always be present in the URL.

The concern is purely defensive: #clickAndExtractJobId already returns jobId: '' as a safety catch (e.g., if a try/catch fires or LinkedIn's URL structure changes), and that '' value is forwarded directly into #extractDetailFromPanel. If both conditions occur simultaneously — applyEl is null and jobId is '' — the current code produces https://www.linkedin.com/jobs/view// and it would silently pass the non-empty URL check downstream.

The proposed guard would add no overhead in the common case, but would prevent a malformed URL from being saved in that edge scenario:

let applicationUrl = applyEl?.href?.trim() ?? ''; if (!applicationUrl && jobId) { applicationUrl = `https://www.linkedin.com/jobs/view/${jobId}/`; }

That said, if you're confident the calling site always supplies a non-empty jobId before reaching this function, I understand leaving it as-is. Happy to defer to your judgment here.

🧠 Learnings used

Learnt from: DSnoNintendo Repo: santifer/career-ops PR: 379 File: scan-auth/linkedin.mjs:357-368 Timestamp: 2026-04-19T17:52:55.390Z Learning: In `scan-auth/linkedin.mjs` (career-ops repo), LinkedIn job search filters (date posted, experience level) are applied via search query strings embedded in the keywords parameter, not via URL parameters like `f_TPR` or `f_E`. Do not flag the `#buildSearches` method for using query string text for filters — this is the current correct approach for LinkedIn job search URLs.

…ead of '/career-ops pipeline' once scan job is complete

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

.claude/skills/career-ops/SKILL.md (1)

30-35: ⚠️ Potential issue | 🟠 Major

Keep the router docs aligned with the portal-required CLI.

The router and menu still advertise bare /career-ops scan-auth, but scan-auth.mjs exits unless a portal is provided. Either route bare scan-auth to an interactive/default portal flow, or document /career-ops scan-auth linkedin consistently.

Docs-only alignment

-| `scan-auth`, `scan-auth <portal>` | `scan-auth` |
+| `scan-auth <portal>` | `scan-auth` |
@@
-  /career-ops scan-auth → Authenticated portal scan (LinkedIn etc.)
+  /career-ops scan-auth linkedin → Authenticated portal scan (LinkedIn)

Also applies to: 63-63

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.claude/skills/career-ops/SKILL.md around lines 30 - 35, The router/menu
docs advertise a bare "scan-auth" command but scan-auth.mjs requires a portal
argument; update routing or docs to be consistent: either change the router/menu
entry for scan-auth to require a portal (e.g., show "scan-auth <portal>" and
examples like "scan-auth linkedin") or implement a default/interactive flow in
the scan-auth handler that routes bare "scan-auth" to prompt for or default to a
portal (e.g., invoke the same logic as scan-auth.mjs with a default portal or
interactive prompt). Locate references to "scan-auth", "scan-auth.mjs", and any
router/menu entries and make the change so the CLI usage in the docs and router
behavior match.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/career-ops/SKILL.md:
- Around line 83-89: MD022 violations: add a single blank line after the
headings "Standalone modes (only their mode file):" and "Modes delegated to
subagent:" so each heading is followed by an empty line before the next
paragraph. Edit the SKILL.md content near those headings and insert one blank
line after each heading (and also ensure any subsequent headings like "Read
`modes/{mode}.md`" or the list starting with "For `scan`..." remain separated),
keeping the existing text and references to `modes/{mode}.md` and `_shared.md`
unchanged.

In `@scan-auth.mjs`:
- Around line 273-293: The saveJd function currently builds filenames from only
company and title which causes collisions; update saveJd to append a stable
unique suffix (e.g., the LinkedIn job ID from detail.jobId or detail.id or
detail.listingId) to the slugified filename so each listing produces a unique
file (fallback to a deterministic fallback like a timestamp only if no ID
exists), ensure the chosen ID field is sanitized the same way as the slug, and
keep returning the same jds/<filename> path; refer to saveJd, JDS_DIR, slug,
filename, filepath and detail.* to locate where to change the filename
generation and return value.
- Around line 75-97: Current parsing ignores unknown flags and extra positional
args which allows e.g. "node scan-auth.mjs linkedin --dry-run" to proceed;
change argument validation in the portalId/FLAGS logic to explicitly parse args
into two sets (recognized flags using FLAGS and positional args), reject any
unknown flag (error + process.exit(1)) and reject more than one positional
portal (error + process.exit(1)); update usage/error messages to include
supportedNames and mention invalid tokens; ensure FLAG.login is derived from the
parsed flags set and that SCANNERS[portalId] only runs after these validations
succeed.
- Around line 319-324: The "Next step" message printed after listing new results
is missing a connector making the sentence awkward; update the console.log that
prints the next-step instruction (the console.log after the loop that references
"/career-ops pipeline") to include a connector — e.g. change its text to "Next
step: run the /career-ops pipeline jobs saved to the pipeline." or "Next step:
run the /career-ops pipeline jobs that were saved to the pipeline." so the
command and the rest of the sentence read correctly.

---

Duplicate comments:
In @.claude/skills/career-ops/SKILL.md:
- Around line 30-35: The router/menu docs advertise a bare "scan-auth" command
but scan-auth.mjs requires a portal argument; update routing or docs to be
consistent: either change the router/menu entry for scan-auth to require a
portal (e.g., show "scan-auth <portal>" and examples like "scan-auth linkedin")
or implement a default/interactive flow in the scan-auth handler that routes
bare "scan-auth" to prompt for or default to a portal (e.g., invoke the same
logic as scan-auth.mjs with a default portal or interactive prompt). Locate
references to "scan-auth", "scan-auth.mjs", and any router/menu entries and make
the change so the CLI usage in the docs and router behavior match.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1899db7a-2d86-4b7c-b192-2c00ed7d642a

📥 Commits

Reviewing files that changed from the base of the PR and between a60cf34 and 648b562.

📒 Files selected for processing (2)

.claude/skills/career-ops/SKILL.md
scan-auth.mjs

coderabbitai · 2026-04-19T18:04:25Z

 ### Standalone modes (only their mode file):
 Read `modes/{mode}.md`

 Applies to: `tracker`, `deep`, `training`, `project`, `patterns`, `followup`

 ### Modes delegated to subagent:
-For `scan`, `apply` (with Playwright), and `pipeline` (3+ URLs): launch as Agent with the content of `_shared.md` + `modes/{mode}.md` injected into the subagent prompt.
+For `scan`, `scan-auth`, `apply` (with Playwright), and `pipeline` (3+ URLs): launch as Agent with the content of `_shared.md` + `modes/{mode}.md` injected into the subagent prompt.


⚠️ Potential issue | 🟡 Minor

Restore markdownlint-required blank lines after headings.

MD022 expects a blank line below these headings; the current text starts immediately after each heading.

Suggested markdownlint fix

### Standalone modes (only their mode file): + Read `modes/{mode}.md` @@ ### Modes delegated to subagent: + For `scan`, `scan-auth`, `apply` (with Playwright), and `pipeline` (3+ URLs): launch as Agent with the content of `_shared.md` + `modes/{mode}.md` injected into the subagent prompt.

🧰 Tools

🪛 markdownlint-cli2 (0.22.0)

[warning] 83-83: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 88-88: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.claude/skills/career-ops/SKILL.md around lines 83 - 89, MD022 violations: add a single blank line after the headings "Standalone modes (only their mode file):" and "Modes delegated to subagent:" so each heading is followed by an empty line before the next paragraph. Edit the SKILL.md content near those headings and insert one blank line after each heading (and also ensure any subsequent headings like "Read `modes/{mode}.md`" or the list starting with "For `scan`..." remain separated), keeping the existing text and references to `modes/{mode}.md` and `_shared.md` unchanged.

coderabbitai · 2026-04-19T18:04:25Z

+const FLAGS = new Set(['--login']);
+const portalId = (() => {
+  for (let i = 0; i < args.length; i++) {
+    if (FLAGS.has(args[i])) continue;
+    return args[i];
+  }
+  return null;
+})();
+
+if (!portalId) {
+  console.error(`Usage: node scan-auth.mjs [options] <portal>\n\nSupported portals: ${supportedNames.join(', ')}`);
+  process.exit(1);
+}
+if (!SCANNERS[portalId]) {
+  console.error(`Unknown portal: "${portalId}"\nSupported portals: ${supportedNames.join(', ')}`);
+  process.exit(1);
+}
+
+const scanner = SCANNERS[portalId];
+
+const FLAG = {
+  login: args.includes('--login'),
+};


⚠️ Potential issue | 🟠 Major

Reject unsupported flags and extra positional args before scanning.

node scan-auth.mjs linkedin --dry-run currently scans and writes files because unknown args after the portal are ignored. Fail closed on unknown flags and multiple portals so typos or stale flags cannot trigger a mutating run.

Suggested CLI validation

const FLAGS = new Set(['--login']); +const unknownFlags = args.filter(arg => arg.startsWith('--') && !FLAGS.has(arg)); +if (unknownFlags.length > 0) { + console.error(`Unknown option(s): ${unknownFlags.join(', ')}\n\nSupported options: ${[...FLAGS].join(', ')}`); + process.exit(1); +} + +const positionalArgs = args.filter(arg => !arg.startsWith('--')); +if (positionalArgs.length > 1) { + console.error(`Expected one portal, got: ${positionalArgs.join(', ')}\nSupported portals: ${supportedNames.join(', ')}`); + process.exit(1); +} + const portalId = (() => { for (let i = 0; i < args.length; i++) { if (FLAGS.has(args[i])) continue; return args[i]; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scan-auth.mjs` around lines 75 - 97, Current parsing ignores unknown flags and extra positional args which allows e.g. "node scan-auth.mjs linkedin --dry-run" to proceed; change argument validation in the portalId/FLAGS logic to explicitly parse args into two sets (recognized flags using FLAGS and positional args), reject any unknown flag (error + process.exit(1)) and reject more than one positional portal (error + process.exit(1)); update usage/error messages to include supportedNames and mention invalid tokens; ensure FLAG.login is derived from the parsed flags set and that SCANNERS[portalId] only runs after these validations succeed.

coderabbitai · 2026-04-19T18:04:25Z

+function saveJd(detail) {
+  mkdirSync(JDS_DIR, { recursive: true });
+  const slug = slugify(`${detail.company}-${detail.title}`);
+  const filename = `${slug}.md`;
+  const filepath = join(JDS_DIR, filename);
+
+  const content = `---
+title: ${yamlEscape(detail.title)}
+company: ${yamlEscape(detail.company)}
+application_url: ${yamlEscape(detail.applicationUrl || '')}
+scraped: "${new Date().toISOString().split('T')[0]}"
+source: ${portalId}
+---
+
+# ${detail.title} — ${detail.company}
+
+${detail.jdText}
+`;
+
+  writeFileSync(filepath, content, 'utf-8');
+  return `jds/${filename}`;


⚠️ Potential issue | 🟠 Major

Prevent JD filename collisions for same company/title listings.

saveJd() uses only company-title for the filename, so two accepted LinkedIn jobs with the same company and title overwrite the same jds/*.md file and both pipeline entries can point to the final write. Include a stable unique suffix such as the LinkedIn job ID.

Suggested collision-resistant filename

function saveJd(detail) { mkdirSync(JDS_DIR, { recursive: true }); - const slug = slugify(`${detail.company}-${detail.title}`); + const jobId = extractJobIdFromUrl(detail.url); + const slug = slugify(`${detail.company}-${detail.title}${jobId ? `-${jobId}` : ''}`); const filename = `${slug}.md`;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scan-auth.mjs` around lines 273 - 293, The saveJd function currently builds filenames from only company and title which causes collisions; update saveJd to append a stable unique suffix (e.g., the LinkedIn job ID from detail.jobId or detail.id or detail.listingId) to the slugified filename so each listing produces a unique file (fallback to a deterministic fallback like a timestamp only if no ID exists), ensure the chosen ID field is sanitized the same way as the slug, and keep returning the same jds/<filename> path; refer to saveJd, JDS_DIR, slug, filename, filepath and detail.* to locate where to change the filename generation and return value.

coderabbitai · 2026-04-19T18:04:25Z

+  if (results.listings.length > 0) {
+    console.log('\nNew listings:');
+    for (const l of results.listings) {
+      console.log(`  • ${l.title} — ${l.company}`);
+    }
+    console.log(`\nNext step: run /career-ops pipeline jobs saved to the pipeline.`);


⚠️ Potential issue | 🟡 Minor

Clarify the next-step message.

Line 324 is missing the connector after the command, making the instruction awkward.

Suggested wording

- console.log(`\nNext step: run /career-ops pipeline jobs saved to the pipeline.`); + console.log('\nNext step: run /career-ops pipeline to process jobs saved to the pipeline.');

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scan-auth.mjs` around lines 319 - 324, The "Next step" message printed after listing new results is missing a connector making the sentence awkward; update the console.log that prints the next-step instruction (the console.log after the loop that references "/career-ops pipeline") to include a connector — e.g. change its text to "Next step: run the /career-ops pipeline jobs saved to the pipeline." or "Next step: run the /career-ops pipeline jobs that were saved to the pipeline." so the command and the rest of the sentence read correctly.

DSnoNintendo added 7 commits April 19, 2026 10:16

feat: add authenticated portal scanner harness (scan-auth.mjs)

df33d2d

feat: add LinkedIn portal scanner with job data extraction and cross-…

88fd9d4

…portal dedup

docs: add scan-auth mode for authenticated portal scanning

30d0901

docs: add LinkedIn scanner config to portals.example.yml

b6b5568

feat: add scan:auth script for running auth portal scan

2f32caf

docs: add LinkedIn Scanner section to README

b1311db

docs: update CLAUDE.md, DATA_CONTRACT.md, and SKILL.md for scan-auth

a60cf34

github-actions Bot added 📄 docs 🔴 core-architecture ⚠️ agent-behavior 🔧 scripts 📦 dependencies labels Apr 19, 2026

DSnoNintendo changed the title ~~feat: authenticated portal scanning (LinkedIn)~~ feat: authenticated portal scanning w/ LinkedIn support Apr 19, 2026

DSnoNintendo mentioned this pull request Apr 19, 2026

Feature: Persistent portal sessions with auto-login (like linkedin-mcp-server) #238

Open

coderabbitai Bot reviewed Apr 19, 2026

View reviewed changes

DSnoNintendo added 2 commits April 19, 2026 10:50

fix: remove references to stale scan-auth.mjs flags

f7c0dac

fix: update message prompting user to run '/career-ops {portal}' inst…

648b562

…ead of '/career-ops pipeline' once scan job is complete

coderabbitai Bot reviewed Apr 19, 2026

View reviewed changes

-function appendToPipeline(listings) {
-  if (listings.length === 0) return;
-  if (!existsSync(PIPELINE_PATH)) return;
+function appendToPipeline(listings) {
+  if (listings.length === 0) return 0;
+  mkdirSync(dirname(PIPELINE_PATH), { recursive: true });
+  if (!existsSync(PIPELINE_PATH)) {
+    writeFileSync(PIPELINE_PATH, '# Pipeline\n\n## Pending\n', 'utf-8');
+  }
+  let text = readFileSync(PIPELINE_PATH, 'utf-8');
+  if (firstH2 === -1) {
+    warn(`Could not find a pending section in ${PIPELINE_PATH}; pipeline not updated`);
+    return 0;
+  }
+  writeFileSync(PIPELINE_PATH, text, 'utf-8');
+  return listings.length;
+}

		// if no apply url or easy apply button is present, use the listing url
		let applicationUrl = applyEl?.href?.trim() ?? `https://www.linkedin.com/jobs/view/${jobId}/`;

-      // if no apply url or easy apply button is present, use the listing url
-      let applicationUrl = applyEl?.href?.trim() ?? `https://www.linkedin.com/jobs/view/${jobId}/`;
+      // If no external apply URL is present, fall back only when a real job ID exists.
+      let applicationUrl = applyEl?.href?.trim() ?? '';
+      if (!applicationUrl && jobId) {
+        applicationUrl = `https://www.linkedin.com/jobs/view/${jobId}/`;
+      }

Conversation

DSnoNintendo commented Apr 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Authenticated Portal Scanning with LinkedIn Support

Summary

How it works

Use with career-ops agent

Interaction with existing codebase

New files

Usage

Test plan

Related issue

Type of change

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DSnoNintendo Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DSnoNintendo Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DSnoNintendo Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 19, 2026

Choose a reason for hiding this comment

DSnoNintendo commented Apr 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 19, 2026 •

edited

Loading

coderabbitai Bot Apr 19, 2026 •

edited

Loading

coderabbitai Bot Apr 19, 2026 •

edited

Loading

coderabbitai Bot Apr 19, 2026 •

edited

Loading

coderabbitai Bot Apr 19, 2026 •

edited

Loading