Skip to content

feat: authenticated portal scanning w/ LinkedIn support#379

Open
DSnoNintendo wants to merge 9 commits intosantifer:mainfrom
DSnoNintendo:feat/linkedin-portal-support
Open

feat: authenticated portal scanning w/ LinkedIn support#379
DSnoNintendo wants to merge 9 commits intosantifer:mainfrom
DSnoNintendo:feat/linkedin-portal-support

Conversation

@DSnoNintendo
Copy link
Copy Markdown

@DSnoNintendo DSnoNintendo commented Apr 19, 2026

Authenticated Portal Scanning with LinkedIn Support

Harness for scanning job portals that require authentication & LinkedIn job portal scraper

Summary

  • Adds scan-auth.mjs harness and scan-auth/linkedin.mjs scanner for portals that require login (starting with LinkedIn)
  • Scans LinkedIn job search results using Playwright with a persistent browser profile — login once, scan repeatedly
  • Writes accepted JDs to jds/, appends entries to data/pipeline.md and data/scan-history.tsv directly — no intermediate files or post-processing needed
  • Deduplicates across portals: jobs already found via the existing Greenhouse/Ashby/Lever API scanner (scan.mjs) are automatically skipped
  • The entire workflow is agent-agnostic — scan-auth.mjs is a standalone Node.js CLI that runs end-to-end without any LLM involvement. A modes/scan-auth.md mode is included so agents (Claude Code, OpenCode, Gemini CLI) can orchestrate it via /career-ops scan-auth, but the scanner itself requires no AI to operate.

How it works

The scanner opens LinkedIn search results for each keyword configured in portals.yml, then processes cards one at a time:

  1. Reads title and company from the card DOM (no click needed)
  2. Clicks the card to get the job ID from the URL and open the detail panel
  3. Expands the JD, scrapes the apply link and full description
  4. Runs employer blocklist (case-insensitive exact match), dedup, and title filter checks
  5. Unwraps LinkedIn's /safety/go redirect on apply URLs; falls back to the listing URL for Easy Apply jobs
  6. If specified in portals.yml, skips cards LinkedIn marks as "Viewed" and records them in scan history

Use with career-ops agent

/career-ops scan-auth linkedin

The agent runs node scan-auth.mjs linkedin, waits for it to complete, and reports the summary. The scanner handles everything — JD files, pipeline entries, scan history — so the agent doesn't need to do any post-processing. New listings appear in data/pipeline.md ready for /career-ops pipeline to evaluate them.

If the scanner reports "Not logged in", the agent instructs the user to run node scan-auth.mjs --login linkedin in a separate terminal (the login flow requires an interactive browser window).

Interaction with existing codebase

  • scan.mjs (API scanner): No changes. Both scanners share data/scan-history.tsv for dedup. The authenticated scanner loads company::title keys from all portals so a job found via Greenhouse won't be re-accepted from LinkedIn.
  • portals.yml: Adds a linkedin_searches section (keywords, date filter, experience level, delays, employer blocklist). Existing title_filter, tracked_companies, and search_queries sections are unchanged.
  • data/pipeline.md: Accepted listings are appended as - [ ] local:jds/{file} | {company} | {title} under the first ## section. Language-agnostic — works regardless of section heading language.
  • data/scan-history.tsv: Both accepted and skipped entries are recorded (statuses: added, skipped_dup, skipped_viewed_linkedin).
  • jds/: JD files are saved with YAML frontmatter (title, company, application_url, scraped, source).

New files

File Purpose
scan-auth.mjs CLI harness: browser launch, login flow, scan history, JD/pipeline/TSV writing, summary output
scan-auth/linkedin.mjs LinkedIn scanner class: config parsing, session checks, card extraction, pagination, filtering
modes/scan-auth.md Agent mode instructions for /career-ops scan-auth

Usage

# One-time login (opens browser, you log in manually)
node scan-auth.mjs --login linkedin

# Run a scan
node scan-auth.mjs linkedin

# Or through the career-ops agent
/career-ops scan-auth linkedin

Test plan

  • node scan-auth.mjs — prints usage and exits non-zero
  • node scan-auth.mjs --login linkedin — opens browser, saves session, exits without scanning
  • node scan-auth.mjs linkedin — full scan, writes JDs + pipeline + scan history, prints summary
  • Cross-portal dedup: add a job to scan-history.tsv, confirm LinkedIn scanner skips it

Related issue

Main issue: #238
Umbrella issue: #230

Type of change

  • Bug fix
  • New feature
  • Documentation / translation
  • Refactor (no behavior change)

Checklist

  • I have read CONTRIBUTING.md
  • I linked a related issue above (required for features and architecture changes)
  • My PR does not include personal data (CV, email, real names)
  • I ran node test-all.mjs and all tests pass
  • My changes respect the Data Contract (no modifications to user-layer files)
    note: browser-profiles are saved to overwritten to ~/.scan-auth/<portal>/profile/ by scan-auth.mjs
  • My changes align with the project roadmap

Summary by CodeRabbit

  • New Features

    • Added authenticated portal scanning mode (/career-ops scan-auth) for LinkedIn job discovery
    • LinkedIn scanner supports keyword searches, experience-level filtering, employer blocklists, and automatic job deduplication
    • Extracts job details and application URLs from logged-in sessions using persistent browser profiles
    • Added scan:auth npm script for streamlined execution
  • Documentation

    • Added setup and usage guides for authenticated LinkedIn scanning
    • Documented LinkedIn configuration options (keywords, date filters, max results, delays)

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 19, 2026

📝 Walkthrough

Walkthrough

This PR adds a new authenticated scanning mode "scan-auth": a CLI entrypoint (scan-auth.mjs), a LinkedIn scanner module (scan-auth/linkedin.mjs), config/examples, docs, router/skill updates to recognize the mode, and package/script entries for invoking authenticated Playwright-based scans using persistent browser profiles.

Changes

Cohort / File(s) Summary
Skill & Router Docs
.claude/skills/career-ops/SKILL.md
Added scan-auth to CLI mode hints, router table, subagent/Agent injection rules, and discovery menu.
Top-level Documentation
README.md, CLAUDE.md, modes/scan-auth.md
New/expanded docs describing scan-auth mode, LinkedIn scanner workflow, login flow, config, outputs, and files (scan-auth.mjs, scan-auth/), plus mode documentation page.
Data Contract & Templates
DATA_CONTRACT.md, templates/portals.example.yml
Declared ~/.scan-auth/<portal>/profile/ in User Layer; added modes/scan-auth.md and scan-auth/*.mjs to System Layer; added linkedin_searches example config block.
CLI & Orchestrator
scan-auth.mjs
New Node CLI (Playwright persistent contexts) handling --login, session checks, cross-portal dedup via data/scan-history.tsv, writing jds/ markdown, appending pipeline entries and scan-history, and printing run summary.
LinkedIn Scanner
scan-auth/linkedin.mjs
New LinkedInScanner class: config parsing, login/session checks, search URL building, pagination, card extraction, click-to-get job ID, JD/apply URL extraction, filters (viewed, employer blocklist, title/keyword), deduplication, stats, error handling, and redirect unwrapping.
Package Scripts
package.json
Added scan:auth npm script → node scan-auth.mjs.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as scan-auth.mjs
    participant Browser as Playwright<br/>Browser Context
    participant Scanner as LinkedInScanner
    participant FS as File System

    User->>CLI: node scan-auth.mjs linkedin
    CLI->>Browser: Launch persistent profile context
    CLI->>Browser: Inject init script / hide navigator.webdriver
    CLI->>Browser: Check session (scanner.checkSession)
    Browser-->>CLI: session OK / not logged in

    alt --login flow
        User->>CLI: node scan-auth.mjs --login linkedin
        CLI->>Browser: Navigate to loginUrl (headful)
        CLI->>User: Prompt: press ENTER when logged in
        User->>Browser: Complete interactive login
        User->>CLI: Press ENTER
        CLI->>Browser: Re-check session
    end

    CLI->>FS: Load data/scan-history.tsv → dedup keys
    CLI->>Scanner: scan(context, config, { scanHistory })
    loop per keyword/search
        Scanner->>Browser: Navigate to search results page
        Scanner->>Browser: Extract job cards
        loop per visible card
            Scanner->>Browser: Click card → capture job ID / canonical URL
            Scanner->>Browser: Extract JD, apply URL, metadata
            alt passes filters & new
                Scanner->>CLI: add to listings
            else filtered/duplicate
                Scanner->>CLI: add to skipped
            end
        end
    end

    CLI->>FS: Write jds/{slug}.md (YAML frontmatter)
    CLI->>FS: Append entries to data/pipeline.md
    CLI->>FS: Append accepted & skipped rows to data/scan-history.tsv
    CLI->>User: Print summary (searched, found, saved, skipped, errors)
    CLI->>Browser: Close context
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.48% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically describes the main change: introducing authenticated portal scanning with LinkedIn support, which is the core feature across all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@DSnoNintendo DSnoNintendo changed the title feat: authenticated portal scanning (LinkedIn) feat: authenticated portal scanning w/ LinkedIn support Apr 19, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
README.md (1)

298-304: ⚠️ Potential issue | 🟡 Minor

Update the mode count after adding scan-auth.

The tree now includes the new scan-auth mode, so # 14 skill modes is stale.

Proposed docs fix
-├── modes/                       # 14 skill modes
+├── modes/                       # 15 skill modes
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 298 - 304, Update the header comment that lists the
number of mode files: change the line containing "# 14 skill modes" to reflect
the new total after adding scan-auth.md (increment the count to include
scan-auth.md); ensure the comment above the modes/ tree matches the actual files
(e.g., "# 15 skill modes") so the README's modes/ section and the added
scan-auth.md stay in sync.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/career-ops/SKILL.md:
- Line 35: Update the docs to match the actual CLI behavior: in SKILL.md remove
references to unsupported flags `--dry-run`, `--search`, and `--max` from the
"scan-auth mode" description, leaving only the real supported `--login` flag, or
alternatively implement these flags in `scan-auth.mjs` (functions/arg parsing
tied to `scan-auth.mjs` and portal-specific `scan-auth/<portal>.mjs`) before
documenting them; ensure the README text and any router docs consistently
reflect whichever choice you make so agents won't rely on non-functional flags.

In `@CLAUDE.md`:
- Line 67: The docs currently list scan-auth.mjs in the main files but omit it
from the agent-facing skill-mode/command routing sections; update CLAUDE.md by
adding `scan-auth` (and a brief descriptor like "Authenticated portal scanner —
scan-auth.mjs") alongside every occurrence where `scan` is mentioned in the
skill-mode and command routing examples so agents can discover the authenticated
scanner; ensure you reference the main file name `scan-auth.mjs` and the related
folder `scan-auth/` in those routing lists and examples (same places that
mention `scan`) for consistency.

In `@modes/scan-auth.md`:
- Around line 9-12: Add blank lines before and after the fenced code block that
shows how to run the login command and set the fence language for the summary
output block; specifically, update the node login block around "node
scan-auth.mjs --login <portal>" to have an empty line before the opening ```bash
and after the closing ``` and change the summary output fence (the block that
starts with "{Portal} Scan — {YYYY-MM-DD}" in the same file) to use a language
tag such as text (i.e., ```text) instead of a bare fence; apply the same
blank-line-and-language fixes to the other similar fenced block around lines
39-53 to satisfy markdownlint.

In `@scan-auth.mjs`:
- Around line 84-87: The CLI currently exits when portalId is missing; change
this to the documented behavior by replacing the process.exit(1) path: when
portalId is falsy, iterate over supportedNames and invoke the existing scan
routine for each portal (the function in this file that handles a single portal
scan—refer to the scanner/handler used elsewhere in scan-auth.mjs) instead of
printing usage and exiting; retain the usage/exit behavior only for
invalid/unrecognized portal values and preserve the usage message when a
provided portal is not in supportedNames.
- Around line 319-325: The summary message when new listings are detected prints
an incorrect next-step command; in the block that checks results.listings (the
code using results.listings, portalId and the console.log that prints
`/career-ops ${portalId}`), change the guidance to instruct users to run
`/career-ops pipeline` instead of `/career-ops ${portalId}` so the next step
points to the routed pipeline mode.
- Around line 195-198: The appendToPipeline function currently returns silently
if PIPELINE_PATH doesn't exist; update it to handle a missing data/ directory by
either creating the parent directory and the pipeline.md file before writing or
by returning a write count / throwing a warning/error so the caller can detect
failure; specifically modify appendToPipeline (and its use sites) to check
existsSync(PIPELINE_PATH), and if missing create the directory
(fs.mkdirSync(path.dirname(PIPELINE_PATH), { recursive: true })) and an empty
PIPELINE_PATH file, or alternatively return 0 and log a warning via the same
logger so the caller reading the write count can report failure instead of
reporting success. Ensure the change touches appendToPipeline, PIPELINE_PATH
checks, and the code path that reports success so consumers know when pipeline
insertion was impossible.

In `@scan-auth/linkedin.mjs`:
- Around line 624-626: The code currently synthesizes a LinkedIn view URL even
when jobId is missing; update the logic around applicationUrl (the variable set
from applyEl?.href?.trim() ?? `https://www.linkedin.com/jobs/view/${jobId}/`) to
only construct the `https://www.linkedin.com/jobs/view/${jobId}/` fallback when
`jobId` (or currentJobId) is truthy/non-empty; otherwise set applicationUrl to
an empty string (or null) so no fake URL is saved. Ensure you still trim
applyEl.href if present and prefer it over the view URL, but guard the template
fallback with an explicit check for jobId before building it.
- Around line 375-386: The hostname check in `#unwrapRedirect` is too permissive
(uses hostname.includes('linkedin.com')) allowing domains like notlinkedin.com;
update it to only accept exact LinkedIn hostnames (e.g., linkedin.com,
www.linkedin.com, touch.www.linkedin.com, www.linkedin.cn, or other canonical
LinkedIn hosts used by your app) before processing /safety/go and decoding the
nested url; ensure the check happens on u.hostname (or normalized hostname) and
only then call u.searchParams.get('url') and decodeURIComponent, keeping the
rest of the existing logic in `#unwrapRedirect` intact.
- Around line 357-368: The current `#buildSearches` method appends human-readable
date and experience text into the keywords string (levelPrefix and dateSuffix)
which LinkedIn ignores; instead map config.date_posted to f_TPR values (e.g.,
'24' -> 'r86400', 'Week' -> 'r604800', 'Month' -> 'r2592000') and map
config.experience_level entries to LinkedIn numeric f_E codes (1..6) joined by
commas, then build URLSearchParams with keywords set to the plain kw and add
f_TPR and f_E parameters when present; update the code that constructs params
(replace the levelPrefix/dateSuffix logic and the params = new URLSearchParams({
keywords: query })) to include params.append('f_TPR', mappedValue) and
params.append('f_E', mappedCsv) so filters are sent as real LinkedIn filter
parameters.
- Around line 208-234: Reorder and protect per-card processing: after obtaining
the handle from this.#getCard(page, i) call, call this.isJobCardViewed(page,
card) before invoking this.extractJob so you don't click/open viewed listings;
if skipViewed is true and the card is viewed, use a new no-click helper (e.g.,
extractJobMetadataNoClick or similar) that reads
title/company/location/listingUrl from the card DOM without clicking, push the
skipped entry and stats, and skip extraction; wrap the whole per-card logic (the
sequence around this.#getCard, this.isJobCardViewed, this.extractJob and pushing
results) in a try/finally that always calls await card.dispose() in the finally
block so the handle is never leaked, and only call extractJob (which clicks)
when you know you will process the detail panel.
- Around line 464-475: In `#clickAndExtractJobId`, replace the fixed
sleep(CARD_CLICK_DELAY_MS) after clicking the card with a wait-for-change using
page.waitForFunction: capture the prior URL or prior currentJobId before the
click, perform the click via page.evaluate as currently done, then call
page.waitForFunction to poll until new URL.searchParams.get('currentJobId') !==
priorJobId (or until page.url() changes), with a sensible timeout; after
waitForFunction resolves, read the updated page.url() and extract currentJobId
and listingUrl as before to reliably get the updated job id instead of racing
with a fixed delay.

---

Outside diff comments:
In `@README.md`:
- Around line 298-304: Update the header comment that lists the number of mode
files: change the line containing "# 14 skill modes" to reflect the new total
after adding scan-auth.md (increment the count to include scan-auth.md); ensure
the comment above the modes/ tree matches the actual files (e.g., "# 15 skill
modes") so the README's modes/ section and the added scan-auth.md stay in sync.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1f4f2805-1431-455a-a8d0-df31ca12de79

📥 Commits

Reviewing files that changed from the base of the PR and between 411afb3 and a60cf34.

📒 Files selected for processing (9)
  • .claude/skills/career-ops/SKILL.md
  • CLAUDE.md
  • DATA_CONTRACT.md
  • README.md
  • modes/scan-auth.md
  • package.json
  • scan-auth.mjs
  • scan-auth/linkedin.mjs
  • templates/portals.example.yml

Comment thread .claude/skills/career-ops/SKILL.md Outdated
Comment thread CLAUDE.md
Comment thread modes/scan-auth.md
Comment thread scan-auth.mjs
Comment thread scan-auth.mjs
Comment on lines +195 to +198
function appendToPipeline(listings) {
if (listings.length === 0) return;
if (!existsSync(PIPELINE_PATH)) return;

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not silently drop pipeline entries when data/pipeline.md is missing.

Right now JDs and scan history can be written while pipeline.md is skipped, but Line 415 still reports success. Create the data/ file, or return a write count and warn/fail when pipeline insertion is impossible. As per coding guidelines, **/*.mjs scripts should handle missing data/ directories gracefully.

One possible fix direction
 function appendToPipeline(listings) {
-  if (listings.length === 0) return;
-  if (!existsSync(PIPELINE_PATH)) return;
+  if (listings.length === 0) return 0;
+  mkdirSync(dirname(PIPELINE_PATH), { recursive: true });
+  if (!existsSync(PIPELINE_PATH)) {
+    writeFileSync(PIPELINE_PATH, '# Pipeline\n\n## Pending\n', 'utf-8');
+  }
 
   let text = readFileSync(PIPELINE_PATH, 'utf-8');
@@
-  if (firstH2 === -1) return;
+  if (firstH2 === -1) {
+    warn(`Could not find a pending section in ${PIPELINE_PATH}; pipeline not updated`);
+    return 0;
+  }
@@
   writeFileSync(PIPELINE_PATH, text, 'utf-8');
+  return listings.length;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function appendToPipeline(listings) {
if (listings.length === 0) return;
if (!existsSync(PIPELINE_PATH)) return;
function appendToPipeline(listings) {
if (listings.length === 0) return 0;
mkdirSync(dirname(PIPELINE_PATH), { recursive: true });
if (!existsSync(PIPELINE_PATH)) {
writeFileSync(PIPELINE_PATH, '# Pipeline\n\n## Pending\n', 'utf-8');
}
let text = readFileSync(PIPELINE_PATH, 'utf-8');
if (firstH2 === -1) {
warn(`Could not find a pending section in ${PIPELINE_PATH}; pipeline not updated`);
return 0;
}
writeFileSync(PIPELINE_PATH, text, 'utf-8');
return listings.length;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scan-auth.mjs` around lines 195 - 198, The appendToPipeline function
currently returns silently if PIPELINE_PATH doesn't exist; update it to handle a
missing data/ directory by either creating the parent directory and the
pipeline.md file before writing or by returning a write count / throwing a
warning/error so the caller can detect failure; specifically modify
appendToPipeline (and its use sites) to check existsSync(PIPELINE_PATH), and if
missing create the directory (fs.mkdirSync(path.dirname(PIPELINE_PATH), {
recursive: true })) and an empty PIPELINE_PATH file, or alternatively return 0
and log a warning via the same logger so the caller reading the write count can
report failure instead of reporting success. Ensure the change touches
appendToPipeline, PIPELINE_PATH checks, and the code path that reports success
so consumers know when pipeline insertion was impossible.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DSnoNintendo, understood — the silent no-op when pipeline.md is absent is intentional and consistent with scan.mjs's existing behavior. I'll note this for future reviews.


✏️ Learnings added
Learnt from: DSnoNintendo
Repo: santifer/career-ops PR: 379
File: scan-auth.mjs:195-198
Timestamp: 2026-04-19T17:51:06.950Z
Learning: In `scan-auth.mjs` (and consistent with `scan.mjs`), `appendToPipeline` intentionally returns silently (no error/warning) when `data/pipeline.md` does not exist. This is a deliberate design choice to mirror the existing `scan.mjs` behavior — do not flag it as a missing-data-directory handling issue.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: CR
Repo: santifer/career-ops PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-19T10:52:21.261Z
Learning: Applies to data/applications.md : Run `node dedup-tracker.mjs` to remove duplicate entries in the applications.md tracker based on company+role matching

Learnt from: CR
Repo: santifer/career-ops PR: 0
File: GEMINI.md:0-0
Timestamp: 2026-04-19T10:52:39.223Z
Learning: Run 'node verify-pipeline.mjs' to check data pipeline health and integrity

Learnt from: CR
Repo: santifer/career-ops PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-19T10:52:21.261Z
Learning: Run `node verify-pipeline.mjs` to health-check the tracker and reports pipeline

Learnt from: CR
Repo: santifer/career-ops PR: 0
File: GEMINI.md:0-0
Timestamp: 2026-04-19T10:52:39.223Z
Learning: Applies to **/*.{mjs,js} : Script files like update-system.mjs, verify-pipeline.mjs, generate-pdf.mjs, merge-tracker.mjs, and gemini-eval.mjs are auto-updatable system files — NEVER put user data in these files

Learnt from: CR
Repo: santifer/career-ops PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-19T10:52:21.261Z
Learning: Applies to {*.mjs,templates/**,config/**} : All scripts must be Node.js .mjs modules; use Playwright for PDF generation and web scraping, YAML for configuration, and HTML/CSS for CV templates

Learnt from: CR
Repo: santifer/career-ops PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-19T10:52:21.261Z
Learning: GitHub Actions CI/CD runs test-all.mjs (63+ checks) on every PR, applies auto-labels based on risk (🔴 core-architecture, ⚠️ agent-behavior, 📄 docs), and enforces status checks before merge to main

Comment thread scan-auth/linkedin.mjs
Comment on lines +208 to +234
const card = await this.#getCard(page, i);
const cardExists = await page.evaluate(c => c != null, card);
if (!cardExists) {
warn(` ✗ Card ${i} not found`);
stats.errors++;
consecutiveFailures++;
await card.dispose();
continue;
}

// 1. Extract title, company, click card, extract detail
const data = await this.extractJob(page, card);

// 2. Check viewed label
if (skipViewed && await this.isJobCardViewed(page, card)) {
log(` ✗ Viewed: skipped card ${i}`);
stats.skipped_viewed++;
skipped.push({
url: data.listingUrl || '',
title: data.title || '',
company: data.company || '',
status: 'skipped_viewed_linkedin',
});
await card.dispose();
continue;
}
await card.dispose();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

❓ Verification inconclusive

Script executed:

wc -l scan-auth/linkedin.mjs

Repository: santifer/career-ops


Repository: santifer/career-ops
Exit code: 0

stdout:

636 scan-auth/linkedin.mjs

Script executed:

cat -n scan-auth/linkedin.mjs | head -250

Repository: santifer/career-ops


Repository: santifer/career-ops
Exit code: 0

stdout:

     1	/**
     2	 * LinkedIn Scanner
     3	 *
     4	 * All LinkedIn-specific logic: selectors, config parsing, session checks,
     5	 * pagination, card extraction, search URL construction, and the scan loop.
     6	 *
     7	 * Per-card flow (scan loop):
     8	 *   1. Resolve card element once via `#getCard` (evaluateHandle)
     9	 *   2. extractJob() reads title/company/location from the card DOM,
    10	 *      then clicks the card via `#clickAndExtractJobId` to get the job ID
    11	 *      from the URL's currentJobId param, then calls `#extractDetailFromPanel`
    12	 *      to scrape the apply link and JD text from the opened detail panel
    13	 *   3. isJobCardViewed() checks for LinkedIn's "Viewed" label on the card
    14	 *   4. Blocklist, dedup (job ID + company::title), and title filter run
    15	 *      against the job data — dedup catches both same-run duplicates
    16	 *      and cross-portal matches (Greenhouse/Ashby/Lever via scan-history.tsv)
    17	 *   5. JD-level filter and apply URL validation run post-click
    18	 *   6. Accepted listings and skipped entries are returned to the harness
    19	 *
    20	 * Apply URL resolution:
    21	 *   - External apply link (via xpathApplyUrl) → unwrapped from LinkedIn's
    22	 *     /safety/go redirect
    23	 *   - Easy Apply / no external link → falls back to the listing URL
    24	 *     (https://www.linkedin.com/jobs/view/{JOB_ID}/)
    25	 *
    26	 * Listing URL:
    27	 *   Always stored as https://www.linkedin.com/jobs/view/{JOB_ID}/, extracted
    28	 *   from the currentJobId URL param after clicking the card.
    29	 */
    30	
    31	
    32	import yaml from 'js-yaml';
    33	
    34	// ---------------------------------------------------------------------------
    35	// Selectors — grouped for easy maintenance when LinkedIn changes DOM
    36	// ---------------------------------------------------------------------------
    37	
    38	const SELECTORS = {
    39	  xpathListingCard: "//button[starts-with(`@aria-label`, 'Dismiss') and contains(`@aria-label`, 'job')]/ancestor::div[`@role`='button']",
    40	  /** Dismiss control on each left-rail card; aria-label is `Dismiss {job title} job` (EN UI). */
    41	  cardDismissButtonCss: 'button[aria-label^="Dismiss"][aria-label*=" job"]',
    42	  /** Company name often appears on this anchor when present. */
    43	  cardCompanyLinkQuery: 'a[href*="/company/"]',
    44	  xpathApplyUrl: "//a[`@aria-label`='Apply on company website']",
    45	  xpathMoreButton: "//span[normalize-space(text())='more']",
    46	  jdContent: 'span[data-testid="expandable-text-box"]',
    47	  loggedIn: 'a[aria-label*="My Network"]',
    48	  xpathCurrentPage: "//button[`@aria-current`='true'][starts-with(`@aria-label`, 'Page')]",
    49	  xpathPageButton: "//button[starts-with(`@aria-label`, 'Page')]",
    50	
    51	  viewedStatusTagQuery: 'p, span, li',
    52	  viewedStatusLabels: ['Viewed'],
    53	  /** Characters allowed between the label and the next status token in `innerText` (middle dot, bullet, pipe) */
    54	  viewedStatusLineSeparatorCharClass: '·•|',
    55	};
    56	
    57	const NOISE_LABELS = new Set([
    58	  'more', 'show more', 'see more',
    59	  'less', 'show less', 'see less',
    60	  'retry premium',
    61	]);
    62	const CARD_CLICK_DELAY_MS = 1000;
    63	const SESSION_CHECK_DELAY_MS = 3000;
    64	const NAV_TIMEOUT_MS = 30000;
    65	const DEFAULT_DELAY_PAGES_MS = [3000, 8000];
    66	const DEFAULT_DELAY_SEARCHES_MS = [5000, 15000];
    67	
    68	function randomDelay(range) {
    69	  const [min, max] = range;
    70	  return Math.floor(Math.random() * (max - min) + min);
    71	}
    72	function sleep(ms) { return new Promise(r => setTimeout(r, ms)); }
    73	
    74	function log(msg) { console.log(`[linkedin] ${msg}`); }
    75	function warn(msg) { console.warn(`[linkedin] ⚠ ${msg}`); }
    76	
    77	export default class LinkedInScanner {
    78	  name = 'LinkedIn';
    79	  portalId = 'linkedin';
    80	  loginUrl = 'https://www.linkedin.com/login';
    81	  feedUrl = 'https://www.linkedin.com/feed/';
    82	
    83	  // -------------------------------------------------------------------------
    84	  // Config parsing — extracts linkedin_searches section from portals.yml
    85	  // -------------------------------------------------------------------------
    86	
    87	  parseConfig(raw) {
    88	    const doc = yaml.load(raw);
    89	    const ls = doc.linkedin_searches || {};
    90	    return {
    91	      title_filter: doc.title_filter || { positive: [], negative: [] },
    92	      keywords: ls.keywords || [],
    93	      employer_blocklist: ls.employer_blocklist || [],
    94	      date_posted: ls.date_posted,
    95	      max_results: ls.max_results_per_search,
    96	      delay_pages: ls.delay_between_pages_ms,
    97	      delay_searches: ls.delay_between_searches_ms,
    98	      experience_level: ls.experience_level,
    99	      skip_viewed: ls.skip_viewed,
   100	    };
   101	  }
   102	
   103	  // -------------------------------------------------------------------------
   104	  // Session management
   105	  // -------------------------------------------------------------------------
   106	
   107	  async isLoggedIn(page) {
   108	    const url = page.url();
   109	    if (url.includes('/login') || url.includes('/uas/') || url.includes('/checkpoint/')) {
   110	      return false;
   111	    }
   112	    if (await page.$(SELECTORS.loggedIn)) return true;
   113	    return false;
   114	  }
   115	
   116	  async checkSession(page) {
   117	    await page.goto(this.feedUrl, { waitUntil: 'domcontentloaded', timeout: NAV_TIMEOUT_MS });
   118	    await sleep(SESSION_CHECK_DELAY_MS);
   119	    return this.isLoggedIn(page);
   120	  }
   121	
   122	  // -------------------------------------------------------------------------
   123	  // Scan — the main extraction loop
   124	  //
   125	  // Handles extraction, filtering, dedup, and employer blocklist.
   126	  // Returns only accepted listings ready to be saved.
   127	  //
   128	  // Options: { scanHistory, skipViewed }
   129	  // -------------------------------------------------------------------------
   130	
   131	  async scan(context, config, options = {}) {
   132	    const maxPerSearch = options.maxResults || config.max_results || 25;
   133	    const delayPages = config.delay_pages || DEFAULT_DELAY_PAGES_MS;
   134	    const delaySearches = config.delay_searches || DEFAULT_DELAY_SEARCHES_MS;
   135	    const titleFilter = config.title_filter;
   136	    const employerBlocklist = config.employer_blocklist || [];
   137	    const scanHistory = options.scanHistory || new Set();
   138	    /** Omit cards LinkedIn marks as already opened. Default true when `skip_viewed` is absent in portals.yml. */
   139	    const skipViewed = options.skipViewed !== undefined
   140	      ? Boolean(options.skipViewed)
   141	      : config.skip_viewed !== false;
   142	
   143	    const keywords = config.keywords || [];
   144	    if (keywords.length === 0) {
   145	      log('No keywords found in portals.yml');
   146	      return null;
   147	    }
   148	
   149	    const searches = this.#buildSearches(config);
   150	
   151	    const toRun = options.searchFilter
   152	      ? searches.filter(s => s.name === options.searchFilter)
   153	      : searches;
   154	
   155	    if (toRun.length === 0) {
   156	      log(`No keyword matching "${options.searchFilter}"`);
   157	      log(`Available: ${searches.map(s => s.name).join(', ')}`);
   158	      return null;
   159	    }
   160	
   161	    const listings = [];
   162	    const skipped = [];
   163	    const errors = [];
   164	    const stats = {
   165	      searched: 0, found: 0, extracted: 0,
   166	      skipped_filter: 0, skipped_dedup: 0, skipped_viewed: 0, errors: 0,
   167	    };
   168	
   169	    // Circuit breaker: bail out after too many consecutive extraction failures
   170	    const MAX_CONSECUTIVE_FAILURES = 15;
   171	    let consecutiveFailures = 0;
   172	
   173	    // Single page for all searches — avoids Playwright stealing window focus
   174	    const page = await context.newPage();
   175	
   176	    for (const search of toRun) {
   177	      log(`\n── Search: ${search.name} ──`);
   178	      stats.searched++;
   179	      consecutiveFailures = 0; // reset circuit breaker per search
   180	
   181	      try {
   182	        await page.goto(search.url, { waitUntil: 'domcontentloaded', timeout: NAV_TIMEOUT_MS });
   183	        await sleep(randomDelay(delayPages));
   184	
   185	        let accepted = 0;
   186	        let hasNextPage = true;
   187	
   188	        while (hasNextPage && accepted < maxPerSearch) {
   189	          const currentPage = await this.#getCurrentPage(page);
   190	          log(`Page ${currentPage || 1}`);
   191	
   192	          const cardCount = await this.#getCardCount(page);
   193	          log(`Found ${cardCount} job cards`);
   194	          stats.found += cardCount;
   195	
   196	          for (let i = 0; i < cardCount; i++) {
   197	            if (accepted >= maxPerSearch) {
   198	              log(`Reached max results (${maxPerSearch}) for this search`);
   199	              break;
   200	            }
   201	
   202	            // Circuit breaker: stop if extraction is consistently failing
   203	            if (consecutiveFailures >= MAX_CONSECUTIVE_FAILURES) {
   204	              warn(`${MAX_CONSECUTIVE_FAILURES} consecutive extraction failures — stopping this search (likely throttled or DOM changed)`);
   205	              break;
   206	            }
   207	
   208	            const card = await this.#getCard(page, i);
   209	            const cardExists = await page.evaluate(c => c != null, card);
   210	            if (!cardExists) {
   211	              warn(`  ✗ Card ${i} not found`);
   212	              stats.errors++;
   213	              consecutiveFailures++;
   214	              await card.dispose();
   215	              continue;
   216	            }
   217	
   218	            // 1. Extract title, company, click card, extract detail
   219	            const data = await this.extractJob(page, card);
   220	
   221	            // 2. Check viewed label
   222	            if (skipViewed && await this.isJobCardViewed(page, card)) {
   223	              log(`  ✗ Viewed: skipped card ${i}`);
   224	              stats.skipped_viewed++;
   225	              skipped.push({
   226	                url: data.listingUrl || '',
   227	                title: data.title || '',
   228	                company: data.company || '',
   229	                status: 'skipped_viewed_linkedin',
   230	              });
   231	              await card.dispose();
   232	              continue;
   233	            }
   234	            await card.dispose();
   235	
   236	            
   237	            if (!data.title) {
   238	              warn(`  ✗ No title on card ${i} (data); skipping`);
   239	              stats.errors++;
   240	              consecutiveFailures++;
   241	              continue;
   242	            }
   243	            // 3. Blocklist, dedup, and title filter
   244	            if (employerBlocklist.length && data.company) {
   245	              const companyLower = data.company.toLowerCase();
   246	              if (employerBlocklist.some(b => companyLower === b.toLowerCase())) {
   247	                log(`  ✗ Blocked employer: ${data.company}`);
   248	                stats.skipped_filter++;
   249	                continue;
   250	              }

Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '190,280p'

Repository: santifer/career-ops


Repository: santifer/career-ops
Exit code: 0

stdout:

   190	          log(`Page ${currentPage || 1}`);
   191	
   192	          const cardCount = await this.#getCardCount(page);
   193	          log(`Found ${cardCount} job cards`);
   194	          stats.found += cardCount;
   195	
   196	          for (let i = 0; i < cardCount; i++) {
   197	            if (accepted >= maxPerSearch) {
   198	              log(`Reached max results (${maxPerSearch}) for this search`);
   199	              break;
   200	            }
   201	
   202	            // Circuit breaker: stop if extraction is consistently failing
   203	            if (consecutiveFailures >= MAX_CONSECUTIVE_FAILURES) {
   204	              warn(`${MAX_CONSECUTIVE_FAILURES} consecutive extraction failures — stopping this search (likely throttled or DOM changed)`);
   205	              break;
   206	            }
   207	
   208	            const card = await this.#getCard(page, i);
   209	            const cardExists = await page.evaluate(c => c != null, card);
   210	            if (!cardExists) {
   211	              warn(`  ✗ Card ${i} not found`);
   212	              stats.errors++;
   213	              consecutiveFailures++;
   214	              await card.dispose();
   215	              continue;
   216	            }
   217	
   218	            // 1. Extract title, company, click card, extract detail
   219	            const data = await this.extractJob(page, card);
   220	
   221	            // 2. Check viewed label
   222	            if (skipViewed && await this.isJobCardViewed(page, card)) {
   223	              log(`  ✗ Viewed: skipped card ${i}`);
   224	              stats.skipped_viewed++;
   225	              skipped.push({
   226	                url: data.listingUrl || '',
   227	                title: data.title || '',
   228	                company: data.company || '',
   229	                status: 'skipped_viewed_linkedin',
   230	              });
   231	              await card.dispose();
   232	              continue;
   233	            }
   234	            await card.dispose();
   235	
   236	            
   237	            if (!data.title) {
   238	              warn(`  ✗ No title on card ${i} (data); skipping`);
   239	              stats.errors++;
   240	              consecutiveFailures++;
   241	              continue;
   242	            }
   243	            // 3. Blocklist, dedup, and title filter
   244	            if (employerBlocklist.length && data.company) {
   245	              const companyLower = data.company.toLowerCase();
   246	              if (employerBlocklist.some(b => companyLower === b.toLowerCase())) {
   247	                log(`  ✗ Blocked employer: ${data.company}`);
   248	                stats.skipped_filter++;
   249	                continue;
   250	              }
   251	            }
   252	
   253	            const companyTitleKey = (data.company && data.title)
   254	              ? `${data.company}::${data.title}`.toLowerCase() : null;
   255	            const dedupKey = data.jobId || companyTitleKey;
   256	            if ((dedupKey && scanHistory.has(dedupKey)) || (companyTitleKey && scanHistory.has(companyTitleKey))) {
   257	              log(`  ✗ Already seen: ${data.title} (${data.company})`);
   258	              stats.skipped_dedup++;
   259	              skipped.push({
   260	                url: data.listingUrl || '',
   261	                title: data.title || '',
   262	                company: data.company || '',
   263	                status: 'skipped_dup',
   264	              });
   265	              continue;
   266	            }
   267	
   268	            if (!this.#matchesFilter(data.title, '', titleFilter)) {
   269	              log(`  ✗ Filtered: ${data.title} (${data.company})`);
   270	              stats.skipped_filter++;
   271	              continue;
   272	            }
   273	
   274	            stats.extracted++;
   275	
   276	            if (!data.applicationUrl) {
   277	              log(`  ✗ No apply URL: ${data.title} (${data.company})`);
   278	              stats.skipped_filter++;
   279	              continue;
   280	            }

Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '290,380p'

Repository: santifer/career-ops


Repository: santifer/career-ops
Exit code: 0

stdout:

   290	              stats.errors++;
   291	              consecutiveFailures++;
   292	              continue;
   293	            }
   294	
   295	            consecutiveFailures = 0;
   296	            if (dedupKey) scanHistory.add(dedupKey);
   297	            if (companyTitleKey) scanHistory.add(companyTitleKey);
   298	            listings.push({
   299	              title: data.title,
   300	              company: data.company,
   301	              applicationUrl: data.applicationUrl,
   302	              jdText: data.jdText,
   303	              url: data.listingUrl,
   304	            });
   305	            accepted++;
   306	            log(`  ✓ Accepted: ${data.title} at ${data.company}`);
   307	          }
   308	
   309	          if (consecutiveFailures >= MAX_CONSECUTIVE_FAILURES) {
   310	            hasNextPage = false;
   311	          } else if (accepted < maxPerSearch) {
   312	            hasNextPage = await this.#goToNextPage(page);
   313	            if (hasNextPage) {
   314	              log(`Navigating to next page...`);
   315	              await sleep(randomDelay(delayPages));
   316	            }
   317	          } else {
   318	            hasNextPage = false;
   319	          }
   320	        }
   321	      } catch (e) {
   322	        log(`Search "${search.name}" failed: ${e.message}`);
   323	        errors.push({ search: search.name, error: e.message });
   324	        stats.errors++;
   325	      }
   326	
   327	      if (toRun.indexOf(search) < toRun.length - 1) {
   328	        const d = randomDelay(delaySearches);
   329	        log(`Waiting ${(d / 1000).toFixed(1)}s before next search...`);
   330	        await sleep(d);
   331	      }
   332	    }
   333	
   334	    await page.close();
   335	    return { listings, skipped, errors, stats };
   336	  }
   337	
   338	  // -------------------------------------------------------------------------
   339	  // Private — filtering
   340	  // -------------------------------------------------------------------------
   341	
   342	  `#matchesFilter`(title, jdText, filter) {
   343	    if (!filter) return true;
   344	    const combined = `${title} ${jdText}`.toLowerCase();
   345	    const titleLower = title.toLowerCase();
   346	    const hasPositive = !filter.positive?.length ||
   347	      filter.positive.some(kw => combined.includes(kw.toLowerCase()));
   348	    const hasNegative = filter.negative?.length &&
   349	      filter.negative.some(kw => titleLower.includes(kw.toLowerCase()));
   350	    return hasPositive && !hasNegative;
   351	  }
   352	
   353	  // -------------------------------------------------------------------------
   354	  // Private — search URL construction
   355	  // -------------------------------------------------------------------------
   356	
   357	  `#buildSearches`(config) {
   358	    const datePostedMap = { '24': 'past 24 hours', 'Week': 'past week', 'Month': 'past month' };
   359	    const dateSuffix = datePostedMap[config.date_posted] || '';
   360	    const levels = config.experience_level || [];
   361	    const levelPrefix = levels.length ? levels.join(' or ') : '';
   362	
   363	    return config.keywords.map(kw => {
   364	      let query = levelPrefix ? `${levelPrefix} ${kw}` : kw;
   365	      if (dateSuffix) query += ` posted in the ${dateSuffix}`;
   366	      const params = new URLSearchParams({ keywords: query });
   367	      return { name: kw, url: `https://www.linkedin.com/jobs/search-results/?${params}` };
   368	    });
   369	  }
   370	
   371	  // -------------------------------------------------------------------------
   372	  // Private — URL helpers
   373	  // -------------------------------------------------------------------------
   374	
   375	  `#unwrapRedirect`(href) {
   376	    const trimmed = (href || '').trim();
   377	    if (!trimmed) return '';
   378	    try {
   379	      const u = new URL(trimmed);
   380	      if (!u.hostname.includes('linkedin.com')) return trimmed;

Script executed:

grep -n "extractJob" scan-auth/linkedin.mjs | head -5

Repository: santifer/career-ops


Repository: santifer/career-ops
Exit code: 0

stdout:

9: *   2. extractJob() reads title/company/location from the card DOM,
219:            const data = await this.extractJob(page, card);
488:  async extractJob(page, card) {

Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '290,350p'

Repository: santifer/career-ops


Repository: santifer/career-ops
Exit code: 0

stdout:

   290	              stats.errors++;
   291	              consecutiveFailures++;
   292	              continue;
   293	            }
   294	
   295	            consecutiveFailures = 0;
   296	            if (dedupKey) scanHistory.add(dedupKey);
   297	            if (companyTitleKey) scanHistory.add(companyTitleKey);
   298	            listings.push({
   299	              title: data.title,
   300	              company: data.company,
   301	              applicationUrl: data.applicationUrl,
   302	              jdText: data.jdText,
   303	              url: data.listingUrl,
   304	            });
   305	            accepted++;
   306	            log(`  ✓ Accepted: ${data.title} at ${data.company}`);
   307	          }
   308	
   309	          if (consecutiveFailures >= MAX_CONSECUTIVE_FAILURES) {
   310	            hasNextPage = false;
   311	          } else if (accepted < maxPerSearch) {
   312	            hasNextPage = await this.#goToNextPage(page);
   313	            if (hasNextPage) {
   314	              log(`Navigating to next page...`);
   315	              await sleep(randomDelay(delayPages));
   316	            }
   317	          } else {
   318	            hasNextPage = false;
   319	          }
   320	        }
   321	      } catch (e) {
   322	        log(`Search "${search.name}" failed: ${e.message}`);
   323	        errors.push({ search: search.name, error: e.message });
   324	        stats.errors++;
   325	      }
   326	
   327	      if (toRun.indexOf(search) < toRun.length - 1) {
   328	        const d = randomDelay(delaySearches);
   329	        log(`Waiting ${(d / 1000).toFixed(1)}s before next search...`);
   330	        await sleep(d);
   331	      }
   332	    }
   333	
   334	    await page.close();
   335	    return { listings, skipped, errors, stats };
   336	  }
   337	
   338	  // -------------------------------------------------------------------------
   339	  // Private — filtering
   340	  // -------------------------------------------------------------------------
   341	
   342	  `#matchesFilter`(title, jdText, filter) {
   343	    if (!filter) return true;
   344	    const combined = `${title} ${jdText}`.toLowerCase();
   345	    const titleLower = title.toLowerCase();
   346	    const hasPositive = !filter.positive?.length ||
   347	      filter.positive.some(kw => combined.includes(kw.toLowerCase()));
   348	    const hasNegative = filter.negative?.length &&
   349	      filter.negative.some(kw => titleLower.includes(kw.toLowerCase()));
   350	    return hasPositive && !hasNegative;

Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '488,550p'

Repository: santifer/career-ops


Repository: santifer/career-ops
Exit code: 0

stdout:

   488	  async extractJob(page, card) {
   489	    const data = await page.evaluate(({ card, sel, noise }) => {
   490	      const noiseSet = new Set((noise || []).map((s) => String(s).toLowerCase()));
   491	
   492	      let title = '';
   493	      const dismiss = card.querySelector(sel.cardDismissButtonCss || 'button[aria-label^="Dismiss"]');
   494	      if (dismiss) {
   495	        const al = dismiss.getAttribute('aria-label') || '';
   496	        const m = al.match(/^Dismiss\s+(.+?)\s+job\s*$/i);
   497	        if (m) title = m[1].trim();
   498	      }
   499	
   500	      let company = '';
   501	      for (const a of card.querySelectorAll(sel.cardCompanyLinkQuery || 'a[href*="/company/"]')) {
   502	        const t = (a.textContent ?? '').trim();
   503	        if (t.length >= 1 && t.length < 120 && !noiseSet.has(t.toLowerCase())) {
   504	          company = t;
   505	          break;
   506	        }
   507	      }
   508	
   509	      function looksLikeLocation(s) {
   510	        return /\((On-?site|Hybrid|Remote)\)/i.test(s)
   511	          || /,\s*[A-Z]{2}\b/.test(s)
   512	          || /\bRemote\b/i.test(s);
   513	      }
   514	
   515	      function looksLikeMetaLine(s) {
   516	        const lower = s.toLowerCase();
   517	        return lower === 'viewed'
   518	          || lower === '·'
   519	          || /school alumni work(s)? here/i.test(s)
   520	          || /early applicant/i.test(lower)
   521	          || /^\d+ benefits?$/i.test(s)
   522	          || /^posted on\b/i.test(s)
   523	          || /\b(hour|day|week|month)s?\s+ago$/i.test(s)
   524	          || /^[\d·|•\s]+$/.test(s);
   525	      }
   526	
   527	      if (!company) {
   528	        const paragraphs = [...card.querySelectorAll('p')]
   529	          .map((p) => (p.textContent ?? '').trim())
   530	          .filter((t) => t.length > 0 && t.length < 200);
   531	        for (const t of paragraphs) {
   532	          if (noiseSet.has(t.toLowerCase())) continue;
   533	          if (title && (t === title || t.includes(title))) continue;
   534	          if (looksLikeMetaLine(t)) continue;
   535	          if (looksLikeLocation(t)) continue;
   536	          if (t.length < 2) continue;
   537	          company = t;
   538	          break;
   539	        }
   540	      }
   541	
   542	      let location = '';
   543	      for (const t of [...card.querySelectorAll('p')].map((p) => (p.textContent ?? '').trim())) {
   544	        if (looksLikeLocation(t) && t !== title && t !== company) {
   545	          location = t;
   546	          break;
   547	        }
   548	      }
   549	
   550	      return {

Script executed:

grep -n "isJobCardViewed" scan-auth/linkedin.mjs | head -5

Repository: santifer/career-ops


Repository: santifer/career-ops
Exit code: 0

stdout:

13: *   3. isJobCardViewed() checks for LinkedIn's "Viewed" label on the card
222:            if (skipViewed && await this.isJobCardViewed(page, card)) {
576:  async isJobCardViewed(page, card) {

Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '560,620p'

Repository: santifer/career-ops


Repository: santifer/career-ops
Exit code: 0

stdout:

   560	    data.listingUrl = listingUrl;
   561	    data.clicked = clicked;
   562	
   563	    // Extract detail from the now-open panel
   564	    if (clicked) {
   565	      const detail = await this.#extractDetailFromPanel(page, jobId);
   566	      data.applicationUrl = this.#unwrapRedirect(detail.applicationUrl);
   567	      data.jdText = detail.jdText;
   568	    }
   569	
   570	    return data;
   571	  }
   572	
   573	  /**
   574	   * Whether LinkedIn marks the listing card as already opened (see SELECTORS.viewedStatusLabels).
   575	   */
   576	  async isJobCardViewed(page, card) {
   577	    return page.evaluate(({ card, sel }) => {
   578	      const labels = sel.viewedStatusLabels || [];
   579	      const labelSet = new Set(labels.map((s) => String(s).toLowerCase()));
   580	      const tagQuery = sel.viewedStatusTagQuery || 'p, span, li';
   581	      for (const el of card.querySelectorAll(tagQuery)) {
   582	        const t = (el.textContent ?? '').trim();
   583	        if (t && labelSet.has(t.toLowerCase())) return true;
   584	      }
   585	
   586	      const sepClass = sel.viewedStatusLineSeparatorCharClass || '·•|';
   587	      const line = (card.innerText ?? '').replace(/\s+/g, ' ').trim();
   588	      for (const raw of labels) {
   589	        const esc = String(raw).replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
   590	        const re = new RegExp(`\\b${esc}\\b\\s*[${sepClass}]`, 'i');
   591	        if (re.test(line)) return true;
   592	      }
   593	
   594	      return false;
   595	    }, { card, sel: SELECTORS });
   596	  }
   597	
   598	  /**
   599	   * Scrape the detail panel (right side) after a card has been clicked.
   600	   * Clicks "more" to expand truncated JDs, then extracts the apply URL
   601	   * and full JD text. Falls back to the listing URL if no external apply
   602	   * link is found (Easy Apply jobs).
   603	   */
   604	  async `#extractDetailFromPanel`(page, jobId) {
   605	    // Try to expand the description
   606	    await page.evaluate(({ xpath }) => {
   607	      const result = document.evaluate(xpath, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);
   608	      const moreSpan = result.singleNodeValue;
   609	      if (moreSpan) moreSpan.click();
   610	    }, { xpath: SELECTORS.xpathMoreButton });
   611	    await sleep(250);
   612	
   613	    const detail = await page.evaluate(({ sel, jobId }) => {
   614	      function xpathFirst(expression) {
   615	        const result = document.evaluate(
   616	          expression, document, null,
   617	          XPathResult.FIRST_ORDERED_NODE_TYPE, null
   618	        );
   619	        return result.singleNodeValue;
   620	      }

Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '550,575p'

Repository: santifer/career-ops


Repository: santifer/career-ops
Exit code: 0

stdout:

   550	      return {
   551	        title: title || '',
   552	        company: company || '',
   553	        location: location || '',
   554	      };
   555	    }, { card, sel: SELECTORS, noise: [...NOISE_LABELS] });
   556	
   557	    // Click card and extract job ID from the resulting URL
   558	    const { clicked, jobId, listingUrl } = await this.#clickAndExtractJobId(page, card);
   559	    data.jobId = jobId;
   560	    data.listingUrl = listingUrl;
   561	    data.clicked = clicked;
   562	
   563	    // Extract detail from the now-open panel
   564	    if (clicked) {
   565	      const detail = await this.#extractDetailFromPanel(page, jobId);
   566	      data.applicationUrl = this.#unwrapRedirect(detail.applicationUrl);
   567	      data.jdText = detail.jdText;
   568	    }
   569	
   570	    return data;
   571	  }
   572	
   573	  /**
   574	   * Whether LinkedIn marks the listing card as already opened (see SELECTORS.viewedStatusLabels).
   575	   */

Script executed:

grep -n "clickAndExtractJobId" scan-auth/linkedin.mjs | head -3

Repository: santifer/career-ops


Repository: santifer/career-ops
Exit code: 0

stdout:

10: *      then clicks the card via `#clickAndExtractJobId` to get the job ID
464:  async `#clickAndExtractJobId`(page, card) {
558:    const { clicked, jobId, listingUrl } = await this.#clickAndExtractJobId(page, card);

Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '440,485p'

Repository: santifer/career-ops


Repository: santifer/career-ops
Exit code: 0

stdout:

   440	
   441	  // -------------------------------------------------------------------------
   442	  // Private — extraction helpers
   443	  // -------------------------------------------------------------------------
   444	
   445	  async `#getCardCount`(page) {
   446	    return page.evaluate((xpath) => {
   447	      const result = document.evaluate(xpath, document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
   448	      return result.snapshotLength;
   449	    }, SELECTORS.xpathListingCard);
   450	  }
   451	
   452	  async `#getCard`(page, index) {
   453	    return page.evaluateHandle(({ xpath, idx }) => {
   454	      const result = document.evaluate(xpath, document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
   455	      return result.snapshotItem(idx) || null;
   456	    }, { xpath: SELECTORS.xpathListingCard, idx: index });
   457	  }
   458	
   459	  /**
   460	   * Click a card and extract the job ID from the resulting URL.
   461	   * LinkedIn updates the URL's currentJobId query param when a card is selected.
   462	   * Returns { clicked, jobId, listingUrl }.
   463	   */
   464	  async `#clickAndExtractJobId`(page, card) {
   465	    const clicked = await page.evaluate((c) => {
   466	      if (c) { c.click(); return true; }
   467	      return false;
   468	    }, card);
   469	    if (!clicked) return { clicked: false, jobId: '', listingUrl: '' };
   470	
   471	    await sleep(CARD_CLICK_DELAY_MS);
   472	    const url = page.url();
   473	    try {
   474	      const jobId = new URL(url).searchParams.get('currentJobId') || '';
   475	      const listingUrl = jobId ? `https://www.linkedin.com/jobs/view/${jobId}/` : '';
   476	      return { clicked: true, jobId, listingUrl };
   477	    } catch {
   478	      return { clicked: true, jobId: '', listingUrl: '' };
   479	    }
   480	  }
   481	
   482	  /**
   483	   * Extract all data for a single job card. Reads title/company/location from
   484	   * the card DOM (no click), then clicks the card to get the job ID and opens
   485	   * the detail panel to scrape the apply URL and JD text.

Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '208,295p'

Repository: santifer/career-ops


Repository: santifer/career-ops
Exit code: 0

stdout:

   208	            const card = await this.#getCard(page, i);
   209	            const cardExists = await page.evaluate(c => c != null, card);
   210	            if (!cardExists) {
   211	              warn(`  ✗ Card ${i} not found`);
   212	              stats.errors++;
   213	              consecutiveFailures++;
   214	              await card.dispose();
   215	              continue;
   216	            }
   217	
   218	            // 1. Extract title, company, click card, extract detail
   219	            const data = await this.extractJob(page, card);
   220	
   221	            // 2. Check viewed label
   222	            if (skipViewed && await this.isJobCardViewed(page, card)) {
   223	              log(`  ✗ Viewed: skipped card ${i}`);
   224	              stats.skipped_viewed++;
   225	              skipped.push({
   226	                url: data.listingUrl || '',
   227	                title: data.title || '',
   228	                company: data.company || '',
   229	                status: 'skipped_viewed_linkedin',
   230	              });
   231	              await card.dispose();
   232	              continue;
   233	            }
   234	            await card.dispose();
   235	
   236	            
   237	            if (!data.title) {
   238	              warn(`  ✗ No title on card ${i} (data); skipping`);
   239	              stats.errors++;
   240	              consecutiveFailures++;
   241	              continue;
   242	            }
   243	            // 3. Blocklist, dedup, and title filter
   244	            if (employerBlocklist.length && data.company) {
   245	              const companyLower = data.company.toLowerCase();
   246	              if (employerBlocklist.some(b => companyLower === b.toLowerCase())) {
   247	                log(`  ✗ Blocked employer: ${data.company}`);
   248	                stats.skipped_filter++;
   249	                continue;
   250	              }
   251	            }
   252	
   253	            const companyTitleKey = (data.company && data.title)
   254	              ? `${data.company}::${data.title}`.toLowerCase() : null;
   255	            const dedupKey = data.jobId || companyTitleKey;
   256	            if ((dedupKey && scanHistory.has(dedupKey)) || (companyTitleKey && scanHistory.has(companyTitleKey))) {
   257	              log(`  ✗ Already seen: ${data.title} (${data.company})`);
   258	              stats.skipped_dedup++;
   259	              skipped.push({
   260	                url: data.listingUrl || '',
   261	                title: data.title || '',
   262	                company: data.company || '',
   263	                status: 'skipped_dup',
   264	              });
   265	              continue;
   266	            }
   267	
   268	            if (!this.#matchesFilter(data.title, '', titleFilter)) {
   269	              log(`  ✗ Filtered: ${data.title} (${data.company})`);
   270	              stats.skipped_filter++;
   271	              continue;
   272	            }
   273	
   274	            stats.extracted++;
   275	
   276	            if (!data.applicationUrl) {
   277	              log(`  ✗ No apply URL: ${data.title} (${data.company})`);
   278	              stats.skipped_filter++;
   279	              continue;
   280	            }
   281	
   282	            if (!this.#matchesFilter(data.title, data.jdText || '', titleFilter)) {
   283	              log(`  ✗ Filtered after JD: ${data.title} (${data.company})`);
   284	              stats.skipped_filter++;
   285	              continue;
   286	            }
   287	
   288	            if (!data.jdText) {
   289	              warn(`  ✗ No JD content: ${data.title}`);
   290	              stats.errors++;
   291	              consecutiveFailures++;
   292	              continue;
   293	            }
   294	
   295	            consecutiveFailures = 0;

Check viewed status before extracting, and protect card disposal with try/finally.

Line 219 calls extractJob(), which clicks the card at line 558 and opens the detail panel. Only then, at line 222, does isJobCardViewed() check whether LinkedIn marked it as viewed—meaning viewed jobs are unnecessarily opened and scraped before being skipped. Additionally, if extractJob() throws, the card handle at line 208 is never disposed, leaking browser resources.

Move isJobCardViewed() before the click (since it only reads card DOM), create a no-click helper to extract title/company/location for skipped entries, and wrap per-card work in try/finally to ensure card.dispose() always runs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scan-auth/linkedin.mjs` around lines 208 - 234, Reorder and protect per-card
processing: after obtaining the handle from this.#getCard(page, i) call, call
this.isJobCardViewed(page, card) before invoking this.extractJob so you don't
click/open viewed listings; if skipViewed is true and the card is viewed, use a
new no-click helper (e.g., extractJobMetadataNoClick or similar) that reads
title/company/location/listingUrl from the card DOM without clicking, push the
skipped entry and stats, and skip extraction; wrap the whole per-card logic (the
sequence around this.#getCard, this.isJobCardViewed, this.extractJob and pushing
results) in a try/finally that always calls await card.dispose() in the finally
block so the handle is never leaked, and only call extractJob (which clicks)
when you know you will process the detail panel.

Comment thread scan-auth/linkedin.mjs
Comment thread scan-auth/linkedin.mjs
Comment on lines +375 to +386
#unwrapRedirect(href) {
const trimmed = (href || '').trim();
if (!trimmed) return '';
try {
const u = new URL(trimmed);
if (!u.hostname.includes('linkedin.com')) return trimmed;
if (!u.pathname.includes('/safety/go')) return trimmed;
const nested = u.searchParams.get('url');
if (!nested) return trimmed;
const decoded = decodeURIComponent(nested);
new URL(decoded);
return decoded;
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten the trusted LinkedIn redirect host check.

hostname.includes('linkedin.com') trusts hosts like notlinkedin.com or linkedin.com.evil.tld and unwraps their /safety/go?url=... targets. Match exact LinkedIn hosts before decoding nested URLs. As per coding guidelines, **/*.mjs: Check for command injection, path traversal, and SSRF.

🛡️ Proposed host validation fix
       const u = new URL(trimmed);
-      if (!u.hostname.includes('linkedin.com')) return trimmed;
-      if (!u.pathname.includes('/safety/go')) return trimmed;
+      const hostname = u.hostname.toLowerCase();
+      const isLinkedInHost = hostname === 'linkedin.com' || hostname.endsWith('.linkedin.com');
+      if (!isLinkedInHost) return trimmed;
+      if (u.pathname !== '/safety/go') return trimmed;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scan-auth/linkedin.mjs` around lines 375 - 386, The hostname check in
`#unwrapRedirect` is too permissive (uses hostname.includes('linkedin.com'))
allowing domains like notlinkedin.com; update it to only accept exact LinkedIn
hostnames (e.g., linkedin.com, www.linkedin.com, touch.www.linkedin.com,
www.linkedin.cn, or other canonical LinkedIn hosts used by your app) before
processing /safety/go and decoding the nested url; ensure the check happens on
u.hostname (or normalized hostname) and only then call u.searchParams.get('url')
and decodeURIComponent, keeping the rest of the existing logic in
`#unwrapRedirect` intact.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is heuristic-based and any potential issues would require malicious code being hosted on the LinkedIn job portal.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DSnoNintendo, that's a fair point. Since #unwrapRedirect only operates on URLs already present in LinkedIn's own DOM (sourced from the Apply on company website anchor), any abuse of the permissive .includes('linkedin.com') check would presuppose an attacker already controlling content served by LinkedIn's job portal. The decoded URL is also returned as a string rather than triggering an automated outbound fetch, which removes the server-side SSRF vector. I'll defer to your assessment here.


✏️ Learnings added
Learnt from: DSnoNintendo
Repo: santifer/career-ops PR: 379
File: scan-auth/linkedin.mjs:375-386
Timestamp: 2026-04-19T17:58:15.791Z
Learning: In `scan-auth/linkedin.mjs` (career-ops repo), the `#unwrapRedirect` method's `hostname.includes('linkedin.com')` check is intentionally heuristic. The URLs it processes are sourced exclusively from LinkedIn's own DOM (the `//a[aria-label='Apply on company website']` XPath selector), so exploiting the permissive check would require an attacker to already control content on LinkedIn's portal. The decoded result is stored as a string, not used for an automated server-side HTTP request, so there is no SSRF risk. Do not flag `#unwrapRedirect` for the permissive hostname check.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: DSnoNintendo
Repo: santifer/career-ops PR: 379
File: scan-auth/linkedin.mjs:464-475
Timestamp: 2026-04-19T17:56:07.061Z
Learning: In `scan-auth/linkedin.mjs` (career-ops repo), clicking a LinkedIn job card in the search results left rail triggers an **immediate/synchronous SPA navigation** that updates the `currentJobId` URL query parameter. A fixed `sleep(CARD_CLICK_DELAY_MS)` after the click is intentional — it is a rendering buffer for the detail panel, not a guard for the URL change. Do not flag `#clickAndExtractJobId` for using a fixed delay instead of `page.waitForFunction` for the URL/jobId change.

Learnt from: DSnoNintendo
Repo: santifer/career-ops PR: 379
File: scan-auth/linkedin.mjs:357-368
Timestamp: 2026-04-19T17:52:55.390Z
Learning: In `scan-auth/linkedin.mjs` (career-ops repo), LinkedIn job search filters (date posted, experience level) are applied via search query strings embedded in the keywords parameter, not via URL parameters like `f_TPR` or `f_E`. Do not flag the `#buildSearches` method for using query string text for filters — this is the current correct approach for LinkedIn job search URLs.

Comment thread scan-auth/linkedin.mjs
Comment on lines +464 to +475
async #clickAndExtractJobId(page, card) {
const clicked = await page.evaluate((c) => {
if (c) { c.click(); return true; }
return false;
}, card);
if (!clicked) return { clicked: false, jobId: '', listingUrl: '' };

await sleep(CARD_CLICK_DELAY_MS);
const url = page.url();
try {
const jobId = new URL(url).searchParams.get('currentJobId') || '';
const listingUrl = jobId ? `https://www.linkedin.com/jobs/view/${jobId}/` : '';
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

head -n 480 scan-auth/linkedin.mjs | tail -n 30

Repository: santifer/career-ops

Length of output: 1182


🏁 Script executed:

grep -n "CARD_CLICK_DELAY_MS\|NAV_TIMEOUT_MS" scan-auth/linkedin.mjs | head -20

Repository: santifer/career-ops

Length of output: 492


🏁 Script executed:

grep -n "#clickAndExtractJobId" scan-auth/linkedin.mjs

Repository: santifer/career-ops

Length of output: 279


🏁 Script executed:

grep -n "page.waitForFunction\|waitForFunction" scan-auth/linkedin.mjs

Repository: santifer/career-ops

Length of output: 124


🏁 Script executed:

grep -rn "waitForFunction" scan-auth/

Repository: santifer/career-ops

Length of output: 147


🏁 Script executed:

grep -n "sleep(" scan-auth/linkedin.mjs | head -20

Repository: santifer/career-ops

Length of output: 379


🏁 Script executed:

sed -n '425,445n' scan-auth/linkedin.mjs

Repository: santifer/career-ops

Length of output: 45


🏁 Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '464,480p'

Repository: santifer/career-ops

Length of output: 791


🏁 Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '425,440p'

Repository: santifer/career-ops

Length of output: 863


🏁 Script executed:

grep -B 5 -A 10 "page.waitForFunction" scan-auth/linkedin.mjs

Repository: santifer/career-ops

Length of output: 819


🏁 Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '1,75p'

Repository: santifer/career-ops

Length of output: 3941


🏁 Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '555,565p'

Repository: santifer/career-ops

Length of output: 571


🏁 Script executed:

grep -n "import.*page\|Playwright\|version" scan-auth/linkedin.mjs | head -10

Repository: santifer/career-ops

Length of output: 144


🏁 Script executed:

cat -n scan-auth/linkedin.mjs | sed -n '464,480p' && echo "---" && cat -n scan-auth/linkedin.mjs | sed -n '460,465p'

Repository: santifer/career-ops

Length of output: 1128


🏁 Script executed:

grep -B 5 "currentJobId" scan-auth/linkedin.mjs | head -20

Repository: santifer/career-ops

Length of output: 884


Replace fixed delay with change detection to reliably capture the updated job ID.

The URL's currentJobId parameter updates asynchronously after the card click. A fixed 1-second delay can race and read an empty or stale value, causing the scanner to extract job details from the wrong listing.

Use page.waitForFunction to wait for the parameter to actually change rather than relying on a fixed timeout:

Proposed fix
  async `#clickAndExtractJobId`(page, card) {
+   const previousJobId = await page.evaluate(() => (
+     new URL(window.location.href).searchParams.get('currentJobId') || ''
+   )).catch(() => '');
+
    const clicked = await page.evaluate((c) => {
      if (c) { c.click(); return true; }
      return false;
    }, card);
    if (!clicked) return { clicked: false, jobId: '', listingUrl: '' };
 
-   await sleep(CARD_CLICK_DELAY_MS);
+   await page.waitForFunction((prev) => {
+     const current = new URL(window.location.href).searchParams.get('currentJobId') || '';
+     return current && current !== prev;
+   }, previousJobId, { timeout: NAV_TIMEOUT_MS }).catch(() => {});
+
    const url = page.url();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scan-auth/linkedin.mjs` around lines 464 - 475, In `#clickAndExtractJobId`,
replace the fixed sleep(CARD_CLICK_DELAY_MS) after clicking the card with a
wait-for-change using page.waitForFunction: capture the prior URL or prior
currentJobId before the click, perform the click via page.evaluate as currently
done, then call page.waitForFunction to poll until new
URL.searchParams.get('currentJobId') !== priorJobId (or until page.url()
changes), with a sensible timeout; after waitForFunction resolves, read the
updated page.url() and extract currentJobId and listingUrl as before to reliably
get the updated job id instead of racing with a fixed delay.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although a page.waitForFunction could work here, clicking a job card instantly redirects a browser to the associated job.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DSnoNintendo, that's a good clarification — if the card click triggers an immediate SPA navigation that synchronously updates currentJobId in the URL, the waitForFunction polling loop would be redundant (it would resolve on the very first check anyway). The existing sleep(CARD_CLICK_DELAY_MS) makes sense as a rendering buffer for the detail panel to populate after the navigation, not as a guard on the URL change itself. Happy to withdraw this concern.


✏️ Learnings added
Learnt from: DSnoNintendo
Repo: santifer/career-ops PR: 379
File: scan-auth/linkedin.mjs:464-475
Timestamp: 2026-04-19T17:56:07.061Z
Learning: In `scan-auth/linkedin.mjs` (career-ops repo), clicking a LinkedIn job card in the search results left rail triggers an **immediate/synchronous SPA navigation** that updates the `currentJobId` URL query parameter. A fixed `sleep(CARD_CLICK_DELAY_MS)` after the click is intentional — it is a rendering buffer for the detail panel, not a guard for the URL change. Do not flag `#clickAndExtractJobId` for using a fixed delay instead of `page.waitForFunction` for the URL/jobId change.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: DSnoNintendo
Repo: santifer/career-ops PR: 379
File: scan-auth/linkedin.mjs:357-368
Timestamp: 2026-04-19T17:52:55.390Z
Learning: In `scan-auth/linkedin.mjs` (career-ops repo), LinkedIn job search filters (date posted, experience level) are applied via search query strings embedded in the keywords parameter, not via URL parameters like `f_TPR` or `f_E`. Do not flag the `#buildSearches` method for using query string text for filters — this is the current correct approach for LinkedIn job search URLs.

Learnt from: CR
Repo: santifer/career-ops PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-19T10:52:21.261Z
Learning: Applies to {check-liveness.mjs,liveness-core.mjs} : Verify offer liveness using Playwright by navigating to the URL and taking a snapshot; only footer/navbar without JD content indicates closed posting, while title + description + Apply indicates active

Comment thread scan-auth/linkedin.mjs
Comment on lines +624 to +626
// if no apply url or easy apply button is present, use the listing url
let applicationUrl = applyEl?.href?.trim() ?? `https://www.linkedin.com/jobs/view/${jobId}/`;

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not synthesize a fallback URL when jobId is missing.

If currentJobId extraction fails and there is no external apply link, this returns https://www.linkedin.com/jobs/view//, which passes the non-empty applicationUrl check and gets saved. Return an empty URL unless a real jobId exists.

🐛 Proposed fallback fix
-      // if no apply url or easy apply button is present, use the listing url
-      let applicationUrl = applyEl?.href?.trim() ?? `https://www.linkedin.com/jobs/view/${jobId}/`;
+      // If no external apply URL is present, fall back only when a real job ID exists.
+      let applicationUrl = applyEl?.href?.trim() ?? '';
+      if (!applicationUrl && jobId) {
+        applicationUrl = `https://www.linkedin.com/jobs/view/${jobId}/`;
+      }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// if no apply url or easy apply button is present, use the listing url
let applicationUrl = applyEl?.href?.trim() ?? `https://www.linkedin.com/jobs/view/${jobId}/`;
// If no external apply URL is present, fall back only when a real job ID exists.
let applicationUrl = applyEl?.href?.trim() ?? '';
if (!applicationUrl && jobId) {
applicationUrl = `https://www.linkedin.com/jobs/view/${jobId}/`;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scan-auth/linkedin.mjs` around lines 624 - 626, The code currently
synthesizes a LinkedIn view URL even when jobId is missing; update the logic
around applicationUrl (the variable set from applyEl?.href?.trim() ??
`https://www.linkedin.com/jobs/view/${jobId}/`) to only construct the
`https://www.linkedin.com/jobs/view/${jobId}/` fallback when `jobId` (or
currentJobId) is truthy/non-empty; otherwise set applicationUrl to an empty
string (or null) so no fake URL is saved. Ensure you still trim applyEl.href if
present and prefer it over the view URL, but guard the template fallback with an
explicit check for jobId before building it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic only runs if an EasyApply button is available on the job page.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DSnoNintendo, thanks for the context — that's a good clarification. You're right that in normal operation the ?? fallback is only reached for Easy Apply jobs, where there's no //a[@aria-label='Apply on company website'] element, and by that point a card has been clicked and the detail panel is open, so a valid currentJobId should always be present in the URL.

The concern is purely defensive: #clickAndExtractJobId already returns jobId: '' as a safety catch (e.g., if a try/catch fires or LinkedIn's URL structure changes), and that '' value is forwarded directly into #extractDetailFromPanel. If both conditions occur simultaneously — applyEl is null and jobId is '' — the current code produces https://www.linkedin.com/jobs/view// and it would silently pass the non-empty URL check downstream.

The proposed guard would add no overhead in the common case, but would prevent a malformed URL from being saved in that edge scenario:

let applicationUrl = applyEl?.href?.trim() ?? '';
if (!applicationUrl && jobId) {
  applicationUrl = `https://www.linkedin.com/jobs/view/${jobId}/`;
}

That said, if you're confident the calling site always supplies a non-empty jobId before reaching this function, I understand leaving it as-is. Happy to defer to your judgment here.


🧠 Learnings used
Learnt from: DSnoNintendo
Repo: santifer/career-ops PR: 379
File: scan-auth/linkedin.mjs:357-368
Timestamp: 2026-04-19T17:52:55.390Z
Learning: In `scan-auth/linkedin.mjs` (career-ops repo), LinkedIn job search filters (date posted, experience level) are applied via search query strings embedded in the keywords parameter, not via URL parameters like `f_TPR` or `f_E`. Do not flag the `#buildSearches` method for using query string text for filters — this is the current correct approach for LinkedIn job search URLs.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
.claude/skills/career-ops/SKILL.md (1)

30-35: ⚠️ Potential issue | 🟠 Major

Keep the router docs aligned with the portal-required CLI.

The router and menu still advertise bare /career-ops scan-auth, but scan-auth.mjs exits unless a portal is provided. Either route bare scan-auth to an interactive/default portal flow, or document /career-ops scan-auth linkedin consistently.

Docs-only alignment
-| `scan-auth`, `scan-auth <portal>` | `scan-auth` |
+| `scan-auth <portal>` | `scan-auth` |
@@
-  /career-ops scan-auth → Authenticated portal scan (LinkedIn etc.)
+  /career-ops scan-auth linkedin → Authenticated portal scan (LinkedIn)

Also applies to: 63-63

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/career-ops/SKILL.md around lines 30 - 35, The router/menu
docs advertise a bare "scan-auth" command but scan-auth.mjs requires a portal
argument; update routing or docs to be consistent: either change the router/menu
entry for scan-auth to require a portal (e.g., show "scan-auth <portal>" and
examples like "scan-auth linkedin") or implement a default/interactive flow in
the scan-auth handler that routes bare "scan-auth" to prompt for or default to a
portal (e.g., invoke the same logic as scan-auth.mjs with a default portal or
interactive prompt). Locate references to "scan-auth", "scan-auth.mjs", and any
router/menu entries and make the change so the CLI usage in the docs and router
behavior match.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/career-ops/SKILL.md:
- Around line 83-89: MD022 violations: add a single blank line after the
headings "Standalone modes (only their mode file):" and "Modes delegated to
subagent:" so each heading is followed by an empty line before the next
paragraph. Edit the SKILL.md content near those headings and insert one blank
line after each heading (and also ensure any subsequent headings like "Read
`modes/{mode}.md`" or the list starting with "For `scan`..." remain separated),
keeping the existing text and references to `modes/{mode}.md` and `_shared.md`
unchanged.

In `@scan-auth.mjs`:
- Around line 273-293: The saveJd function currently builds filenames from only
company and title which causes collisions; update saveJd to append a stable
unique suffix (e.g., the LinkedIn job ID from detail.jobId or detail.id or
detail.listingId) to the slugified filename so each listing produces a unique
file (fallback to a deterministic fallback like a timestamp only if no ID
exists), ensure the chosen ID field is sanitized the same way as the slug, and
keep returning the same jds/<filename> path; refer to saveJd, JDS_DIR, slug,
filename, filepath and detail.* to locate where to change the filename
generation and return value.
- Around line 75-97: Current parsing ignores unknown flags and extra positional
args which allows e.g. "node scan-auth.mjs linkedin --dry-run" to proceed;
change argument validation in the portalId/FLAGS logic to explicitly parse args
into two sets (recognized flags using FLAGS and positional args), reject any
unknown flag (error + process.exit(1)) and reject more than one positional
portal (error + process.exit(1)); update usage/error messages to include
supportedNames and mention invalid tokens; ensure FLAG.login is derived from the
parsed flags set and that SCANNERS[portalId] only runs after these validations
succeed.
- Around line 319-324: The "Next step" message printed after listing new results
is missing a connector making the sentence awkward; update the console.log that
prints the next-step instruction (the console.log after the loop that references
"/career-ops pipeline") to include a connector — e.g. change its text to "Next
step: run the /career-ops pipeline jobs saved to the pipeline." or "Next step:
run the /career-ops pipeline jobs that were saved to the pipeline." so the
command and the rest of the sentence read correctly.

---

Duplicate comments:
In @.claude/skills/career-ops/SKILL.md:
- Around line 30-35: The router/menu docs advertise a bare "scan-auth" command
but scan-auth.mjs requires a portal argument; update routing or docs to be
consistent: either change the router/menu entry for scan-auth to require a
portal (e.g., show "scan-auth <portal>" and examples like "scan-auth linkedin")
or implement a default/interactive flow in the scan-auth handler that routes
bare "scan-auth" to prompt for or default to a portal (e.g., invoke the same
logic as scan-auth.mjs with a default portal or interactive prompt). Locate
references to "scan-auth", "scan-auth.mjs", and any router/menu entries and make
the change so the CLI usage in the docs and router behavior match.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1899db7a-2d86-4b7c-b192-2c00ed7d642a

📥 Commits

Reviewing files that changed from the base of the PR and between a60cf34 and 648b562.

📒 Files selected for processing (2)
  • .claude/skills/career-ops/SKILL.md
  • scan-auth.mjs

Comment on lines 83 to +89
### Standalone modes (only their mode file):
Read `modes/{mode}.md`

Applies to: `tracker`, `deep`, `training`, `project`, `patterns`, `followup`

### Modes delegated to subagent:
For `scan`, `apply` (with Playwright), and `pipeline` (3+ URLs): launch as Agent with the content of `_shared.md` + `modes/{mode}.md` injected into the subagent prompt.
For `scan`, `scan-auth`, `apply` (with Playwright), and `pipeline` (3+ URLs): launch as Agent with the content of `_shared.md` + `modes/{mode}.md` injected into the subagent prompt.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Restore markdownlint-required blank lines after headings.

MD022 expects a blank line below these headings; the current text starts immediately after each heading.

Suggested markdownlint fix
 ### Standalone modes (only their mode file):
+
 Read `modes/{mode}.md`
@@
 ### Modes delegated to subagent:
+
 For `scan`, `scan-auth`, `apply` (with Playwright), and `pipeline` (3+ URLs): launch as Agent with the content of `_shared.md` + `modes/{mode}.md` injected into the subagent prompt.
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 83-83: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 88-88: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/career-ops/SKILL.md around lines 83 - 89, MD022 violations:
add a single blank line after the headings "Standalone modes (only their mode
file):" and "Modes delegated to subagent:" so each heading is followed by an
empty line before the next paragraph. Edit the SKILL.md content near those
headings and insert one blank line after each heading (and also ensure any
subsequent headings like "Read `modes/{mode}.md`" or the list starting with "For
`scan`..." remain separated), keeping the existing text and references to
`modes/{mode}.md` and `_shared.md` unchanged.

Comment thread scan-auth.mjs
Comment on lines +75 to +97
const FLAGS = new Set(['--login']);
const portalId = (() => {
for (let i = 0; i < args.length; i++) {
if (FLAGS.has(args[i])) continue;
return args[i];
}
return null;
})();

if (!portalId) {
console.error(`Usage: node scan-auth.mjs [options] <portal>\n\nSupported portals: ${supportedNames.join(', ')}`);
process.exit(1);
}
if (!SCANNERS[portalId]) {
console.error(`Unknown portal: "${portalId}"\nSupported portals: ${supportedNames.join(', ')}`);
process.exit(1);
}

const scanner = SCANNERS[portalId];

const FLAG = {
login: args.includes('--login'),
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject unsupported flags and extra positional args before scanning.

node scan-auth.mjs linkedin --dry-run currently scans and writes files because unknown args after the portal are ignored. Fail closed on unknown flags and multiple portals so typos or stale flags cannot trigger a mutating run.

Suggested CLI validation
 const FLAGS = new Set(['--login']);
+const unknownFlags = args.filter(arg => arg.startsWith('--') && !FLAGS.has(arg));
+if (unknownFlags.length > 0) {
+  console.error(`Unknown option(s): ${unknownFlags.join(', ')}\n\nSupported options: ${[...FLAGS].join(', ')}`);
+  process.exit(1);
+}
+
+const positionalArgs = args.filter(arg => !arg.startsWith('--'));
+if (positionalArgs.length > 1) {
+  console.error(`Expected one portal, got: ${positionalArgs.join(', ')}\nSupported portals: ${supportedNames.join(', ')}`);
+  process.exit(1);
+}
+
 const portalId = (() => {
   for (let i = 0; i < args.length; i++) {
     if (FLAGS.has(args[i])) continue;
     return args[i];
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scan-auth.mjs` around lines 75 - 97, Current parsing ignores unknown flags
and extra positional args which allows e.g. "node scan-auth.mjs linkedin
--dry-run" to proceed; change argument validation in the portalId/FLAGS logic to
explicitly parse args into two sets (recognized flags using FLAGS and positional
args), reject any unknown flag (error + process.exit(1)) and reject more than
one positional portal (error + process.exit(1)); update usage/error messages to
include supportedNames and mention invalid tokens; ensure FLAG.login is derived
from the parsed flags set and that SCANNERS[portalId] only runs after these
validations succeed.

Comment thread scan-auth.mjs
Comment on lines +273 to +293
function saveJd(detail) {
mkdirSync(JDS_DIR, { recursive: true });
const slug = slugify(`${detail.company}-${detail.title}`);
const filename = `${slug}.md`;
const filepath = join(JDS_DIR, filename);

const content = `---
title: ${yamlEscape(detail.title)}
company: ${yamlEscape(detail.company)}
application_url: ${yamlEscape(detail.applicationUrl || '')}
scraped: "${new Date().toISOString().split('T')[0]}"
source: ${portalId}
---

# ${detail.title} — ${detail.company}

${detail.jdText}
`;

writeFileSync(filepath, content, 'utf-8');
return `jds/${filename}`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Prevent JD filename collisions for same company/title listings.

saveJd() uses only company-title for the filename, so two accepted LinkedIn jobs with the same company and title overwrite the same jds/*.md file and both pipeline entries can point to the final write. Include a stable unique suffix such as the LinkedIn job ID.

Suggested collision-resistant filename
 function saveJd(detail) {
   mkdirSync(JDS_DIR, { recursive: true });
-  const slug = slugify(`${detail.company}-${detail.title}`);
+  const jobId = extractJobIdFromUrl(detail.url);
+  const slug = slugify(`${detail.company}-${detail.title}${jobId ? `-${jobId}` : ''}`);
   const filename = `${slug}.md`;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scan-auth.mjs` around lines 273 - 293, The saveJd function currently builds
filenames from only company and title which causes collisions; update saveJd to
append a stable unique suffix (e.g., the LinkedIn job ID from detail.jobId or
detail.id or detail.listingId) to the slugified filename so each listing
produces a unique file (fallback to a deterministic fallback like a timestamp
only if no ID exists), ensure the chosen ID field is sanitized the same way as
the slug, and keep returning the same jds/<filename> path; refer to saveJd,
JDS_DIR, slug, filename, filepath and detail.* to locate where to change the
filename generation and return value.

Comment thread scan-auth.mjs
Comment on lines +319 to +324
if (results.listings.length > 0) {
console.log('\nNew listings:');
for (const l of results.listings) {
console.log(` • ${l.title} — ${l.company}`);
}
console.log(`\nNext step: run /career-ops pipeline jobs saved to the pipeline.`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify the next-step message.

Line 324 is missing the connector after the command, making the instruction awkward.

Suggested wording
-    console.log(`\nNext step: run /career-ops pipeline jobs saved to the pipeline.`);
+    console.log('\nNext step: run /career-ops pipeline to process jobs saved to the pipeline.');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scan-auth.mjs` around lines 319 - 324, The "Next step" message printed after
listing new results is missing a connector making the sentence awkward; update
the console.log that prints the next-step instruction (the console.log after the
loop that references "/career-ops pipeline") to include a connector — e.g.
change its text to "Next step: run the /career-ops pipeline jobs saved to the
pipeline." or "Next step: run the /career-ops pipeline jobs that were saved to
the pipeline." so the command and the rest of the sentence read correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant