Skip to content

fix(liveness): detect closed postings that were slipping through#374

Open
osamaKhalid122 wants to merge 2 commits intosantifer:mainfrom
osamaKhalid122:fix/liveness-detect-closed-postings
Open

fix(liveness): detect closed postings that were slipping through#374
osamaKhalid122 wants to merge 2 commits intosantifer:mainfrom
osamaKhalid122:fix/liveness-detect-closed-postings

Conversation

@osamaKhalid122
Copy link
Copy Markdown

@osamaKhalid122 osamaKhalid122 commented Apr 19, 2026

Closed job postings were being shortlisted because classifyLiveness failed two ways:

  1. HARD_EXPIRED_PATTERNS didn't cover the banner text "Applications have closed for this job" (seen on mycareersfuture.gov.sg and similar ATS) or "Closed on {date}" timestamps.
  2. When no expired pattern matched and no apply button was visible, the function returned uncertain — letting the posting proceed to evaluation. An active posting virtually always has an apply button; content with no apply control is effectively closed.

Adds three patterns and flips the no-apply-button fallback from uncertain to expired.

What does this PR do?

Related issue

Type of change

  • Bug fix
  • New feature
  • Documentation / translation
  • Refactor (no behavior change)

Checklist

  • I have read CONTRIBUTING.md
  • I linked a related issue above (required for features and architecture changes)
  • My PR does not include personal data (CV, email, real names)
  • I ran node test-all.mjs and all tests pass
  • My changes respect the Data Contract (no modifications to user-layer files)
  • My changes align with the project roadmap

Questions? Join the Discord for faster feedback.

Summary by CodeRabbit

  • Bug Fixes
    • Improved detection of closed/expired job postings via expanded text-pattern matching, including "closed on " variants.
    • Maintains existing behavior for apply-control, listing-page, and insufficient-content cases to avoid false expired classifications.
  • Tests
    • Added a targeted test case to ensure correct classification of a closed job posting example.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 19, 2026

📝 Walkthrough

Walkthrough

Added three new hard-expired body-text regexes to the liveness classifier to detect “applications/positions are closed” and “closed on ” phrasing; this expands conditions under which classifyLiveness may return { result: 'expired' }. No other control-flow branches were modified.

Changes

Cohort / File(s) Summary
Core Classification Logic
liveness-core.mjs
Added three additional regexes to HARD_EXPIRED_PATTERNS to match variations of "applications/positions are closed" and "closed on " in body text. No changes to function signatures or other classification branches.
Tests
test-all.mjs
Added a new liveness test case invoking classifyLiveness with a MyCareersFuture closed-job example (multi-line bodyText, finalUrl, and applyControls) and an assertion that the result is expired.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding detection for closed job postings that were previously missed. It directly relates to the core objective of expanding HARD_EXPIRED_PATTERNS to catch additional closed posting phrasings.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome to career-ops, @osamaKhalid122! Thanks for your first PR.

A few things to know:

  • Tests will run automatically — check the status below
  • Make sure you've linked a related issue (required for features)
  • Read CONTRIBUTING.md if you haven't

We'll review your PR soon. Join our Discord if you have questions.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@liveness-core.mjs`:
- Line 77: The fallback in classifyLiveness currently returns {result:
'expired', ...} which removes the documented 'uncertain' verdict; change the
fallback return inside classifyLiveness back to {result: 'uncertain', reason:
'content present but no visible apply control — likely closed posting'} so the
triage signal is preserved, then update check-liveness.mjs to rely on that
verdict (its uncertain counter/icon will now increment) and add a test in
test-all.mjs that triggers this fallback path, plus ensure docs/SCRIPTS.md
continues to document active | expired | uncertain.
- Around line 12-14: The liveness regexes are too broad and partially redundant:
replace the loose `/closed on \w+ \d+/i` with a stricter date-aware pattern
(match actual month names or ISO-like numeric dates, e.g., month + day or
YYYY-MM-DD) and consolidate the overlapping patterns `/applications? (have|are)
closed/i` and `/applications? closed for this job/i` into a single, combined
pattern that also covers "applications closed on …" variants; update the pattern
collection referenced by classifyLiveness (and any variable holding those
regexes) in liveness-core.mjs accordingly, change the classifyLiveness fallback
to return 'uncertain' instead of 'expired' when confidence is low so it aligns
with check-liveness.mjs/docs, and add a regression test in test-all.mjs that
includes the mycareersfuture.gov.sg phrasing to protect these changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3c9cc4ba-8644-48ca-953b-d75a16462b9e

📥 Commits

Reviewing files that changed from the base of the PR and between 10c496c and efb13af.

📒 Files selected for processing (1)
  • liveness-core.mjs

Comment thread liveness-core.mjs Outdated
Comment on lines +12 to +14
/applications? (have|are) closed/i,
/applications? closed for this job/i,
/closed on \w+ \d+/i,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm there's no existing test exercising the new patterns, and surface any sample fixtures that might contain "closed on <non-date>" substrings.
rg -nP -C2 'applications?\s+(have|are)?\s*closed|closed on\s+\w+\s+\d+' --type=js --type=md --type=mjs 2>/dev/null
fd -e mjs -e md | xargs rg -nP 'classifyLiveness|HARD_EXPIRED_PATTERNS' 2>/dev/null

Repository: santifer/career-ops

Length of output: 747


🏁 Script executed:

cat -n liveness-core.mjs | head -80

Repository: santifer/career-ops

Length of output: 2926


🏁 Script executed:

wc -l test-all.mjs

Repository: santifer/career-ops

Length of output: 79


🏁 Script executed:

fd -e md docs/SCRIPTS.md

Repository: santifer/career-ops

Length of output: 45


🏁 Script executed:

cat -n test-all.mjs | sed -n '85,120p'

Repository: santifer/career-ops

Length of output: 1631


🏁 Script executed:

rg -n "uncertain" liveness-core.mjs check-liveness.mjs docs/ 2>/dev/null || echo "No matches for 'uncertain'"

Repository: santifer/career-ops

Length of output: 725


🏁 Script executed:

rg -l "SCRIPTS.md|documented|API" --type=md

Repository: santifer/career-ops

Length of output: 360


🏁 Script executed:

rg -n "uncertain" liveness-core.mjs

Repository: santifer/career-ops

Length of output: 45


🏁 Script executed:

git log --oneline -10 2>/dev/null | head -20

Repository: santifer/career-ops

Length of output: 135


🏁 Script executed:

git show HEAD:liveness-core.mjs 2>/dev/null | head -80 || echo "Git history not available"

Repository: santifer/career-ops

Length of output: 2380


🏁 Script executed:

git show HEAD~1:liveness-core.mjs 2>/dev/null | tail -5 || echo "No prior version available"

Repository: santifer/career-ops

Length of output: 45


🏁 Script executed:

git diff HEAD~1 HEAD -- liveness-core.mjs 2>/dev/null | head -50 || echo "No diff available"

Repository: santifer/career-ops

Length of output: 45


🏁 Script executed:

git log -p --follow liveness-core.mjs 2>/dev/null | head -200 || echo "Git history limited"

Repository: santifer/career-ops

Length of output: 45


🏁 Script executed:

rg -i "mycareersfuture|closed on" --type=md README.md CLAUDE.md 2>/dev/null || echo "No references found"

Repository: santifer/career-ops

Length of output: 82


🏁 Script executed:

# Check if 'uncertain' ever appears as a valid return value in the current code
rg -A5 -B5 "result.*uncertain" liveness-core.mjs

Repository: santifer/career-ops

Length of output: 45


The /closed on \w+ \d+/i regex is too broad and will cause false positives; patterns at lines 12–13 are redundant.

\w+ \d+ does not match dates; it matches any word(s) followed by digit(s): "closed on page 3", "closed on Issue 123", "closed on floor 2", "closed on Monday 25" (e.g., in a holiday notice in footer text or unrelated company news). Anchoring to an actual month or date format is essential to avoid marking otherwise active postings as expired.

Lines 12–13 overlap and can be consolidated: /applications? (have|are) closed/i and /applications? closed for this job/i can merge into a single pattern that also covers "applications closed on …" variants.

🛠️ Suggested tightening
-  /applications? (have|are) closed/i,
-  /applications? closed for this job/i,
-  /closed on \w+ \d+/i,
+  /applications? (?:have |are )?closed\b/i,
+  /closed on (?:\d{1,2}[\/\-.]\d{1,2}(?:[\/\-.]\d{2,4})?|(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[a-z]*\s+\d{1,2})/i,

Also: line 77 currently returns 'expired' as a fallback, but the documented API (docs/SCRIPTS.md) lists 'uncertain' as a valid verdict. The fallback should either return 'uncertain' for low-confidence cases (missing apply control but no hard-expired signal) or the documentation should be updated to reflect that only 'active' and 'expired' are possible outputs. Currently, check-liveness.mjs has infrastructure to count and report 'uncertain' verdicts, but that code path is unreachable from classifyLiveness.

Add a test case in test-all.mjs covering the mycareersfuture.gov.sg wording referenced in the PR description so these new patterns are regression-guarded.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@liveness-core.mjs` around lines 12 - 14, The liveness regexes are too broad
and partially redundant: replace the loose `/closed on \w+ \d+/i` with a
stricter date-aware pattern (match actual month names or ISO-like numeric dates,
e.g., month + day or YYYY-MM-DD) and consolidate the overlapping patterns
`/applications? (have|are) closed/i` and `/applications? closed for this job/i`
into a single, combined pattern that also covers "applications closed on …"
variants; update the pattern collection referenced by classifyLiveness (and any
variable holding those regexes) in liveness-core.mjs accordingly, change the
classifyLiveness fallback to return 'uncertain' instead of 'expired' when
confidence is low so it aligns with check-liveness.mjs/docs, and add a
regression test in test-all.mjs that includes the mycareersfuture.gov.sg
phrasing to protect these changes.

Comment thread liveness-core.mjs Outdated
@osamaKhalid122 osamaKhalid122 force-pushed the fix/liveness-detect-closed-postings branch from efb13af to ac183b8 Compare April 19, 2026 08:42
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
liveness-core.mjs (1)

12-14: ⚠️ Potential issue | 🟠 Major

Line 14 pattern /closed on \w+ \d+/i is too permissive and will mis-classify active postings as expired.

\w+ \d+ matches any word followed by any digits — not just dates. Real-world false positives on active job pages include footer/sidebar/news text like:

  • "Issue closed on GitHub 3" / "closed on issue 42"
  • "Office closed on floor 2"
  • "Ticket closed on case 1234"
  • "Closed on Monday 25" (holiday notice, non-date context)

Since this pattern short-circuits classification to expired before the apply-control check (line 59-62 runs before line 64), a single stray occurrence anywhere in bodyText will suppress an otherwise active posting. Anchor to an actual date format (month name or numeric date) to avoid collateral damage.

🛠️ Proposed tightening
   /applications? (have|are) closed/i,
   /applications? closed for this job/i,
-  /closed on \w+ \d+/i,
+  /closed on (?:\d{1,2}[\/\-.]\d{1,2}(?:[\/\-.]\d{2,4})?|\d{4}-\d{2}-\d{2}|(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[a-z]*\.?\s+\d{1,2})/i,

Also worth adding a regression fixture in test-all.mjs with the mycareersfuture.gov.sg wording from the PR description so these patterns are guarded against future edits.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@liveness-core.mjs` around lines 12 - 14, The regex `/closed on \w+ \d+/i` in
the patterns array inside liveness-core.mjs is too permissive and causes false
positives; replace it with a tighter pattern that anchors to real date formats
(e.g., month names or numeric dates like `closed on (?:jan|feb|...)\s+\d{1,2}`
or `closed on \d{1,2}[\/\-]\d{1,2}(?:[\/\-]\d{2,4})?`), update the patterns
array entry accordingly, and add a regression fixture in test-all.mjs using the
mycareersfuture.gov.sg example to ensure apply-control (the early
expired-classification logic referenced around the apply-control check) no
longer misclassifies active postings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@liveness-core.mjs`:
- Around line 12-14: The regex `/closed on \w+ \d+/i` in the patterns array
inside liveness-core.mjs is too permissive and causes false positives; replace
it with a tighter pattern that anchors to real date formats (e.g., month names
or numeric dates like `closed on (?:jan|feb|...)\s+\d{1,2}` or `closed on
\d{1,2}[\/\-]\d{1,2}(?:[\/\-]\d{2,4})?`), update the patterns array entry
accordingly, and add a regression fixture in test-all.mjs using the
mycareersfuture.gov.sg example to ensure apply-control (the early
expired-classification logic referenced around the apply-control check) no
longer misclassifies active postings.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c2838f9c-4706-4f9d-a051-781dd4a597bc

📥 Commits

Reviewing files that changed from the base of the PR and between efb13af and ac183b8.

📒 Files selected for processing (1)
  • liveness-core.mjs

Closed job postings (e.g. MaxLinear on mycareersfuture.gov.sg) were
being shortlisted because HARD_EXPIRED_PATTERNS in classifyLiveness
didn't cover the banner text "Applications have closed for this job"
or "Closed on {date}" timestamps. APPLY_PATTERNS then false-positived
on the disabled "Log in to Apply" button and the posting was
classified as active.

Adds three patterns to HARD_EXPIRED_PATTERNS:
- applications (have|are|is)? closed — consolidated variant covering
  "application closed", "applications have/are closed", and
  "applications closed for this job"
- closed on {day} {month} — matches e.g. "Closed on 26 Nov 2025"
- closed on {month} {day} — matches e.g. "Closed on Nov 26"

Month names are explicit to avoid false positives on phrases like
"closed on page 3" or "closed on Monday 25".

Adds a regression test in test-all.mjs against the mycareersfuture
wording.
@osamaKhalid122 osamaKhalid122 force-pushed the fix/liveness-detect-closed-postings branch from ac183b8 to 824b92e Compare April 19, 2026 11:57
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@liveness-core.mjs`:
- Around line 12-14: HARD_EXPIRED_PATTERNS currently handles written-month dates
but misses ISO and numeric formats; update the pattern list used by
HARD_EXPIRED_PATTERNS to include regexes that match "Closed on YYYY-MM-DD"
(e.g., /closed on\s+\d{4}-\d{2}-\d{2}/i) and "Closed on MM/DD/YYYY" (e.g.,
/closed on\s+\d{1,2}\/\d{1,2}\/\d{4}/i), keeping the same case-insensitive flag
and placement so these checks still run before the hasApplyControl logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 972e35d8-e2b7-416d-ae27-46570d46c6c4

📥 Commits

Reviewing files that changed from the base of the PR and between ac183b8 and 824b92e.

📒 Files selected for processing (2)
  • liveness-core.mjs
  • test-all.mjs

Comment thread liveness-core.mjs
Comment on lines +12 to +14
/applications?\s+(?:(?:have|are|is)\s+)?closed/i,
/closed on \d{1,2}\s+(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)/i,
/closed on (?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\w*\s+\d{1,2}/i,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Patterns look good; one minor gap on ISO-format close dates.

The three new patterns address the prior review feedback well: L12 consolidates "applications have/are/is closed" variants, and L13/L14 anchor "closed on …" against explicit month abbreviations in either order, avoiding the false-positive risk of the earlier \w+ \d+ approach. Ordering is also correct — HARD_EXPIRED_PATTERNS is checked at line 59 before hasApplyControl at line 64, so banners like "Applications have closed" correctly override a lingering "Log in to Apply" button.

One small gap worth noting: neither L13 nor L14 matches ISO-style "Closed on 2025-11-26" or "Closed on 11/26/2025" timestamps that some ATS emit. Not a blocker for this PR (the reported MyCareersFuture case uses "26 Nov 2025"), but worth considering a follow-up if you see similar misses from other ATS.

Optional extension to cover numeric date formats
   /closed on \d{1,2}\s+(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)/i,
   /closed on (?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\w*\s+\d{1,2}/i,
+  /closed on \d{4}-\d{1,2}-\d{1,2}/i,
+  /closed on \d{1,2}[\/.\-]\d{1,2}[\/.\-]\d{2,4}/i,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@liveness-core.mjs` around lines 12 - 14, HARD_EXPIRED_PATTERNS currently
handles written-month dates but misses ISO and numeric formats; update the
pattern list used by HARD_EXPIRED_PATTERNS to include regexes that match "Closed
on YYYY-MM-DD" (e.g., /closed on\s+\d{4}-\d{2}-\d{2}/i) and "Closed on
MM/DD/YYYY" (e.g., /closed on\s+\d{1,2}\/\d{1,2}\/\d{4}/i), keeping the same
case-insensitive flag and placement so these checks still run before the
hasApplyControl logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants