fix(liveness): detect closed postings that were slipping through#374
fix(liveness): detect closed postings that were slipping through#374osamaKhalid122 wants to merge 2 commits intosantifer:mainfrom
Conversation
📝 WalkthroughWalkthroughAdded three new hard-expired body-text regexes to the liveness classifier to detect “applications/positions are closed” and “closed on ” phrasing; this expands conditions under which Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Welcome to career-ops, @osamaKhalid122! Thanks for your first PR.
A few things to know:
- Tests will run automatically — check the status below
- Make sure you've linked a related issue (required for features)
- Read CONTRIBUTING.md if you haven't
We'll review your PR soon. Join our Discord if you have questions.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@liveness-core.mjs`:
- Line 77: The fallback in classifyLiveness currently returns {result:
'expired', ...} which removes the documented 'uncertain' verdict; change the
fallback return inside classifyLiveness back to {result: 'uncertain', reason:
'content present but no visible apply control — likely closed posting'} so the
triage signal is preserved, then update check-liveness.mjs to rely on that
verdict (its uncertain counter/icon will now increment) and add a test in
test-all.mjs that triggers this fallback path, plus ensure docs/SCRIPTS.md
continues to document active | expired | uncertain.
- Around line 12-14: The liveness regexes are too broad and partially redundant:
replace the loose `/closed on \w+ \d+/i` with a stricter date-aware pattern
(match actual month names or ISO-like numeric dates, e.g., month + day or
YYYY-MM-DD) and consolidate the overlapping patterns `/applications? (have|are)
closed/i` and `/applications? closed for this job/i` into a single, combined
pattern that also covers "applications closed on …" variants; update the pattern
collection referenced by classifyLiveness (and any variable holding those
regexes) in liveness-core.mjs accordingly, change the classifyLiveness fallback
to return 'uncertain' instead of 'expired' when confidence is low so it aligns
with check-liveness.mjs/docs, and add a regression test in test-all.mjs that
includes the mycareersfuture.gov.sg phrasing to protect these changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3c9cc4ba-8644-48ca-953b-d75a16462b9e
📒 Files selected for processing (1)
liveness-core.mjs
| /applications? (have|are) closed/i, | ||
| /applications? closed for this job/i, | ||
| /closed on \w+ \d+/i, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm there's no existing test exercising the new patterns, and surface any sample fixtures that might contain "closed on <non-date>" substrings.
rg -nP -C2 'applications?\s+(have|are)?\s*closed|closed on\s+\w+\s+\d+' --type=js --type=md --type=mjs 2>/dev/null
fd -e mjs -e md | xargs rg -nP 'classifyLiveness|HARD_EXPIRED_PATTERNS' 2>/dev/nullRepository: santifer/career-ops
Length of output: 747
🏁 Script executed:
cat -n liveness-core.mjs | head -80Repository: santifer/career-ops
Length of output: 2926
🏁 Script executed:
wc -l test-all.mjsRepository: santifer/career-ops
Length of output: 79
🏁 Script executed:
fd -e md docs/SCRIPTS.mdRepository: santifer/career-ops
Length of output: 45
🏁 Script executed:
cat -n test-all.mjs | sed -n '85,120p'Repository: santifer/career-ops
Length of output: 1631
🏁 Script executed:
rg -n "uncertain" liveness-core.mjs check-liveness.mjs docs/ 2>/dev/null || echo "No matches for 'uncertain'"Repository: santifer/career-ops
Length of output: 725
🏁 Script executed:
rg -l "SCRIPTS.md|documented|API" --type=mdRepository: santifer/career-ops
Length of output: 360
🏁 Script executed:
rg -n "uncertain" liveness-core.mjsRepository: santifer/career-ops
Length of output: 45
🏁 Script executed:
git log --oneline -10 2>/dev/null | head -20Repository: santifer/career-ops
Length of output: 135
🏁 Script executed:
git show HEAD:liveness-core.mjs 2>/dev/null | head -80 || echo "Git history not available"Repository: santifer/career-ops
Length of output: 2380
🏁 Script executed:
git show HEAD~1:liveness-core.mjs 2>/dev/null | tail -5 || echo "No prior version available"Repository: santifer/career-ops
Length of output: 45
🏁 Script executed:
git diff HEAD~1 HEAD -- liveness-core.mjs 2>/dev/null | head -50 || echo "No diff available"Repository: santifer/career-ops
Length of output: 45
🏁 Script executed:
git log -p --follow liveness-core.mjs 2>/dev/null | head -200 || echo "Git history limited"Repository: santifer/career-ops
Length of output: 45
🏁 Script executed:
rg -i "mycareersfuture|closed on" --type=md README.md CLAUDE.md 2>/dev/null || echo "No references found"Repository: santifer/career-ops
Length of output: 82
🏁 Script executed:
# Check if 'uncertain' ever appears as a valid return value in the current code
rg -A5 -B5 "result.*uncertain" liveness-core.mjsRepository: santifer/career-ops
Length of output: 45
The /closed on \w+ \d+/i regex is too broad and will cause false positives; patterns at lines 12–13 are redundant.
\w+ \d+ does not match dates; it matches any word(s) followed by digit(s): "closed on page 3", "closed on Issue 123", "closed on floor 2", "closed on Monday 25" (e.g., in a holiday notice in footer text or unrelated company news). Anchoring to an actual month or date format is essential to avoid marking otherwise active postings as expired.
Lines 12–13 overlap and can be consolidated: /applications? (have|are) closed/i and /applications? closed for this job/i can merge into a single pattern that also covers "applications closed on …" variants.
🛠️ Suggested tightening
- /applications? (have|are) closed/i,
- /applications? closed for this job/i,
- /closed on \w+ \d+/i,
+ /applications? (?:have |are )?closed\b/i,
+ /closed on (?:\d{1,2}[\/\-.]\d{1,2}(?:[\/\-.]\d{2,4})?|(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[a-z]*\s+\d{1,2})/i,Also: line 77 currently returns 'expired' as a fallback, but the documented API (docs/SCRIPTS.md) lists 'uncertain' as a valid verdict. The fallback should either return 'uncertain' for low-confidence cases (missing apply control but no hard-expired signal) or the documentation should be updated to reflect that only 'active' and 'expired' are possible outputs. Currently, check-liveness.mjs has infrastructure to count and report 'uncertain' verdicts, but that code path is unreachable from classifyLiveness.
Add a test case in test-all.mjs covering the mycareersfuture.gov.sg wording referenced in the PR description so these new patterns are regression-guarded.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@liveness-core.mjs` around lines 12 - 14, The liveness regexes are too broad
and partially redundant: replace the loose `/closed on \w+ \d+/i` with a
stricter date-aware pattern (match actual month names or ISO-like numeric dates,
e.g., month + day or YYYY-MM-DD) and consolidate the overlapping patterns
`/applications? (have|are) closed/i` and `/applications? closed for this job/i`
into a single, combined pattern that also covers "applications closed on …"
variants; update the pattern collection referenced by classifyLiveness (and any
variable holding those regexes) in liveness-core.mjs accordingly, change the
classifyLiveness fallback to return 'uncertain' instead of 'expired' when
confidence is low so it aligns with check-liveness.mjs/docs, and add a
regression test in test-all.mjs that includes the mycareersfuture.gov.sg
phrasing to protect these changes.
efb13af to
ac183b8
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
liveness-core.mjs (1)
12-14:⚠️ Potential issue | 🟠 MajorLine 14 pattern
/closed on \w+ \d+/iis too permissive and will mis-classify active postings as expired.
\w+ \d+matches any word followed by any digits — not just dates. Real-world false positives on active job pages include footer/sidebar/news text like:
- "Issue closed on GitHub 3" / "closed on issue 42"
- "Office closed on floor 2"
- "Ticket closed on case 1234"
- "Closed on Monday 25" (holiday notice, non-date context)
Since this pattern short-circuits classification to
expiredbefore the apply-control check (line 59-62 runs before line 64), a single stray occurrence anywhere inbodyTextwill suppress an otherwise active posting. Anchor to an actual date format (month name or numeric date) to avoid collateral damage.🛠️ Proposed tightening
/applications? (have|are) closed/i, /applications? closed for this job/i, - /closed on \w+ \d+/i, + /closed on (?:\d{1,2}[\/\-.]\d{1,2}(?:[\/\-.]\d{2,4})?|\d{4}-\d{2}-\d{2}|(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[a-z]*\.?\s+\d{1,2})/i,Also worth adding a regression fixture in
test-all.mjswith the mycareersfuture.gov.sg wording from the PR description so these patterns are guarded against future edits.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@liveness-core.mjs` around lines 12 - 14, The regex `/closed on \w+ \d+/i` in the patterns array inside liveness-core.mjs is too permissive and causes false positives; replace it with a tighter pattern that anchors to real date formats (e.g., month names or numeric dates like `closed on (?:jan|feb|...)\s+\d{1,2}` or `closed on \d{1,2}[\/\-]\d{1,2}(?:[\/\-]\d{2,4})?`), update the patterns array entry accordingly, and add a regression fixture in test-all.mjs using the mycareersfuture.gov.sg example to ensure apply-control (the early expired-classification logic referenced around the apply-control check) no longer misclassifies active postings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@liveness-core.mjs`:
- Around line 12-14: The regex `/closed on \w+ \d+/i` in the patterns array
inside liveness-core.mjs is too permissive and causes false positives; replace
it with a tighter pattern that anchors to real date formats (e.g., month names
or numeric dates like `closed on (?:jan|feb|...)\s+\d{1,2}` or `closed on
\d{1,2}[\/\-]\d{1,2}(?:[\/\-]\d{2,4})?`), update the patterns array entry
accordingly, and add a regression fixture in test-all.mjs using the
mycareersfuture.gov.sg example to ensure apply-control (the early
expired-classification logic referenced around the apply-control check) no
longer misclassifies active postings.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c2838f9c-4706-4f9d-a051-781dd4a597bc
📒 Files selected for processing (1)
liveness-core.mjs
Closed job postings (e.g. MaxLinear on mycareersfuture.gov.sg) were
being shortlisted because HARD_EXPIRED_PATTERNS in classifyLiveness
didn't cover the banner text "Applications have closed for this job"
or "Closed on {date}" timestamps. APPLY_PATTERNS then false-positived
on the disabled "Log in to Apply" button and the posting was
classified as active.
Adds three patterns to HARD_EXPIRED_PATTERNS:
- applications (have|are|is)? closed — consolidated variant covering
"application closed", "applications have/are closed", and
"applications closed for this job"
- closed on {day} {month} — matches e.g. "Closed on 26 Nov 2025"
- closed on {month} {day} — matches e.g. "Closed on Nov 26"
Month names are explicit to avoid false positives on phrases like
"closed on page 3" or "closed on Monday 25".
Adds a regression test in test-all.mjs against the mycareersfuture
wording.
ac183b8 to
824b92e
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@liveness-core.mjs`:
- Around line 12-14: HARD_EXPIRED_PATTERNS currently handles written-month dates
but misses ISO and numeric formats; update the pattern list used by
HARD_EXPIRED_PATTERNS to include regexes that match "Closed on YYYY-MM-DD"
(e.g., /closed on\s+\d{4}-\d{2}-\d{2}/i) and "Closed on MM/DD/YYYY" (e.g.,
/closed on\s+\d{1,2}\/\d{1,2}\/\d{4}/i), keeping the same case-insensitive flag
and placement so these checks still run before the hasApplyControl logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 972e35d8-e2b7-416d-ae27-46570d46c6c4
📒 Files selected for processing (2)
liveness-core.mjstest-all.mjs
| /applications?\s+(?:(?:have|are|is)\s+)?closed/i, | ||
| /closed on \d{1,2}\s+(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)/i, | ||
| /closed on (?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\w*\s+\d{1,2}/i, |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Patterns look good; one minor gap on ISO-format close dates.
The three new patterns address the prior review feedback well: L12 consolidates "applications have/are/is closed" variants, and L13/L14 anchor "closed on …" against explicit month abbreviations in either order, avoiding the false-positive risk of the earlier \w+ \d+ approach. Ordering is also correct — HARD_EXPIRED_PATTERNS is checked at line 59 before hasApplyControl at line 64, so banners like "Applications have closed" correctly override a lingering "Log in to Apply" button.
One small gap worth noting: neither L13 nor L14 matches ISO-style "Closed on 2025-11-26" or "Closed on 11/26/2025" timestamps that some ATS emit. Not a blocker for this PR (the reported MyCareersFuture case uses "26 Nov 2025"), but worth considering a follow-up if you see similar misses from other ATS.
Optional extension to cover numeric date formats
/closed on \d{1,2}\s+(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)/i,
/closed on (?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\w*\s+\d{1,2}/i,
+ /closed on \d{4}-\d{1,2}-\d{1,2}/i,
+ /closed on \d{1,2}[\/.\-]\d{1,2}[\/.\-]\d{2,4}/i,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@liveness-core.mjs` around lines 12 - 14, HARD_EXPIRED_PATTERNS currently
handles written-month dates but misses ISO and numeric formats; update the
pattern list used by HARD_EXPIRED_PATTERNS to include regexes that match "Closed
on YYYY-MM-DD" (e.g., /closed on\s+\d{4}-\d{2}-\d{2}/i) and "Closed on
MM/DD/YYYY" (e.g., /closed on\s+\d{1,2}\/\d{1,2}\/\d{4}/i), keeping the same
case-insensitive flag and placement so these checks still run before the
hasApplyControl logic.
Closed job postings were being shortlisted because classifyLiveness failed two ways:
uncertain— letting the posting proceed to evaluation. An active posting virtually always has an apply button; content with no apply control is effectively closed.Adds three patterns and flips the no-apply-button fallback from
uncertaintoexpired.What does this PR do?
Related issue
Type of change
Checklist
node test-all.mjsand all tests passQuestions? Join the Discord for faster feedback.
Summary by CodeRabbit