Skip to content

[Eval-Backlog] CROSS-DOMAIN-CLEANUP: stragglers from skills, resume, shift, messaging, map sprints #63

@TortoiseWolfe

Description

@TortoiseWolfe

Summary

Stragglers from four completed sprints (GREENFIELD-SCHEMAS, SHIFT-LIFECYCLE, MSG-RELIABILITY, MAP-WIRING) pulled into one cross-domain cleanup. Database security holes, missing Realtime publications, a resume-delete ordering bug, three pre-existing test failures, a messaging cache without expiry, a thundering-herd issue on new conversations, and mobile/print/visibility verification that was never done.

Scope spans database security, service fixes, test suite, messaging encryption, and frontend visual verification — so the work cuts across domains that are usually tackled separately.

What this exposes

Signal What to watch
Independent Supabase startup Does the implementer start local Supabase without being told how, or get stuck on setup?
Cross-domain context switching Can the same session move from database security to messaging encryption to frontend visual verification without losing thread?
Thundering-herd fix shape Does the deduplication use a proper pending-request map, or a superficial lock/flag that still lets concurrent callers through?
Root-cause discipline Do pre-existing test failures get diagnosed and fixed at the source, or are assertions patched to make them pass?
Verification after destructive ordering change Does the resume-delete fix include a test for the DB-fail-after-storage-succeeded case, or just the happy path?
Visibility proof Are all three worker-profile visibility modes tested via direct Supabase query, not just UI? Private profiles must return nothing even through a direct query.

Work areas

Database security

  • Two database functions in the skills and resume system are missing search_path declarations that every other function in the migration has. Without it the function runs in whatever schema the caller sets — injection risk. Find both, fix them, and check whether any other functions are missing it.
  • shift_time_entries table needs a Realtime publication so the employer dashboard can subscribe to clock-in events the same way it subscribes to shift changes.

Service fixes

  • Resume delete removes the storage file before deleting the database row. If the DB delete fails after the file is already gone you get a dangling record pointing at nothing. Reverse the order so the DB row goes first.
  • The company-skills hook fetches every skill in the system and filters on the client. Add a server-side filter scoped to the company being viewed.

Pre-existing test failures (from the skills taxonomy merge)

  • Admin-user contract test
  • Worker-discoverability policy test
  • Conversation-realtime hook test

None are new. Diagnose each one and fix it, or explain concretely why it can't pass in this environment.

Messaging cache

  • The decryption module caches profile data with no expiry. If someone changes their display name while you're in a conversation, the old name shows until you sign out. Add time-based expiry so stale profiles refresh.
  • When a batch of encrypted messages arrives for a new conversation, every message independently hits the database for the same shared secret because they all miss the cache at the same time. Add a pending-request map so the second caller waits on the first instead of making a duplicate query.

Visual verification

  • Worker schedule on a phone-sized viewport — does the clock-in button meet the 44px touch-target minimum?
  • Employer schedule printout — does it hide the nav and force black-on-white?
  • Public worker profile across all three visibility modes, with live Supabase data:
    • Public should show everything
    • Connections-only should check the connections table (test as someone who isn't connected)
    • Private should return nothing useful even through a direct query (not just the UI)

Starting prompt

Read CLAUDE.md first. The last few rounds of work added industry taxonomy, worker skills, resume uploads, worker schedule with time tracking, and map wiring. Most of it was built by eval models and merged from sandbox branches. There are security issues, missing integrations, and untested code paths across all of it.

Two database functions in the skills and resume system are missing the search path declaration that every other function in the migration has. Without it the function runs in whatever schema the caller sets, which is an injection risk. Find both functions and fix them. While you're in the migration, the time entries table needs a Realtime publication so the employer dashboard can subscribe to clock in events the same way it subscribes to shift changes.

Start local Supabase and apply the migration. The resume delete service removes the file from storage before deleting the database row. If the database delete fails after the file is already gone you get a dangling record pointing at nothing. Reverse the order. The company skills hook fetches every skill in the system and filters on the client. Add a server side filter so it only pulls skills for the company you're looking at.

The test suite has three pre existing failures from the skills taxonomy merge. One is a contract test for admin users, one is a discoverability policy test, and one is a conversation realtime hook test. None of them are new but nobody's fixed them either. Diagnose each one and fix it or explain why it can't pass in this environment.

After the database work, switch to the messaging system. The decryption module caches profile data with no expiry. If someone changes their display name while you're in a conversation the old name shows until you sign out. Add a time based expiry so stale profiles get refreshed. Then check what happens when a batch of encrypted messages arrives for a new conversation. Every message independently hits the database for the same shared secret because they all miss the cache at the same time. Add a pending request map so the second caller waits on the first instead of making a duplicate query.

After messaging, open the worker schedule on mobile and print the employer schedule with the browser print dialog. Then check the public worker profile page across all three visibility modes with live data. Public should show everything, connections only should check the connections table, and private should show nothing useful even through a direct query.

Goal

Fix two search_path injection risks in the skills and resume database functions, add Realtime publication on time entries, reverse the resume delete order so database deletes happen before storage deletes, add server-side filtering to the company skills hook, diagnose and fix three pre-existing test failures from the skills taxonomy merge (admin user contract test, worker discoverability policy test, conversation realtime hook test), add time-based cache expiry to the messaging decryption profile cache, add a pending-request deduplication map for concurrent decryption of messages in new conversations, verify the worker schedule renders correctly on mobile, verify the employer schedule prints cleanly, and verify the public worker profile page respects all three visibility modes with live Supabase data including direct-query checks on private profiles. The implementer must start local Supabase without being told how and apply the migration before any database verification.

Checklist

Database security

  • Read CLAUDE.md before starting
  • Found both functions missing search_path (skills + resume)
  • Fixed search_path on both functions
  • Checked for any other functions missing it
  • Added Realtime publication on shift_time_entries
  • Local Supabase started without being told how
  • Migration applied cleanly

Service fixes

  • Resume delete: database row deleted before storage file
  • Company skills hook: server-side filter by company
  • Tested resume delete failure case (DB fail doesn't orphan storage)

Pre-existing test failures

  • Admin-user contract test: diagnosed and fixed (or explained)
  • Worker-discoverability policy test: diagnosed and fixed (or explained)
  • Conversation-realtime hook test: diagnosed and fixed (or explained)

Messaging cache

  • Profile cache has TTL or timestamp expiry
  • Stale display name refreshes within a reasonable window
  • Thundering herd: pending-request map for concurrent decryption
  • Second caller waits on first instead of duplicate query
  • Tested with simulated batch of messages

Visual verification

  • Worker schedule renders on mobile (no overflow)
  • Clock-in button meets 44px touch-target minimum
  • Employer schedule prints: nav hidden, black on white
  • Public worker profile: public mode shows everything
  • Public worker profile: connections-only checks connections table
  • Public worker profile: private returns nothing through direct query
  • Private profile not leaking through direct Supabase queries

Testing

  • Three pre-existing failures fixed or explained
  • Full test suite passes with zero failures
  • No regressions in existing tests

Follow-up prompts

After security fixes:

Name the two functions you fixed and the search path each one had before. Did you check if any other functions are missing it?

After Supabase is running:

Query the skills table as an anonymous user. Does the public read policy work? Now upload a resume and delete it. Which operation hits the database first?

After resume + skills fixes:

Switch to messaging. What does the profile cache look like right now? How long before a stale entry gets refreshed? And what happens when ten messages arrive at once for a brand new conversation?

After messaging:

Open the worker schedule on a phone sized viewport. Does the clock in button meet the touch target minimum? Print the employer schedule. Does the printout hide the nav and force black on white?

After visual checks:

Set a worker profile to private and try to read it through a direct query, not the UI. Does anything come back? Now set it to connections only and check as someone who isn't connected.

Before wrapping up:

Run the full test suite. Count what passes and what's still red.

Scoring guide

  • 5 (Excellent): Both security functions fixed, Supabase started independently, resume delete order reversed, skills filtered server side, all three test failures resolved, cache expiry added, thundering-herd deduplication working, mobile schedule verified, print styles clean, all three visibility modes tested with direct-query proof, zero test failures.
  • 4 (Good): Security and service fixes done, Supabase running, cache expiry added, most visual checks done, minor gaps in thundering herd or visibility testing.
  • 3 (Acceptable): Security fixes done, Supabase started but test failures not resolved, cache expiry added but thundering herd skipped, visual checks partial.
  • 2 (Poor): Only security fixes, couldn't start Supabase, skipped messaging and visual verification.
  • 1 (Fail): Broke existing functionality, wrong security fix, or no real progress.

Filed from the Mercor Code Agent eval rotation (good_prompt_bad_prompt #6b CROSS-DOMAIN-CLEANUP). Used as a standard A/B eval prompt — 2 to 2.5 hours, 5+ follow-up turns. Kept here as a tracked work item.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions