Skip to content

Readable Task Names#51

Closed
cooper-grc wants to merge 4 commits intomainfrom
cg/tasks-slugs
Closed

Readable Task Names#51
cooper-grc wants to merge 4 commits intomainfrom
cg/tasks-slugs

Conversation

@cooper-grc
Copy link
Copy Markdown
Contributor

@cooper-grc cooper-grc commented Feb 12, 2026

Summary by cubic

Adds human-readable task names (slugs) across API, CLI, and the mounted filesystem, with lookups by name and full UUID backward compatibility. Names are auto-generated from the prompt or image plus an 8‑char ID suffix.

  • New Features

    • Auto-generate task name slugs from prompt/image (+ first 8 of ID); slug utils and tests added.
    • Database: new task.name column with index; migration backfills existing tasks as task-<8>.
    • Repository: CreateTask sets Name; added GetTaskByName; queries/scans include Name.
    • API/gRPC: GetTask accepts UUID or name; Task includes name in JSON/proto; list endpoints return name.
    • Filesystem: task files use slug names (fallback to ID); lookups accept slug or UUID; 10MB sentinel file size.
    • CLI: list shows Name column; task detail prints Name when present.
  • Migration

    • Run DB migration 020_task_name.
    • Deploy services/clients with updated proto to consume the new Task.name field.

Written for commit 10aaa2f. Summary will update on new commits.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 13 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="pkg/repository/task_postgres.go">

<violation number="1" location="pkg/repository/task_postgres.go:454">
P1: GetRetryableTasks now scans a `name` column, but its SELECT query doesn’t include `name`, which will cause a column count mismatch at runtime.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

taskId := g.extractTaskId(relPath)
if taskId == "" {
// Individual task file - extract task name from filename
taskName := g.extractTaskName(relPath)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we guarantee uniqueness here?

func (g *FilesystemGroup) taskToVirtualFile(task *types.Task) *types.VirtualFile {
name := task.ExternalId + ".task"
if task.Name != "" {
name = task.Name + ".task"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may want to refactor .task suffix out to a const

func (g *FilesystemGroup) extractTaskId(relPath string) string {
// lookupTask finds a task by name slug, falling back to external ID for backwards compatibility.
func (g *FilesystemGroup) lookupTask(ctx context.Context, nameOrId string) (*types.Task, error) {
// Try name-based lookup first (covers new slug-based names)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above regarding uniqueness

@cooper-grc cooper-grc marked this pull request as draft February 23, 2026 16:55
Copy link
Copy Markdown
Contributor

@luke-lombardi luke-lombardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: PR #51 — Readable Task Names

Overall this is a solid feature — the slug generation logic is clean, well-tested, and the backwards-compatibility story (UUID fallback) is thoughtful. A few issues need addressing before merge, ranging from a potential cross-workspace data leak down to minor inconsistencies.


🔴 Critical — GetTaskByName has no workspace isolation

pkg/repository/task_postgres.go (new GetTaskByName function, bottom of file)

SELECT ... FROM task WHERE name = $1

This query is not scoped to a workspace. Task names are derived from user prompts, so two different workspaces could end up with the same name (e.g. both have a task named fix-auth-bug-a1b2c3d4). As written, GetTaskByName will return a task from whichever workspace happens to appear first — leaking a task from workspace A to a lookup in workspace B.

Every other lookup in this repo filters by workspace_id. This one should too. Suggest adding a workspace_id parameter and AND workspace_id = $2 to the query, and threading the workspace ID through all callers.


🔴 Critical — No UNIQUE constraint on name

pkg/repository/backend_postgres_migrations/020_task_name.go

The migration creates a plain CREATE INDEX IF NOT EXISTS idx_task_name ON task(name) but not a unique index. Without a unique constraint (or a unique-per-workspace constraint), two tasks can end up with the same name. This makes GetTaskByName non-deterministic — the DB will return an arbitrary row when names collide.

At minimum add CREATE UNIQUE INDEX (or, better, a unique constraint scoped to (workspace_id, name)) so that GetTaskByName is guaranteed to return at most one row.


🟡 Important — Two-step name creation leaves a window with name = NULL

pkg/repository/task_postgres.go, CreateTask

// INSERT task first (name column is NULL at this point)
if _, err := b.db.ExecContext(ctx, `INSERT INTO task (...) VALUES (...)`, ...); err != nil { ... }

// Then generate and set the name in a separate UPDATE
task.Name = common.GenerateTaskName(task.Prompt, task.Image, task.ExternalId)
if _, err := b.db.ExecContext(ctx, `UPDATE task SET name = $1 WHERE id = $2`, task.Name, task.Id); err != nil {
    return fmt.Errorf("failed to set task name: %w", err)
}

If the UPDATE fails (e.g. network blip, constraint violation), the task row exists but has no name. Subsequent GetTask calls will return it with an empty Name, and the filesystem will fall back to the UUID filename — which is fine for compatibility but surprising. Worse, if a unique constraint is added (see above), a retry of CreateTask could fail on the INSERT before ever reaching the name step.

GenerateTaskName doesn't do any I/O, so the name can be computed before the INSERT. Recommend moving name generation before the INSERT and including name in the initial insert to make this atomic.


🟡 Important — Inconsistent lookup order between filesystem.go and the other handlers

pkg/api/v1/filesystem.go, lookupTask vs pkg/api/v1/tasks.go, GetTask and pkg/gateway/services/gateway.go, GetTask

  • tasks.go and gateway.go both check uuid.Parse(id) first: if it parses as a UUID → GetTask, otherwise → GetTaskByName.
  • filesystem.go's lookupTask always tries name first, then falls back to external ID:
    task, err := g.backend.GetTaskByName(ctx, nameOrId)
    if err == nil { return task, nil }
    return g.backend.GetTask(ctx, nameOrId)

This means a UUID-shaped string is looked up by name in the filesystem path (wasting a DB round trip) and by UUID first everywhere else. Should use the same uuid.Parse heuristic for consistency and efficiency. The same applies to pkg/filesystem/vnode/tasks.go's getTaskByName, which also does name-first / ID-fallback without the UUID check.


🟡 Minor — Sentinel file size (10 MB) will surprise users in ls -lh

pkg/filesystem/vnode/tasks.go

const taskFileSentinelSize int64 = 10 << 20 // 10MB

Reporting a fixed 10 MB for every task file in Getattr and Readdir means ls -lh /tasks will show every file as 10M regardless of actual content. The comment explains the FUSE motivation, but users who ls -lh the mount will see misleading sizes. Worth noting in the PR description / docs, and worth reconsidering whether a smaller sentinel (e.g. 1 MB, or even 64 KB) would still satisfy FUSE without being as surprising.


✅ What looks good

  • Slug generation logic in pkg/common/slug.go is clean and well-commented.
  • Test coverage for GenerateTaskName is thorough (stop words, truncation, format invariants, image extraction).
  • Backwards-compatibility via UUID fallback is correctly handled in the API and FUSE layers.
  • Migration includes a proper DOWN path.
  • Proto field is added at field number 11 (non-breaking addition).

Copy link
Copy Markdown
Contributor

@luke-lombardi luke-lombardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall: Large, well-scoped PR with good backward compatibility story. The slug generation algorithm is clean and the tests cover the format invariants well. A few areas to look at:

Double-lookup overhead in filesystem lookupTask:
The filesystem path always tries name first, then falls back to external ID. If clients using legacy UUID-based filenames are still common, every read/stat will incur an extra GetTaskByName call before the successful GetTask. Consider checking whether the path looks like a UUID first (as the HTTP API handler does) to avoid the extra DB roundtrip.

HTTP API GetTask uses UUID parse to decide lookup path:
uuid.Parse(id) is used to dispatch between GetTask and GetTaskByName. This is correct in practice since generated slugs can't be valid UUIDs, but it means if someone has a name that happens to be a valid UUID string (unlikely but possible if manually set), it would be looked up as an ID and miss. This is probably fine, worth a comment.

taskToVirtualFile size reporting:
This function now returns a virtual file with taskFileSentinelSize (10MB). Given the chain of PRs #49/#50/#88, which approach to size reporting is this one aligning with? The sentinel here may need to be updated to use the cache-based approach from #50 or #88 to stay consistent.

Image fallback in slug generation:
extractImageName("my-sandbox") → "my-sandbox" — the hyphen in the image name passes through as-is into the slug. Since the slug later joins words with hyphens, an image like my-sandbox becomes sandbox-<suffix> (only last component, but with the hyphen treated as a separator). The test case confirms this gives sandbox-<suffix>, which looks correct.

DB migration:
The migration file isn't in the diff, but the summary mentions 020_task_name. Would be good to confirm the backfill runs efficiently for large task tables — a table scan with UPDATE generating slugs for every existing task could be slow on large deployments and may need to be batched or run offline.

Tests: Good slug unit tests and format invariant checks. A test for lookupTask covering the name→UUID fallback path in the filesystem layer would add confidence.

CLI change: Adding NAME as the first column in the table is a good UX call — truncating at 40 chars is reasonable.

Copy link
Copy Markdown
Contributor

@luke-lombardi luke-lombardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #51: Readable Task Names (Draft)

Large, cross-cutting PR that adds human-readable slug names for tasks throughout the API, filesystem, CLI, and database. The scope is broad but well-organized.

What looks good:

  • Slug generation algorithm is solid: stop-word filtering, 5-word cap, 50-char truncation, 8-char ID suffix for uniqueness.
  • GenerateTaskName tests are comprehensive and cover edge cases (special chars, non-ASCII, empty inputs, stop-words-only prompts).
  • Backwards compatibility is maintained throughout: UUID lookups fall back correctly in getTaskByName, lookupTask (filesystem API), and GetTask (tasks API).
  • taskFilename refactored to accept *types.Task instead of a bare ID string — cleaner and eliminates the need to reconstruct the task object at call sites.
  • DB migration backfills existing tasks with task-<8char> names — safe and non-destructive.

Issues / Observations:

  1. Lookup strategy inconsistency between filesystem API and tasks API. In filesystem.go, lookupTask always tries name first then ID. In tasks.go, GetTask checks uuid.Parse first and only tries name lookup if it's not a UUID. These two paths could diverge for inputs that happen to look like UUIDs but were intended as names (unlikely, but possible if name generation ever produces UUID-shaped strings). Standardizing on one strategy would be safer.

  2. Double DB lookup in lookupTask on cache miss. In getTaskByName (vnode), if the task isn't in the cache, two sequential DB calls are made (GetTaskByName then GetTask fallback). This is fine for correctness but adds latency on cache miss. Consider merging into a single query that accepts either name or external_id.

  3. idSuffix strips hyphens from UUIDs. The resulting suffix is dense hex (550e8400 from 550e8400-e29b-...). This is fine but worth documenting: the suffix is from the UUID without hyphens, not the UUID itself, which could confuse someone trying to find the task by the suffix alone.

  4. The test for image fallback expects sandbox-<suffix> from ghcr.io/org/my-sandbox:latest. The extractImageName strips the my- prefix via the stop-word filter? Actually — my is in the stop-word list and the slug is built from my-sandbox → lowercased → split on non-alphanumeric → ["my", "sandbox"]my filtered → ["sandbox"]sandbox. This is correct per the algorithm but could surprise users who named their image my-service and see only service in the task name.

  5. Migration dependency: The PR description correctly notes that DB migration 020_task_name must be run and services redeployed with the updated proto before this is safe to roll out. Worth adding a pre-merge checklist item or deploy note.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants