Skip to content

Add File Size Logic And Testing#50

Draft
cooper-grc wants to merge 4 commits intocg/tasks-files-2from
cg/tasks-files-cache
Draft

Add File Size Logic And Testing#50
cooper-grc wants to merge 4 commits intocg/tasks-files-2from
cg/tasks-files-cache

Conversation

@cooper-grc
Copy link
Copy Markdown
Contributor

@cooper-grc cooper-grc commented Feb 12, 2026

Summary by cubic

Expose task files with true sizes from a shared 5s content cache. Readdir builds content; getattr returns cached or last-known size without fetching, and read uses the same bytes for correct EOF.

  • New Features

    • Per-task content cache (5s TTL) for rendered content and last-known size; reused across getattr, readdir, and read.
    • Injected logsFetcher; gRPC mounts fetch logs via S2, non-GRPC shows a simple placeholder.
  • Bug Fixes

    • Getattr never fetches logs and returns 0 or last-known size until content is cached; readdir populates cache and returns actual readable size; read matches those bytes for correct EOF.
    • Added tests for large logs, size consistency across ops, EOF, and single fetch per TTL.

Written for commit f7929bd. Summary will update on new commits.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="pkg/filesystem/vnode/tasks.go">

<violation number="1" location="pkg/filesystem/vnode/tasks.go:381">
P2: Getattr now fetches full task logs just to compute file size, which can make metadata lookups unexpectedly expensive and block stat calls. Avoid fetching logs during Getattr; only use cached content or a lightweight size placeholder.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="pkg/filesystem/vnode/tasks.go">

<violation number="1" location="pkg/filesystem/vnode/tasks.go:381">
P2: Getattr reports size 0 on cache misses. That makes file size incorrect unless the content was already cached, and it disagrees with the content returned by Read.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="pkg/filesystem/vnode/tasks.go">

<violation number="1" location="pkg/filesystem/vnode/tasks.go:381">
P2: Getattr now calls getTaskContent, which renders logs and can trigger expensive log fetches on every stat/ls. This defeats the fast-path behavior and can cause heavy backend calls just to compute size. Use cached content only and fall back to size 0 when not cached.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="pkg/filesystem/vnode/tasks.go">

<violation number="1" location="pkg/filesystem/vnode/tasks.go:396">
P2: Getattr returns size 0 on first access when no cache/size entry exists, so file size is incorrect until a read/readdir populates the cache. This breaks the expectation that getattr reports the actual size.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment on lines +396 to +401
size := int64(0)
if data, ok := t.getCachedTaskContent(task.ExternalId); ok {
size = int64(len(data))
} else if s := t.getLastKnownSize(task.ExternalId); s > 0 {
size = s
}
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Getattr returns size 0 on first access when no cache/size entry exists, so file size is incorrect until a read/readdir populates the cache. This breaks the expectation that getattr reports the actual size.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At pkg/filesystem/vnode/tasks.go, line 396:

<comment>Getattr returns size 0 on first access when no cache/size entry exists, so file size is incorrect until a read/readdir populates the cache. This breaks the expectation that getattr reports the actual size.</comment>

<file context>
@@ -378,8 +393,12 @@ func (t *TasksVNode) Getattr(path string) (*FileInfo, error) {
 
-	data := t.getTaskContent(ctx, task)
-	size := int64(len(data))
+	size := int64(0)
+	if data, ok := t.getCachedTaskContent(task.ExternalId); ok {
+		size = int64(len(data))
</file context>
Suggested change
size := int64(0)
if data, ok := t.getCachedTaskContent(task.ExternalId); ok {
size = int64(len(data))
} else if s := t.getLastKnownSize(task.ExternalId); s > 0 {
size = s
}
size := int64(0)
if data, ok := t.getCachedTaskContent(task.ExternalId); ok {
size = int64(len(data))
} else if s := t.getLastKnownSize(task.ExternalId); s > 0 {
size = s
} else {
data := t.getTaskContent(ctx, task)
size = int64(len(data))
}
Fix with Cubic

Copy link
Copy Markdown
Contributor

@luke-lombardi luke-lombardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall: Good progression from #49 — replacing the static sentinel with a real content cache that populates accurate sizes is the right direction. Tests are solid and cover the important invariants (size consistency, cache reuse, EOF). A few things worth discussing:

Potential performance issue in Readdir: Readdir now calls getTaskContent for every task in the listing, which in turn calls fetchTaskLogsGRPC (a gRPC round-trip) for every task not yet cached. If a workspace has many tasks, an ls /tasks could trigger N concurrent log fetches. The 5-second TTL helps for repeated listings, but the first call is an unbounded fan-out. Consider whether a lazy approach (skip log fetch in Readdir, just return cached or 0) is acceptable for the listing case, since clients should re-stat/read anyway.

Getattr returns 0 before any Readdir: The test TestTasksVNode_GetattrUsesPlaceholderUntilContentCached documents this explicitly and treats 0 as acceptable. Some FUSE clients may skip reading a file reported as 0 bytes (same problem as the original bug). If Readdir is always called before individual reads this is fine, but it's worth verifying with the specific FUSE/NFS client in use.

logsFetcher nil check in renderTaskContent: renderTaskContent guards on t.logsFetcher != nil and falls back to "(logs available via API)\n". This means NewTasksVNode (non-gRPC path) will silently produce stub content. Is that intentional? If logs aren't accessible without gRPC, the stub is fine — just making sure it's deliberate.

sizeCache is redundant when content is cached: When contentCache[taskId] is valid, sizeCache[taskId] should always equal len(data). The two-map design adds some complexity; alternatively sizeCache only needs to survive past the content TTL expiry so users can re-stat without re-fetching. The current design is correct — just noting it for future simplification if desired.

Tests: Good coverage of the key invariants. The large-logs test (>10MB) is a smart regression guard.

This was referenced Feb 27, 2026
Copy link
Copy Markdown
Contributor

@luke-lombardi luke-lombardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #50: Add File Size Logic And Testing (Draft)

Replaces the static 10MB sentinel from #49 with a proper per-task content cache (5s TTL) shared across Getattr, Readdir, and Read. This gives clients accurate file sizes and eliminates redundant log fetches within the TTL window.

What looks good:

  • The logsFetcher function injection pattern is clean — makes the gateway backend and gRPC paths testable without real connections.
  • Cache populated in Readdir (the path that triggers a full ls) and then reused in Read is the right flow order; eliminates a double fetch.
  • Getattr intentionally avoids triggering a fetch and uses the last-known size (or 0) — avoids latency spikes on stat calls.
  • Tests cover large files (>10MB), size consistency across ops, EOF handling, and single-fetch-per-TTL. Good coverage.
  • Nil guard on contentCache / sizeCache inside setCachedTaskContent is defensive, though the maps are always initialized in the constructors. Could remove the nil checks, but they're harmless.

Observations for before merge:

  1. getTaskContent always renders and caches, even for non-terminal tasks. Once a running task's content is cached for 5s, a Read during that window will serve stale status/logs. The 5s TTL is short enough that this is probably acceptable, but it's worth calling out — especially if log output is expected to update quickly.

  2. Memory growth. The content cache holds full rendered content (task info + logs) for all tasks that have been read within the last 5s. If logs are large (the test checks >10MB), cache entries can be substantial. Consider capping per-entry size or evicting when memory pressure is high, or document the expected max log size.

  3. logsFetcher is not set in NewTasksVNode (gateway backend mode). This means the gateway-mounted /tasks path will always show (logs available via API) in the rendered output. This appears intentional (logs are fetched via S2 only for gRPC mounts), but it's worth a comment explaining why.

  4. This PR's base branch is cg/tasks-files-2 (PR #49), not main. Ensure this is merged after #49 and rebased/retargeted before merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants