Skip to content

Feature/getdents64#48

Open
RaymondTana wants to merge 7 commits intomainfrom
feature/getdents64
Open

Feature/getdents64#48
RaymondTana wants to merge 7 commits intomainfrom
feature/getdents64

Conversation

@RaymondTana
Copy link
Contributor

@RaymondTana RaymondTana commented Feb 17, 2026

Implementation of a getdents64 syscall handler.

  • Fully virtualized response for procfile backend
  • Full passthrough for tmp and passthrough backends.
  • Merge for cow backend: kernel (passthrough) + cow (passthrough) - tombstoned (in-memory)

Required adding tombstone : Tombstones field (an in-memory has set of deleted paths/directories) to the Supervisor.

Also affected the handlers for fstatat64, faccessat, and openat in the cases of CoW and tmp backends.


Summary by cubic

Adds a getdents64 syscall handler that virtualizes directory listings across backends. Adds in-memory tombstones with locking to hide deleted paths, and fixes lseek/dirent offset handling with proper error propagation.

  • New Features

    • getdents64 wired into syscall dispatch.
    • Passthrough/tmp: direct kernel read; errors propagated.
    • COW: merges kernel + overlay entries, deduped; filters tombstones; per-FD dirents_offset; larger limits; correct d_off; logs when truncated.
    • Proc: synthetic listings for /proc, /proc/self, /proc/; directories are DIR; status is REG; statx reports correct modes.
    • Tombstones in Supervisor (add/remove/isTombstoned/isChildTombstoned), guarded by mutex.
    • dirent helpers for record sizing and serialization.
  • Refactors

    • faccessat/fstatat64/openat honor tombstones; openat clears tombstone on O_CREAT else returns ENOENT.
    • File exposes getdents64; handler uses a 4KB buffer, continues for stdio, and locks for COW/proc to avoid races.
    • lseek dispatch fixed; seeking to 0 resets dirents_offset.

Written for commit 96fcbdb. Summary will update on new commits.

@greptile-apps
Copy link

greptile-apps bot commented Feb 17, 2026

Greptile Summary

Adds comprehensive getdents64 syscall support with tombstone-aware directory listings. The implementation includes virtualized /proc directory listings, COW backend with kernel+overlay merging and deduplication, and passthrough for tmp/passthrough backends. All previous review thread issues have been addressed: d_off now uses sequential offsets, directory/file distinction fixed for /proc/self, truncation is logged, overlay errors propagate, lseek resets dirents_offset, and tombstone access is properly mutex-protected.

Key additions:

  • New Tombstones tracking system for deleted paths with proper locking
  • COW getdents64 merges kernel and overlay entries, filters tombstones, with increased limits (2048 entries, 32KB names)
  • Proc getdents64 synthesizes directory listings for /proc, /proc/self, and /proc/<pid>
  • File.dirents_offset tracks returned entries per FD, reset on lseek to position 0
  • openat, faccessat, fstatat64 honor tombstones for COW/tmp paths

Minor issue: tmp backend's getdents64 doesn't filter tombstones (unlike other tmp handlers), though this may be intentional for overlay directories.

Confidence Score: 4/5

  • Safe to merge with one minor inconsistency in tmp backend tombstone handling
  • All major issues from previous review threads were addressed. The implementation is thorough with proper mutex locking, comprehensive test coverage, and correct handling of edge cases. One point deducted for tmp backend getdents64 not filtering tombstones while all other tmp handlers do, creating a potential inconsistency where deleted files appear in directory listings but return ENOENT when accessed.
  • Check src/core/virtual/fs/backend/tmp.zig for tombstone filtering consistency

Important Files Changed

Filename Overview
src/core/virtual/syscall/handlers/getdents64.zig New handler implementing getdents64 syscall with proper mutex locking for proc/cow backends and comprehensive test coverage
src/core/virtual/Tombstones.zig New tombstone tracking system with proper memory management and comprehensive test coverage for file/dir deletion tracking
src/core/virtual/fs/backend/cow.zig Added getdents64 implementation merging kernel and overlay entries with tombstone filtering, deduplication, and truncation logging
src/core/virtual/fs/backend/procfile.zig Added getdents64 for /proc directory listings and fixed directory/file distinction for /proc/self and /proc/<pid>
src/core/virtual/fs/File.zig Added getdents64 dispatcher, dirents_offset tracking field, and lseek now correctly resets dirents_offset to 0 when position is 0
src/core/virtual/syscall/handlers/openat.zig Added mutex-protected tombstone checks for cow/tmp backends, clearing tombstone on O_CREAT or returning ENOENT
src/core/Supervisor.zig Added tombstones field with proper init/deinit lifecycle management
src/core/virtual/fs/backend/tmp.zig Added simple passthrough getdents64 without tombstone filtering (inconsistent with other tmp handlers)

Last reviewed commit: 96fcbdb

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 13 to 14
std.mem.writeInt(u64, buf[0..8], ino, .little);
std.mem.writeInt(i64, buf[8..16], @intCast(ino), .little);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d_off is set to inode, not directory offset

The d_off field (bytes 8-16) is set to @intCast(ino), which copies the inode number into the directory offset field. In linux_dirent64, d_off should be the offset position for lseek/telldir to locate the next entry — not the inode number.

While this doesn't cause issues today (no code in the sandbox reads d_off), programs like ls may use telldir/seekdir on directory streams, and a non-monotonic d_off could cause them to skip or repeat entries.

Additionally, if ino exceeds maxInt(i64) (e.g., on XFS or btrfs which use large inode numbers), the @intCast(ino) from u64 to i64 will panic at runtime.

Consider using the cumulative byte offset or entry_idx for d_off instead:

Suggested change
std.mem.writeInt(u64, buf[0..8], ino, .little);
std.mem.writeInt(i64, buf[8..16], @intCast(ino), .little);
std.mem.writeInt(u64, buf[0..8], ino, .little);
std.mem.writeInt(i64, buf[8..16], @intCast(ino + 1), .little);

(Using ino + 1 here still isn't ideal. The callers should pass a proper sequential offset as a separate parameter, rather than reusing the inode.)

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/core/virtual/fs/dirent.zig
Line: 13:14

Comment:
**`d_off` is set to inode, not directory offset**

The `d_off` field (bytes 8-16) is set to `@intCast(ino)`, which copies the inode number into the directory offset field. In `linux_dirent64`, `d_off` should be the offset position for `lseek`/`telldir` to locate the next entry — not the inode number.

While this doesn't cause issues today (no code in the sandbox reads `d_off`), programs like `ls` may use `telldir`/`seekdir` on directory streams, and a non-monotonic `d_off` could cause them to skip or repeat entries.

Additionally, if `ino` exceeds `maxInt(i64)` (e.g., on XFS or btrfs which use large inode numbers), the `@intCast(ino)` from `u64` to `i64` will panic at runtime.

Consider using the cumulative byte offset or `entry_idx` for `d_off` instead:
```suggestion
    std.mem.writeInt(u64, buf[0..8], ino, .little);
    std.mem.writeInt(i64, buf[8..16], @intCast(ino + 1), .little);
```

(Using `ino + 1` here still isn't ideal. The callers should pass a proper sequential offset as a separate parameter, rather than reusing the inode.)

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it so writeDirent takes a d_off argument which is determined by adding one to the entry_idx used by the caller in each backend.

Comment on lines 17 to 19
dir_proc, // new
dir_pid_self, // new
dir_pid: NsTgid, // new
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dir_pid_self and dir_pid are unreachable dead code

The ProcTarget variants dir_pid_self and dir_pid are defined here and handled in the open switch (line 71), but parseProcPath never returns either of them:

  • /proc/self returns .self_nstgid (line 39), not .dir_pid_self
  • /proc/<N> returns .{ .nstgid = nstgid } (line 57), not .{ .dir_pid = nstgid }

These variants are unreachable dead code. If the intent is to distinguish "opening /proc/self as a directory for getdents64" from "reading /proc/self as a file", then parseProcPath needs to be updated to return these variants for the appropriate paths.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/core/virtual/fs/backend/procfile.zig
Line: 17:19

Comment:
**`dir_pid_self` and `dir_pid` are unreachable dead code**

The `ProcTarget` variants `dir_pid_self` and `dir_pid` are defined here and handled in the `open` switch (line 71), but `parseProcPath` never returns either of them:
- `/proc/self` returns `.self_nstgid` (line 39), not `.dir_pid_self`
- `/proc/<N>` returns `.{ .nstgid = nstgid }` (line 57), not `.{ .dir_pid = nstgid }`

These variants are unreachable dead code. If the intent is to distinguish "opening /proc/self as a directory for getdents64" from "reading /proc/self as a file", then `parseProcPath` needs to be updated to return these variants for the appropriate paths.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made use of these for the directory/file distinction mentioned in another comment.

(Linux treats /proc/self as a symlink to the directory /proc/<pid>)

Comment on lines +146 to +151
} else {
// /proc/<pid> or /proc/self — list known subfiles
// TODO: expand as more /proc/<pid>/* files are virtualized
names_buf[name_count] = "status";
name_count += 1;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getdents64 on /proc/self lists "status" but open path routing disagrees

When a file is opened as /proc/self via openat, parseProcPath returns .self_nstgid (not .dir_pid_self), and ProcFile.open fills in the NsTgid content — treating it as a regular file read. However, getdents64 on this same file would take the else branch here (line 146) and return a directory listing with "status".

This means a guest can open("/proc/self"), get back file content containing the PID number, and then call getdents64 on the same fd to get a directory listing of [".", "..", "status"]. That's inconsistent — the same fd behaves as both a regular file and a directory depending on which syscall is used.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/core/virtual/fs/backend/procfile.zig
Line: 146:151

Comment:
**`getdents64` on /proc/self lists "status" but open path routing disagrees**

When a file is opened as `/proc/self` via `openat`, `parseProcPath` returns `.self_nstgid` (not `.dir_pid_self`), and `ProcFile.open` fills in the NsTgid content — treating it as a regular file read. However, `getdents64` on this same file would take the `else` branch here (line 146) and return a directory listing with `"status"`.

This means a guest can `open("/proc/self")`, get back file content containing the PID number, and then call `getdents64` on the same fd to get a directory listing of `[".", "..", "status"]`. That's inconsistent — the same fd behaves as both a regular file and a directory depending on which syscall is used.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this by following the kernel's behavior: treat /proc/self as a directory, and /proc/self/status as a file. Affected many tests throughout the codebase

Comment on lines +291 to +301
const name = name_bytes[0..null_pos];

if (name_pos.* + name.len <= name_storage.len) {
@memcpy(name_storage[name_pos.*..][0..name.len], name);
entries[count.*] = .{
.name = name_storage[name_pos.*..][0..name.len],
.d_type = d_type,
};
count.* += 1;
name_pos.* += name.len;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent data loss when name storage overflows

If the combined length of all directory entry names exceeds 4096 bytes (the name_storage buffer), entries are silently dropped — the if condition at line 293 is not met and the entry is skipped without any logging or error indication.

With MAX_DIR_ENTRIES = 256, even short names (average ~16 chars) would fill the 4096-byte buffer. Consider logging when this limit is hit, or increasing the buffer size to match the theoretical capacity.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/core/virtual/fs/backend/cow.zig
Line: 291:301

Comment:
**Silent data loss when name storage overflows**

If the combined length of all directory entry names exceeds 4096 bytes (the `name_storage` buffer), entries are silently dropped — the `if` condition at line 293 is not met and the entry is skipped without any logging or error indication.

With `MAX_DIR_ENTRIES = 256`, even short names (average ~16 chars) would fill the 4096-byte buffer. Consider logging when this limit is hit, or increasing the buffer size to match the theoretical capacity.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added logging in case any data was truncated. This happens whenever a collectDirents/collectDirentsDedup call processes data (rc > 0) but can't add any more entries, which means we've hit either the MAX_DIR_ENTRIES or MAX_NAME_STORAGE limit

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 88 to 97
// Check tombstones for COW/tmp paths
if (h.backend == .cow or h.backend == .tmp) {
if (supervisor.tombstones.isTombstoned(h.normalized)) {
if (flags.CREAT) {
supervisor.tombstones.remove(h.normalized);
} else {
return LinuxErr.NOENT;
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tombstone access without mutex is a data race

supervisor.tombstones is a StringHashMapUnmanaged accessed here without holding supervisor.mutex. Line 90 reads the map (isTombstoned) and line 92 mutates it (remove). Since the Supervisor runs up to MAX_INFLIGHT = 8 concurrent handlers (see Supervisor.zig:78), a concurrent faccessat or fstatat64 handler reading tombstones while this handler calls remove is undefined behavior on the hash map.

The same unprotected read pattern exists in faccessat.zig:78 and fstatat64.zig:117.

This block should be inside a mutex-protected section, or tombstones should use a concurrent-safe data structure.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/core/virtual/syscall/handlers/openat.zig
Line: 88:97

Comment:
**Tombstone access without mutex is a data race**

`supervisor.tombstones` is a `StringHashMapUnmanaged` accessed here without holding `supervisor.mutex`. Line 90 reads the map (`isTombstoned`) and line 92 mutates it (`remove`). Since the Supervisor runs up to `MAX_INFLIGHT = 8` concurrent handlers (see `Supervisor.zig:78`), a concurrent `faccessat` or `fstatat64` handler reading tombstones while this handler calls `remove` is undefined behavior on the hash map.

The same unprotected read pattern exists in `faccessat.zig:78` and `fstatat64.zig:117`.

This block should be inside a mutex-protected section, or tombstones should use a concurrent-safe data structure.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping in supervisor.mutex whenever accessing

Comment on lines +162 to +163
// Reset kernel FD's offset so we always read the full directory
_ = linux.lseek(fd, 0, linux.SEEK.SET);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lseek reset discards kernel's directory position

Every call to cow.getdents64 resets the kernel FD offset to 0 with lseek and reads the entire directory into an in-memory array. This works for small directories, but with MAX_DIR_ENTRIES = 256 and name_storage of 4096 bytes, there's a risk for directories with many files:

  1. Directories with >256 entries will be silently truncated — the excess entries are never returned to the caller.
  2. The name_storage buffer (4096 bytes) is shared between kernel and overlay entries. With average filenames of ~16 chars, it fills up at ~256 names, but longer filenames will exhaust it sooner, silently dropping entries.

Consider documenting these limits or logging when entries are dropped.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/core/virtual/fs/backend/cow.zig
Line: 162:163

Comment:
**lseek reset discards kernel's directory position**

Every call to `cow.getdents64` resets the kernel FD offset to 0 with `lseek` and reads the entire directory into an in-memory array. This works for small directories, but with `MAX_DIR_ENTRIES = 256` and `name_storage` of 4096 bytes, there's a risk for directories with many files:

1. Directories with >256 entries will be silently truncated — the excess entries are never returned to the caller.
2. The `name_storage` buffer (4096 bytes) is shared between kernel and overlay entries. With average filenames of ~16 chars, it fills up at ~256 names, but longer filenames will exhaust it sooner, silently dropping entries.

Consider documenting these limits or logging when entries are dropped.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor Author

@RaymondTana RaymondTana Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

increasing these limits

@RaymondTana
Copy link
Contributor Author

@greptile review after this latest commit

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 194 to 196
while (entry_count < MAX_DIR_ENTRIES and name_pos < MAX_NAME_STORAGE) {
const rc = linux.getdents64(overlay_fd, &overlay_buf, overlay_buf.len);
if (linux.errno(rc) != .SUCCESS or rc == 0) break;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overlay getdents64 errors silently swallowed

The kernel directory read loop (line 173) propagates errors via checkErr, but the overlay loop here silently ignores getdents64 failures by breaking out of the loop on any non-SUCCESS errno. Since openOverlayDir already returned successfully (the overlay directory exists and was opened), a failure in the subsequent getdents64 call would be unexpected and could mask real I/O issues.

Consider using checkErr here as well (matching the kernel read pattern), or at minimum logging the error:

Suggested change
while (entry_count < MAX_DIR_ENTRIES and name_pos < MAX_NAME_STORAGE) {
const rc = linux.getdents64(overlay_fd, &overlay_buf, overlay_buf.len);
if (linux.errno(rc) != .SUCCESS or rc == 0) break;
const rc = linux.getdents64(overlay_fd, &overlay_buf, overlay_buf.len);
try checkErr(rc, "cow.getdents64 overlay read", .{});
if (rc == 0) break;
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/core/virtual/fs/backend/cow.zig
Line: 194:196

Comment:
**Overlay getdents64 errors silently swallowed**

The kernel directory read loop (line 173) propagates errors via `checkErr`, but the overlay loop here silently ignores `getdents64` failures by breaking out of the loop on any non-SUCCESS errno. Since `openOverlayDir` already returned successfully (the overlay directory exists and was opened), a failure in the subsequent `getdents64` call would be unexpected and could mask real I/O issues.

Consider using `checkErr` here as well (matching the kernel read pattern), or at minimum logging the error:

```suggestion
                const rc = linux.getdents64(overlay_fd, &overlay_buf, overlay_buf.len);
                try checkErr(rc, "cow.getdents64 overlay read", .{});
                if (rc == 0) break;
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this checkErr

Comment on lines +162 to +163
var entries: [MAX_DIR_ENTRIES]DirEntry = undefined;
var entry_count: usize = 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stack arrays total ~100KB per call

entries is [2048]DirEntry where DirEntry is a struct with a slice (16 bytes) + u8 (padded to 24 bytes), so ~49KB. name_storage is 32KB. Combined with the 4KB kernel_buf and overlay_buf, that's approximately 89KB of stack per getdents64 invocation. With MAX_INFLIGHT = 8 concurrent handlers, this totals ~712KB of stack usage across all simultaneous calls.

This should be fine for typical Linux stack sizes (default 8MB), but it's worth keeping in mind if the limits are ever increased further or if there are other stack-heavy paths in the call chain.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/core/virtual/fs/backend/cow.zig
Line: 162:163

Comment:
**Stack arrays total ~100KB per call**

`entries` is `[2048]DirEntry` where `DirEntry` is a struct with a slice (16 bytes) + u8 (padded to 24 bytes), so ~49KB. `name_storage` is 32KB. Combined with the 4KB `kernel_buf` and `overlay_buf`, that's approximately 89KB of stack per `getdents64` invocation. With `MAX_INFLIGHT = 8` concurrent handlers, this totals ~712KB of stack usage across all simultaneous calls.

This should be fine for typical Linux stack sizes (default 8MB), but it's worth keeping in mind if the limits are ever increased further or if there are other stack-heavy paths in the call chain.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erik-dunteman what do you think about this?

Currently set

const MAX_DIR_ENTRIES = 2048; // max number of entries we can report in a directory
const MAX_NAME_STORAGE = 32768; // total number of bytes we allocate to storing entry names on the stack

These numbers were chosen to balance an approximate average of 16 bytes per file name. Exceeding either limit triggers truncation of the output of getdents64

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

ref_count: AtomicUsize = undefined,
opened_path: ?[]u8 = null,
open_flags: linux.O = .{},
dirents_offset: usize = 0,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dirents_offset not reset by lseek/rewinddir

The dirents_offset field tracks how many entries have been returned to the guest, but it is never reset when the guest calls lseek(fd, 0, SEEK_SET) on a directory FD. The lseek handler (lseek.zig:41) dispatches to cow.lseek which resets the kernel FD position, but dirents_offset remains at its old value.

Since cow.getdents64 always resets the kernel FD to position 0 and re-reads the full directory, a guest that does:

  1. getdents64(fd) → entries returned, dirents_offset advances
  2. lseek(fd, 0, SEEK_SET) → kernel position reset, dirents_offset unchanged
  3. getdents64(fd) → all entries skipped because dirents_offset is still advanced → returns 0 (spurious EOF)

This breaks rewinddir() / seekdir() for COW and proc directories. glibc's rewinddir calls lseek(fd, 0, SEEK_SET) and expects subsequent getdents64 to restart from the beginning.

Consider resetting dirents_offset to 0 in File.lseek when whence is SEEK_SET and offset is 0 (or unconditionally for directory-backed files).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/core/virtual/fs/File.zig
Line: 33:33

Comment:
**`dirents_offset` not reset by `lseek`/`rewinddir`**

The `dirents_offset` field tracks how many entries have been returned to the guest, but it is never reset when the guest calls `lseek(fd, 0, SEEK_SET)` on a directory FD. The `lseek` handler (`lseek.zig:41`) dispatches to `cow.lseek` which resets the kernel FD position, but `dirents_offset` remains at its old value.

Since `cow.getdents64` always resets the kernel FD to position 0 and re-reads the full directory, a guest that does:
1. `getdents64(fd)` → entries returned, `dirents_offset` advances
2. `lseek(fd, 0, SEEK_SET)` → kernel position reset, `dirents_offset` unchanged
3. `getdents64(fd)` → all entries skipped because `dirents_offset` is still advanced → returns 0 (spurious EOF)

This breaks `rewinddir()` / `seekdir()` for COW and proc directories. glibc's `rewinddir` calls `lseek(fd, 0, SEEK_SET)` and expects subsequent `getdents64` to restart from the beginning.

Consider resetting `dirents_offset` to 0 in `File.lseek` when whence is `SEEK_SET` and offset is 0 (or unconditionally for directory-backed files).

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed lseek dispatch in File.zig to reset dirents_offset to zero whenever the resulting file's position is zero. Only affects directory FDs.

Comment on lines 145 to 253
) !usize {
const fd = switch (self.*) {
inline else => |backing_fd| backing_fd,
};

const path = dir_path orelse {
const rc = linux.getdents64(fd, buf.ptr, buf.len);
try checkErr(rc, "cow.getdents64", .{});
return rc;
};

const logger = Logger.init(.supervisor);

var name_storage: [MAX_NAME_STORAGE]u8 = undefined;
var name_pos: usize = 0;
var truncated = false;

var entries: [MAX_DIR_ENTRIES]DirEntry = undefined;
var entry_count: usize = 0;

// Reset kernel FD's offset so we always read the full directory
_ = linux.lseek(fd, 0, linux.SEEK.SET);

// Read all kernel directory entries
{
var kernel_buf: [4096]u8 = undefined;
while (entry_count < MAX_DIR_ENTRIES and name_pos < MAX_NAME_STORAGE) {
const rc = linux.getdents64(fd, &kernel_buf, kernel_buf.len);
try checkErr(rc, "cow.getdents64 kernel read", .{});
if (rc == 0) break;
const prev_count = entry_count;
collectDirents(
kernel_buf[0..rc],
&entries,
&entry_count,
&name_storage,
&name_pos,
);
if (entry_count == prev_count and rc > 0) {
truncated = true;
break;
}
}
}

// Read overlay directory entries (if overlay cow dir exists for this path)
if (openOverlayDir(overlay, path)) |overlay_fd| {
defer _ = linux.close(overlay_fd);
var overlay_buf: [4096]u8 = undefined;
while (entry_count < MAX_DIR_ENTRIES and name_pos < MAX_NAME_STORAGE) {
const rc = linux.getdents64(overlay_fd, &overlay_buf, overlay_buf.len);
if (linux.errno(rc) != .SUCCESS or rc == 0) break;
const prev_count = entry_count;
collectDirentsDedup(
overlay_buf[0..rc],
&entries,
&entry_count,
&name_storage,
&name_pos,
);
if (entry_count == prev_count and rc > 0) {
truncated = true;
break;
}
}
}

if (truncated) {
logger.log("cow.getdents64: directory listing truncated for {s} (entries={d}, name_bytes={d})", .{ path, entry_count, name_pos });
}

// Serialize entries, skipping already-returned and tombstoned ones
var buf_pos: usize = 0;
var entry_idx: usize = 0;

for (entries[0..entry_count]) |entry| {
if (entry_idx < dirents_offset.*) {
entry_idx += 1;
continue;
}

// Don't filter . and .., but filter tombstoned children
if (!std.mem.eql(u8, entry.name, ".") and !std.mem.eql(u8, entry.name, "..") and tombstones.isChildTombstoned(path, entry.name)) {
entry_idx += 1;
dirents_offset.* += 1;
continue;
}

const rec_len = dirent.recLen(entry.name.len);
// Stop listing if exceeded length of buffer
if (buf_pos + rec_len > buf.len) break;

dirent.writeDirent(
buf[buf_pos..],
entry_idx + 1,
@intCast(entry_idx + 1),
@intCast(rec_len),
entry.d_type,
entry.name,
);

// Shift buffer position, entry index, and offset accordingly
buf_pos += rec_len;
entry_idx += 1;
dirents_offset.* += 1;
}

return buf_pos;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutex held during kernel I/O in COW getdents64

This entire method is called while holding supervisor.mutex (from getdents64.zig:47). It performs multiple blocking kernel syscalls — lseek (line 166), a loop of getdents64 calls on the kernel FD (line 172), potentially another openat + loop of getdents64 for the overlay (lines 191-209). With MAX_INFLIGHT = 8 concurrent handlers, any handler needing the mutex (FD table lookups, tombstone checks, proc operations) will be blocked for the duration of directory reads on potentially large directories.

For now this is acceptable given the early stage, but as more syscalls are added, consider restructuring to release the mutex around the kernel I/O — e.g., read kernel+overlay entries without the mutex, then re-acquire it only for the tombstone filtering pass.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/core/virtual/fs/backend/cow.zig
Line: 145:253

Comment:
**Mutex held during kernel I/O in COW getdents64**

This entire method is called while holding `supervisor.mutex` (from `getdents64.zig:47`). It performs multiple blocking kernel syscalls — `lseek` (line 166), a loop of `getdents64` calls on the kernel FD (line 172), potentially another `openat` + loop of `getdents64` for the overlay (lines 191-209). With `MAX_INFLIGHT = 8` concurrent handlers, any handler needing the mutex (FD table lookups, tombstone checks, proc operations) will be blocked for the duration of directory reads on potentially large directories.

For now this is acceptable given the early stage, but as more syscalls are added, consider restructuring to release the mutex around the kernel I/O — e.g., read kernel+overlay entries without the mutex, then re-acquire it only for the tombstone filtering pass.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

@RaymondTana
Copy link
Contributor Author

@greptile check again

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +94 to +98
pub fn getdents64(self: *Tmp, buf: []u8) !usize {
const rc = linux.getdents64(self.fd, buf.ptr, buf.len);
try checkErr(rc, "tmp.getdents64", .{});
return rc;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmp backend doesn't filter tombstones in getdents64, creating inconsistency: openat, faccessat, and fstatat64 all check tombstones for tmp paths (returning ENOENT for deleted files), but getdents64 will list them. Consider whether tmp should filter tombstones like cow does, or if the passthrough behavior is intentional for tmp overlay directories.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/core/virtual/fs/backend/tmp.zig
Line: 94:98

Comment:
tmp backend doesn't filter tombstones in `getdents64`, creating inconsistency: `openat`, `faccessat`, and `fstatat64` all check tombstones for tmp paths (returning ENOENT for deleted files), but `getdents64` will list them. Consider whether tmp should filter tombstones like cow does, or if the passthrough behavior is intentional for tmp overlay directories.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant