You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tracking issue for follow-up work that emerged during the folder-quota PR (#23) but was deliberately deferred to keep that PR focused. Sub-issues will be opened for each item below as we pick them up; this is the master list.
Sweep all FSAL methods for handle-reuse + symlink-swap path traversal
PR #23 added a validate_path() re-check in LocalFilesystem::write() and LocalFilesystem::setattr_size() (PR comments #39 / #40), but the same exposure exists in every method that calls resolve_handle() and then operates on the returned path:
read()
setattr_mode() — particularly dangerous: arbitrary chmod outside the export
setattr_owner() — particularly dangerous: arbitrary chown outside the export
commit()
link() (source side)
readlink(), readdir(), getattr()
symlink(), mknod()
Attack: client looks up pvc-a/file, retains the handle, replaces the path with a symlink pointing outside the export, then reuses the handle — OpenOptions::open follows the symlink and operates on the wrong file.
Fix: add self.validate_path(&path)?; immediately after resolve_handle() in every affected method. Mostly mechanical (~10 sites).
Tighten the validate_path → open TOCTOU
validate_path and the subsequent open are not atomic — a sufficiently fast attacker can swap a symlink in between. Mitigations to evaluate:
O_NOFOLLOW on opens that should never traverse a link
openat2(RESOLVE_NO_SYMLINKS | RESOLVE_BENEATH) on Linux ≥ 5.6
nix or libc wrapper as needed
🟡 Correctness
Cross-PVC hard link accounting
link() does not run through write/create's quota path, so a hard link can be created across PVCs (or to a file in an untracked area) without the target PVC being charged. After the link, all writes through any name share the same inode blocks but are charged to whichever PVC the writer's path resolves to. Decide on policy:
(a) refuse to create cross-PVC hard links when quota is enabled; or
(b) charge the destination PVC at link time (and refund only on the last unlink — already tightened via refundable_bytes_on_unlink).
PR #23 already handles the unlink side correctly via nlink checks (PR comment #38).
Atomic check-and-reserve API
check_quota + add_usage are two separate calls, so concurrent writers can both pass the pre-check and then both account, briefly exceeding the limit by up to (N-1) × max_write_size. Documented in check_quota's doc-comment as an accepted limitation for the PVC use case (PR comments #7, #27). If we ever need strict enforcement:
add try_add_usage(dir, bytes) that holds the write lock across check + increment + persist
callers (write, setattr_size, rename) switch to it
needs a refund-on-write-failure path
Fail-closed mode for quota persistence failures
PR #23 chose fail-open after add_usage persist failures — log a warning and let the FS operation succeed (PR comments #16, #19, #22). Good for transient redb hiccups, but if redb is genuinely broken the quota silently stops enforcing until a restart.
Optional follow-up: a quota.fail_mode = \"open\" | \"closed\" config knob, plus a health flag that flips the manager into fail-closed after N consecutive persist failures (with a clear log line so ops notices).
🔵 Performance
Per-PVC lock granularity
Today QuotaManager serialises every quota mutation on a single RwLock<HashMap> write lock held across the redb commit. Fine for one or a few PVCs; could be a bottleneck for high-fan-out deployments. Options:
dashmap::DashMap for the in-memory cache (sharded locks under the hood)
hand-rolled HashMap<String, Arc<Mutex<QuotaEntry>>> with the outer map only locked for entry insert/remove
Note that redb itself is single-writer, so this only relieves the cache layer. If that becomes the bottleneck, the next step is per-PVC redb files or a backend that supports concurrent writers.
🟢 Operability / Features
Periodic background reconciliation
Reconciliation currently runs once at startup. Out-of-band changes (e.g. an admin running cp on the host) only get repaired at the next restart. Add a timer that re-runs reconcile_all on a configurable interval (e.g. quota.reconcile_interval_secs).
Admin CLI / runtime quota management
Quota limits today can only be set via the [quota.bootstrap] config block, which is idempotent (only inserts when missing). Missing operator capabilities:
update an existing PVC's limit at runtime
query current usage
remove a quota entry (the remove_quota API exists but has no entry point)
trigger reconciliation on demand
Form factor: a separate arcticwolf-quota binary, or an admin RPC over a unix socket.
Telemetry
No external observability for quota state right now. Suggest a Prometheus / OpenMetrics endpoint exporting per-PVC limit_bytes, usage_bytes, and usage_ratio, so ops can alert on PVCs approaching their limit.
Quota visibility to NFS clients
FSSTAT already returns quota-aware byte counts (PR #23 covered this), which df on a mounted PVC honours. Could go further with NFSv4 dacl extensions or an out-of-band query mechanism, but that's a much bigger lift and probably not worth it for NFSv3.
hard link behaviour (stays charged across the first unlink, refunds on the last)
symlink-swap rejection (mirrors of the new unit tests added in PR Add folder quota #23)
nfstest_posix --runtest=read,write is also a small subset of the POSIX suite — broaden over time as quota-relevant areas stabilise.
Operator documentation
Add a dedicated docs/quota.md:
configuration reference, including the requirement that db_path lives outside the export tree
sizing guidance (limits land on 4 KiB block boundaries on tmpfs/ext4 — values that are not block-aligned will round)
known limitations: bounded over-shoot under concurrency, hard-link cross-PVC behaviour, interaction with underlying OS quotas (XFS project quotas etc.)
reconciliation behaviour and how to invoke it (once we add a manual trigger)
Non-4 KiB filesystem support
Quota unit tests use BLOCK = 4096 and assume tmpfs/ext4 page size. On filesystems with different allocation units (ZFS records, btrfs CoW, network FS) the exact comparisons would need to be relaxed to ranges. Currently called out in a code comment; would be good to actually exercise on at least XFS in CI.
Tracking issue for follow-up work that emerged during the folder-quota PR (#23) but was deliberately deferred to keep that PR focused. Sub-issues will be opened for each item below as we pick them up; this is the master list.
Related: #22, #23
🔴 Security
Sweep all FSAL methods for handle-reuse + symlink-swap path traversal
PR #23 added a
validate_path()re-check inLocalFilesystem::write()andLocalFilesystem::setattr_size()(PR comments #39 / #40), but the same exposure exists in every method that callsresolve_handle()and then operates on the returned path:read()setattr_mode()— particularly dangerous: arbitrary chmod outside the exportsetattr_owner()— particularly dangerous: arbitrary chown outside the exportcommit()link()(source side)readlink(),readdir(),getattr()symlink(),mknod()Attack: client looks up
pvc-a/file, retains the handle, replaces the path with a symlink pointing outside the export, then reuses the handle —OpenOptions::openfollows the symlink and operates on the wrong file.Fix: add
self.validate_path(&path)?;immediately afterresolve_handle()in every affected method. Mostly mechanical (~10 sites).Tighten the validate_path → open TOCTOU
validate_pathand the subsequentopenare not atomic — a sufficiently fast attacker can swap a symlink in between. Mitigations to evaluate:O_NOFOLLOWon opens that should never traverse a linkopenat2(RESOLVE_NO_SYMLINKS | RESOLVE_BENEATH)on Linux ≥ 5.6🟡 Correctness
Cross-PVC hard link accounting
link()does not run through write/create's quota path, so a hard link can be created across PVCs (or to a file in an untracked area) without the target PVC being charged. After the link, all writes through any name share the same inode blocks but are charged to whichever PVC the writer's path resolves to. Decide on policy:refundable_bytes_on_unlink).PR #23 already handles the unlink side correctly via
nlinkchecks (PR comment #38).Atomic check-and-reserve API
check_quota+add_usageare two separate calls, so concurrent writers can both pass the pre-check and then both account, briefly exceeding the limit by up to(N-1) × max_write_size. Documented incheck_quota's doc-comment as an accepted limitation for the PVC use case (PR comments #7, #27). If we ever need strict enforcement:try_add_usage(dir, bytes)that holds the write lock across check + increment + persistwrite,setattr_size,rename) switch to itFail-closed mode for quota persistence failures
PR #23 chose fail-open after
add_usagepersist failures — log a warning and let the FS operation succeed (PR comments #16, #19, #22). Good for transient redb hiccups, but if redb is genuinely broken the quota silently stops enforcing until a restart.Optional follow-up: a
quota.fail_mode = \"open\" | \"closed\"config knob, plus a health flag that flips the manager into fail-closed after N consecutive persist failures (with a clear log line so ops notices).🔵 Performance
Per-PVC lock granularity
Today
QuotaManagerserialises every quota mutation on a singleRwLock<HashMap>write lock held across the redb commit. Fine for one or a few PVCs; could be a bottleneck for high-fan-out deployments. Options:dashmap::DashMapfor the in-memory cache (sharded locks under the hood)HashMap<String, Arc<Mutex<QuotaEntry>>>with the outer map only locked for entry insert/removeNote that redb itself is single-writer, so this only relieves the cache layer. If that becomes the bottleneck, the next step is per-PVC redb files or a backend that supports concurrent writers.
🟢 Operability / Features
Periodic background reconciliation
Reconciliation currently runs once at startup. Out-of-band changes (e.g. an admin running
cpon the host) only get repaired at the next restart. Add a timer that re-runsreconcile_allon a configurable interval (e.g.quota.reconcile_interval_secs).Admin CLI / runtime quota management
Quota limits today can only be set via the
[quota.bootstrap]config block, which is idempotent (only inserts when missing). Missing operator capabilities:remove_quotaAPI exists but has no entry point)Form factor: a separate
arcticwolf-quotabinary, or an admin RPC over a unix socket.Telemetry
No external observability for quota state right now. Suggest a Prometheus / OpenMetrics endpoint exporting per-PVC
limit_bytes,usage_bytes, andusage_ratio, so ops can alert on PVCs approaching their limit.Quota visibility to NFS clients
FSSTAT already returns quota-aware byte counts (PR #23 covered this), which
dfon a mounted PVC honours. Could go further with NFSv4daclextensions or an out-of-band query mechanism, but that's a much bigger lift and probably not worth it for NFSv3.🟠 Tests / Docs
Extend integration tests
tests/test_nfs_quota.shcovers under-limit write, over-limit EDQUOT, remove-then-write, truncate-then-write, anddf. Missing end-to-end coverage:renamenfstest_posix --runtest=read,writeis also a small subset of the POSIX suite — broaden over time as quota-relevant areas stabilise.Operator documentation
Add a dedicated
docs/quota.md:db_pathlives outside the export treeNon-4 KiB filesystem support
Quota unit tests use
BLOCK = 4096and assume tmpfs/ext4 page size. On filesystems with different allocation units (ZFS records, btrfs CoW, network FS) the exact comparisons would need to be relaxed to ranges. Currently called out in a code comment; would be good to actually exercise on at least XFS in CI.