Skip to content

Conversation

@ovaistariq
Copy link
Contributor

@ovaistariq ovaistariq commented Sep 28, 2025

Note

Adds a one-line install script, switches to a positive --tigris-prefetch flag defaulting off, and hardens directory listing with generation tracking and lock-order fixes; updates docs to v1.2.1 and refactors semaphores in tests.

  • Install/Docs:
    • One-line installer: New install.sh downloads the latest release, verifies checksums/signatures, detects OS/arch/pkg manager, and handles macFUSE on macOS.
    • README: Add one-line install; update downloads to v1.2.1; adjust mount paths to $HOME/mnt/tigris/<bucket> and /mnt/tigris/<bucket>; update macOS tarball URL.
  • Core (directory handling):
    • Introduce directory generation tracking in DirInodeData/DirHandle to detect structural changes and reset iteration safely.
    • Avoid deadlocks by unlocking around sealDir, slurp, and list calls; revalidate generation after operations.
    • Increment generation on structural mutations (insert/remove/removeAllChildren/sealDir).
  • Flags:
    • Replace --no-tigris-prefetch with --tigris-prefetch (default off); update flag parsing and defaults.
  • Cluster:
    • Propagate dir-handle generation when opening/reading directories in cluster paths.
  • Cache/Lookup:
    • Add recheckInodeByName and use it in RefreshInodeCache to avoid stale inode references.
  • Tests/Utils:
    • Replace custom channel-based semaphore with golang.org/x/sync/semaphore via new Semaphore wrapper; update test helpers accordingly.

Written by Cursor Bugbot for commit d3e4661. This will update automatically on new commits. Configure here.

Erisa and others added 7 commits May 11, 2025 06:41
chore: fix mount directory names in README
…invalidation (#39)

* fix: Resolve deadlock in directory operations using generation-based invalidation

This fixes issue #37 where tigrisfs would deadlock during concurrent directory
operations, particularly when listing directories.

The deadlock occurred because:
1. listObjectsFlat() held dh.mu while calling sealDir()
2. sealDir() -> removeExpired() -> removeChildUnlocked() tried to lock ALL
   directory handles' mutexes
3. Other threads could hold different directory handle mutexes while waiting
   for dh.mu, creating a circular lock dependency

The solution uses a generation counter approach:
- Each directory maintains an atomic generation counter
- The counter increments on structural changes (add/remove children)
- Directory handles check the generation before operations
- If generation changed, handles reset their position without locking

* fix: Address race condition in generation handling after sealDir

* fix: Initialize DirHandle generation correctly to prevent spurious invalidations
@ovaistariq ovaistariq changed the title Release chore: Release Sep 28, 2025
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

This release PR (v1.2.1) consolidates several important fixes and improvements:

  • Deadlock Resolution: Implemented generation-based directory invalidation in core/dir.go to resolve deadlocks in directory operations, replacing direct handle manipulation with atomic counters
  • Semaphore Modernization: Replaced deprecated custom semaphore implementation with golang.org/x/sync/semaphore
  • Configuration Change: Changed Tigris prefetch to be disabled by default (performance optimization)
  • Installation Improvements: Added comprehensive installation script with multi-platform support and macFUSE handling
  • Documentation Updates: Version bumps to 1.2.1 and corrected mount path references

The most complex change is the directory operation deadlock fix using generation-based invalidation, which appears well-designed to avoid the mutex ordering issues that caused the original deadlock.

Confidence Score: 4/5

  • This PR is mostly safe to merge with one identified logical issue in test code
  • Score reflects solid engineering work on complex deadlock resolution and semaphore modernization, but the incorrect semaphore usage pattern in test code could lead to race conditions
  • Pay close attention to core/goofys_common_test.go for the semaphore race condition fix

Important Files Changed

File Analysis

Filename        Score        Overview
core/dir.go 4/5 Complex deadlock fix using generation-based directory invalidation instead of direct handle manipulation; includes proper mutex ordering
core/goofys_common_test.go 5/5 Updated semaphore usage from custom implementation to golang.org/x/sync/semaphore with correct P/V semantics
core/utils.go 5/5 Replaced deprecated custom semaphore implementation with golang.org/x/sync/semaphore
install.sh 5/5 New installation script with comprehensive platform support, checksum verification, and macFUSE handling

8 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +132 to 133
sem := NewSemaphore(concurrency)
var err error
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Semaphore pattern incorrect: should acquire slots before starting goroutines, not inside them. This creates a race where more goroutines can start than the semaphore allows.

Suggested change
sem := NewSemaphore(concurrency)
var err error
sem := NewSemaphore(concurrency)
var err error
for _, blobOuter := range blobs {
sem.P(1) // Acquire slot before spawning goroutine
go func(blob string) {
defer sem.V(1) // Release slot when goroutine completes
Prompt To Fix With AI
This is a comment left during a code review.
Path: core/goofys_common_test.go
Line: 132:133

Comment:
logic: Semaphore pattern incorrect: should acquire slots before starting goroutines, not inside them. This creates a race where more goroutines can start than the semaphore allows.

```suggestion
	sem := NewSemaphore(concurrency)
	var err error
	for _, blobOuter := range blobs {
		sem.P(1) // Acquire slot before spawning goroutine
		go func(blob string) {
			defer sem.V(1) // Release slot when goroutine completes
```

How can I resolve this? If you propose a fix, please make it concise.

ovaistariq and others added 2 commits September 29, 2025 11:24
This fixes the flaky test by addressing the root cause: stale inode references
causing removeChild to fail due to pointer mismatch.

The issue occurs when RefreshInodeCache receives an inode from fs.inodes that's
different from the instance in parent.dir.Children.

The solution introduces recheckInodeByName which:
1. First finds the current child by name from parent's children list
2. Uses that instance for removal if the file doesn't exist
3. Ensures we always work with the most up-to-date inode reference

This approach:
- Preserves all safety guarantees (pointer equality checks remain)
- Fixes the mutex and ID mismatch issues identified in review
- Doesn't reassign variables or violate locking rules
- Is clear about intent: we're rechecking the current state by name

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
* fix: Prevent deadlock in loadListing unlock/relock pattern

This fixes potential deadlocks in the loadListing() function where
unlock/relock patterns could cause circular lock dependencies.

The deadlock could occur in two places:
1. When calling slurpOnce() at line 718-725
2. When calling listObjectsFlat() at line 737-742

Both cases released parent.mu (and dh.mu in case 1) while other
goroutines could be holding different locks and waiting, creating
circular dependencies.

The solution uses the generation counter approach from PR #39:
- Check generation before unlocking
- After relocking, detect if generation changed
- If changed, reset directory handle position

This ensures proper synchronization without risking deadlocks.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants