Project
vgrep
Description
The database schema in src/core/db.rs defines a foreign key constraint with ON DELETE CASCADE:
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
However, SQLite does NOT enforce foreign key constraints by default. The PRAGMA foreign_keys = ON; statement must be executed for each database connection to enable enforcement. Without this, the cascade delete is completely ignored.
This causes:
- Orphaned chunks: When files are deleted, their chunks remain in the database
- Database bloat: Orphaned data accumulates over time
- Stale search results: Searches may return content from deleted files
- Data integrity corruption:
file_id in chunks can reference non-existent files
Affected Files
src/core/db.rs (lines 48-52, 73, 112)
Evidence
Current Implementation (missing PRAGMA):
impl Database {
pub fn new(path: &Path) -> Result<Self> {
let conn = Connection::open(path)?;
let db = Self { conn };
db.init_schema()?;
Ok(db)
}
Schema with CASCADE that won't work:
CREATE TABLE IF NOT EXISTS chunks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_id INTEGER NOT NULL,
chunk_index INTEGER NOT NULL,
content TEXT NOT NULL,
start_line INTEGER NOT NULL,
end_line INTEGER NOT NULL,
embedding BLOB NOT NULL,
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE,
UNIQUE(file_id, chunk_index)
);
INSERT OR REPLACE triggers implicit DELETE:
pub fn insert_file(&self, path: &Path, hash: &str) -> Result<i64> {
let path_str = path.to_string_lossy();
let now = Utc::now().to_rfc3339();
self.conn.execute(
"INSERT OR REPLACE INTO files (path, hash, indexed_at) VALUES (?, ?, ?)",
params![path_str.as_ref(), hash, now],
)?;
Ok(self.conn.last_insert_rowid())
}
Confirmed Bug Verification
# Test without PRAGMA foreign_keys = ON
Before delete:
Files: 1, Chunks: 1
After delete (without PRAGMA):
Files: 0, Chunks: 1 ← 🚨 ORPHANED CHUNK!
# Test WITH PRAGMA foreign_keys = ON
After delete (WITH PRAGMA):
Files: 0, Chunks: 0 ← ✅ CASCADE works!
Confirmed Failure Cases Where Orphaned Chunks Occur
Case 1: Race Condition Between prepare_file() and insert_file()
The indexer has a significant time gap between checking for existing files and inserting:
src/core/indexer.rs workflow:
Phase 1 (Line 98): prepare_file() checks DB and deletes if exists
Phase 2 (Line 139): Embedding generation (5-30+ seconds)
Phase 3 (Line 163): insert_file() with INSERT OR REPLACE
Race Timeline:
Time 0s: Process A: prepare_file("foo.rs") → file not in DB, no delete needed
Time 1s: Process B: indexes "foo.rs" → creates file_id=5 with 10 chunks
Time 5s: Process A: Still in Phase 2 (embedding generation)...
Time 10s: Process A: insert_file("foo.rs")
→ INSERT OR REPLACE sees UNIQUE constraint on path
→ SQLite internally DELETEs file_id=5
→ CASCADE doesn't trigger (foreign keys OFF)
→ Creates new file_id=6 with Process A's chunks
RESULT: Process B's 10 chunks are ORPHANED (file_id=5 no longer exists)
When this happens:
- Running
vgrep index in multiple terminals
- Running
vgrep index while vgrep watch is active
- CI/CD pipelines with parallel indexing jobs
Case 2: Watcher + Manual Index Race Condition
Terminal 1: vgrep watch # File watcher running
Terminal 2: vgrep index # User runs manual re-index
# Both detect same file change:
Watcher (Terminal 1):
Line 326: if let Some(entry) = db.get_file_by_path(path)?
Line 327: db.delete_file(entry.id)? → Deletes file_id=5 + chunks
Line 376: db.insert_file(path, &hash)? → Creates file_id=6
Indexer (Terminal 2):
Line 215: if let Some(existing) = self.db.get_file_by_path(path)?
→ File doesn't exist yet (watcher hasn't inserted)
Line 163: db.insert_file(&pending.path, &pending.hash)?
→ INSERT OR REPLACE DELETEs file_id=6 (watcher's file)
→ CASCADE doesn't work
→ Watcher's chunks with file_id=6 are ORPHANED
Case 3: INSERT OR REPLACE Implicit Delete
The insert_file() function uses INSERT OR REPLACE:
self.conn.execute(
"INSERT OR REPLACE INTO files (path, hash, indexed_at) VALUES (?, ?, ?)",
...
)?;
When a file with the same path exists:
- SQLite detects UNIQUE constraint violation
- SQLite DELETEs the existing row internally
- SQLite INSERTs the new row
- Since foreign keys are OFF, step 2 does NOT cascade to chunks
This happens even in single-threaded scenarios when:
- A file is modified and re-indexed
- The manual
delete_file() call is skipped due to error
- Any code path calls
insert_file() without first calling delete_file()
Case 4: Error During Manual Deletion
The manual deletion is a two-step non-atomic operation:
pub fn delete_file(&self, file_id: i64) -> Result<()> {
self.conn
.execute("DELETE FROM chunks WHERE file_id = ?", params![file_id])?; // Step 1
self.conn
.execute("DELETE FROM files WHERE id = ?", params![file_id])?; // Step 2
Ok(())
}
If Step 1 succeeds but Step 2 fails (disk full, process killed, etc.):
- Chunks are deleted but file record remains
- This is the inverse problem (file without chunks)
If the process is killed between prepare_file() deleting and insert_file() creating:
- Old file and chunks are gone
- New file never created
- Data loss
Impact
- Data Corruption: Chunks referencing deleted files cause integrity issues
- Stale Results: Searches return outdated/deleted content
- Database Bloat: Orphaned data grows database size indefinitely
- Performance Degradation: More data to scan during searches
- Unpredictable Behavior: JOINs may fail or return unexpected results
- Silent Failures: No error messages, users never know
Error Message
Debug Logs
System Information
Bounty Version: 0.1.0
OS: Ubuntu 24.04 LTS
CPU: AMD EPYC-Genoa Processor (8 cores)
RAM: 15 GB
Screenshots
No response
Steps to Reproduce
Method 1: Direct SQL Test
sqlite3 ~/.vgrep/projects/*.db "
PRAGMA foreign_keys; -- Will show 0 (OFF)
-- Simulate INSERT OR REPLACE orphaning chunks
INSERT INTO files (path, hash, indexed_at) VALUES ('/test.rs', 'hash1', '2024-01-01');
INSERT INTO chunks (file_id, chunk_index, content, start_line, end_line, embedding)
VALUES (last_insert_rowid(), 0, 'test', 0, 1, X'00');
-- This will orphan the chunk:
INSERT OR REPLACE INTO files (path, hash, indexed_at) VALUES ('/test.rs', 'hash2', '2024-01-02');
-- Check for orphans:
SELECT COUNT(*) FROM chunks WHERE file_id NOT IN (SELECT id FROM files);
"
Method 2: Race Condition Test
# Terminal 1
cd /project && vgrep watch
# Terminal 2 (run while watcher is active)
cd /project && vgrep index --force
# Check for orphans after both complete
python3 -c "
import sqlite3
conn = sqlite3.connect('$HOME/.vgrep/projects/*.db')
cur = conn.cursor()
cur.execute('SELECT COUNT(*) FROM chunks WHERE file_id NOT IN (SELECT id FROM files)')
print(f'Orphaned chunks: {cur.fetchone()[0]}')
"
Expected Behavior
- Foreign key constraints are enforced on every database connection
ON DELETE CASCADE automatically removes orphaned chunks
- Database maintains referential integrity at all times
- No manual cleanup of chunks is required when files are deleted
INSERT OR REPLACE operations cascade properly to dependent tables
Actual Behavior
Foreign key constraints are never enabled because PRAGMA foreign_keys = ON is not executed. The ON DELETE CASCADE clause in the schema is completely ignored.
Actual database state after deleting a file:
files table: 0 rows (file deleted)
chunks table: N rows (chunks remain - ORPHANED!)
Actual behaviors:
- Foreign key constraints are disabled (SQLite default)
ON DELETE CASCADE is silently ignored
- Orphaned chunks accumulate in the database over time
- Manual
delete_file() method attempts cleanup but can be bypassed
INSERT OR REPLACE creates orphaned chunks from the replaced row
- No error or warning is shown to users
- Search results may include content from deleted/stale files
Additional Context
No response
Project
vgrep
Description
The database schema in
src/core/db.rsdefines a foreign key constraint withON DELETE CASCADE:However, SQLite does NOT enforce foreign key constraints by default. The
PRAGMA foreign_keys = ON;statement must be executed for each database connection to enable enforcement. Without this, the cascade delete is completely ignored.This causes:
file_idin chunks can reference non-existent filesAffected Files
src/core/db.rs(lines 48-52, 73, 112)Evidence
Current Implementation (missing PRAGMA):
Schema with CASCADE that won't work:
INSERT OR REPLACE triggers implicit DELETE:
Confirmed Bug Verification
Confirmed Failure Cases Where Orphaned Chunks Occur
Case 1: Race Condition Between
prepare_file()andinsert_file()The indexer has a significant time gap between checking for existing files and inserting:
Race Timeline:
When this happens:
vgrep indexin multiple terminalsvgrep indexwhilevgrep watchis activeCase 2: Watcher + Manual Index Race Condition
Case 3: INSERT OR REPLACE Implicit Delete
The
insert_file()function usesINSERT OR REPLACE:When a file with the same
pathexists:This happens even in single-threaded scenarios when:
delete_file()call is skipped due to errorinsert_file()without first callingdelete_file()Case 4: Error During Manual Deletion
The manual deletion is a two-step non-atomic operation:
If Step 1 succeeds but Step 2 fails (disk full, process killed, etc.):
If the process is killed between
prepare_file()deleting andinsert_file()creating:Impact
Error Message
Debug Logs
System Information
Screenshots
No response
Steps to Reproduce
Method 1: Direct SQL Test
Method 2: Race Condition Test
Expected Behavior
ON DELETE CASCADEautomatically removes orphaned chunksINSERT OR REPLACEoperations cascade properly to dependent tablesActual Behavior
Foreign key constraints are never enabled because
PRAGMA foreign_keys = ONis not executed. TheON DELETE CASCADEclause in the schema is completely ignored.Actual database state after deleting a file:
Actual behaviors:
ON DELETE CASCADEis silently ignoreddelete_file()method attempts cleanup but can be bypassedINSERT OR REPLACEcreates orphaned chunks from the replaced rowAdditional Context
No response