-
-
Notifications
You must be signed in to change notification settings - Fork 351
Description
Problem
When running `cgr start --clean`, the Memgraph database is wiped but the per-repo hash cache file (`.cgr-hash-cache.json`) is left intact. On the next `--update-graph` run, the hash cache marks every file as unchanged, so 0 functions are re-indexed into the now-empty database.
This means `--clean` followed by `--update-graph` silently produces an empty graph unless the user manually deletes the cache file first.
Steps to reproduce
```sh
cgr start --clean --repo-path /some/repo --update-graph
Graph is wiped, but .cgr-hash-cache.json still marks all files as unchanged
Result: 0 nodes indexed
```
Multi-repo scenario (makes this worse)
When indexing multiple repos into the same database, `--clean` only deletes the cache for the repo passed via `--repo-path`. Other repos' caches remain stale. Example:
```sh
Wipe DB and reindex repo A (deletes A's cache correctly)
cgr start --clean --update-graph --repo-path /repos/A
Reindex repo B — its cache still reflects the pre-wipe state,
so all files appear unchanged and 0 nodes are added
cgr start --update-graph --repo-path /repos/B
```
The user must manually `rm /repos/B/.cgr-hash-cache.json` before re-indexing B.
Expected behaviour
`--clean` should delete (or invalidate) the hash cache at `/.cgr-hash-cache.json` so the subsequent update processes all files from scratch. For multi-repo workflows, users need to either delete all caches manually or run `--clean --update-graph` once per repo.
Longer term, a `--clean-all` flag or a standalone `cgr clean` command that wipes both the database and all known repo caches would address the multi-repo case cleanly.
Proposed fix (single-repo)
In `cli.py`, after `ingestor.clean_database()`:
```python
cache_path = repo_to_update / cs.HASH_CACHE_FILENAME
if cache_path.exists():
cache_path.unlink()
```
This is a one-liner fix with no architectural impact.
Metadata
Metadata
Assignees
Labels
Projects
Status