Skip to content

fix: --clean should also delete the repo hash cache #434

@dj0nes

Description

@dj0nes

Problem

When running `cgr start --clean`, the Memgraph database is wiped but the per-repo hash cache file (`.cgr-hash-cache.json`) is left intact. On the next `--update-graph` run, the hash cache marks every file as unchanged, so 0 functions are re-indexed into the now-empty database.

This means `--clean` followed by `--update-graph` silently produces an empty graph unless the user manually deletes the cache file first.

Steps to reproduce

```sh
cgr start --clean --repo-path /some/repo --update-graph

Graph is wiped, but .cgr-hash-cache.json still marks all files as unchanged

Result: 0 nodes indexed

```

Multi-repo scenario (makes this worse)

When indexing multiple repos into the same database, `--clean` only deletes the cache for the repo passed via `--repo-path`. Other repos' caches remain stale. Example:

```sh

Wipe DB and reindex repo A (deletes A's cache correctly)

cgr start --clean --update-graph --repo-path /repos/A

Reindex repo B — its cache still reflects the pre-wipe state,

so all files appear unchanged and 0 nodes are added

cgr start --update-graph --repo-path /repos/B
```

The user must manually `rm /repos/B/.cgr-hash-cache.json` before re-indexing B.

Expected behaviour

`--clean` should delete (or invalidate) the hash cache at `/.cgr-hash-cache.json` so the subsequent update processes all files from scratch. For multi-repo workflows, users need to either delete all caches manually or run `--clean --update-graph` once per repo.

Longer term, a `--clean-all` flag or a standalone `cgr clean` command that wipes both the database and all known repo caches would address the multi-repo case cleanly.

Proposed fix (single-repo)

In `cli.py`, after `ingestor.clean_database()`:

```python
cache_path = repo_to_update / cs.HASH_CACHE_FILENAME
if cache_path.exists():
cache_path.unlink()
```

This is a one-liner fix with no architectural impact.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions