Skip to content

feat: semantic query via sentence-transformers embeddings#424

Open
juhii31 wants to merge 2 commits intosafishamsi:v4from
juhii31:feature/semantic-query-embeddings
Open

feat: semantic query via sentence-transformers embeddings#424
juhii31 wants to merge 2 commits intosafishamsi:v4from
juhii31:feature/semantic-query-embeddings

Conversation

@juhii31
Copy link
Copy Markdown

@juhii31 juhii31 commented Apr 17, 2026

Closes #1

What this does

Adds semantic embedding support to graphify query using
sentence-transformers with all-MiniLM-L6-v2 (80MB, local,
no API key required).

Changes

  • graphify/embed.py — new module with embed_graph() function

    • Embeds all node labels + docstrings using all-MiniLM-L6-v2
    • Computes pairwise cosine similarity matrix
    • Adds semantically_similar_to edges above configurable
      threshold (default 0.82)
    • Tags edges as INFERRED with confidence_score = cosine similarity
    • Caches embedding vectors in graphify-out/cache/embeddings.json
    • Re-runs only embed new/changed nodes via SHA256 cache key
  • tests/test_embed.py — 3 tests covering:

    • Similar nodes get connected
    • Cache file is created on first run
    • No duplicate edges on repeated runs

Notes

  • Works fully offline, zero API cost after model download
  • All 393 applicable tests passing (symlink and git hook
    tests skipped — Windows privilege limitation, pre-existing)

@juhii31
Copy link
Copy Markdown
Author

juhii31 commented Apr 17, 2026

This PR adds the embedding engine, caching, and edge creation. Query wiring into main.py and pyproject.toml optional dependency coming in the next commit before merge.

@juhii31
Copy link
Copy Markdown
Author

juhii31 commented Apr 18, 2026

Updated , query wiring into main.py and pyproject.toml optional dependency now included. graphify query "question" --embeddings uses semantic cosine similarity ranking instead of BFS keyword match. Fully closes #1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v3: semantic query with embeddings

1 participant