Skip to content

feat: code embeddings combined with graph structure for improved retrieval #387

@vitali87

Description

@vitali87

Task Description

Explore combining code embeddings with graph structural information to improve code retrieval quality, inspired by CodeFuse-CGM which achieved high rankings on SWE-bench using this approach.

The idea is to enrich the vector similarity search with graph topology signals (e.g., call graphs, import chains, class hierarchies) so that structurally related code is ranked higher even when lexically dissimilar.

Acceptance Criteria

  • Research the CodeFuse-CGM approach and identify applicable techniques
  • Prototype a retrieval method that combines embedding similarity with graph structure scores
  • Benchmark retrieval quality against the current approach

References

Priority

  • High
  • Medium
  • Low

Estimated Effort

  • Small (< 2 hours)
  • Medium (2-8 hours)
  • Large (> 8 hours)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions