Master by offx-zinth · Pull Request #24 · offx-zinth/SMP

offx-zinth · 2026-04-19T06:37:40Z

some improvements

gemini-code-assist

Code Review

This pull request introduces an EmbeddingService supporting NVIDIA and OpenAI providers and integrates it into the StaticSemanticEnricher and SeedWalkEngine. However, the PR is currently in a broken state as multiple files (including docker-compose.yml, pyproject.toml, and several Python modules) contain unresolved git merge conflict markers. Additionally, the EmbeddingService requires refactoring to correctly handle provider-specific configuration defaults and to reduce code duplication in the embedding logic.

gemini-code-assist · 2026-04-19T06:38:57Z

+<<<<<<< HEAD
+      - "7475:7474" # Host 7475 maps to Container 7474
+      - "7688:7687" # Host 7688 maps to Container 7687
+=======
      - "7474:7474"
      - "7687:7687"
+>>>>>>> 87cfd9650622e51c4c94d43d490450a82a87ad3d


Merge conflict markers have been committed to the repository. This results in invalid YAML syntax and will cause the docker-compose command to fail. Please resolve the conflicts and remove the markers.

gemini-code-assist · 2026-04-19T06:38:57Z

+<<<<<<< HEAD
+    "chromadb", 
+=======
+>>>>>>> 87cfd9650622e51c4c94d43d490450a82a87ad3d


Merge conflict markers found in the dependencies section. This will prevent the project from being installed or built correctly. Please resolve the conflict.

gemini-code-assist · 2026-04-19T06:38:57Z

+<<<<<<< HEAD
+"""Static semantic enricher with optional LLM-based embedding."""
+=======
 """Static semantic enricher — AST-based extraction.

 Extracts docstrings, inline comments, decorators, type annotations,
 and computes source hashes purely from the AST.
 No LLM or embedding generation.
 """
+>>>>>>> 87cfd9650622e51c4c94d43d490450a82a87ad3d


Merge conflict markers detected in the module docstring. This is invalid Python code and will cause a SyntaxError at runtime.

gemini-code-assist · 2026-04-19T06:38:57Z

+<<<<<<< HEAD
+    @abc.abstractmethod
+    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
+        """Generate embeddings for multiple texts."""
+
+=======
+>>>>>>> 87cfd9650622e51c4c94d43d490450a82a87ad3d


Merge conflict markers found in the SemanticEnricher abstract class. This will break the engine's interface definitions.

gemini-code-assist · 2026-04-19T06:38:57Z

+<<<<<<< HEAD
+        delegate: QueryEngineInterface | None = None,
+=======
+>>>>>>> 87cfd9650622e51c4c94d43d490450a82a87ad3d


Merge conflict markers in the constructor parameters. This will prevent the SeedWalkEngine from being instantiated.

gemini-code-assist · 2026-04-19T06:38:57Z

+        self._provider = provider
+        self._api_key = api_key or os.environ.get("NVIDIA_NIM_API_KEY") or os.environ.get("OPENAI_API_KEY", "")
+        self._model = model or os.environ.get("EMBEDDING_MODEL", "nvidia/nv-embed-qa-4")
+        self._base_url = base_url or os.environ.get(
+            "EMBEDDING_BASE_URL", "https://integrate.api.nvidia.com/v1"
+        )


The initialization logic for _api_key and _base_url does not correctly differentiate between providers. If provider='openai' is used, it might still pick up an NVIDIA API key or use the NVIDIA base URL by default. It is safer to select defaults based on the provider value.

self._provider = provider if api_key: self._api_key = api_key else: env_key = "NVIDIA_NIM_API_KEY" if provider == "nvidia" else "OPENAI_API_KEY" self._api_key = os.environ.get(env_key, "") self._model = model or os.environ.get("EMBEDDING_MODEL", "nvidia/nv-embed-qa-4" if provider == "nvidia" else "text-embedding-3-small") default_url = "https://integrate.api.nvidia.com/v1" if provider == "nvidia" else "https://api.openai.com/v1" self._base_url = base_url or os.environ.get("EMBEDDING_BASE_URL", default_url)

gemini-code-assist · 2026-04-19T06:38:57Z

+    async def _embed_nvidia(self, text: str) -> list[float]:
+        payload = {
+            "input": text,
+            "model": self._model,
+        }
+        response = await self._client.post("/embeddings", json=payload)
+        response.raise_for_status()
+        data = response.json()
+        return data["data"][0]["embedding"]
+
+    async def _embed_batch_nvidia(self, texts: list[str]) -> list[list[float]]:
+        payload = {
+            "input": texts,
+            "model": self._model,
+        }
+        response = await self._client.post("/embeddings", json=payload)
+        response.raise_for_status()
+        data = response.json()
+        return [item["embedding"] for item in data["data"]]
+
+    async def _embed_openai(self, text: str) -> list[float]:
+        payload = {
+            "input": text,
+            "model": self._model,
+        }
+        response = await self._client.post("/embeddings", json=payload)
+        response.raise_for_status()
+        data = response.json()
+        return data["data"][0]["embedding"]
+
+    async def _embed_batch_openai(self, texts: list[str]) -> list[list[float]]:
+        payload = {
+            "input": texts,
+            "model": self._model,
+        }
+        response = await self._client.post("/embeddings", json=payload)
+        response.raise_for_status()
+        data = response.json()
+        return [item["embedding"] for item in data["data"]]


The implementation of embedding methods for both NVIDIA and OpenAI providers is identical. Consolidating this logic into a single private method will reduce code duplication and simplify future updates.

async def _fetch_embeddings(self, input_data: str | list[str]) -> Any: payload = {"input": input_data, "model": self._model} response = await self._client.post("/embeddings", json=payload) response.raise_for_status() return response.json()["data"] async def _embed_nvidia(self, text: str) -> list[float]: data = await self._fetch_embeddings(text) return data[0]["embedding"] async def _embed_batch_nvidia(self, texts: list[str]) -> list[list[float]]: data = await self._fetch_embeddings(texts) return [item["embedding"] for item in data] async def _embed_openai(self, text: str) -> list[float]: data = await self._fetch_embeddings(text) return data[0]["embedding"] async def _embed_batch_openai(self, texts: list[str]) -> list[list[float]]: data = await self._fetch_embeddings(texts) return [item["embedding"] for item in data]

offx-zinth added 3 commits April 19, 2026 11:59

Save my local changes before linking

f3559c1

Resolved merge conflicts

962d1b0

rose v1

f69cd8d

offx-zinth merged commit b45d8fb into main Apr 19, 2026
1 check failed

gemini-code-assist Bot reviewed Apr 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Master#24

Master#24
offx-zinth merged 3 commits intomainfrom
master

offx-zinth commented Apr 19, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 19, 2026

Uh oh!

gemini-code-assist Bot Apr 19, 2026

Uh oh!

gemini-code-assist Bot Apr 19, 2026

Uh oh!

gemini-code-assist Bot Apr 19, 2026

Uh oh!

gemini-code-assist Bot Apr 19, 2026

Uh oh!

gemini-code-assist Bot Apr 19, 2026

Uh oh!

gemini-code-assist Bot Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

offx-zinth commented Apr 19, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant