Skip to content

Master#24

Merged
offx-zinth merged 3 commits intomainfrom
master
Apr 19, 2026
Merged

Master#24
offx-zinth merged 3 commits intomainfrom
master

Conversation

@offx-zinth
Copy link
Copy Markdown
Owner

some improvements

@offx-zinth offx-zinth merged commit b45d8fb into main Apr 19, 2026
1 check failed
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an EmbeddingService supporting NVIDIA and OpenAI providers and integrates it into the StaticSemanticEnricher and SeedWalkEngine. However, the PR is currently in a broken state as multiple files (including docker-compose.yml, pyproject.toml, and several Python modules) contain unresolved git merge conflict markers. Additionally, the EmbeddingService requires refactoring to correctly handle provider-specific configuration defaults and to reduce code duplication in the embedding logic.

Comment thread docker-compose.yml
Comment on lines +9 to +15
<<<<<<< HEAD
- "7475:7474" # Host 7475 maps to Container 7474
- "7688:7687" # Host 7688 maps to Container 7687
=======
- "7474:7474"
- "7687:7687"
>>>>>>> 87cfd9650622e51c4c94d43d490450a82a87ad3d
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Merge conflict markers have been committed to the repository. This results in invalid YAML syntax and will cause the docker-compose command to fail. Please resolve the conflicts and remove the markers.

Comment thread pyproject.toml
Comment on lines +24 to +27
<<<<<<< HEAD
"chromadb",
=======
>>>>>>> 87cfd9650622e51c4c94d43d490450a82a87ad3d
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Merge conflict markers found in the dependencies section. This will prevent the project from being installed or built correctly. Please resolve the conflict.

Comment thread smp/engine/enricher.py
Comment on lines +1 to +10
<<<<<<< HEAD
"""Static semantic enricher with optional LLM-based embedding."""
=======
"""Static semantic enricher — AST-based extraction.

Extracts docstrings, inline comments, decorators, type annotations,
and computes source hashes purely from the AST.
No LLM or embedding generation.
"""
>>>>>>> 87cfd9650622e51c4c94d43d490450a82a87ad3d
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Merge conflict markers detected in the module docstring. This is invalid Python code and will cause a SyntaxError at runtime.

Comment thread smp/engine/interfaces.py
Comment on lines +58 to +64
<<<<<<< HEAD
@abc.abstractmethod
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
"""Generate embeddings for multiple texts."""

=======
>>>>>>> 87cfd9650622e51c4c94d43d490450a82a87ad3d
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Merge conflict markers found in the SemanticEnricher abstract class. This will break the engine's interface definitions.

Comment thread smp/engine/seed_walk.py
Comment on lines +102 to +105
<<<<<<< HEAD
delegate: QueryEngineInterface | None = None,
=======
>>>>>>> 87cfd9650622e51c4c94d43d490450a82a87ad3d
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Merge conflict markers in the constructor parameters. This will prevent the SeedWalkEngine from being instantiated.

Comment thread smp/engine/embedding.py
Comment on lines +26 to +31
self._provider = provider
self._api_key = api_key or os.environ.get("NVIDIA_NIM_API_KEY") or os.environ.get("OPENAI_API_KEY", "")
self._model = model or os.environ.get("EMBEDDING_MODEL", "nvidia/nv-embed-qa-4")
self._base_url = base_url or os.environ.get(
"EMBEDDING_BASE_URL", "https://integrate.api.nvidia.com/v1"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The initialization logic for _api_key and _base_url does not correctly differentiate between providers. If provider='openai' is used, it might still pick up an NVIDIA API key or use the NVIDIA base URL by default. It is safer to select defaults based on the provider value.

        self._provider = provider
        if api_key:
            self._api_key = api_key
        else:
            env_key = "NVIDIA_NIM_API_KEY" if provider == "nvidia" else "OPENAI_API_KEY"
            self._api_key = os.environ.get(env_key, "")

        self._model = model or os.environ.get("EMBEDDING_MODEL", "nvidia/nv-embed-qa-4" if provider == "nvidia" else "text-embedding-3-small")

        default_url = "https://integrate.api.nvidia.com/v1" if provider == "nvidia" else "https://api.openai.com/v1"
        self._base_url = base_url or os.environ.get("EMBEDDING_BASE_URL", default_url)

Comment thread smp/engine/embedding.py
Comment on lines +76 to +114
async def _embed_nvidia(self, text: str) -> list[float]:
payload = {
"input": text,
"model": self._model,
}
response = await self._client.post("/embeddings", json=payload)
response.raise_for_status()
data = response.json()
return data["data"][0]["embedding"]

async def _embed_batch_nvidia(self, texts: list[str]) -> list[list[float]]:
payload = {
"input": texts,
"model": self._model,
}
response = await self._client.post("/embeddings", json=payload)
response.raise_for_status()
data = response.json()
return [item["embedding"] for item in data["data"]]

async def _embed_openai(self, text: str) -> list[float]:
payload = {
"input": text,
"model": self._model,
}
response = await self._client.post("/embeddings", json=payload)
response.raise_for_status()
data = response.json()
return data["data"][0]["embedding"]

async def _embed_batch_openai(self, texts: list[str]) -> list[list[float]]:
payload = {
"input": texts,
"model": self._model,
}
response = await self._client.post("/embeddings", json=payload)
response.raise_for_status()
data = response.json()
return [item["embedding"] for item in data["data"]]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The implementation of embedding methods for both NVIDIA and OpenAI providers is identical. Consolidating this logic into a single private method will reduce code duplication and simplify future updates.

    async def _fetch_embeddings(self, input_data: str | list[str]) -> Any:
        payload = {"input": input_data, "model": self._model}
        response = await self._client.post("/embeddings", json=payload)
        response.raise_for_status()
        return response.json()["data"]

    async def _embed_nvidia(self, text: str) -> list[float]:
        data = await self._fetch_embeddings(text)
        return data[0]["embedding"]

    async def _embed_batch_nvidia(self, texts: list[str]) -> list[list[float]]:
        data = await self._fetch_embeddings(texts)
        return [item["embedding"] for item in data]

    async def _embed_openai(self, text: str) -> list[float]:
        data = await self._fetch_embeddings(text)
        return data[0]["embedding"]

    async def _embed_batch_openai(self, texts: list[str]) -> list[list[float]]:
        data = await self._fetch_embeddings(texts)
        return [item["embedding"] for item in data]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant