Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
840 changes: 840 additions & 0 deletions docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md

Large diffs are not rendered by default.

502 changes: 502 additions & 0 deletions docs/design/llama-stack-config-merge/llama-stack-config-merge.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Library-mode PoC evidence

Command:
```bash
export OPENAI_API_KEY=<redacted>
export E2E_OPENAI_MODEL=gpt-4o-mini
uv run lightspeed-stack -c docs/design/llama-stack-config-merge/poc-evidence/lightspeed-stack-unified-library.yaml
```
Comment on lines +3 to +8
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor: blank line around fenced block (MD031).

markdownlint flags missing blank lines around the fenced bash block. Trivial; evidence doc slated for removal pre-merge.

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 4-4: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/llama-stack-config-merge/poc-evidence/library-mode/README.md`
around lines 3 - 8, The fenced bash code block starting with "```bash" in the
README.md is missing blank lines before and/or after (markdownlint MD031); add a
single blank line immediately above the opening ```bash line and a single blank
line immediately below the closing ``` line so the fenced block is separated
from surrounding text and satisfies MD031.


## What the unified config does

- `llama_stack.config.profile: /abs/path/to/tests/e2e/configs/run-ci.yaml` — baseline loaded from the CI profile
- `llama_stack.config.native_override.safety.default_shield_id: llama-guard` — override proves merge works

## Evidence

- `synthesized-run.yaml` — the full run.yaml LCORE produced from the unified config
- `query-response.json` — a successful `/v1/query` round-trip

## Proves

- `llama_stack.library_client_config_path` was NOT used (no external run.yaml needed)
- `llama_stack.config.profile` was used as the synthesis baseline (path resolution works with absolute paths)
- `llama_stack.config.native_override` was merged onto the baseline
- `AsyncLlamaStackAsLibraryClient` accepts the synthesized file path (answered item #24: file-only, not dict)
- `/v1/query` succeeded end-to-end through the synthesized stack
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"conversation_id":"976ef32527283085ba2f1d0cfb4c16d97071bf64391a8200","response":"The three primary colors are red, blue, and yellow.","rag_chunks":[],"referenced_documents":[],"truncated":false,"input_tokens":24,"output_tokens":12,"available_quotas":{},"tool_calls":[],"tool_results":[]}
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
apis:
- agents
- batches
- datasetio
- eval
- files
- inference
- safety
- scoring
- tool_runtime
- vector_io
benchmarks: []
datasets: []
image_name: starter
providers:
agents:
- config:
persistence:
agent_state:
backend: kv_default
namespace: agents_state
responses:
backend: sql_default
table_name: agents_responses
provider_id: meta-reference
provider_type: inline::meta-reference
batches:
- config:
kvstore:
backend: kv_default
namespace: batches_store
provider_id: reference
provider_type: inline::reference
datasetio:
- config:
kvstore:
backend: kv_default
namespace: huggingface_datasetio
provider_id: huggingface
provider_type: remote::huggingface
- config:
kvstore:
backend: kv_default
namespace: localfs_datasetio
provider_id: localfs
provider_type: inline::localfs
eval:
- config:
kvstore:
backend: kv_default
namespace: eval_store
provider_id: meta-reference
provider_type: inline::meta-reference
files:
- config:
metadata_store:
backend: sql_default
table_name: files_metadata
storage_dir: ~/.llama/storage/files
provider_id: meta-reference-files
provider_type: inline::localfs
inference:
- config:
allowed_models:
- ${env.E2E_OPENAI_MODEL:=gpt-4o-mini}
api_key: ${env.OPENAI_API_KEY}
provider_id: openai
provider_type: remote::openai
- config: {}
provider_id: sentence-transformers
provider_type: inline::sentence-transformers
safety:
- config:
excluded_categories: []
provider_id: llama-guard
provider_type: inline::llama-guard
scoring:
- config: {}
provider_id: basic
provider_type: inline::basic
- config: {}
provider_id: llm-as-judge
provider_type: inline::llm-as-judge
- config:
openai_api_key: '********'
provider_id: braintrust
provider_type: inline::braintrust
tool_runtime:
- config: {}
provider_id: rag-runtime
provider_type: inline::rag-runtime
- config: {}
provider_id: model-context-protocol
provider_type: remote::model-context-protocol
vector_io: []
registered_resources:
benchmarks: []
datasets: []
models:
- metadata:
embedding_dimension: 768
model_id: all-mpnet-base-v2
model_type: embedding
provider_id: sentence-transformers
provider_model_id: all-mpnet-base-v2
scoring_fns: []
shields:
- provider_id: llama-guard
provider_shield_id: openai/gpt-4o-mini
shield_id: llama-guard
Comment on lines +107 to +110
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Suspicious provider_shield_id value in evidence artifact.

The llama-guard shield lists provider_shield_id: openai/gpt-4o-mini, i.e. an OpenAI chat model id is being registered as the Llama Guard shield id. This almost certainly reflects a misconfigured native_override in the PoC input rather than a working safety shield (Llama Guard expects a guard model id such as meta-llama/Llama-Guard-3-8B). The artifact is being removed pre-merge, but worth noting so the same shape is not copied into the spec or any follow-up defaults.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@docs/design/llama-stack-config-merge/poc-evidence/library-mode/synthesized-run.yaml`
around lines 107 - 110, The shields entry has an incorrect provider_shield_id
value: replace the OpenAI chat model id in the shields list (symbols: shields,
provider_id, provider_shield_id, shield_id) with the correct Llama Guard model
id or remove the mistaken native_override that injected it; specifically, ensure
provider_shield_id uses a guard model identifier such as
"meta-llama/Llama-Guard-3-8B" (or the intended guard model) and verify the
native_override/source that produced the OpenAI id is fixed so future artifacts
don’t carry an OpenAI model as a Llama Guard shield.

tool_groups:
- provider_id: rag-runtime
toolgroup_id: builtin::rag
vector_stores: []
safety:
default_shield_id: llama-guard
scoring_fns: []
server:
port: 8321
storage:
backends:
kv_default:
db_path: ${env.KV_STORE_PATH:=~/.llama/storage/kv_store.db}
type: kv_sqlite
sql_default:
db_path: ${env.SQL_STORE_PATH:=~/.llama/storage/sql_store.db}
type: sql_sqlite
stores:
conversations:
backend: sql_default
table_name: openai_conversations
inference:
backend: sql_default
max_write_queue_size: 10000
num_writers: 4
table_name: inference_store
metadata:
backend: kv_default
namespace: registry
prompts:
backend: kv_default
namespace: prompts
vector_stores:
default_embedding_model:
model_id: all-mpnet-base-v2
provider_id: sentence-transformers
default_provider_id: faiss
version: 2
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Lightspeed Core Service (LCS) - Unified PoC
service:
host: 0.0.0.0
port: 8080
base_url: http://localhost:8080
auth_enabled: false
workers: 1
color_log: true
access_log: true
# Unified mode: no `library_client_config_path`. Operational LS config is
# synthesized by LCORE from `llama_stack.config` below.
llama_stack:
use_as_library_client: true
config:
# Use the CI-friendly baseline via `profile` (no EXTERNAL_PROVIDERS_DIR
# env var required). Equivalent to what tests/e2e/configs/run-ci.yaml
# provides; this exercises the `profile:` path of the synthesizer.
profile: /home/msvistun/repos/lightspeed/stack/tests/e2e/configs/run-ci.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove the machine-local profile path before merge.

This config will fail anywhere except the author’s workstation and should not be committed as reusable evidence. If this PoC artifact must remain, use a repo-relative path instead.

Suggested portable path if the artifact is retained
-    profile: /home/msvistun/repos/lightspeed/stack/tests/e2e/configs/run-ci.yaml
+    profile: ../../../../tests/e2e/configs/run-ci.yaml
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
profile: /home/msvistun/repos/lightspeed/stack/tests/e2e/configs/run-ci.yaml
profile: ../../../../tests/e2e/configs/run-ci.yaml
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@docs/design/llama-stack-config-merge/poc-evidence/lightspeed-stack-unified-library.yaml`
at line 18, The committed YAML contains a machine-local absolute path in the
profile key ('profile:
/home/msvistun/repos/lightspeed/stack/tests/e2e/configs/run-ci.yaml'); remove or
replace this with a portable repo-relative path (for example
'./tests/e2e/configs/run-ci.yaml') or delete the profile entry if not needed so
the PoC artifact is reusable outside the author’s workstation; update any
references that expect the old absolute path (search for the profile key) to use
the new relative path.

# Small native_override: prove overrides take effect end-to-end.
native_override:
safety:
default_shield_id: llama-guard
user_data_collection:
feedback_enabled: false
feedback_storage: "/tmp/lcore-836-poc/feedback"
transcripts_enabled: false
transcripts_storage: "/tmp/lcore-836-poc/transcripts"
conversation_cache:
type: "sqlite"
sqlite:
db_path: "/tmp/lcore-836-poc/conversation-cache.db"
authentication:
module: "noop"
12 changes: 9 additions & 3 deletions scripts/llama-stack-entrypoint.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
#!/bin/bash
# Entrypoint for llama-stack container.
# Enriches config with lightspeed dynamic values, then starts llama-stack.
# Produces the run.yaml from lightspeed-stack.yaml then starts llama-stack.
#
# Two modes, auto-detected by the Python CLI (llama_stack_configuration.py):
# - Unified (LCORE-836): `llama_stack.config` present in lightspeed-stack.yaml.
# The full run.yaml is SYNTHESIZED from the unified block; -i is ignored.
# - Legacy: `run.yaml` is mounted separately and ENRICHED with BYOK RAG / Solr /
# Azure Entra ID values from lightspeed-stack.yaml.

set -e

Expand All @@ -9,9 +15,9 @@ ENRICHED_CONFIG="/opt/app-root/run.yaml"
LIGHTSPEED_CONFIG="${LIGHTSPEED_CONFIG:-/opt/app-root/lightspeed-stack.yaml}"
ENV_FILE="/opt/app-root/.env"

# Enrich config if lightspeed config exists
# Run the config producer if lightspeed config exists
if [ -f "$LIGHTSPEED_CONFIG" ]; then
echo "Enriching llama-stack config..."
echo "Preparing llama-stack config from $LIGHTSPEED_CONFIG ..."
ENRICHMENT_FAILED=0
python3 /opt/app-root/llama_stack_configuration.py \
-c "$LIGHTSPEED_CONFIG" \
Expand Down
65 changes: 57 additions & 8 deletions src/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import json
import os
import tempfile
from pathlib import Path
from typing import Optional

import yaml
Expand All @@ -11,7 +12,12 @@
from llama_stack_client import APIConnectionError, AsyncLlamaStackClient

from configuration import configuration
from llama_stack_configuration import YamlDumper, enrich_byok_rag, enrich_solr
from llama_stack_configuration import (
YamlDumper,
enrich_byok_rag,
enrich_solr,
synthesize_configuration,
)
from log import get_logger
from models.config import LlamaStackConfiguration
from models.responses import ServiceUnavailableResponse
Expand Down Expand Up @@ -44,22 +50,65 @@ async def load(self, llama_stack_config: LlamaStackConfiguration) -> None:
async def _load_library_client(self, config: LlamaStackConfiguration) -> None:
"""Initialize client in library mode.

Two paths:
- Unified mode (`config.config` set): synthesize full run.yaml from the
lightspeed-stack config and write to a deterministic path.
- Legacy mode (`config.library_client_config_path` set): read the
external run.yaml and apply in-place enrichment.

Stores the final config path for use in reload.
"""
if config.library_client_config_path is None:
if config.config is not None:
logger.info("Using Llama stack as library client (unified mode)")
self._config_path = self._synthesize_library_config()
elif config.library_client_config_path is not None:
logger.info("Using Llama stack as library client (legacy mode)")
self._config_path = self._enrich_library_config(
config.library_client_config_path
)
else:
raise ValueError(
"Configuration problem: library_client_config_path is not set"
"Configuration problem: neither `llama_stack.config` (unified) "
"nor `llama_stack.library_client_config_path` (legacy) is set"
)
logger.info("Using Llama stack as library client")

self._config_path = self._enrich_library_config(
config.library_client_config_path
)

client = AsyncLlamaStackAsLibraryClient(self._config_path)
await client.initialize()
self._lsc = client

def _synthesize_library_config(self) -> str:
"""Synthesize the full Llama Stack run.yaml from unified-mode config.

Library-client-friendly: writes to a file since the Llama Stack library
client only accepts a file path (not a dict). Returns the path to the
synthesized file.

The synthesizer preserves env-var references (`${env.FOO}`) verbatim;
secrets are not resolved into the file on disk.

Returns:
str: Path to the synthesized run.yaml.
"""
lcs_config_dict = configuration.configuration.model_dump(
exclude_none=True, mode="python"
)
config_file_dir: Optional[Path] = None
env_path = os.environ.get("LIGHTSPEED_STACK_CONFIG_PATH")
if env_path:
config_file_dir = Path(env_path).resolve().parent

ls_config = synthesize_configuration(
lcs_config_dict, config_file_dir=config_file_dir
)

synthesized_path = os.path.join(
tempfile.gettempdir(), "llama_stack_synthesized_config.yaml"
)
with open(synthesized_path, "w", encoding="utf-8") as f:
yaml.dump(ls_config, f, Dumper=YamlDumper, default_flow_style=False)
Comment on lines +104 to +108
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not write the synthesized config to a fixed /tmp filename.

/tmp/llama_stack_synthesized_config.yaml is predictable and shared across processes, so concurrent instances can overwrite each other and a pre-existing symlink can redirect the write. Use a securely created file or an app-owned directory with restrictive permissions.

Safer temp-file creation
-        synthesized_path = os.path.join(
-            tempfile.gettempdir(), "llama_stack_synthesized_config.yaml"
-        )
-        with open(synthesized_path, "w", encoding="utf-8") as f:
-            yaml.dump(ls_config, f, Dumper=YamlDumper, default_flow_style=False)
+        with tempfile.NamedTemporaryFile(
+            "w",
+            encoding="utf-8",
+            prefix="llama_stack_synthesized_config_",
+            suffix=".yaml",
+            delete=False,
+        ) as f:
+            synthesized_path = f.name
+            yaml.dump(ls_config, f, Dumper=YamlDumper, default_flow_style=False)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/client.py` around lines 104 - 108, The code currently writes the
synthesized config to a predictable temp path via the synthesized_path variable
and open(...), which is unsafe for concurrent processes and symlink attacks;
change this to create a secure, unique temp file (e.g. via
tempfile.NamedTemporaryFile(delete=False) or os.open with tempfile.mkstemp) and
write ls_config to that securely opened file using yaml.dump, then set
restrictive permissions (chmod 0o600) on the new file and close it before
returning/using the path; reference the synthesized_path creation, the open(...)
write block, and the yaml.dump call when making the change.

logger.info("Wrote synthesized Llama Stack config to %s", synthesized_path)
return synthesized_path

def _load_service_client(self, config: LlamaStackConfiguration) -> None:
"""Initialize client in service mode (remote HTTP)."""
logger.info("Using Llama stack running as a service")
Expand Down
Loading
Loading