LCORE-836 spike: merge run.yaml into lightspeed-stack.yaml#1580
LCORE-836 spike: merge run.yaml into lightspeed-stack.yaml#1580max-svistunov wants to merge 3 commits intolightspeed-core:mainfrom
Conversation
Add a unified `llama_stack.config` sub-section to `lightspeed-stack.yaml` that lets operators express the Llama Stack operational configuration in one place, eliminating the need for a separately maintained `run.yaml`. Legacy mode (`llama_stack.library_client_config_path` + external run.yaml) is preserved and mutually exclusive with the new path. New Pydantic classes `UnifiedLlamaStackConfig`, `UnifiedInferenceSection`, and `UnifiedInferenceProvider` define the unified schema; a new `synthesize_configuration` pipeline applies profile (or baseline) → existing BYOK RAG / Solr OKP enrichment → high-level sections → `native_override` (deep-merge, list-replacement). A `baseline: default | empty` field enables strict lossless round-trip for the migration tool. Library-mode wiring in `src/client.py` detects the unified form and writes the synthesized file to disk for `AsyncLlamaStackAsLibraryClient` (which the PoC confirmed requires a file path, not a dict). Legacy enrichment path is unchanged. A `--migrate-config` flag on the `lightspeed-stack` CLI produces a unified single-file config from a legacy (run.yaml, lightspeed-stack.yaml) pair (dumb lift-and-shift: content goes under `native_override` with `baseline: empty`, and `library_client_config_path` is removed). The LS container's `llama_stack_configuration.py` CLI now auto-detects unified vs legacy based on the presence of `llama_stack.config`; the entrypoint script requires no functional change (comment clarified). `test.containerfile` copies `src/data/` into the container so the shipped default baseline resolves at runtime. Tests: 22 new unit tests covering merge semantics, high-level inference expansion, the full synthesize pipeline, profile loading, precedence (profile < high-level < native_override), and migrate-then-synthesize round-trip lossless equality. 3 new schema tests cover unified/legacy mutual exclusion. 5 existing dump-configuration expectations updated for the new `config: None` field; 1 client error-message regex updated. Full `uv run make verify` passes (black, pylint 10/10, ruff, docstyle, mypy). `uv run pytest tests/unit/` — 2098 passed, 1 skipped, 0 failed.
Add the spike doc (decisions up front, background below, 7 proposed JIRAs) and the spec doc (requirements R1..R11, architecture, implementation guide, migration worked example) under `docs/design/llama-stack-config-merge/`. Key decisions captured for reviewer confirmation: - Overall shape: Option C (high-level + native_override) with Option E (profile feature, no shipped profiles) as an optional layer. - Deprecation: calendar-based (e.g., "legacy path removed no sooner than 6 months after WARN begins"); concrete timing deferred to PM review. - Override precedence: deep-merge with list replacement at leaf level. - Secrets handling: env-var references preserved verbatim in synthesized files; never resolved to disk. - Format detection: shape-based, with an optional `config_format_version` field that, if present, must agree with the shape. - Migration tool shape: `--migrate-config` flag (no CLI refactor); dumb lift-and-shift mode only in v1; smart mode deferred. - Profile distribution: feature only, LCORE ships no profiles of its own beyond reference examples under `examples/profiles/`. - LS process supervision and hot-reload: out of scope (LCORE-777, LCORE-778, LCORE-781 territory). The spike's PoC validated library-mode end-to-end: a `lightspeed-stack.yaml` containing only `llama_stack.config` (no external run.yaml) boots LCORE, serves /v1/query with a real model response, and a `native_override` value demonstrably takes effect in the synthesized run.yaml. Server-mode end-to-end through docker-compose was skipped because the LS container image rebuild (~2 GB, UBI + llama-stack llslibdev dependency sync) was impractical for the spike timeline; the same synthesis code path is exercised by the unit tests, including the lossless migrate-then-synthesize round-trip. PoC evidence is under `poc-evidence/library-mode/` as reference material for reviewers, and per the spike howto it is intended to be removed from the branch prior to merge. The spike doc and spec doc remain permanent.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 33 minutes and 57 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThis pull request introduces a unified configuration model for Llama Stack that consolidates Lightspeed Stack high-level settings with Llama Stack operational configuration. It includes new configuration model definitions, synthesis logic to generate Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant LibClient as Library Client<br/>(AsyncLlamaStackAsLibraryClient)
participant ConfigLoader as Config Loader<br/>(_synthesize_library_config)
participant BaselineLoader as Baseline/Profile<br/>Loader
participant Synthesizer as Config<br/>Synthesizer
participant FileWriter as File Writer
participant LlamaStack as Llama Stack
Client->>LibClient: Initialize with unified config
LibClient->>ConfigLoader: _load_library_client(config)
ConfigLoader->>BaselineLoader: Load default baseline or profile
BaselineLoader-->>ConfigLoader: Baseline dict
ConfigLoader->>Synthesizer: synthesize_configuration()
Synthesizer->>Synthesizer: apply_high_level_inference()
Synthesizer->>Synthesizer: deep_merge_list_replace(native_override)
Synthesizer-->>ConfigLoader: Complete run.yaml dict
ConfigLoader->>FileWriter: synthesize_to_file()
FileWriter->>FileWriter: Write YAML to temp file
FileWriter-->>ConfigLoader: File path
ConfigLoader-->>LibClient: Synthesized config file path
LibClient->>LlamaStack: Initialize with synthesized file
LlamaStack-->>LibClient: Ready
LibClient-->>Client: Client ready
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/lightspeed_stack.py (1)
117-165:⚠️ Potential issue | 🟡 MinorStale
main()docstring — references non-existent flag and omits the new--migrate-configflow.The docstring still mentions
--generate-llama-stack-configuration, which is not a CLI flag in this parser, and it does not describe the newly added--migrate-config/--run-yaml/--migrate-outputbranch that exits beforeload_configurationruns. Please update the docstring to match the actual behavior so theRaises: SystemExitpaths and flag list stay accurate.✏️ Proposed docstring fix
- If --dump-schema is provided, writes the active configuration schema to schema.json and exits (exits with status 1 on failure). - - If --generate-llama-stack-configuration is provided, generates and stores - the Llama Stack configuration to the specified output file and exits - (exits with status 1 on failure). + - If --migrate-config is provided, migrates the legacy + (run.yaml + lightspeed-stack.yaml) setup into a unified single-file + configuration at --migrate-output and exits (status 1 on failure). + This branch bypasses configuration.load_configuration(). - Otherwise, sets LIGHTSPEED_STACK_CONFIG_PATH for worker processes, starts the quota scheduler, and starts the Uvicorn web service. Raises: - SystemExit: when configuration dumping or Llama Stack generation fails - (exits with status 1). + SystemExit: when configuration dumping, schema dumping, or config + migration fails (exits with status 1).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_stack.py` around lines 117 - 165, Update the main() docstring to remove the obsolete reference to --generate-llama-stack-configuration, add documentation for the new --migrate-config flow and its related flags (--run-yaml and --migrate-output), and state that when args.migrate_config is true the migration branch (handled by migrate_config_dumb) runs and exits before load_configuration; also keep the note that failures in dumping/migration raise SystemExit (status 1). Reference main(), args.migrate_config, and migrate_config_dumb so the updated docstring accurately matches the implemented control flow.src/llama_stack_configuration.py (1)
918-931:⚠️ Potential issue | 🔴 CriticalSeparate raw and expanded config to prevent writing secrets to output file.
synthesize_to_file()is documented to preserve env-var references like${env.FOO}verbatim in the output. The current code expands all environment variables on line 920 before passing tosynthesize_to_file(), causing secrets to be written in plaintext to the generatedrun.yaml.Use
raw_configforsynthesize_to_file()(preserves env refs as documented), andexpanded_configforsetup_azure_entra_id_token()(which requires actual credential values) andgenerate_configuration()(legacy mode enrichment):Keep raw unified config separate from expanded legacy config
with open(args.config, "r", encoding="utf-8") as f: - config = yaml.safe_load(f) - config = replace_env_vars(config) + raw_config = yaml.safe_load(f) + expanded_config = replace_env_vars(raw_config) - unified_present = (config.get("llama_stack") or {}).get("config") is not None + unified_present = (raw_config.get("llama_stack") or {}).get("config") is not None if unified_present: logger.info("Unified mode detected (llama_stack.config present)") # Azure Entra ID side-effect (writes .env) stays part of boot — still run it. - setup_azure_entra_id_token(config.get("azure_entra_id"), args.env_file) + setup_azure_entra_id_token(expanded_config.get("azure_entra_id"), args.env_file) synthesize_to_file( - config, + raw_config, args.output, config_file_dir=Path(args.config).resolve().parent, ) else: logger.info("Legacy mode detected (no llama_stack.config)") - generate_configuration(args.input, args.output, config, args.env_file) + generate_configuration(args.input, args.output, expanded_config, args.env_file)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llama_stack_configuration.py` around lines 918 - 931, The code currently calls replace_env_vars on the loaded config and then passes that expanded config into synthesize_to_file, which will write expanded secrets into the output; instead, keep two variables: raw_config = yaml.safe_load(...) (no replace_env_vars) and expanded_config = replace_env_vars(raw_config); use raw_config when calling synthesize_to_file(...) so env-var references remain verbatim, and use expanded_config when calling setup_azure_entra_id_token(...) and when invoking generate_configuration(...) or any legacy-mode enrichment that needs real values; update references of config in this block to the appropriate raw_config or expanded_config names accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md`:
- Around line 274-284: The markdown file contains fenced code blocks that lack
surrounding blank lines and trigger markdownlint MD031; for each fenced block
(e.g., the "Agentic tool instruction" block and the other blocks at the noted
ranges like 310-317, 343-349, 376-381, 411-416, 438-443, 463-468, 598-626) add
an empty line immediately before the opening ``` and an empty line immediately
after the closing ``` so every fenced code block is separated by blank lines
from surrounding text/content.
- Line 147: The in-page anchor in the link "See [Merge semantics worked
examples](`#merge-semantics-worked-examples`)." doesn't match the actual heading
"### Merge semantics — worked examples"; update either the link target or the
heading so they match under markdownlint rules — e.g., change the heading to use
a plain hyphen "### Merge semantics - worked examples" or change the link to the
exact slug produced from the em-dash (percent-encoded or the renderer-specific
slug), ensuring the text in the link target and the heading "Merge semantics —
worked examples" are identical.
In `@docs/design/llama-stack-config-merge/llama-stack-config-merge.md`:
- Around line 280-291: Update the doc to stop claiming synthesized files are
safe as world-readable and instead require restrictive file permissions and
recommend env-var references; specifically change wording around
native_override, apply_high_level_inference, replace_env_vars and run.yaml to
note that native_override and migrations may include literal secrets (e.g., API
keys), that apply_high_level_inference may not always emit only ${env.<VAR>} and
replace_env_vars is not a guarantee for all inputs, and mandate default
restrictive filesystem permissions and operator guidance to prefer env refs and
secrets management rather than declaring world-readable output acceptable.
In `@docs/design/llama-stack-config-merge/poc-evidence/library-mode/README.md`:
- Around line 3-8: The fenced bash code block starting with "```bash" in the
README.md is missing blank lines before and/or after (markdownlint MD031); add a
single blank line immediately above the opening ```bash line and a single blank
line immediately below the closing ``` line so the fenced block is separated
from surrounding text and satisfies MD031.
In
`@docs/design/llama-stack-config-merge/poc-evidence/library-mode/synthesized-run.yaml`:
- Around line 107-110: The shields entry has an incorrect provider_shield_id
value: replace the OpenAI chat model id in the shields list (symbols: shields,
provider_id, provider_shield_id, shield_id) with the correct Llama Guard model
id or remove the mistaken native_override that injected it; specifically, ensure
provider_shield_id uses a guard model identifier such as
"meta-llama/Llama-Guard-3-8B" (or the intended guard model) and verify the
native_override/source that produced the OpenAI id is fixed so future artifacts
don’t carry an OpenAI model as a Llama Guard shield.
In
`@docs/design/llama-stack-config-merge/poc-evidence/lightspeed-stack-unified-library.yaml`:
- Line 18: The committed YAML contains a machine-local absolute path in the
profile key ('profile:
/home/msvistun/repos/lightspeed/stack/tests/e2e/configs/run-ci.yaml'); remove or
replace this with a portable repo-relative path (for example
'./tests/e2e/configs/run-ci.yaml') or delete the profile entry if not needed so
the PoC artifact is reusable outside the author’s workstation; update any
references that expect the old absolute path (search for the profile key) to use
the new relative path.
In `@src/client.py`:
- Around line 104-108: The code currently writes the synthesized config to a
predictable temp path via the synthesized_path variable and open(...), which is
unsafe for concurrent processes and symlink attacks; change this to create a
secure, unique temp file (e.g. via tempfile.NamedTemporaryFile(delete=False) or
os.open with tempfile.mkstemp) and write ls_config to that securely opened file
using yaml.dump, then set restrictive permissions (chmod 0o600) on the new file
and close it before returning/using the path; reference the synthesized_path
creation, the open(...) write block, and the yaml.dump call when making the
change.
In `@src/data/default_run.yaml`:
- Line 155: Remove the trailing blank line at the end of the file to satisfy
YAMLlint's empty-lines rule: edit src/data/default_run.yaml (the file containing
default run config) and delete the final blank/empty line so the file ends with
the last YAML content line only (ensure there is no extra empty line after the
last document or mapping).
- Line 18: The baseline exposes external_providers_dir:
${env.EXTERNAL_PROVIDERS_DIR} without a safe fallback, causing configs to break
when the env var is absent; either provide a sensible default (e.g., set
external_providers_dir to a known safe path or an empty string/null) or remove
the key from the baseline so profiles supply it; update the default_run.yaml
entry for external_providers_dir to use a safe literal default value or delete
the line, and ensure any code reading external_providers_dir handles the chosen
default.
In `@src/lightspeed_stack.py`:
- Around line 150-165: The except block for the migrate_config_dumb call should
log the full traceback; replace the logger.error("Migration failed: %s", e) call
with logger.exception("Migration failed") so the stack trace is captured when
args.migrate_config triggers migrate_config_dumb (refer to migrate_config_dumb
and the surrounding try/except using logger).
In `@src/llama_stack_configuration.py`:
- Around line 825-829: The file write is using open(output_file, "w") which can
create world-readable files; change the write to create the file with explicit
restrictive permissions (mode 0o600). After ensuring the parent directory is
created (the existing Path(output_file).parent.mkdir call is fine), open the
file using a low-level create with os.open(...) with flags
O_WRONLY|O_CREAT|O_TRUNC and mode 0o600, wrap the returned fd with os.fdopen to
get a text file object, then use yaml.dump(ls_config, f, Dumper=YamlDumper,
default_flow_style=False) to write; reference the variables/functions ls_config,
output_file, synthesize_configuration, and YamlDumper so you update the existing
block that writes the synthesized configuration.
In `@tests/unit/test_client.py`:
- Around line 84-90: Replace the current test that relies on
AsyncLlamaStackClientHolder().load(...) to trigger a runtime ValueError with a
direct validation test against the Pydantic model: construct an invalid
LlamaStackConfiguration(...) instance (setting library_client_config_path=None
and leaving both unified and legacy unset) inside pytest.raises(ValueError,
match="neither .*unified.* nor .*legacy.* is set") so the
check_llama_stack_model validator on the LlamaStackConfiguration model is
exercised; this ensures validation fails at model construction rather than via
AsyncLlamaStackClientHolder.load or _load_library_client runtime checks.
In `@tests/unit/test_llama_stack_synthesize.py`:
- Around line 1-371: This PoC test suite should not be merged as-is; either
delete the whole test file (remove the new tests referencing
synthesize_configuration/migrate_config_dumb/apply_high_level_inference) or, if
you intentionally keep it, change tests to avoid relying on in-place mutation by
using the functional return value of apply_high_level_inference (don't assert
that the passed ls_config instance was mutated) and annotate the
MINIMAL_BASELINE constant with Final[dict[str, Any]] and update any tests that
mutate it to operate on a copy (e.g., use copy.deepcopy) so the baseline is not
altered in-place.
---
Outside diff comments:
In `@src/lightspeed_stack.py`:
- Around line 117-165: Update the main() docstring to remove the obsolete
reference to --generate-llama-stack-configuration, add documentation for the new
--migrate-config flow and its related flags (--run-yaml and --migrate-output),
and state that when args.migrate_config is true the migration branch (handled by
migrate_config_dumb) runs and exits before load_configuration; also keep the
note that failures in dumping/migration raise SystemExit (status 1). Reference
main(), args.migrate_config, and migrate_config_dumb so the updated docstring
accurately matches the implemented control flow.
In `@src/llama_stack_configuration.py`:
- Around line 918-931: The code currently calls replace_env_vars on the loaded
config and then passes that expanded config into synthesize_to_file, which will
write expanded secrets into the output; instead, keep two variables: raw_config
= yaml.safe_load(...) (no replace_env_vars) and expanded_config =
replace_env_vars(raw_config); use raw_config when calling
synthesize_to_file(...) so env-var references remain verbatim, and use
expanded_config when calling setup_azure_entra_id_token(...) and when invoking
generate_configuration(...) or any legacy-mode enrichment that needs real
values; update references of config in this block to the appropriate raw_config
or expanded_config names accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: e3d88bb1-37ac-4c41-b1f8-3fd87572f9ef
📒 Files selected for processing (17)
docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.mddocs/design/llama-stack-config-merge/llama-stack-config-merge.mddocs/design/llama-stack-config-merge/poc-evidence/library-mode/README.mddocs/design/llama-stack-config-merge/poc-evidence/library-mode/query-response.jsondocs/design/llama-stack-config-merge/poc-evidence/library-mode/synthesized-run.yamldocs/design/llama-stack-config-merge/poc-evidence/lightspeed-stack-unified-library.yamlscripts/llama-stack-entrypoint.shsrc/client.pysrc/data/default_run.yamlsrc/lightspeed_stack.pysrc/llama_stack_configuration.pysrc/models/config.pytest.containerfiletests/unit/models/config/test_dump_configuration.pytests/unit/models/config/test_llama_stack_configuration.pytests/unit/test_client.pytests/unit/test_llama_stack_synthesize.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-pr
- GitHub Check: unit_tests (3.12)
- GitHub Check: Pylinter
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Import FastAPI dependencies with:from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with:from llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Usetyping_extensions.Selffor model validators in Pydantic models
Use modern union type syntaxstr | intinstead ofUnion[str, int]
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with@abstractmethoddecorators
Use@model_validatorand@field_validatorfor Pydantic model validation
Complete type annotations for all class attributes; use specific types, notAny
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections
Files:
tests/unit/test_client.pytests/unit/models/config/test_dump_configuration.pysrc/lightspeed_stack.pytests/unit/models/config/test_llama_stack_configuration.pysrc/client.pysrc/models/config.pytests/unit/test_llama_stack_synthesize.pysrc/llama_stack_configuration.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Usepytest-mockfor AsyncMock objects in tests
Use markerpytest.mark.asynciofor async tests
Unit tests require 60% coverage, integration tests 10%
Files:
tests/unit/test_client.pytests/unit/models/config/test_dump_configuration.pytests/unit/models/config/test_llama_stack_configuration.pytests/unit/test_llama_stack_synthesize.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models extend
ConfigurationBasefor config,BaseModelfor data models
Files:
src/lightspeed_stack.pysrc/client.pysrc/models/config.pysrc/llama_stack_configuration.py
src/**/config*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/config*.py: All config uses Pydantic models extendingConfigurationBase
Base class setsextra="forbid"to reject unknown fields in Pydantic models
Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Use type hints likeOptional[FilePath],PositiveInt,SecretStrin Pydantic models
Files:
src/models/config.py
🧠 Learnings (7)
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Import Llama Stack client with: `from llama_stack_client import AsyncLlamaStackClient`
Applied to files:
tests/unit/test_client.pysrc/client.pysrc/models/config.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.
Applied to files:
docs/design/llama-stack-config-merge/poc-evidence/library-mode/synthesized-run.yamldocs/design/llama-stack-config-merge/poc-evidence/lightspeed-stack-unified-library.yamlsrc/data/default_run.yamlsrc/models/config.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Handle `APIConnectionError` from Llama Stack in error handling
Applied to files:
src/client.py
📚 Learning: 2026-04-07T15:03:11.530Z
Learnt from: jrobertboos
Repo: lightspeed-core/lightspeed-stack PR: 1396
File: src/app/endpoints/conversations_v1.py:6-6
Timestamp: 2026-04-07T15:03:11.530Z
Learning: In the `llama_stack_api` package, all imports MUST use the flat form `from llama_stack_api import <symbol>`. Sub-module imports (e.g., `from llama_stack_api.common.errors import ConversationNotFoundError`) are explicitly NOT supported and considered a code smell, as stated in `llama_stack_api/__init__.py` lines 15-19. Do not flag or suggest changing root-package imports to sub-module imports for this package.
Applied to files:
src/client.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to src/**/config*.py : All config uses Pydantic models extending `ConfigurationBase`
Applied to files:
src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
🪛 LanguageTool
docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md
[style] ~451-~451: Try moving the adverb to make the sentence clearer.
Context: ... Link to the migration doc. Legacy mode continues to fully function. Scope: - Warning emission point: ...
(SPLIT_INFINITIVE)
[style] ~655-~655: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ncluding fields LCORE doesn't model. 3. Existing CI/CD templating that treats run.yaml...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~656-~656: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...s run.yaml as a separate artifact. 4. Existing enrichment behavior (Azure Entra ID, BY...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~673-~673: To elevate your writing, try using more formal phrasing here.
Context: ...|---|---|---| | Do nothing | 0 | Legacy keeps working until deprecation window closes | | Lif...
(CONTINUE_TO_VB)
docs/design/llama-stack-config-merge/llama-stack-config-merge.md
[style] ~303-~303: To elevate your writing, try using more formal phrasing here.
Context: ...|---|---|---| | Do nothing | 0 | Legacy keeps working until deprecation closes | | Lift-and-s...
(CONTINUE_TO_VB)
[style] ~405-~405: The double modal “requires supervised” is nonstandard (only accepted in certain dialects). Consider “to be supervised”.
Context: ...oad means any implementation requires supervised restart, which is out of scope here. - ...
(NEEDS_FIXED)
🪛 markdownlint-cli2 (0.22.0)
docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md
[warning] 147-147: Link fragments should be valid
(MD051, link-fragments)
[warning] 275-275: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 311-311: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 344-344: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 377-377: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 412-412: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 439-439: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 464-464: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 599-599: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 612-612: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 618-618: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
docs/design/llama-stack-config-merge/poc-evidence/library-mode/README.md
[warning] 4-4: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🪛 YAMLlint (1.38.0)
src/data/default_run.yaml
[error] 155-155: too many blank lines (1 > 0)
(empty-lines)
🔇 Additional comments (6)
test.containerfile (1)
39-46: LGTM.Ownership and copy additions align with the unified-mode entrypoint flow.
tests/unit/models/config/test_dump_configuration.py (1)
147-147: Assertion updates are consistent.Adding
"config": Noneto all five dumped-config expectations matches the new Optionalconfigfield onLlamaStackConfiguration.docs/design/llama-stack-config-merge/poc-evidence/library-mode/query-response.json (1)
1-1: PoC evidence artifact — no action.Per PR description, PoC evidence is expected to be removed prior to merge.
scripts/llama-stack-entrypoint.sh (1)
3-20: Comment/log update LGTM.Header and log string accurately describe the two auto-detected modes; no behavior change.
tests/unit/models/config/test_llama_stack_configuration.py (1)
85-145: LGTM — validation coverage matches the new unified/legacy rules.The tests cover the missing-config error, unified library mode, mutual exclusion, and URL normalization with unified config.
src/models/config.py (1)
750-787: LGTM — mutual-exclusion validation is enforced at model load.The validator rejects unified+legacy mode, requires one config source in library mode, and preserves legacy path validation.
| no list-merge tarpit, keeps scalar + map overrides minimal. Implemented in | ||
| `deep_merge_list_replace()`. Confidence: 70%. | ||
|
|
||
| See [Merge semantics worked examples](#merge-semantics-worked-examples). |
There was a problem hiding this comment.
Fix the broken in-page anchor.
The target heading is ### Merge semantics — worked examples, so the current fragment does not resolve under markdownlint’s anchor rules.
One way to make the anchor stable
-See [Merge semantics worked examples](`#merge-semantics-worked-examples`).
+See [Merge semantics — worked examples](`#merge-semantics--worked-examples`).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| See [Merge semantics worked examples](#merge-semantics-worked-examples). | |
| See [Merge semantics — worked examples](`#merge-semantics--worked-examples`). |
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 147-147: Link fragments should be valid
(MD051, link-fragments)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md` at
line 147, The in-page anchor in the link "See [Merge semantics worked
examples](`#merge-semantics-worked-examples`)." doesn't match the actual heading
"### Merge semantics — worked examples"; update either the link target or the
heading so they match under markdownlint rules — e.g., change the heading to use
a plain hyphen "### Merge semantics - worked examples" or change the link to the
exact slug produced from the em-dash (percent-encoded or the renderer-specific
slug), ensuring the text in the link target and the heading "Merge semantics —
worked examples" are identical.
| **Agentic tool instruction**: | ||
| ```text | ||
| Read the "Architecture" and "Implementation Suggestions" sections of | ||
| docs/design/llama-stack-config-merge/llama-stack-config-merge.md. | ||
| Key files to create or modify: | ||
| src/models/config.py (new classes; modify LlamaStackConfiguration) | ||
| src/llama_stack_configuration.py (synthesize_configuration + helpers) | ||
| src/data/default_run.yaml (new) | ||
| src/client.py (library-mode wiring) | ||
| To verify: run a unified-mode config end-to-end via `uv run lightspeed-stack -c <config>` and confirm /v1/query succeeds. | ||
| ``` |
There was a problem hiding this comment.
Add blank lines around fenced code blocks.
These fences violate markdownlint MD031; add an empty line before and after each fenced block.
Also applies to: 310-317, 343-349, 376-381, 411-416, 438-443, 463-468, 598-626
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 275-275: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md`
around lines 274 - 284, The markdown file contains fenced code blocks that lack
surrounding blank lines and trigger markdownlint MD031; for each fenced block
(e.g., the "Agentic tool instruction" block and the other blocks at the noted
ranges like 310-317, 343-349, 376-381, 411-416, 438-443, 463-468, 598-626) add
an empty line immediately before the opening ``` and an empty line immediately
after the closing ``` so every fenced code block is separated by blank lines
from surrounding text/content.
| - **No secrets written to disk**: `apply_high_level_inference` emits | ||
| `${env.<VAR>}` references, never the resolved secret. The synthesized | ||
| `run.yaml` is safe to log path-wise; its contents only contain env | ||
| references for secrets. | ||
| - **`native_override` is raw YAML**: content is operator-controlled, so | ||
| no new injection surface — same trust model as the existing | ||
| `run.yaml`. LCORE does no template expansion other than the existing | ||
| `replace_env_vars()` step in the load pipeline. | ||
| - **Synthesized file location**: persistent known path, world-readable | ||
| by default in a container. This is acceptable because the file | ||
| contains only env-var references for secrets; operators who want | ||
| stricter filesystem permissions should tighten the mount. |
There was a problem hiding this comment.
Do not assume synthesized files contain no literal secrets.
native_override and dumb migration can carry an existing run.yaml verbatim, including literal API keys. The design should require restrictive file permissions and recommend env refs, not declare world-readable output acceptable.
Suggested wording direction
-- **No secrets written to disk**: `apply_high_level_inference` emits
- `${env.<VAR>}` references, never the resolved secret. The synthesized
- `run.yaml` is safe to log path-wise; its contents only contain env
- references for secrets.
+- **Prefer env references and protect synthesized files**:
+ `apply_high_level_inference` emits `${env.<VAR>}` references, but
+ `native_override` and migrated legacy configs may contain literal secret
+ values. Write synthesized files with restrictive permissions and document
+ env references as the recommended operator pattern.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **No secrets written to disk**: `apply_high_level_inference` emits | |
| `${env.<VAR>}` references, never the resolved secret. The synthesized | |
| `run.yaml` is safe to log path-wise; its contents only contain env | |
| references for secrets. | |
| - **`native_override` is raw YAML**: content is operator-controlled, so | |
| no new injection surface — same trust model as the existing | |
| `run.yaml`. LCORE does no template expansion other than the existing | |
| `replace_env_vars()` step in the load pipeline. | |
| - **Synthesized file location**: persistent known path, world-readable | |
| by default in a container. This is acceptable because the file | |
| contains only env-var references for secrets; operators who want | |
| stricter filesystem permissions should tighten the mount. | |
| - **Prefer env references and protect synthesized files**: | |
| `apply_high_level_inference` emits `${env.<VAR>}` references, but | |
| `native_override` and migrated legacy configs may contain literal secret | |
| values. Write synthesized files with restrictive permissions and document | |
| env references as the recommended operator pattern. | |
| - **`native_override` is raw YAML**: content is operator-controlled, so | |
| no new injection surface — same trust model as the existing | |
| `run.yaml`. LCORE does no template expansion other than the existing | |
| `replace_env_vars()` step in the load pipeline. | |
| - **Synthesized file location**: persistent known path, world-readable | |
| by default in a container. This is acceptable because the file | |
| contains only env-var references for secrets; operators who want | |
| stricter filesystem permissions should tighten the mount. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/design/llama-stack-config-merge/llama-stack-config-merge.md` around
lines 280 - 291, Update the doc to stop claiming synthesized files are safe as
world-readable and instead require restrictive file permissions and recommend
env-var references; specifically change wording around native_override,
apply_high_level_inference, replace_env_vars and run.yaml to note that
native_override and migrations may include literal secrets (e.g., API keys),
that apply_high_level_inference may not always emit only ${env.<VAR>} and
replace_env_vars is not a guarantee for all inputs, and mandate default
restrictive filesystem permissions and operator guidance to prefer env refs and
secrets management rather than declaring world-readable output acceptable.
| Command: | ||
| ```bash | ||
| export OPENAI_API_KEY=<redacted> | ||
| export E2E_OPENAI_MODEL=gpt-4o-mini | ||
| uv run lightspeed-stack -c docs/design/llama-stack-config-merge/poc-evidence/lightspeed-stack-unified-library.yaml | ||
| ``` |
There was a problem hiding this comment.
Minor: blank line around fenced block (MD031).
markdownlint flags missing blank lines around the fenced bash block. Trivial; evidence doc slated for removal pre-merge.
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 4-4: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/design/llama-stack-config-merge/poc-evidence/library-mode/README.md`
around lines 3 - 8, The fenced bash code block starting with "```bash" in the
README.md is missing blank lines before and/or after (markdownlint MD031); add a
single blank line immediately above the opening ```bash line and a single blank
line immediately below the closing ``` line so the fenced block is separated
from surrounding text and satisfies MD031.
| shields: | ||
| - provider_id: llama-guard | ||
| provider_shield_id: openai/gpt-4o-mini | ||
| shield_id: llama-guard |
There was a problem hiding this comment.
Suspicious provider_shield_id value in evidence artifact.
The llama-guard shield lists provider_shield_id: openai/gpt-4o-mini, i.e. an OpenAI chat model id is being registered as the Llama Guard shield id. This almost certainly reflects a misconfigured native_override in the PoC input rather than a working safety shield (Llama Guard expects a guard model id such as meta-llama/Llama-Guard-3-8B). The artifact is being removed pre-merge, but worth noting so the same shape is not copied into the spec or any follow-up defaults.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@docs/design/llama-stack-config-merge/poc-evidence/library-mode/synthesized-run.yaml`
around lines 107 - 110, The shields entry has an incorrect provider_shield_id
value: replace the OpenAI chat model id in the shields list (symbols: shields,
provider_id, provider_shield_id, shield_id) with the correct Llama Guard model
id or remove the mistaken native_override that injected it; specifically, ensure
provider_shield_id uses a guard model identifier such as
"meta-llama/Llama-Guard-3-8B" (or the intended guard model) and verify the
native_override/source that produced the OpenAI id is fixed so future artifacts
don’t carry an OpenAI model as a Llama Guard shield.
| # --migrate-config runs standalone; does not load config into the singleton, | ||
| # since the input may be in legacy form and we are producing its successor. | ||
| if args.migrate_config: | ||
| # pylint: disable=import-outside-toplevel | ||
| from llama_stack_configuration import migrate_config_dumb | ||
|
|
||
| try: | ||
| migrate_config_dumb(args.run_yaml, args.config_file, args.migrate_output) | ||
| logger.info( | ||
| "Migration complete. Wrote unified config to %s", | ||
| args.migrate_output, | ||
| ) | ||
| except Exception as e: | ||
| logger.error("Migration failed: %s", e) | ||
| raise SystemExit(1) from e | ||
| return |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Migration branch looks reasonable for a spike.
Deferred import avoids pulling synthesis code into the hot boot path, and broad except Exception is acceptable at the CLI boundary. One small nit: consider logger.exception(...) instead of logger.error("Migration failed: %s", e) so the traceback is captured — helpful when users report migration failures.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lightspeed_stack.py` around lines 150 - 165, The except block for the
migrate_config_dumb call should log the full traceback; replace the
logger.error("Migration failed: %s", e) call with logger.exception("Migration
failed") so the stack trace is captured when args.migrate_config triggers
migrate_config_dumb (refer to migrate_config_dumb and the surrounding try/except
using logger).
| def apply_high_level_inference( | ||
| ls_config: dict[str, Any], inference: dict[str, Any] | ||
| ) -> None: | ||
| """Apply a high-level `inference` block into `ls_config['providers']['inference']`. | ||
|
|
||
| Replaces the inference provider list entirely. Use `native_override` for | ||
| additive tweaks. | ||
|
|
||
| Parameters: | ||
| ls_config: Llama Stack config dict (modified in place). | ||
| inference: High-level inference section as a dict (with 'providers' list). | ||
| """ | ||
| providers_out: list[dict[str, Any]] = [] | ||
| for provider in inference.get("providers", []): | ||
| p_type = provider["type"] | ||
| entry: dict[str, Any] = { | ||
| "provider_id": p_type, | ||
| "provider_type": PROVIDER_TYPE_MAP[p_type], | ||
| } | ||
| cfg: dict[str, Any] = {} | ||
| if provider.get("api_key_env"): | ||
| cfg["api_key"] = f"${{env.{provider['api_key_env']}}}" | ||
| if provider.get("allowed_models"): | ||
| cfg["allowed_models"] = provider["allowed_models"] | ||
| if provider.get("extra"): | ||
| cfg.update(provider["extra"]) | ||
| if cfg: | ||
| entry["config"] = cfg | ||
| providers_out.append(entry) | ||
|
|
||
| if "providers" not in ls_config: | ||
| ls_config["providers"] = {} | ||
| ls_config["providers"]["inference"] = providers_out |
There was a problem hiding this comment.
Do not use high-level provider type as the emitted provider_id.
For sentence_transformers, this emits provider_id: sentence_transformers, while the baseline and model/vector-store references use sentence-transformers. That leaves synthesized configs with references to a provider ID that no longer exists after providers.inference is replaced.
Normalize emitted provider IDs separately from provider types
PROVIDER_TYPE_MAP: dict[str, str] = {
"openai": "remote::openai",
"sentence_transformers": "inline::sentence-transformers",
@@
"vllm_rhel_ai": "remote::vllm",
}
+
+PROVIDER_ID_MAP: dict[str, str] = {
+ "openai": "openai",
+ "sentence_transformers": "sentence-transformers",
+ "azure": "azure",
+ "vertexai": "vertexai",
+ "watsonx": "watsonx",
+ "vllm_rhaiis": "vllm-rhaiis",
+ "vllm_rhel_ai": "vllm-rhel-ai",
+}
@@
p_type = provider["type"]
entry: dict[str, Any] = {
- "provider_id": p_type,
+ "provider_id": PROVIDER_ID_MAP[p_type],
"provider_type": PROVIDER_TYPE_MAP[p_type],
}| ls_config = synthesize_configuration(lcs_config, config_file_dir=config_file_dir) | ||
| logger.info("Writing synthesized Llama Stack configuration to %s", output_file) | ||
| Path(output_file).parent.mkdir(parents=True, exist_ok=True) | ||
| with open(output_file, "w", encoding="utf-8") as f: | ||
| yaml.dump(ls_config, f, Dumper=YamlDumper, default_flow_style=False) |
There was a problem hiding this comment.
Write synthesized configs with restrictive permissions.
native_override can contain literal secrets from a migrated run.yaml, so writing with default open(..., "w") can create a world-readable file depending on umask.
Use an explicit `0600` create mode
ls_config = synthesize_configuration(lcs_config, config_file_dir=config_file_dir)
logger.info("Writing synthesized Llama Stack configuration to %s", output_file)
Path(output_file).parent.mkdir(parents=True, exist_ok=True)
- with open(output_file, "w", encoding="utf-8") as f:
+ fd = os.open(output_file, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
+ with os.fdopen(fd, "w", encoding="utf-8") as f:
yaml.dump(ls_config, f, Dumper=YamlDumper, default_flow_style=False)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ls_config = synthesize_configuration(lcs_config, config_file_dir=config_file_dir) | |
| logger.info("Writing synthesized Llama Stack configuration to %s", output_file) | |
| Path(output_file).parent.mkdir(parents=True, exist_ok=True) | |
| with open(output_file, "w", encoding="utf-8") as f: | |
| yaml.dump(ls_config, f, Dumper=YamlDumper, default_flow_style=False) | |
| ls_config = synthesize_configuration(lcs_config, config_file_dir=config_file_dir) | |
| logger.info("Writing synthesized Llama Stack configuration to %s", output_file) | |
| Path(output_file).parent.mkdir(parents=True, exist_ok=True) | |
| fd = os.open(output_file, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600) | |
| with os.fdopen(fd, "w", encoding="utf-8") as f: | |
| yaml.dump(ls_config, f, Dumper=YamlDumper, default_flow_style=False) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/llama_stack_configuration.py` around lines 825 - 829, The file write is
using open(output_file, "w") which can create world-readable files; change the
write to create the file with explicit restrictive permissions (mode 0o600).
After ensuring the parent directory is created (the existing
Path(output_file).parent.mkdir call is fine), open the file using a low-level
create with os.open(...) with flags O_WRONLY|O_CREAT|O_TRUNC and mode 0o600,
wrap the returned fd with os.fdopen to get a text file object, then use
yaml.dump(ls_config, f, Dumper=YamlDumper, default_flow_style=False) to write;
reference the variables/functions ls_config, output_file,
synthesize_configuration, and YamlDumper so you update the existing block that
writes the synthesized configuration.
| cfg.library_client_config_path = None | ||
| with pytest.raises( | ||
| ValueError, | ||
| match="Configuration problem: library_client_config_path is not set", | ||
| match="neither .*unified.* nor .*legacy.* is set", | ||
| ): | ||
| client = AsyncLlamaStackClientHolder() | ||
| await client.load(cfg) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm that load() (or a helper it calls) re-runs validation raising the expected message.
rg -nP --type=py -C4 '\b(class\s+AsyncLlamaStackClientHolder|async\s+def\s+load)\b'
rg -nP --type=py -C2 'neither\s+.*unified.*\s+nor\s+.*legacy'Repository: lightspeed-core/lightspeed-stack
Length of output: 1339
🏁 Script executed:
# Read the load() method and surrounding code
sed -n '40,100p' src/client.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 2701
🏁 Script executed:
# Read the complete test function context
grep -n "def test_" tests/unit/test_client.py | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 448
🏁 Script executed:
# Find the specific test around line 84-90
sed -n '75,95p' tests/unit/test_client.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 830
🏁 Script executed:
# Search for LlamaStackConfiguration class and check_llama_stack_model validator
rg -nP --type=py 'class\s+LlamaStackConfiguration|def\s+check_llama_stack_model' -A 20Repository: lightspeed-core/lightspeed-stack
Length of output: 3016
Consider testing validation at the model layer instead of relying on load() runtime checks.
The test passes, but not through Pydantic re-validation as suggested. The load() method does not re-run the check_llama_stack_model validator. Instead, the ValueError is raised by explicit runtime checks inside _load_library_client() (src/client.py:73-77). While this works today, the test would silently pass if those runtime checks are ever removed or refactored. A more robust test would construct the invalid config directly via LlamaStackConfiguration(...) inside pytest.raises to pin the expected validation behavior at the model layer where the check_llama_stack_model validator should enforce it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/test_client.py` around lines 84 - 90, Replace the current test
that relies on AsyncLlamaStackClientHolder().load(...) to trigger a runtime
ValueError with a direct validation test against the Pydantic model: construct
an invalid LlamaStackConfiguration(...) instance (setting
library_client_config_path=None and leaving both unified and legacy unset)
inside pytest.raises(ValueError, match="neither .*unified.* nor .*legacy.* is
set") so the check_llama_stack_model validator on the LlamaStackConfiguration
model is exercised; this ensures validation fails at model construction rather
than via AsyncLlamaStackClientHolder.load or _load_library_client runtime
checks.
| """Unit tests for unified-mode synthesizer and migration tool (LCORE-836).""" | ||
|
|
||
| from pathlib import Path | ||
| from typing import Any | ||
|
|
||
| import pytest | ||
| import yaml | ||
|
|
||
| from llama_stack_configuration import ( | ||
| PROVIDER_TYPE_MAP, | ||
| apply_high_level_inference, | ||
| deep_merge_list_replace, | ||
| load_default_baseline, | ||
| migrate_config_dumb, | ||
| synthesize_configuration, | ||
| ) | ||
|
|
||
| # ============================================================================= | ||
| # deep_merge_list_replace | ||
| # ============================================================================= | ||
|
|
||
|
|
||
| def test_deep_merge_scalar_replace() -> None: | ||
| """Overlay scalar replaces base scalar.""" | ||
| result = deep_merge_list_replace({"a": 1}, {"a": 2}) | ||
| assert result == {"a": 2} | ||
|
|
||
|
|
||
| def test_deep_merge_adds_new_keys() -> None: | ||
| """Overlay keys not in base are added.""" | ||
| result = deep_merge_list_replace({"a": 1}, {"b": 2}) | ||
| assert result == {"a": 1, "b": 2} | ||
|
|
||
|
|
||
| def test_deep_merge_nested_map_merges() -> None: | ||
| """Nested maps merge recursively.""" | ||
| base = {"a": {"x": 1, "y": 2}} | ||
| overlay = {"a": {"y": 20, "z": 30}} | ||
| result = deep_merge_list_replace(base, overlay) | ||
| assert result == {"a": {"x": 1, "y": 20, "z": 30}} | ||
|
|
||
|
|
||
| def test_deep_merge_list_replaces() -> None: | ||
| """Lists are replaced, not appended.""" | ||
| base = {"items": [1, 2, 3]} | ||
| overlay = {"items": [9]} | ||
| result = deep_merge_list_replace(base, overlay) | ||
| assert result == {"items": [9]} | ||
|
|
||
|
|
||
| def test_deep_merge_does_not_mutate_inputs() -> None: | ||
| """Neither base nor overlay are mutated.""" | ||
| base = {"a": {"x": 1}} | ||
| overlay = {"a": {"x": 2}} | ||
| result = deep_merge_list_replace(base, overlay) | ||
| assert base == {"a": {"x": 1}} | ||
| assert overlay == {"a": {"x": 2}} | ||
| assert result == {"a": {"x": 2}} | ||
|
|
||
|
|
||
| def test_deep_merge_type_mismatch_replaces() -> None: | ||
| """If overlay type != base type at same key, overlay wins.""" | ||
| # base is map, overlay is scalar | ||
| result = deep_merge_list_replace({"a": {"x": 1}}, {"a": "replaced"}) | ||
| assert result == {"a": "replaced"} | ||
|
|
||
|
|
||
| # ============================================================================= | ||
| # apply_high_level_inference | ||
| # ============================================================================= | ||
|
|
||
|
|
||
| def test_apply_high_level_inference_single_provider() -> None: | ||
| """Single provider with api_key_env and allowed_models.""" | ||
| ls_config: dict[str, Any] = {} | ||
| inference = { | ||
| "providers": [ | ||
| { | ||
| "type": "openai", | ||
| "api_key_env": "OPENAI_API_KEY", | ||
| "allowed_models": ["gpt-4o-mini"], | ||
| } | ||
| ] | ||
| } | ||
| apply_high_level_inference(ls_config, inference) | ||
| assert ls_config["providers"]["inference"] == [ | ||
| { | ||
| "provider_id": "openai", | ||
| "provider_type": "remote::openai", | ||
| "config": { | ||
| "api_key": "${env.OPENAI_API_KEY}", | ||
| "allowed_models": ["gpt-4o-mini"], | ||
| }, | ||
| } | ||
| ] | ||
|
|
||
|
|
||
| def test_apply_high_level_inference_replaces_existing() -> None: | ||
| """Providers list is replaced entirely, not merged.""" | ||
| ls_config = {"providers": {"inference": [{"provider_id": "stale"}]}} | ||
| apply_high_level_inference( | ||
| ls_config, {"providers": [{"type": "sentence_transformers"}]} | ||
| ) | ||
| assert ls_config["providers"]["inference"] == [ | ||
| { | ||
| "provider_id": "sentence_transformers", | ||
| "provider_type": "inline::sentence-transformers", | ||
| } | ||
| ] | ||
|
|
||
|
|
||
| def test_apply_high_level_inference_extra_merged() -> None: | ||
| """`extra` dict fields merge into emitted config.""" | ||
| ls_config: dict[str, Any] = {} | ||
| inference = { | ||
| "providers": [ | ||
| { | ||
| "type": "vertexai", | ||
| "extra": {"project_id": "my-project", "location": "us-central1"}, | ||
| } | ||
| ] | ||
| } | ||
| apply_high_level_inference(ls_config, inference) | ||
| assert ls_config["providers"]["inference"][0]["config"] == { | ||
| "project_id": "my-project", | ||
| "location": "us-central1", | ||
| } | ||
|
|
||
|
|
||
| def test_provider_type_map_covers_all_literals() -> None: | ||
| """Every Literal value declared on UnifiedInferenceProvider.type has a mapping.""" | ||
| # pylint: disable=import-outside-toplevel | ||
| from models.config import UnifiedInferenceProvider | ||
|
|
||
| literal_values = ( | ||
| UnifiedInferenceProvider.model_fields[ # pylint: disable=unsubscriptable-object | ||
| "type" | ||
| ].annotation.__args__ | ||
| ) | ||
| for value in literal_values: | ||
| assert value in PROVIDER_TYPE_MAP | ||
|
|
||
|
|
||
| # ============================================================================= | ||
| # synthesize_configuration | ||
| # ============================================================================= | ||
|
|
||
|
|
||
| MINIMAL_BASELINE: dict[str, Any] = { | ||
| "version": 2, | ||
| "apis": ["inference"], | ||
| "providers": { | ||
| "inference": [ | ||
| {"provider_id": "stock", "provider_type": "remote::stock", "config": {}} | ||
| ] | ||
| }, | ||
| "safety": {"default_shield_id": "llama-guard"}, | ||
| } | ||
|
|
||
|
|
||
| def test_synthesize_errors_without_config() -> None: | ||
| """Without llama_stack.config present, synthesize raises ValueError.""" | ||
| with pytest.raises(ValueError, match="llama_stack.config"): | ||
| synthesize_configuration({"llama_stack": {}}) | ||
|
|
||
|
|
||
| def test_synthesize_uses_default_baseline_when_no_profile() -> None: | ||
| """With neither profile nor native_override, result is the baseline (through enrichment).""" | ||
| lcs_config: dict[str, Any] = {"llama_stack": {"config": {}}} | ||
| result = synthesize_configuration(lcs_config, default_baseline=MINIMAL_BASELINE) | ||
| # Baseline preserved (enrichment is a no-op without byok_rag/rag/okp) | ||
| assert result["safety"] == {"default_shield_id": "llama-guard"} | ||
| assert result["providers"]["inference"] == [ | ||
| {"provider_id": "stock", "provider_type": "remote::stock", "config": {}} | ||
| ] | ||
|
|
||
|
|
||
| def test_synthesize_loads_profile_from_path(tmp_path: Path) -> None: | ||
| """Profile path is loaded as the baseline.""" | ||
| profile_data = { | ||
| "version": 2, | ||
| "apis": ["inference"], | ||
| "providers": {"inference": [{"provider_id": "profile_p"}]}, | ||
| } | ||
| profile_path = tmp_path / "profile.yaml" | ||
| profile_path.write_text(yaml.dump(profile_data)) | ||
|
|
||
| lcs_config: dict[str, Any] = { | ||
| "llama_stack": {"config": {"profile": str(profile_path)}} | ||
| } | ||
| result = synthesize_configuration(lcs_config) | ||
| assert result["providers"]["inference"] == [{"provider_id": "profile_p"}] | ||
|
|
||
|
|
||
| def test_synthesize_profile_relative_path(tmp_path: Path) -> None: | ||
| """Relative profile path resolves against config_file_dir.""" | ||
| profile_data = {"version": 2} | ||
| (tmp_path / "p.yaml").write_text(yaml.dump(profile_data)) | ||
| lcs_config: dict[str, Any] = {"llama_stack": {"config": {"profile": "p.yaml"}}} | ||
| result = synthesize_configuration(lcs_config, config_file_dir=tmp_path) | ||
| assert result == {"version": 2} | ||
|
|
||
|
|
||
| def test_synthesize_applies_high_level_inference() -> None: | ||
| """High-level inference section expands into native providers list.""" | ||
| lcs_config: dict[str, Any] = { | ||
| "llama_stack": { | ||
| "config": { | ||
| "inference": { | ||
| "providers": [{"type": "openai", "api_key_env": "OPENAI_API_KEY"}] | ||
| } | ||
| } | ||
| } | ||
| } | ||
| result = synthesize_configuration(lcs_config, default_baseline=MINIMAL_BASELINE) | ||
| assert result["providers"]["inference"] == [ | ||
| { | ||
| "provider_id": "openai", | ||
| "provider_type": "remote::openai", | ||
| "config": {"api_key": "${env.OPENAI_API_KEY}"}, | ||
| } | ||
| ] | ||
|
|
||
|
|
||
| def test_synthesize_native_override_deep_merges() -> None: | ||
| """native_override deep-merges on top (scalar path).""" | ||
| lcs_config: dict[str, Any] = { | ||
| "llama_stack": { | ||
| "config": { | ||
| "native_override": { | ||
| "safety": {"default_shield_id": "overridden"}, | ||
| } | ||
| } | ||
| } | ||
| } | ||
| result = synthesize_configuration(lcs_config, default_baseline=MINIMAL_BASELINE) | ||
| assert result["safety"]["default_shield_id"] == "overridden" | ||
|
|
||
|
|
||
| def test_synthesize_native_override_list_replaces() -> None: | ||
| """native_override replaces lists, not appends.""" | ||
| lcs_config: dict[str, Any] = { | ||
| "llama_stack": { | ||
| "config": { | ||
| "native_override": { | ||
| "providers": { | ||
| "inference": [{"provider_id": "override-only"}], | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| result = synthesize_configuration(lcs_config, default_baseline=MINIMAL_BASELINE) | ||
| assert result["providers"]["inference"] == [{"provider_id": "override-only"}] | ||
|
|
||
|
|
||
| def test_synthesize_precedence_override_beats_high_level() -> None: | ||
| """When high-level and native_override both touch the same path, override wins.""" | ||
| lcs_config: dict[str, Any] = { | ||
| "llama_stack": { | ||
| "config": { | ||
| "inference": {"providers": [{"type": "openai"}]}, | ||
| "native_override": { | ||
| "providers": { | ||
| "inference": [{"provider_id": "override-wins"}], | ||
| } | ||
| }, | ||
| } | ||
| } | ||
| } | ||
| result = synthesize_configuration(lcs_config, default_baseline=MINIMAL_BASELINE) | ||
| assert result["providers"]["inference"] == [{"provider_id": "override-wins"}] | ||
|
|
||
|
|
||
| def test_synthesize_preserves_env_var_refs_verbatim() -> None: | ||
| """Secrets stay as ${env.FOO} references; never resolved into the output.""" | ||
| lcs_config: dict[str, Any] = { | ||
| "llama_stack": { | ||
| "config": { | ||
| "inference": { | ||
| "providers": [{"type": "openai", "api_key_env": "OPENAI_API_KEY"}] | ||
| } | ||
| } | ||
| } | ||
| } | ||
| result = synthesize_configuration(lcs_config, default_baseline=MINIMAL_BASELINE) | ||
| api_key_value = result["providers"]["inference"][0]["config"]["api_key"] | ||
| assert api_key_value == "${env.OPENAI_API_KEY}" | ||
|
|
||
|
|
||
| # ============================================================================= | ||
| # Built-in default baseline loader | ||
| # ============================================================================= | ||
|
|
||
|
|
||
| def test_load_default_baseline_returns_dict() -> None: | ||
| """The shipped default baseline loads as a dict with expected keys.""" | ||
| baseline = load_default_baseline() | ||
| assert isinstance(baseline, dict) | ||
| assert baseline.get("version") == 2 | ||
| assert "providers" in baseline | ||
|
|
||
|
|
||
| # ============================================================================= | ||
| # migrate_config_dumb | ||
| # ============================================================================= | ||
|
|
||
|
|
||
| def test_migrate_dumb_lossless_roundtrip(tmp_path: Path) -> None: | ||
| """Dumb migration places full run.yaml under config.native_override.""" | ||
| run_yaml_content = { | ||
| "version": 2, | ||
| "apis": ["inference"], | ||
| "providers": {"inference": [{"provider_id": "opa"}]}, | ||
| } | ||
| lcs_yaml_content = { | ||
| "name": "LCS", | ||
| "llama_stack": { | ||
| "use_as_library_client": True, | ||
| "library_client_config_path": str(tmp_path / "run.yaml"), | ||
| }, | ||
| } | ||
|
|
||
| run_yaml_path = tmp_path / "run.yaml" | ||
| run_yaml_path.write_text(yaml.dump(run_yaml_content)) | ||
| lcs_yaml_path = tmp_path / "lightspeed-stack.yaml" | ||
| lcs_yaml_path.write_text(yaml.dump(lcs_yaml_content)) | ||
| output_path = tmp_path / "unified.yaml" | ||
|
|
||
| migrate_config_dumb(str(run_yaml_path), str(lcs_yaml_path), str(output_path)) | ||
|
|
||
| result = yaml.safe_load(output_path.read_text()) | ||
|
|
||
| # Legacy path is gone | ||
| assert "library_client_config_path" not in result["llama_stack"] | ||
| # Unified config has full run.yaml under native_override | ||
| assert result["llama_stack"]["config"]["native_override"] == run_yaml_content | ||
| # Other fields preserved | ||
| assert result["llama_stack"]["use_as_library_client"] is True | ||
| assert result["name"] == "LCS" | ||
|
|
||
|
|
||
| def test_migrate_then_synthesize_reproduces_run_yaml(tmp_path: Path) -> None: | ||
| """End-to-end round trip: run.yaml → migrate → synthesize → original content.""" | ||
| run_yaml_content = { | ||
| "version": 2, | ||
| "apis": ["inference", "vector_io"], | ||
| "providers": { | ||
| "inference": [{"provider_id": "rt", "provider_type": "remote::rt"}] | ||
| }, | ||
| "safety": {"default_shield_id": "guard"}, | ||
| } | ||
| lcs_yaml_content = { | ||
| "name": "LCS", | ||
| "llama_stack": { | ||
| "use_as_library_client": True, | ||
| "library_client_config_path": str(tmp_path / "run.yaml"), | ||
| }, | ||
| } | ||
| run_yaml_path = tmp_path / "run.yaml" | ||
| run_yaml_path.write_text(yaml.dump(run_yaml_content)) | ||
| lcs_yaml_path = tmp_path / "lightspeed-stack.yaml" | ||
| lcs_yaml_path.write_text(yaml.dump(lcs_yaml_content)) | ||
| output_path = tmp_path / "unified.yaml" | ||
| migrate_config_dumb(str(run_yaml_path), str(lcs_yaml_path), str(output_path)) | ||
|
|
||
| unified = yaml.safe_load(output_path.read_text()) | ||
| synthesized = synthesize_configuration(unified) | ||
|
|
||
| # Synthesized == original run.yaml (lossless round trip in dumb mode) | ||
| assert synthesized == run_yaml_content |
There was a problem hiding this comment.
Remove this PoC-only test file before merge.
The PR objectives say PoC code and tests are expected to be removed before merging, with only docs/spec remaining. This new unit suite keeps the spike synthesizer/migration APIs in the main test surface and should be deleted before release.
If this file is intentionally retained, also update it so it does not codify in-place mutation behavior around apply_high_level_inference and annotate MINIMAL_BASELINE with Final. As per coding guidelines, “Avoid in-place parameter modification anti-patterns; return new data structures instead” and “Use Final[type] as type hint for all constants.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/test_llama_stack_synthesize.py` around lines 1 - 371, This PoC
test suite should not be merged as-is; either delete the whole test file (remove
the new tests referencing
synthesize_configuration/migrate_config_dumb/apply_high_level_inference) or, if
you intentionally keep it, change tests to avoid relying on in-place mutation by
using the functional return value of apply_high_level_inference (don't assert
that the passed ls_config instance was mutated) and annotate the
MINIMAL_BASELINE constant with Final[dict[str, Any]] and update any tests that
mutate it to operate on a copy (e.g., use copy.deepcopy) so the baseline is not
altered in-place.
…entation) Incorporates reviewer request: the work on this feature kicks off with a Story that authors the behave `.feature` files for unified mode BEFORE the feature is implemented. The intent is to keep test-shape authorship free of implementation bias and to surface any architectural gaps early. Adds two JIRAs to the spike doc's proposed-JIRAs list, bringing the total from 7 to 9: 1. LCORE-???? (Story, inserted first) — E2E feature files for unified mode (no step implementation). Authors Gherkin scenarios against the spec doc's R1..R11 requirements. Explicitly forbids reading the implementation JIRAs or the synthesizer code while authoring. behave marks resulting steps as undefined; test-e2e still green (undefined scenarios are reported, not failed). 2. LCORE-???? (Task, inserted after the migrate-e2e-configs Story) — Implement behave step definitions for the kickoff feature files. Takes the Gherkin as-is (does not water down the tests to fit implementation). Blocked by the kickoff ticket plus the feature- implementation tickets (schema + synthesizer, migration tool, LS container entrypoint). Filing both tickets together (rather than filing only the kickoff and "letting the step-def ticket appear later") makes the dependency chain explicit from the start and ensures the step-def work is not forgotten. No other JIRAs change scope. The PR template is updated to reflect the new count and to widen the "Full JIRA list" link range to cover both new sections.
LCORE-836: merge run.yaml into lightspeed-stack.yaml (spike)
Spike deliverable for LCORE-836 — single-file operator-facing configuration for Lightspeed Core.
What's in this PR
Design docs (
docs/design/llama-stack-config-merge/):llama-stack-config-merge-spike.md(use the "Outline" button) — the spike: research, design alternatives, PoC results, reviewer decisions (S1–S4, T1–T9), proposed JIRAsllama-stack-config-merge.md(use the "Outline" button) — feature spec — changeable based on final decisions, will be kept in the repo long-termPoC code (will stay in-branch for review; see "Before merge" below):
src/models/config.py—UnifiedLlamaStackConfig,UnifiedInferenceSection,UnifiedInferenceProvider+ modifiedLlamaStackConfiguration(mutual-exclusion validator)src/llama_stack_configuration.py—synthesize_configurationpipeline,deep_merge_list_replace,apply_high_level_inference,load_default_baseline,synthesize_to_file,migrate_config_dumb; CLI auto-detectsrc/data/default_run.yaml— shipped default baseline (note: needs slimming before production; see findings Python project structure #4)src/client.py— library-mode wiring branches on unified vs legacysrc/lightspeed_stack.py—--migrate-config/--run-yaml/--migrate-outputflagstest.containerfile— copiessrc/data/into the LS containertests/unit/test_llama_stack_synthesize.py— 22 new tests (synth + migration round-trip)config: Nonefield in dump expectations and error-message regexPoC evidence (
docs/design/llama-stack-config-merge/poc-evidence/):lightspeed-stack.yamlused for validationrun.yamlproduced at boot/v1/queryresponse JSONMain findings
lightspeed-stack.yamlwithllama_stack.config(no externalrun.yaml) boots LCORE in library mode, serves/v1/query, and appliesnative_overridevalues correctly. 2098 unit tests pass including a lossless migrate-then-synthesize round-trip.llama_stack.library_client_config_pathand the newllama_stack.configare mutually exclusive and coexist through a deprecation window. Migration tool (lightspeed-stack --migrate-config) produces a lossless unified file from the legacy pair.AsyncLlamaStackAsLibraryClientinllama-stack. Consequence: library mode must write the synthesized file to disk. Noted in Decision T9.${env.EXTERNAL_PROVIDERS_DIR}requirement from the repo'srun.yaml(the PoC uses it verbatim). Implementation JIRA should slim the shipped baseline or change the reference to a default-aware form like${env.EXTERNAL_PROVIDERS_DIR:=~/.llama/providers.d}. Flagged in the spike doc's "Surprise discovered during PoC" section.sentence_transformers(underscore) differs from the baseline'ssentence-transformers(hyphen) that other parts of the baseline reference. Implementation JIRA to resolve (most likely: change the Literal to use hyphenated forms matching the LS ecosystem). Flagged in the spike doc.For reviewers
Strategic decisions — @sbunciak (PM) / @tisnik:
Technical decisions — @tisnik / team leads:
baselinefield (added during PoC).tekton/)Proposed JIRAs — review scope and ordering:
Doc structure note: The decisions and proposed-JIRAs sections of the spike doc are where your input is needed. They link to background sections later in the doc (current architecture, design alternatives, merge-semantics worked examples, backward-compat scope) — read those if you need more context on a specific point, but it is optional.
Before merge
Per the spike howto, step 10: once decisions are confirmed and JIRAs are filed (via
dev-tools/file-jiras.sh), the following are expected to be removed from this branch prior to merge:src/(schema, synthesizer, library-mode wiring, migration tool) — becomes implementation in the filed JIRAstests/— becomes part of the implementation JIRAssrc/data/default_run.yaml— becomes part of the implementation JIRAtest.containerfilechange — becomes part of the implementation JIRAdocs/design/llama-stack-config-merge/poc-evidence/The spike doc and the spec doc stay in the repo.
Tools used to create PR
Related Tickets & Documents
Checklist