Implement Cognitive Layer (Phi-3/Gemma Integration)#729
Conversation
- Implemented `OrtAdapter` for ONNX LLM inference. - Implemented `OrtCognitiveAdapter` for high-level thinking. - Updated `ModelManager` to support Phi-3 ONNX models. - Implemented "Thought Loop" orchestrating perception, ethical judgment, cognition, and simulation. - Integrated `thought` command into `synapse-cli` for testing. - Exposed new components in `synapse-infra` and `synapse-cognition`. Co-authored-by: iberi22 <10615454+iberi22@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
📝 WalkthroughWalkthroughThis PR implements the Cognitive Layer with a complete 4-phase Thought Loop (Perception, Judgment, Cognition, Simulation) that integrates Phi-3 ONNX models via ONNX Runtime. It adds a CLI Changes
Sequence DiagramsequenceDiagram
actor User
participant CLI as synapse-cli
participant ThoughtLoop
participant Retina as HolographicRetina
participant OrtAdapter as OrtAdapter (LLM)
participant CognitivePort as OrtCognitiveAdapter
User->>CLI: thought "input text" --ort
CLI->>OrtAdapter: load Phi-3 ONNX model
activate OrtAdapter
OrtAdapter-->>CLI: ready
deactivate OrtAdapter
CLI->>ThoughtLoop: cycle("input text")
activate ThoughtLoop
Note over ThoughtLoop: Phase 1: Perception
ThoughtLoop->>Retina: perceive(input_vector)
Retina-->>ThoughtLoop: tensor
Note over ThoughtLoop: Phase 2: Judgment
ThoughtLoop->>Retina: detect_suffering(latent)
Retina-->>ThoughtLoop: is_suffering, entropy_level
ThoughtLoop->>ThoughtLoop: map entropy → ethical_state
Note over ThoughtLoop: Phase 3: Cognition
ThoughtLoop->>CognitivePort: think(input, system_prompt)
activate CognitivePort
CognitivePort->>OrtAdapter: generate(structured_prompt, 512 tokens)
OrtAdapter->>OrtAdapter: tokenize → infer → decode
OrtAdapter-->>CognitivePort: generated text
CognitivePort-->>ThoughtLoop: Thought
deactivate CognitivePort
Note over ThoughtLoop: Phase 4: Simulation
ThoughtLoop->>CognitivePort: analyze(thought.content)
activate CognitivePort
CognitivePort->>OrtAdapter: generate(entropy_prompt, 10 tokens)
OrtAdapter-->>CognitivePort: entropy score
deactivate CognitivePort
CognitivePort-->>ThoughtLoop: simulated_entropy
ThoughtLoop->>ThoughtLoop: entropy_reduction = max(0, original - simulated)
ThoughtLoop-->>CLI: Thought {content, confidence, entropy_reduction}
deactivate ThoughtLoop
CLI->>User: print results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a foundational cognitive layer to the system, enabling it to process information and generate thoughts using Small Language Models (SLMs). It establishes a "Thought Loop" orchestrator that integrates perception, ethical judgment, cognition, and simulation, allowing the system to reason and act in alignment with ethical constraints. The changes significantly advance the system's autonomous reasoning capabilities by providing a flexible and extensible framework for cognitive processing. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces the Cognitive Layer, a significant new feature integrating SLMs like Phi-3 for thought generation, adding a thought command to the CLI and orchestrating a perception-cognition loop. While the architecture is sound, there are significant security concerns regarding prompt injection in the OrtCognitiveAdapter due to direct concatenation of untrusted user input and unvalidated LLM output into prompts. Implementing robust input sanitization and structured prompt handling is crucial to mitigate these risks. Furthermore, the review suggests improvements for robustness, performance, and maintainability, such as addressing potential panics from unsafe unwrap calls, refactoring duplicated code in the ModelManager, implementing KV caching for the OrtAdapter, and replacing magic numbers with named constants.
| let full_prompt = format!( | ||
| "<|system|>\n{}<|end|>\n<|user|>\n{}<|end|>\n<|assistant|>", | ||
| system_prompt, context | ||
| ); |
There was a problem hiding this comment.
The think function is vulnerable to prompt injection because it directly concatenates untrusted user input (context) into the system prompt. An attacker can provide a specially crafted input to manipulate the LLM's behavior, potentially bypassing ethical constraints or generating malicious content. This is a critical issue as the system's core value proposition is guided by these ethical constraints.
| let full_prompt = format!( | |
| "<|system|>\n{}<|end|>\n<|user|>\n{}<|end|>\n<|assistant|>", | |
| system_prompt, context | |
| ); | |
| let full_prompt = format!( | |
| "<|system|>\n{}<|end|>\n<|user|>\n{}<|end|>\n<|assistant|>", | |
| system_prompt, context.replace("<|end|>", "") | |
| ); |
| model_path.to_str().unwrap().to_string(), | ||
| Some(tok_path.to_str().unwrap().to_string()) |
There was a problem hiding this comment.
The use of .unwrap() on to_str() can cause the program to panic if the model or tokenizer path contains non-UTF-8 characters. This is a possibility on some filesystems and would crash the application. It's safer to handle this potential error gracefully by propagating it.
| model_path.to_str().unwrap().to_string(), | |
| Some(tok_path.to_str().unwrap().to_string()) | |
| model_path.to_str().context("Model path contains invalid UTF-8")?.to_string(), | |
| Some(tok_path.to_str().context("Tokenizer path contains invalid UTF-8")?.to_string()) |
| for _ in 0..max_tokens { | ||
| let seq_len = input_ids_vec.len(); | ||
|
|
||
| // Prepare inputs for Phi-3 ONNX (common format) | ||
| let shape = vec![1, seq_len]; | ||
| let input_ids_val = ort::value::Value::from_array((shape.clone(), input_ids_vec.clone())) | ||
| .map_err(|e| Error::System(format!("ORT value error: {}", e)))?; | ||
|
|
||
| let attention_mask_vec = vec![1i64; seq_len]; | ||
| let attention_mask_val = ort::value::Value::from_array((shape, attention_mask_vec)) | ||
| .map_err(|e| Error::System(format!("ORT value error: {}", e)))?; | ||
|
|
||
| // Run inference | ||
| // Note: This is non-KV-cached version. | ||
| let outputs = { | ||
| let ins = inputs![ | ||
| "input_ids" => input_ids_val, | ||
| "attention_mask" => attention_mask_val, | ||
| ]; | ||
|
|
||
| let res = session.run(ins); | ||
| match res { | ||
| Ok(out) => out, | ||
| Err(e) => { | ||
| warn!("ORT inference with attention_mask failed, retrying with only input_ids: {}", e); | ||
| let shape = vec![1, seq_len]; | ||
| let input_ids_val_retry = ort::value::Value::from_array((shape, input_ids_vec.clone())) | ||
| .map_err(|e| Error::System(format!("ORT value error: {}", e)))?; | ||
|
|
||
| let ins_retry = inputs![ | ||
| "input_ids" => input_ids_val_retry, | ||
| ]; | ||
|
|
||
| session.run(ins_retry).map_err(|e2| Error::System(format!("ORT inference failed: {}", e2)))? | ||
| } | ||
| } | ||
| }; | ||
|
|
||
| // Get logits | ||
| let logits_value = outputs.get("logits") | ||
| .or_else(|| outputs.get("output")) // Some models name it 'output' | ||
| .ok_or_else(|| Error::System("No logits found in model output".to_string()))?; | ||
|
|
||
| let (shape, logits_data) = logits_value.try_extract_tensor::<f32>() | ||
| .map_err(|e| Error::System(format!("Failed to extract logits: {}", e)))?; | ||
|
|
||
| // Logits shape: [batch, seq, vocab] | ||
| if shape.len() < 3 { | ||
| return Err(Error::System(format!("Unexpected logits shape: {:?}", shape))); | ||
| } | ||
|
|
||
| let vocab_size = shape[2] as usize; | ||
| let last_token_logits = &logits_data[(seq_len - 1) * vocab_size .. seq_len * vocab_size]; | ||
|
|
||
| // Greedy selection for now | ||
| let next_token = last_token_logits | ||
| .iter() | ||
| .enumerate() | ||
| .max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap()) | ||
| .map(|(i, _)| i as i64) | ||
| .unwrap(); | ||
|
|
||
| // Check for EOS | ||
| if next_token == 2 || next_token == 32000 || next_token == 32007 { | ||
| break; | ||
| } | ||
|
|
||
| generated_tokens.push(next_token as u32); | ||
| input_ids_vec.push(next_token); | ||
|
|
||
| if generated_tokens.len() >= max_tokens { | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
The current token generation loop is highly inefficient because it re-processes the entire input sequence for each new token. This leads to O(n^2) complexity where n is the sequence length, causing very slow generation for longer outputs. The comment on line 90 correctly notes that this is a non-KV-cached version.
To significantly improve performance, you should implement KV caching. The typical workflow is:
- On the first run, pass the full prompt and get back the logits and the initial
past_key_values(KV cache). - For subsequent tokens, pass only the new token ID and the
past_key_valuesfrom the previous step. The model will output logits for the next token and the updatedpast_key_values. - Repeat until an EOS token is generated or
max_tokensis reached.
The ort crate supports this by allowing you to pass past_key_values as inputs and receive present_key_values as outputs.
| .enumerate() | ||
| .max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap()) | ||
| .map(|(i, _)| i as i64) | ||
| .unwrap(); |
There was a problem hiding this comment.
Using .unwrap() here is unsafe and can cause a panic if last_token_logits is empty, which would crash the current thread. While unlikely, robust code should handle this edge case. Using ok_or_else provides a safe alternative that converts the Option to a Result and allows for graceful error handling.
| .unwrap(); | |
| .ok_or_else(|| Error::System("Failed to find token with max logit".to_string()))? |
| let prompt = format!( | ||
| "<|system|>\nAnalyze the following text and return ONLY a single floating point number between 0.0 and 1.0 representing the entropy/chaos level (0.0 is perfect order, 1.0 is total chaos/suffering).<|end|>\n<|user|>\n{}<|end|>\n<|assistant|>", | ||
| text | ||
| ); | ||
|
|
||
| let response = self.llm.generate(&prompt, 10).await?; | ||
|
|
||
| let score: f32 = response.trim().parse().unwrap_or(0.5); | ||
| Ok(score.clamp(0.0, 1.0)) |
There was a problem hiding this comment.
The analyze function is vulnerable to second-order prompt injection and flawed security logic. It uses unvalidated LLM output (text) directly in a prompt for another LLM call. Furthermore, it relies on the LLM to return a single floating-point number and uses a fragile parsing logic (parse().unwrap_or(0.5)) to make security-sensitive decisions (entropy analysis for corrective guidance). An attacker can manipulate the first LLM's output to influence the second LLM's analysis and bypass security checks.
| let prompt = format!( | |
| "<|system|>\nAnalyze the following text and return ONLY a single floating point number between 0.0 and 1.0 representing the entropy/chaos level (0.0 is perfect order, 1.0 is total chaos/suffering).<|end|>\n<|user|>\n{}<|end|>\n<|assistant|>", | |
| text | |
| ); | |
| let response = self.llm.generate(&prompt, 10).await?; | |
| let score: f32 = response.trim().parse().unwrap_or(0.5); | |
| Ok(score.clamp(0.0, 1.0)) | |
| let prompt = format!( | |
| "<|system|>\nAnalyze the following text and return ONLY a single floating point number between 0.0 and 1.0 representing the entropy/chaos level (0.0 is perfect order, 1.0 is total chaos/suffering).<|end|>\n<|user|>\n{}<|end|>\n<|assistant|>", | |
| text.replace("<|end|>", "") | |
| ); |
| let phi3_onnx_path = match self.ensure_model(PHI3_ONNX_REPO, PHI3_ONNX_FILE) { | ||
| Ok(path) => Some(path), | ||
| Err(e) => { | ||
| warn!("Failed to download Phi-3 ONNX: {}", e); | ||
| None | ||
| } | ||
| }; | ||
|
|
||
| let phi3_tokenizer_path = match self.ensure_model(PHI3_ONNX_REPO, PHI3_ONNX_TOKENIZER) { | ||
| Ok(path) => Some(path), | ||
| Err(e) => { | ||
| warn!("Failed to download Phi-3 Tokenizer: {}", e); | ||
| None | ||
| } | ||
| }; |
There was a problem hiding this comment.
This block for downloading the Phi-3 model and tokenizer repeats a pattern seen earlier in the function for other models. This code duplication makes the function harder to maintain. To improve this, you could extract the logic for downloading an optional model into a private helper function.
For example, you could add a helper method to ModelManager:
fn download_optional_model(&self, repo: &str, file: &str, model_name: &str) -> Option<PathBuf> {
match self.ensure_model(repo, file) {
Ok(path) => Some(path),
Err(e) => {
warn!("Failed to download {}: {}", model_name, e);
None
}
}
}This would make the call sites much cleaner and reduce redundancy.
| .unwrap(); | ||
|
|
||
| // Check for EOS | ||
| if next_token == 2 || next_token == 32000 || next_token == 32007 { |
There was a problem hiding this comment.
The hardcoded token IDs (2, 32000, 32007) are magic numbers, which make the code difficult to understand and maintain. It would be much clearer to define these as named constants with comments explaining what they represent (e.g., <|endoftext|>, <|end|>).
You should define these constants at the top of the file:
// EOS tokens for Phi-3
const PHI3_EOS_ENDOFTEXT: i64 = 2;
const PHI3_EOS_END: i64 = 32000;
const PHI3_EOS_STOP: i64 = 32007;Then use them in the check.
| if next_token == 2 || next_token == 32000 || next_token == 32007 { | |
| if [PHI3_EOS_ENDOFTEXT, PHI3_EOS_END, PHI3_EOS_STOP].contains(&next_token) { |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
crates/synapse-infra/src/adapters/model_manager.rs (1)
164-179: Consider lazy-gating Phi-3 downloads inensure_models_exist.This path now downloads ORT assets for every consumer, including non-ORT flows, which can significantly increase first-run cost.
♻️ Refactor direction
-pub fn ensure_models_exist(&self) -> Result<ModelPaths> { +pub fn ensure_models_exist(&self, include_phi3: bool) -> Result<ModelPaths> { info!("Checking for required models..."); // ... - let phi3_onnx_path = match self.ensure_model(PHI3_ONNX_REPO, PHI3_ONNX_FILE) { ... }; - let phi3_tokenizer_path = match self.ensure_model(PHI3_ONNX_REPO, PHI3_ONNX_TOKENIZER) { ... }; + let (phi3_onnx_path, phi3_tokenizer_path) = if include_phi3 { + ( + self.ensure_model(PHI3_ONNX_REPO, PHI3_ONNX_FILE).ok(), + self.ensure_model(PHI3_ONNX_REPO, PHI3_ONNX_TOKENIZER).ok(), + ) + } else { + (None, None) + };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/synapse-infra/src/adapters/model_manager.rs` around lines 164 - 179, In ensure_models_exist, avoid unconditionally calling ensure_model for PHI3_ONNX_REPO/PHI3_ONNX_FILE and PHI3_ONNX_TOKENIZER; instead detect whether an ORT/ONNX consumer is actually required (e.g., via the runtime/config flag or the flow type that requires ORT) and only call ensure_model for PHI3_* when that condition is true. Modify the block that currently sets phi3_onnx_path and phi3_tokenizer_path to first check the ORT-enabled predicate (or a helper like is_ort_required()) and only invoke self.ensure_model(...) when true; otherwise set the paths to None without attempting downloads. Ensure references to PHI3_ONNX_REPO, PHI3_ONNX_FILE, PHI3_ONNX_TOKENIZER and the ensure_model/ensure_models_exist functions are updated accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/synapse-cli/src/commands.rs`:
- Around line 136-139: The code initializes candle_llm by calling
CandleAdapter::new with model_path.to_str().unwrap() and
tok_path.to_str().unwrap(), which can panic on non-UTF8 paths; update the
conversion to use model_path.to_string_lossy().into_owned() and
tok_path.to_string_lossy().into_owned() (or equivalent lossily-converted
strings) before passing them into CandleAdapter::new to safely handle OS paths
that are not valid UTF-8.
In `@crates/synapse-cognition/src/thought_loop.rs`:
- Around line 65-69: The simulation gate currently only prefixes thought.content
when simulated_entropy > tensor.metadata().entropy_score and is_suffering, which
still allows higher-entropy outputs to pass; update the logic in the block that
checks simulated_entropy, tensor.metadata().entropy_score, and is_suffering so
that when simulated_entropy exceeds the stored entropy score you either
reject/discard the generated thought (e.g., return an Err or skip adding it), or
force a regeneration/stronger-guidance path instead of returning the
higher-entropy object; adjust any callers that expect a Thought return (or set a
validity flag on thought) so higher-entropy outputs are not accepted—look for
the if branch using simulated_entropy, tensor.metadata().entropy_score,
is_suffering and the mutation of thought.content to implement this gate
properly.
In `@crates/synapse-infra/src/adapters/ort_adapter.rs`:
- Around line 60-66: The generate_with_params implementation
(generate_with_params) currently ignores the sampling controls temperature and
top_p and always performs greedy decoding; update the decoding logic inside
generate_with_params (the token selection loop / decoder section) to apply
temperature scaling to logits and perform probabilistic sampling with nucleus
(top-p) filtering instead of greedy argmax: scale logits by 1.0/temperature,
convert to probabilities via softmax, apply top-p cumulative cutoff retaining
tokens until cumulative prob >= top_p, renormalize, and draw a sample using a
RNG; ensure temperature and top_p parameters are actually passed into the
decoder path and used for each token selection so callers can control sampling
behavior.
- Around line 132-137: The greedy token selection currently calls
partial_cmp(...).unwrap() on last_token_logits which will panic if any logits
are NaN; change the comparison to use total_cmp (e.g., replace partial_cmp with
total_cmp) when computing next_token from last_token_logits to ensure a total
ordering that handles NaNs, and remove the unwrap or handle the Option safely as
part of that change; also add a short comment near the parameter declarations
(_temperature and _top_p) or in the greedy selection code indicating they are
intentionally unused for this greedy implementation so future sampling work can
refactor or remove them.
In `@crates/synapse-infra/src/adapters/ort_cognitive_adapter.rs`:
- Line 40: Replace the debug that logs the full model output (the
debug!("Generated thought response: {}", response); line referencing the
response variable) with metadata-only logging: do not include the raw response
text, instead log safe fields such as response length, a stable identifier or
hash, whether it was truncated, and any non-sensitive status flags; if needed
add a helper like sanitize_generated_response or response_metadata() in
ort_cognitive_adapter.rs that extracts those safe fields and log that metadata
instead of the full content.
---
Nitpick comments:
In `@crates/synapse-infra/src/adapters/model_manager.rs`:
- Around line 164-179: In ensure_models_exist, avoid unconditionally calling
ensure_model for PHI3_ONNX_REPO/PHI3_ONNX_FILE and PHI3_ONNX_TOKENIZER; instead
detect whether an ORT/ONNX consumer is actually required (e.g., via the
runtime/config flag or the flow type that requires ORT) and only call
ensure_model for PHI3_* when that condition is true. Modify the block that
currently sets phi3_onnx_path and phi3_tokenizer_path to first check the
ORT-enabled predicate (or a helper like is_ort_required()) and only invoke
self.ensure_model(...) when true; otherwise set the paths to None without
attempting downloads. Ensure references to PHI3_ONNX_REPO, PHI3_ONNX_FILE,
PHI3_ONNX_TOKENIZER and the ensure_model/ensure_models_exist functions are
updated accordingly.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
crates/synapse-cli/src/commands.rscrates/synapse-cli/src/main.rscrates/synapse-cognition/src/lib.rscrates/synapse-cognition/src/thought_loop.rscrates/synapse-infra/src/adapters/mod.rscrates/synapse-infra/src/adapters/model_manager.rscrates/synapse-infra/src/adapters/ort_adapter.rscrates/synapse-infra/src/adapters/ort_cognitive_adapter.rs
| let candle_llm = Arc::new(tokio::sync::Mutex::new(synapse_infra::adapters::candle_adapter::CandleAdapter::new( | ||
| model_path.to_str().unwrap().to_string(), | ||
| Some(tok_path.to_str().unwrap().to_string()) | ||
| )?)); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the exact location and context of the problematic code
cat -n crates/synapse-cli/src/commands.rs | sed -n '130,145p'Repository: iberi22/synapse-protocol
Length of output: 973
🏁 Script executed:
#!/bin/bash
# Search for all to_str().unwrap() patterns in the file
rg -n 'to_str\(\)\.unwrap\(\)' crates/synapse-cli/src/commands.rsRepository: iberi22/synapse-protocol
Length of output: 186
🏁 Script executed:
#!/bin/bash
# Check the broader context of the thought function to understand path origins
sed -n '100,145p' crates/synapse-cli/src/commands.rs | cat -nRepository: iberi22/synapse-protocol
Length of output: 2441
Avoid panic on non-UTF8 filesystem paths in Candle adapter initialization.
Lines 137–138 use to_str().unwrap() which will panic if the path contains invalid UTF-8 sequences. Replace with to_string_lossy().into_owned() to safely convert OS paths that may not be UTF-8:
Suggested fix
let candle_llm = Arc::new(tokio::sync::Mutex::new(synapse_infra::adapters::candle_adapter::CandleAdapter::new(
- model_path.to_str().unwrap().to_string(),
- Some(tok_path.to_str().unwrap().to_string())
+ model_path.to_string_lossy().into_owned(),
+ Some(tok_path.to_string_lossy().into_owned())
)?));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let candle_llm = Arc::new(tokio::sync::Mutex::new(synapse_infra::adapters::candle_adapter::CandleAdapter::new( | |
| model_path.to_str().unwrap().to_string(), | |
| Some(tok_path.to_str().unwrap().to_string()) | |
| )?)); | |
| let candle_llm = Arc::new(tokio::sync::Mutex::new(synapse_infra::adapters::candle_adapter::CandleAdapter::new( | |
| model_path.to_string_lossy().into_owned(), | |
| Some(tok_path.to_string_lossy().into_owned()) | |
| )?)); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/synapse-cli/src/commands.rs` around lines 136 - 139, The code
initializes candle_llm by calling CandleAdapter::new with
model_path.to_str().unwrap() and tok_path.to_str().unwrap(), which can panic on
non-UTF8 paths; update the conversion to use
model_path.to_string_lossy().into_owned() and
tok_path.to_string_lossy().into_owned() (or equivalent lossily-converted
strings) before passing them into CandleAdapter::new to safely handle OS paths
that are not valid UTF-8.
| if simulated_entropy > tensor.metadata().entropy_score && is_suffering { | ||
| warn!("Simulation detected entropy increase during suffering! Applying corrective guidance."); | ||
| // In a real implementation, we might re-run generation with stronger guidance. | ||
| thought.content = format!("(Guided) {}", thought.content); | ||
| } |
There was a problem hiding this comment.
Simulation gate does not actually block entropy regressions.
When entropy rises, the code still returns the thought (only content is prefixed), so higher-entropy output can pass validation.
🧭 Suggested correction pattern
- if simulated_entropy > tensor.metadata().entropy_score && is_suffering {
- warn!("Simulation detected entropy increase during suffering! Applying corrective guidance.");
- // In a real implementation, we might re-run generation with stronger guidance.
- thought.content = format!("(Guided) {}", thought.content);
- }
-
- thought.entropy_reduction = (tensor.metadata().entropy_score - simulated_entropy).max(0.0);
+ if simulated_entropy > tensor.metadata().entropy_score {
+ warn!("Simulation detected entropy increase; rejecting output for regeneration.");
+ return Err(synapse_core::error::Error::System(
+ "Thought rejected: simulation increased entropy".to_string(),
+ ));
+ }
+
+ thought.entropy_reduction = tensor.metadata().entropy_score - simulated_entropy;Also applies to: 71-71
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/synapse-cognition/src/thought_loop.rs` around lines 65 - 69, The
simulation gate currently only prefixes thought.content when simulated_entropy >
tensor.metadata().entropy_score and is_suffering, which still allows
higher-entropy outputs to pass; update the logic in the block that checks
simulated_entropy, tensor.metadata().entropy_score, and is_suffering so that
when simulated_entropy exceeds the stored entropy score you either
reject/discard the generated thought (e.g., return an Err or skip adding it), or
force a regeneration/stronger-guidance path instead of returning the
higher-entropy object; adjust any callers that expect a Thought return (or set a
validity flag on thought) so higher-entropy outputs are not accepted—look for
the if branch using simulated_entropy, tensor.metadata().entropy_score,
is_suffering and the mutation of thought.content to implement this gate
properly.
| async fn generate_with_params( | ||
| &self, | ||
| _prompt: &str, | ||
| max_tokens: usize, | ||
| _temperature: f32, | ||
| _top_p: f32, | ||
| ) -> Result<String, Error> { |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n crates/synapse-infra/src/adapters/ort_adapter.rs | head -100Repository: iberi22/synapse-protocol
Length of output: 4411
🏁 Script executed:
cat -n crates/synapse-infra/src/adapters/ort_adapter.rs | sed -n '60,150p'Repository: iberi22/synapse-protocol
Length of output: 4385
🏁 Script executed:
cat -n crates/synapse-infra/src/adapters/ort_adapter.rs | sed -n '150,180p'Repository: iberi22/synapse-protocol
Length of output: 309
generate_with_params silently ignores sampling parameters, violating the API contract.
The temperature and top_p parameters are accepted but never used. Decoding is hardcoded to greedy selection (lines 131–137), with no sampling or temperature scaling applied. Callers cannot influence generation behavior despite the API accepting these controls.
🛠️ Proposed minimum-safe behavior
async fn generate(&self, prompt: &str, max_tokens: usize) -> Result<String, Error> {
- self.generate_with_params(prompt, max_tokens, 0.7, 0.9).await
+ self.generate_with_params(prompt, max_tokens, 1.0, 1.0).await
}
async fn generate_with_params(
&self,
- _prompt: &str,
+ prompt: &str,
max_tokens: usize,
- _temperature: f32,
- _top_p: f32,
+ temperature: f32,
+ top_p: f32,
) -> Result<String, Error> {
+ if (temperature - 1.0).abs() > f32::EPSILON || (top_p - 1.0).abs() > f32::EPSILON {
+ return Err(Error::System(
+ "temperature/top_p sampling is not yet implemented in OrtAdapter".to_string(),
+ ));
+ }
debug!("Generating with ORT adapter for prompt...");
- let encoding = self.tokenizer.encode(_prompt, true)
+ let encoding = self.tokenizer.encode(prompt, true)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/synapse-infra/src/adapters/ort_adapter.rs` around lines 60 - 66, The
generate_with_params implementation (generate_with_params) currently ignores the
sampling controls temperature and top_p and always performs greedy decoding;
update the decoding logic inside generate_with_params (the token selection loop
/ decoder section) to apply temperature scaling to logits and perform
probabilistic sampling with nucleus (top-p) filtering instead of greedy argmax:
scale logits by 1.0/temperature, convert to probabilities via softmax, apply
top-p cumulative cutoff retaining tokens until cumulative prob >= top_p,
renormalize, and draw a sample using a RNG; ensure temperature and top_p
parameters are actually passed into the decoder path and used for each token
selection so callers can control sampling behavior.
| let next_token = last_token_logits | ||
| .iter() | ||
| .enumerate() | ||
| .max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap()) | ||
| .map(|(i, _)| i as i64) | ||
| .unwrap(); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n crates/synapse-infra/src/adapters/ort_adapter.rs | head -150 | tail -40Repository: iberi22/synapse-protocol
Length of output: 1795
🏁 Script executed:
wc -l crates/synapse-infra/src/adapters/ort_adapter.rsRepository: iberi22/synapse-protocol
Length of output: 120
🏁 Script executed:
cat -n crates/synapse-infra/src/adapters/ort_adapter.rs | sed -n '100,180p'Repository: iberi22/synapse-protocol
Length of output: 2759
🏁 Script executed:
rg -n "generate_with_params|_temperature|_top_p" crates/synapse-infra/src/adapters/ort_adapter.rs -A 5 -B 5Repository: iberi22/synapse-protocol
Length of output: 722
🌐 Web query:
Rust f32 partial_cmp vs total_cmp NaN handling semantics
💡 Result:
f32::partial_cmp (from PartialOrd)
- Returns
Option<Ordering>becausef32is not totally ordered. - If either operand is NaN, the comparison is undefined and
partial_cmpreturnsNone(and the operators<, <=, >, >=all yieldfalsein NaN cases). [1] - Consequently you can’t use
partial_cmpto implement a correctOrdfor raw floats, and things like sorting with it typically require handling theNonecase explicitly (e.g., viaunwrap_or,filter, custom NaN policy, etc.). [1]
f32::total_cmp
- Always returns an
Ordering(neverNone) by using IEEE-754’stotalOrderpredicate. [2] - It provides a deterministic total ordering over all bit patterns, including NaNs, and it also distinguishes
-0.0from+0.0(even thoughPartialEq/PartialOrdtreat them as equal / not ordered in some respects). [2] - NaNs are ordered relative to numbers based on their sign bit and NaN “kind” (quiet vs signaling). The doc’s sequence is:
negative quiet NaN < negative signaling NaN < … < -0.0 < +0.0 < … < positive signaling NaN < positive quiet NaN. [2]
Practical rule
- Use
partial_cmpwhen you want IEEE “normal” float comparisons and you’re prepared to handle “not comparable” (None) when NaNs may appear. [1] - Use
total_cmpwhen you need a total order (e.g., for sorting keys,BTreeMap-like ordering) and you want NaNs placed in a consistent, specified position. [2]
Sources: [PartialOrd docs] [1], [f32::total_cmp docs] [2].
Token selection panics on NaN logits.
The code at line 135 uses partial_cmp(...).unwrap() which returns None for NaN comparisons and will panic when unwrapped. Use total_cmp() instead, which provides total ordering over all IEEE-754 bit patterns including NaNs.
Additionally, _temperature and _top_p parameters at line 64-65 are intentionally unused for this greedy implementation—consider whether this should be documented or refactored once sampling strategies are implemented.
🧯 Proposed fix
- let next_token = last_token_logits
- .iter()
- .enumerate()
- .max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap())
- .map(|(i, _)| i as i64)
- .unwrap();
+ let next_token = last_token_logits
+ .iter()
+ .enumerate()
+ .max_by(|(_, a), (_, b)| a.total_cmp(b))
+ .map(|(i, _)| i as i64)
+ .ok_or_else(|| Error::System("Empty logits for next-token selection".to_string()))?;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let next_token = last_token_logits | |
| .iter() | |
| .enumerate() | |
| .max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap()) | |
| .map(|(i, _)| i as i64) | |
| .unwrap(); | |
| let next_token = last_token_logits | |
| .iter() | |
| .enumerate() | |
| .max_by(|(_, a), (_, b)| a.total_cmp(b)) | |
| .map(|(i, _)| i as i64) | |
| .ok_or_else(|| Error::System("Empty logits for next-token selection".to_string()))?; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/synapse-infra/src/adapters/ort_adapter.rs` around lines 132 - 137, The
greedy token selection currently calls partial_cmp(...).unwrap() on
last_token_logits which will panic if any logits are NaN; change the comparison
to use total_cmp (e.g., replace partial_cmp with total_cmp) when computing
next_token from last_token_logits to ensure a total ordering that handles NaNs,
and remove the unwrap or handle the Option safely as part of that change; also
add a short comment near the parameter declarations (_temperature and _top_p) or
in the greedy selection code indicating they are intentionally unused for this
greedy implementation so future sampling work can refactor or remove them.
|
|
||
| let response = self.llm.generate(&full_prompt, 512).await?; | ||
|
|
||
| debug!("Generated thought response: {}", response); |
There was a problem hiding this comment.
Do not log full generated responses.
Logging raw model output increases risk of leaking sensitive content. Prefer metadata-only logging.
🔒 Proposed fix
- debug!("Generated thought response: {}", response);
+ debug!("Generated thought response length: {} chars", response.chars().count());📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| debug!("Generated thought response: {}", response); | |
| debug!("Generated thought response length: {} chars", response.chars().count()); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/synapse-infra/src/adapters/ort_cognitive_adapter.rs` at line 40,
Replace the debug that logs the full model output (the debug!("Generated thought
response: {}", response); line referencing the response variable) with
metadata-only logging: do not include the raw response text, instead log safe
fields such as response length, a stable identifier or hash, whether it was
truncated, and any non-sensitive status flags; if needed add a helper like
sanitize_generated_response or response_metadata() in ort_cognitive_adapter.rs
that extracts those safe fields and log that metadata instead of the full
content.
Implement the Cognitive Layer (synapse-cognition) to enable the system to "think" using Small Language Models (SLMs) like Phi-3 Mini or Gemma 2B, guided by the ethical constraints of the Synapse Core.
Summary of changes:
OrtAdapter: A cross-platform LLM adapter using theort(ONNX Runtime) crate, capable of token-by-token generation.OrtCognitiveAdapter: A cognitive layer component that utilizesOrtAdapterto generate high-level thoughts, guided by ethical system prompts.ModelManager: Added support for Phi-3 Mini ONNX models and their tokenizers, enabling automatic download and caching.synapse-cli: Added a newthoughtcommand to the CLI to test the complete thought loop with both ORT and Candle adapters.Regarding build issues: I encountered pre-existing compilation errors in
synapse-infrarelated tov4l2-sys-mitand SurrealDB trait bounds. These were not caused by my changes but prevented a full clean build and automated test run. I focused on ensuring my new code is logically correct and correctly integrated with the existing architecture.Fixes #693
PR created automatically by Jules for task 7139681364060466942 started by @iberi22
Summary by CodeRabbit
Release Notes
thoughtCLI command that runs a complete cognitive cycle on user input. Users can generate AI-powered responses with confidence scores and entropy metrics. An optional flag enables ORT ONNX backend selection for model inference.