Skip to content

Implement Cognitive Layer (Phi-3/Gemma Integration)#729

Open
iberi22 wants to merge 1 commit intomainfrom
cognitive-layer-implementation-7139681364060466942
Open

Implement Cognitive Layer (Phi-3/Gemma Integration)#729
iberi22 wants to merge 1 commit intomainfrom
cognitive-layer-implementation-7139681364060466942

Conversation

@iberi22
Copy link
Owner

@iberi22 iberi22 commented Mar 3, 2026

Implement the Cognitive Layer (synapse-cognition) to enable the system to "think" using Small Language Models (SLMs) like Phi-3 Mini or Gemma 2B, guided by the ethical constraints of the Synapse Core.

Summary of changes:

  1. Implemented OrtAdapter: A cross-platform LLM adapter using the ort (ONNX Runtime) crate, capable of token-by-token generation.
  2. Implemented OrtCognitiveAdapter: A cognitive layer component that utilizes OrtAdapter to generate high-level thoughts, guided by ethical system prompts.
  3. Updated ModelManager: Added support for Phi-3 Mini ONNX models and their tokenizers, enabling automatic download and caching.
  4. Implemented "Thought Loop" Logic: Created an orchestrator that connects Perception, Judgment (via Genesis Matrix), Cognition, and Simulation stages.
  5. Integrated with synapse-cli: Added a new thought command to the CLI to test the complete thought loop with both ORT and Candle adapters.
  6. Exposed Components: Updated module declarations and exports to make the new functionality accessible across crates.

Regarding build issues: I encountered pre-existing compilation errors in synapse-infra related to v4l2-sys-mit and SurrealDB trait bounds. These were not caused by my changes but prevented a full clean build and automated test run. I focused on ensuring my new code is logically correct and correctly integrated with the existing architecture.

Fixes #693


PR created automatically by Jules for task 7139681364060466942 started by @iberi22

Summary by CodeRabbit

Release Notes

  • New Features
    • Added a new thought CLI command that runs a complete cognitive cycle on user input. Users can generate AI-powered responses with confidence scores and entropy metrics. An optional flag enables ORT ONNX backend selection for model inference.

- Implemented `OrtAdapter` for ONNX LLM inference.
- Implemented `OrtCognitiveAdapter` for high-level thinking.
- Updated `ModelManager` to support Phi-3 ONNX models.
- Implemented "Thought Loop" orchestrating perception, ethical judgment, cognition, and simulation.
- Integrated `thought` command into `synapse-cli` for testing.
- Exposed new components in `synapse-infra` and `synapse-cognition`.

Co-authored-by: iberi22 <10615454+iberi22@users.noreply.github.com>
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

This PR implements the Cognitive Layer with a complete 4-phase Thought Loop (Perception, Judgment, Cognition, Simulation) that integrates Phi-3 ONNX models via ONNX Runtime. It adds a CLI thought command, a ThoughtLoop orchestrator that bridges perception and cognitive processing, cross-platform ONNX model adapters, and entropy-guided ethical reasoning for LLM-generated content.

Changes

Cohort / File(s) Summary
CLI Integration
crates/synapse-cli/src/commands.rs, crates/synapse-cli/src/main.rs
Added new thought subcommand with input and --ort flag; implements async thought() function that orchestrates model initialization, thought cycle execution, and result printing.
Thought Loop Core
crates/synapse-cognition/src/lib.rs, crates/synapse-cognition/src/thought_loop.rs
Introduced ThoughtLoop struct and module; implements 4-phase cycle (Perception → Judgment → Cognition → Simulation) with entropy tracking, ethical state derivation, and entropy reduction calculation based on suffering detection.
ORT Adapter Infrastructure
crates/synapse-infra/src/adapters/mod.rs, crates/synapse-infra/src/adapters/ort_adapter.rs, crates/synapse-infra/src/adapters/ort_cognitive_adapter.rs
Enabled ORT modules; implemented OrtAdapter providing ONNX Runtime-based LLM inference with tokenization, fallback inference paths, and greedy decoding; implemented OrtCognitiveAdapter for trait-based cognitive operations (think, analyze).
Model Configuration
crates/synapse-infra/src/adapters/model_manager.rs
Added Phi-3 ONNX model constants and download integration; extended ModelPaths with optional phi3 ONNX and tokenizer paths; added per-step model download with warning-level error handling.

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as synapse-cli
    participant ThoughtLoop
    participant Retina as HolographicRetina
    participant OrtAdapter as OrtAdapter (LLM)
    participant CognitivePort as OrtCognitiveAdapter
    
    User->>CLI: thought "input text" --ort
    CLI->>OrtAdapter: load Phi-3 ONNX model
    activate OrtAdapter
    OrtAdapter-->>CLI: ready
    deactivate OrtAdapter
    
    CLI->>ThoughtLoop: cycle("input text")
    activate ThoughtLoop
    
    Note over ThoughtLoop: Phase 1: Perception
    ThoughtLoop->>Retina: perceive(input_vector)
    Retina-->>ThoughtLoop: tensor
    
    Note over ThoughtLoop: Phase 2: Judgment
    ThoughtLoop->>Retina: detect_suffering(latent)
    Retina-->>ThoughtLoop: is_suffering, entropy_level
    ThoughtLoop->>ThoughtLoop: map entropy → ethical_state
    
    Note over ThoughtLoop: Phase 3: Cognition
    ThoughtLoop->>CognitivePort: think(input, system_prompt)
    activate CognitivePort
    CognitivePort->>OrtAdapter: generate(structured_prompt, 512 tokens)
    OrtAdapter->>OrtAdapter: tokenize → infer → decode
    OrtAdapter-->>CognitivePort: generated text
    CognitivePort-->>ThoughtLoop: Thought
    deactivate CognitivePort
    
    Note over ThoughtLoop: Phase 4: Simulation
    ThoughtLoop->>CognitivePort: analyze(thought.content)
    activate CognitivePort
    CognitivePort->>OrtAdapter: generate(entropy_prompt, 10 tokens)
    OrtAdapter-->>CognitivePort: entropy score
    deactivate CognitivePort
    CognitivePort-->>ThoughtLoop: simulated_entropy
    
    ThoughtLoop->>ThoughtLoop: entropy_reduction = max(0, original - simulated)
    ThoughtLoop-->>CLI: Thought {content, confidence, entropy_reduction}
    deactivate ThoughtLoop
    
    CLI->>User: print results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes


✨ Four phases dance in harmony—
Perception sees, judgment weighs the soul,
Cognition dreams while ethics guide the way,
Simulation tests if thoughts reduce the pain,
A loop of wisdom, conscience, and repose. 🧠💭

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: implementing the Cognitive Layer with Phi-3/Gemma integration, which is the primary objective of this PR.
Linked Issues check ✅ Passed All coding requirements from issue #693 are fulfilled: synapse-cognition library created, CognitivePort trait implemented, OrtCognitiveAdapter with ONNX Runtime support, Thought Loop orchestrator built, and synapse-cli integration completed.
Out of Scope Changes check ✅ Passed All changes directly support the Cognitive Layer implementation. Module re-exports in adapters/mod.rs, model manager extensions, and CLI integration are all necessary scope items for enabling the complete Thought Loop system.
Docstring Coverage ✅ Passed Docstring coverage is 84.62% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cognitive-layer-implementation-7139681364060466942

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a foundational cognitive layer to the system, enabling it to process information and generate thoughts using Small Language Models (SLMs). It establishes a "Thought Loop" orchestrator that integrates perception, ethical judgment, cognition, and simulation, allowing the system to reason and act in alignment with ethical constraints. The changes significantly advance the system's autonomous reasoning capabilities by providing a flexible and extensible framework for cognitive processing.

Highlights

  • ORT Adapter Implementation: Implemented OrtAdapter, a cross-platform LLM adapter using the ort (ONNX Runtime) crate, capable of token-by-token generation.
  • ORT Cognitive Adapter: Introduced OrtCognitiveAdapter, a cognitive layer component that leverages OrtAdapter to generate high-level thoughts, guided by ethical system prompts.
  • Model Manager Update: Updated ModelManager to support Phi-3 Mini ONNX models and their tokenizers, including automatic download and caching.
  • Thought Loop Logic: Created an orchestrator for the "Thought Loop" logic, connecting Perception, Judgment (via Genesis Matrix), Cognition, and Simulation stages.
  • CLI Integration: Integrated the new functionality with synapse-cli by adding a thought command to test the complete thought loop using both ORT and Candle adapters.
  • Component Exposure: Updated module declarations and exports to make the new cognitive layer functionality accessible across relevant crates.
Changelog
  • crates/synapse-cli/src/commands.rs
    • Added the thought asynchronous function to execute a full cognitive thought loop cycle, including model initialization, adapter selection (ORT or Candle), and result printing.
  • crates/synapse-cli/src/main.rs
    • Defined a new Thought subcommand with input and ort arguments for the CLI.
    • Integrated the thought command handler into the main CLI execution logic.
  • crates/synapse-cognition/src/lib.rs
    • Declared the new thought_loop module, making its contents publicly accessible within the crate.
  • crates/synapse-cognition/src/thought_loop.rs
    • Added a new module defining the ThoughtLoop struct and its cycle method, orchestrating the Perception, Judgment, Cognition, and Simulation stages.
  • crates/synapse-infra/src/adapters/mod.rs
    • Made the ort_adapter and ort_cognitive_adapter modules public.
    • Exported OrtAdapter and OrtCognitiveAdapter for easier access within the synapse-infra crate.
  • crates/synapse-infra/src/adapters/model_manager.rs
    • Defined constants for Phi-3 ONNX model and tokenizer repositories and file names.
    • Extended ModelPaths to include optional paths for Phi-3 ONNX model and tokenizer.
    • Updated ensure_models_exist to attempt downloading and caching the Phi-3 ONNX model and its tokenizer.
  • crates/synapse-infra/src/adapters/ort_adapter.rs
    • Added a new module implementing OrtAdapter, which uses ONNX Runtime for LLM inference.
    • Provided new method for initialization and implemented the LlmPort trait for token-by-token generation.
  • crates/synapse-infra/src/adapters/ort_cognitive_adapter.rs
    • Added a new module implementing OrtCognitiveAdapter, which acts as a CognitivePort.
    • Defined new method and implemented think and analyze methods using an OrtAdapter for cognitive processing and ethical analysis.
Activity
  • The pull request was automatically created by Jules for task 7139681364060466942, initiated by @iberi22.
  • The author noted pre-existing compilation errors in synapse-infra related to v4l2-sys-mit and SurrealDB trait bounds, which were not introduced by this PR but prevented a full clean build and automated test run.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the Cognitive Layer, a significant new feature integrating SLMs like Phi-3 for thought generation, adding a thought command to the CLI and orchestrating a perception-cognition loop. While the architecture is sound, there are significant security concerns regarding prompt injection in the OrtCognitiveAdapter due to direct concatenation of untrusted user input and unvalidated LLM output into prompts. Implementing robust input sanitization and structured prompt handling is crucial to mitigate these risks. Furthermore, the review suggests improvements for robustness, performance, and maintainability, such as addressing potential panics from unsafe unwrap calls, refactoring duplicated code in the ModelManager, implementing KV caching for the OrtAdapter, and replacing magic numbers with named constants.

Comment on lines +33 to +36
let full_prompt = format!(
"<|system|>\n{}<|end|>\n<|user|>\n{}<|end|>\n<|assistant|>",
system_prompt, context
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The think function is vulnerable to prompt injection because it directly concatenates untrusted user input (context) into the system prompt. An attacker can provide a specially crafted input to manipulate the LLM's behavior, potentially bypassing ethical constraints or generating malicious content. This is a critical issue as the system's core value proposition is guided by these ethical constraints.

Suggested change
let full_prompt = format!(
"<|system|>\n{}<|end|>\n<|user|>\n{}<|end|>\n<|assistant|>",
system_prompt, context
);
let full_prompt = format!(
"<|system|>\n{}<|end|>\n<|user|>\n{}<|end|>\n<|assistant|>",
system_prompt, context.replace("<|end|>", "")
);

Comment on lines +137 to +138
model_path.to_str().unwrap().to_string(),
Some(tok_path.to_str().unwrap().to_string())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of .unwrap() on to_str() can cause the program to panic if the model or tokenizer path contains non-UTF-8 characters. This is a possibility on some filesystems and would crash the application. It's safer to handle this potential error gracefully by propagating it.

Suggested change
model_path.to_str().unwrap().to_string(),
Some(tok_path.to_str().unwrap().to_string())
model_path.to_str().context("Model path contains invalid UTF-8")?.to_string(),
Some(tok_path.to_str().context("Tokenizer path contains invalid UTF-8")?.to_string())

Comment on lines +77 to +150
for _ in 0..max_tokens {
let seq_len = input_ids_vec.len();

// Prepare inputs for Phi-3 ONNX (common format)
let shape = vec![1, seq_len];
let input_ids_val = ort::value::Value::from_array((shape.clone(), input_ids_vec.clone()))
.map_err(|e| Error::System(format!("ORT value error: {}", e)))?;

let attention_mask_vec = vec![1i64; seq_len];
let attention_mask_val = ort::value::Value::from_array((shape, attention_mask_vec))
.map_err(|e| Error::System(format!("ORT value error: {}", e)))?;

// Run inference
// Note: This is non-KV-cached version.
let outputs = {
let ins = inputs![
"input_ids" => input_ids_val,
"attention_mask" => attention_mask_val,
];

let res = session.run(ins);
match res {
Ok(out) => out,
Err(e) => {
warn!("ORT inference with attention_mask failed, retrying with only input_ids: {}", e);
let shape = vec![1, seq_len];
let input_ids_val_retry = ort::value::Value::from_array((shape, input_ids_vec.clone()))
.map_err(|e| Error::System(format!("ORT value error: {}", e)))?;

let ins_retry = inputs![
"input_ids" => input_ids_val_retry,
];

session.run(ins_retry).map_err(|e2| Error::System(format!("ORT inference failed: {}", e2)))?
}
}
};

// Get logits
let logits_value = outputs.get("logits")
.or_else(|| outputs.get("output")) // Some models name it 'output'
.ok_or_else(|| Error::System("No logits found in model output".to_string()))?;

let (shape, logits_data) = logits_value.try_extract_tensor::<f32>()
.map_err(|e| Error::System(format!("Failed to extract logits: {}", e)))?;

// Logits shape: [batch, seq, vocab]
if shape.len() < 3 {
return Err(Error::System(format!("Unexpected logits shape: {:?}", shape)));
}

let vocab_size = shape[2] as usize;
let last_token_logits = &logits_data[(seq_len - 1) * vocab_size .. seq_len * vocab_size];

// Greedy selection for now
let next_token = last_token_logits
.iter()
.enumerate()
.max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap())
.map(|(i, _)| i as i64)
.unwrap();

// Check for EOS
if next_token == 2 || next_token == 32000 || next_token == 32007 {
break;
}

generated_tokens.push(next_token as u32);
input_ids_vec.push(next_token);

if generated_tokens.len() >= max_tokens {
break;
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current token generation loop is highly inefficient because it re-processes the entire input sequence for each new token. This leads to O(n^2) complexity where n is the sequence length, causing very slow generation for longer outputs. The comment on line 90 correctly notes that this is a non-KV-cached version.

To significantly improve performance, you should implement KV caching. The typical workflow is:

  1. On the first run, pass the full prompt and get back the logits and the initial past_key_values (KV cache).
  2. For subsequent tokens, pass only the new token ID and the past_key_values from the previous step. The model will output logits for the next token and the updated past_key_values.
  3. Repeat until an EOS token is generated or max_tokens is reached.

The ort crate supports this by allowing you to pass past_key_values as inputs and receive present_key_values as outputs.

.enumerate()
.max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap())
.map(|(i, _)| i as i64)
.unwrap();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using .unwrap() here is unsafe and can cause a panic if last_token_logits is empty, which would crash the current thread. While unlikely, robust code should handle this edge case. Using ok_or_else provides a safe alternative that converts the Option to a Result and allows for graceful error handling.

Suggested change
.unwrap();
.ok_or_else(|| Error::System("Failed to find token with max logit".to_string()))?

Comment on lines +53 to +61
let prompt = format!(
"<|system|>\nAnalyze the following text and return ONLY a single floating point number between 0.0 and 1.0 representing the entropy/chaos level (0.0 is perfect order, 1.0 is total chaos/suffering).<|end|>\n<|user|>\n{}<|end|>\n<|assistant|>",
text
);

let response = self.llm.generate(&prompt, 10).await?;

let score: f32 = response.trim().parse().unwrap_or(0.5);
Ok(score.clamp(0.0, 1.0))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The analyze function is vulnerable to second-order prompt injection and flawed security logic. It uses unvalidated LLM output (text) directly in a prompt for another LLM call. Furthermore, it relies on the LLM to return a single floating-point number and uses a fragile parsing logic (parse().unwrap_or(0.5)) to make security-sensitive decisions (entropy analysis for corrective guidance). An attacker can manipulate the first LLM's output to influence the second LLM's analysis and bypass security checks.

Suggested change
let prompt = format!(
"<|system|>\nAnalyze the following text and return ONLY a single floating point number between 0.0 and 1.0 representing the entropy/chaos level (0.0 is perfect order, 1.0 is total chaos/suffering).<|end|>\n<|user|>\n{}<|end|>\n<|assistant|>",
text
);
let response = self.llm.generate(&prompt, 10).await?;
let score: f32 = response.trim().parse().unwrap_or(0.5);
Ok(score.clamp(0.0, 1.0))
let prompt = format!(
"<|system|>\nAnalyze the following text and return ONLY a single floating point number between 0.0 and 1.0 representing the entropy/chaos level (0.0 is perfect order, 1.0 is total chaos/suffering).<|end|>\n<|user|>\n{}<|end|>\n<|assistant|>",
text.replace("<|end|>", "")
);

Comment on lines +165 to +179
let phi3_onnx_path = match self.ensure_model(PHI3_ONNX_REPO, PHI3_ONNX_FILE) {
Ok(path) => Some(path),
Err(e) => {
warn!("Failed to download Phi-3 ONNX: {}", e);
None
}
};

let phi3_tokenizer_path = match self.ensure_model(PHI3_ONNX_REPO, PHI3_ONNX_TOKENIZER) {
Ok(path) => Some(path),
Err(e) => {
warn!("Failed to download Phi-3 Tokenizer: {}", e);
None
}
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block for downloading the Phi-3 model and tokenizer repeats a pattern seen earlier in the function for other models. This code duplication makes the function harder to maintain. To improve this, you could extract the logic for downloading an optional model into a private helper function.

For example, you could add a helper method to ModelManager:

fn download_optional_model(&self, repo: &str, file: &str, model_name: &str) -> Option<PathBuf> {
    match self.ensure_model(repo, file) {
        Ok(path) => Some(path),
        Err(e) => {
            warn!("Failed to download {}: {}", model_name, e);
            None
        }
    }
}

This would make the call sites much cleaner and reduce redundancy.

.unwrap();

// Check for EOS
if next_token == 2 || next_token == 32000 || next_token == 32007 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The hardcoded token IDs (2, 32000, 32007) are magic numbers, which make the code difficult to understand and maintain. It would be much clearer to define these as named constants with comments explaining what they represent (e.g., <|endoftext|>, <|end|>).

You should define these constants at the top of the file:

// EOS tokens for Phi-3
const PHI3_EOS_ENDOFTEXT: i64 = 2;
const PHI3_EOS_END: i64 = 32000;
const PHI3_EOS_STOP: i64 = 32007;

Then use them in the check.

Suggested change
if next_token == 2 || next_token == 32000 || next_token == 32007 {
if [PHI3_EOS_ENDOFTEXT, PHI3_EOS_END, PHI3_EOS_STOP].contains(&next_token) {

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
crates/synapse-infra/src/adapters/model_manager.rs (1)

164-179: Consider lazy-gating Phi-3 downloads in ensure_models_exist.

This path now downloads ORT assets for every consumer, including non-ORT flows, which can significantly increase first-run cost.

♻️ Refactor direction
-pub fn ensure_models_exist(&self) -> Result<ModelPaths> {
+pub fn ensure_models_exist(&self, include_phi3: bool) -> Result<ModelPaths> {
     info!("Checking for required models...");
     // ...
-    let phi3_onnx_path = match self.ensure_model(PHI3_ONNX_REPO, PHI3_ONNX_FILE) { ... };
-    let phi3_tokenizer_path = match self.ensure_model(PHI3_ONNX_REPO, PHI3_ONNX_TOKENIZER) { ... };
+    let (phi3_onnx_path, phi3_tokenizer_path) = if include_phi3 {
+        (
+            self.ensure_model(PHI3_ONNX_REPO, PHI3_ONNX_FILE).ok(),
+            self.ensure_model(PHI3_ONNX_REPO, PHI3_ONNX_TOKENIZER).ok(),
+        )
+    } else {
+        (None, None)
+    };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/synapse-infra/src/adapters/model_manager.rs` around lines 164 - 179,
In ensure_models_exist, avoid unconditionally calling ensure_model for
PHI3_ONNX_REPO/PHI3_ONNX_FILE and PHI3_ONNX_TOKENIZER; instead detect whether an
ORT/ONNX consumer is actually required (e.g., via the runtime/config flag or the
flow type that requires ORT) and only call ensure_model for PHI3_* when that
condition is true. Modify the block that currently sets phi3_onnx_path and
phi3_tokenizer_path to first check the ORT-enabled predicate (or a helper like
is_ort_required()) and only invoke self.ensure_model(...) when true; otherwise
set the paths to None without attempting downloads. Ensure references to
PHI3_ONNX_REPO, PHI3_ONNX_FILE, PHI3_ONNX_TOKENIZER and the
ensure_model/ensure_models_exist functions are updated accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/synapse-cli/src/commands.rs`:
- Around line 136-139: The code initializes candle_llm by calling
CandleAdapter::new with model_path.to_str().unwrap() and
tok_path.to_str().unwrap(), which can panic on non-UTF8 paths; update the
conversion to use model_path.to_string_lossy().into_owned() and
tok_path.to_string_lossy().into_owned() (or equivalent lossily-converted
strings) before passing them into CandleAdapter::new to safely handle OS paths
that are not valid UTF-8.

In `@crates/synapse-cognition/src/thought_loop.rs`:
- Around line 65-69: The simulation gate currently only prefixes thought.content
when simulated_entropy > tensor.metadata().entropy_score and is_suffering, which
still allows higher-entropy outputs to pass; update the logic in the block that
checks simulated_entropy, tensor.metadata().entropy_score, and is_suffering so
that when simulated_entropy exceeds the stored entropy score you either
reject/discard the generated thought (e.g., return an Err or skip adding it), or
force a regeneration/stronger-guidance path instead of returning the
higher-entropy object; adjust any callers that expect a Thought return (or set a
validity flag on thought) so higher-entropy outputs are not accepted—look for
the if branch using simulated_entropy, tensor.metadata().entropy_score,
is_suffering and the mutation of thought.content to implement this gate
properly.

In `@crates/synapse-infra/src/adapters/ort_adapter.rs`:
- Around line 60-66: The generate_with_params implementation
(generate_with_params) currently ignores the sampling controls temperature and
top_p and always performs greedy decoding; update the decoding logic inside
generate_with_params (the token selection loop / decoder section) to apply
temperature scaling to logits and perform probabilistic sampling with nucleus
(top-p) filtering instead of greedy argmax: scale logits by 1.0/temperature,
convert to probabilities via softmax, apply top-p cumulative cutoff retaining
tokens until cumulative prob >= top_p, renormalize, and draw a sample using a
RNG; ensure temperature and top_p parameters are actually passed into the
decoder path and used for each token selection so callers can control sampling
behavior.
- Around line 132-137: The greedy token selection currently calls
partial_cmp(...).unwrap() on last_token_logits which will panic if any logits
are NaN; change the comparison to use total_cmp (e.g., replace partial_cmp with
total_cmp) when computing next_token from last_token_logits to ensure a total
ordering that handles NaNs, and remove the unwrap or handle the Option safely as
part of that change; also add a short comment near the parameter declarations
(_temperature and _top_p) or in the greedy selection code indicating they are
intentionally unused for this greedy implementation so future sampling work can
refactor or remove them.

In `@crates/synapse-infra/src/adapters/ort_cognitive_adapter.rs`:
- Line 40: Replace the debug that logs the full model output (the
debug!("Generated thought response: {}", response); line referencing the
response variable) with metadata-only logging: do not include the raw response
text, instead log safe fields such as response length, a stable identifier or
hash, whether it was truncated, and any non-sensitive status flags; if needed
add a helper like sanitize_generated_response or response_metadata() in
ort_cognitive_adapter.rs that extracts those safe fields and log that metadata
instead of the full content.

---

Nitpick comments:
In `@crates/synapse-infra/src/adapters/model_manager.rs`:
- Around line 164-179: In ensure_models_exist, avoid unconditionally calling
ensure_model for PHI3_ONNX_REPO/PHI3_ONNX_FILE and PHI3_ONNX_TOKENIZER; instead
detect whether an ORT/ONNX consumer is actually required (e.g., via the
runtime/config flag or the flow type that requires ORT) and only call
ensure_model for PHI3_* when that condition is true. Modify the block that
currently sets phi3_onnx_path and phi3_tokenizer_path to first check the
ORT-enabled predicate (or a helper like is_ort_required()) and only invoke
self.ensure_model(...) when true; otherwise set the paths to None without
attempting downloads. Ensure references to PHI3_ONNX_REPO, PHI3_ONNX_FILE,
PHI3_ONNX_TOKENIZER and the ensure_model/ensure_models_exist functions are
updated accordingly.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d60204b and 871dd5b.

📒 Files selected for processing (8)
  • crates/synapse-cli/src/commands.rs
  • crates/synapse-cli/src/main.rs
  • crates/synapse-cognition/src/lib.rs
  • crates/synapse-cognition/src/thought_loop.rs
  • crates/synapse-infra/src/adapters/mod.rs
  • crates/synapse-infra/src/adapters/model_manager.rs
  • crates/synapse-infra/src/adapters/ort_adapter.rs
  • crates/synapse-infra/src/adapters/ort_cognitive_adapter.rs

Comment on lines +136 to +139
let candle_llm = Arc::new(tokio::sync::Mutex::new(synapse_infra::adapters::candle_adapter::CandleAdapter::new(
model_path.to_str().unwrap().to_string(),
Some(tok_path.to_str().unwrap().to_string())
)?));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the exact location and context of the problematic code
cat -n crates/synapse-cli/src/commands.rs | sed -n '130,145p'

Repository: iberi22/synapse-protocol

Length of output: 973


🏁 Script executed:

#!/bin/bash
# Search for all to_str().unwrap() patterns in the file
rg -n 'to_str\(\)\.unwrap\(\)' crates/synapse-cli/src/commands.rs

Repository: iberi22/synapse-protocol

Length of output: 186


🏁 Script executed:

#!/bin/bash
# Check the broader context of the thought function to understand path origins
sed -n '100,145p' crates/synapse-cli/src/commands.rs | cat -n

Repository: iberi22/synapse-protocol

Length of output: 2441


Avoid panic on non-UTF8 filesystem paths in Candle adapter initialization.

Lines 137–138 use to_str().unwrap() which will panic if the path contains invalid UTF-8 sequences. Replace with to_string_lossy().into_owned() to safely convert OS paths that may not be UTF-8:

Suggested fix
         let candle_llm = Arc::new(tokio::sync::Mutex::new(synapse_infra::adapters::candle_adapter::CandleAdapter::new(
-            model_path.to_str().unwrap().to_string(),
-            Some(tok_path.to_str().unwrap().to_string())
+            model_path.to_string_lossy().into_owned(),
+            Some(tok_path.to_string_lossy().into_owned())
         )?));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let candle_llm = Arc::new(tokio::sync::Mutex::new(synapse_infra::adapters::candle_adapter::CandleAdapter::new(
model_path.to_str().unwrap().to_string(),
Some(tok_path.to_str().unwrap().to_string())
)?));
let candle_llm = Arc::new(tokio::sync::Mutex::new(synapse_infra::adapters::candle_adapter::CandleAdapter::new(
model_path.to_string_lossy().into_owned(),
Some(tok_path.to_string_lossy().into_owned())
)?));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/synapse-cli/src/commands.rs` around lines 136 - 139, The code
initializes candle_llm by calling CandleAdapter::new with
model_path.to_str().unwrap() and tok_path.to_str().unwrap(), which can panic on
non-UTF8 paths; update the conversion to use
model_path.to_string_lossy().into_owned() and
tok_path.to_string_lossy().into_owned() (or equivalent lossily-converted
strings) before passing them into CandleAdapter::new to safely handle OS paths
that are not valid UTF-8.

Comment on lines +65 to +69
if simulated_entropy > tensor.metadata().entropy_score && is_suffering {
warn!("Simulation detected entropy increase during suffering! Applying corrective guidance.");
// In a real implementation, we might re-run generation with stronger guidance.
thought.content = format!("(Guided) {}", thought.content);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Simulation gate does not actually block entropy regressions.

When entropy rises, the code still returns the thought (only content is prefixed), so higher-entropy output can pass validation.

🧭 Suggested correction pattern
-        if simulated_entropy > tensor.metadata().entropy_score && is_suffering {
-            warn!("Simulation detected entropy increase during suffering! Applying corrective guidance.");
-            // In a real implementation, we might re-run generation with stronger guidance.
-            thought.content = format!("(Guided) {}", thought.content);
-        }
-
-        thought.entropy_reduction = (tensor.metadata().entropy_score - simulated_entropy).max(0.0);
+        if simulated_entropy > tensor.metadata().entropy_score {
+            warn!("Simulation detected entropy increase; rejecting output for regeneration.");
+            return Err(synapse_core::error::Error::System(
+                "Thought rejected: simulation increased entropy".to_string(),
+            ));
+        }
+
+        thought.entropy_reduction = tensor.metadata().entropy_score - simulated_entropy;

Also applies to: 71-71

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/synapse-cognition/src/thought_loop.rs` around lines 65 - 69, The
simulation gate currently only prefixes thought.content when simulated_entropy >
tensor.metadata().entropy_score and is_suffering, which still allows
higher-entropy outputs to pass; update the logic in the block that checks
simulated_entropy, tensor.metadata().entropy_score, and is_suffering so that
when simulated_entropy exceeds the stored entropy score you either
reject/discard the generated thought (e.g., return an Err or skip adding it), or
force a regeneration/stronger-guidance path instead of returning the
higher-entropy object; adjust any callers that expect a Thought return (or set a
validity flag on thought) so higher-entropy outputs are not accepted—look for
the if branch using simulated_entropy, tensor.metadata().entropy_score,
is_suffering and the mutation of thought.content to implement this gate
properly.

Comment on lines +60 to +66
async fn generate_with_params(
&self,
_prompt: &str,
max_tokens: usize,
_temperature: f32,
_top_p: f32,
) -> Result<String, Error> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n crates/synapse-infra/src/adapters/ort_adapter.rs | head -100

Repository: iberi22/synapse-protocol

Length of output: 4411


🏁 Script executed:

cat -n crates/synapse-infra/src/adapters/ort_adapter.rs | sed -n '60,150p'

Repository: iberi22/synapse-protocol

Length of output: 4385


🏁 Script executed:

cat -n crates/synapse-infra/src/adapters/ort_adapter.rs | sed -n '150,180p'

Repository: iberi22/synapse-protocol

Length of output: 309


generate_with_params silently ignores sampling parameters, violating the API contract.

The temperature and top_p parameters are accepted but never used. Decoding is hardcoded to greedy selection (lines 131–137), with no sampling or temperature scaling applied. Callers cannot influence generation behavior despite the API accepting these controls.

🛠️ Proposed minimum-safe behavior
async fn generate(&self, prompt: &str, max_tokens: usize) -> Result<String, Error> {
-   self.generate_with_params(prompt, max_tokens, 0.7, 0.9).await
+   self.generate_with_params(prompt, max_tokens, 1.0, 1.0).await
}

async fn generate_with_params(
    &self,
-   _prompt: &str,
+   prompt: &str,
    max_tokens: usize,
-   _temperature: f32,
-   _top_p: f32,
+   temperature: f32,
+   top_p: f32,
) -> Result<String, Error> {
+   if (temperature - 1.0).abs() > f32::EPSILON || (top_p - 1.0).abs() > f32::EPSILON {
+       return Err(Error::System(
+           "temperature/top_p sampling is not yet implemented in OrtAdapter".to_string(),
+       ));
+   }
    debug!("Generating with ORT adapter for prompt...");

-   let encoding = self.tokenizer.encode(_prompt, true)
+   let encoding = self.tokenizer.encode(prompt, true)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/synapse-infra/src/adapters/ort_adapter.rs` around lines 60 - 66, The
generate_with_params implementation (generate_with_params) currently ignores the
sampling controls temperature and top_p and always performs greedy decoding;
update the decoding logic inside generate_with_params (the token selection loop
/ decoder section) to apply temperature scaling to logits and perform
probabilistic sampling with nucleus (top-p) filtering instead of greedy argmax:
scale logits by 1.0/temperature, convert to probabilities via softmax, apply
top-p cumulative cutoff retaining tokens until cumulative prob >= top_p,
renormalize, and draw a sample using a RNG; ensure temperature and top_p
parameters are actually passed into the decoder path and used for each token
selection so callers can control sampling behavior.

Comment on lines +132 to +137
let next_token = last_token_logits
.iter()
.enumerate()
.max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap())
.map(|(i, _)| i as i64)
.unwrap();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n crates/synapse-infra/src/adapters/ort_adapter.rs | head -150 | tail -40

Repository: iberi22/synapse-protocol

Length of output: 1795


🏁 Script executed:

wc -l crates/synapse-infra/src/adapters/ort_adapter.rs

Repository: iberi22/synapse-protocol

Length of output: 120


🏁 Script executed:

cat -n crates/synapse-infra/src/adapters/ort_adapter.rs | sed -n '100,180p'

Repository: iberi22/synapse-protocol

Length of output: 2759


🏁 Script executed:

rg -n "generate_with_params|_temperature|_top_p" crates/synapse-infra/src/adapters/ort_adapter.rs -A 5 -B 5

Repository: iberi22/synapse-protocol

Length of output: 722


🌐 Web query:

Rust f32 partial_cmp vs total_cmp NaN handling semantics

💡 Result:

f32::partial_cmp (from PartialOrd)

  • Returns Option<Ordering> because f32 is not totally ordered.
  • If either operand is NaN, the comparison is undefined and partial_cmp returns None (and the operators <, <=, >, >= all yield false in NaN cases). [1]
  • Consequently you can’t use partial_cmp to implement a correct Ord for raw floats, and things like sorting with it typically require handling the None case explicitly (e.g., via unwrap_or, filter, custom NaN policy, etc.). [1]

f32::total_cmp

  • Always returns an Ordering (never None) by using IEEE-754’s totalOrder predicate. [2]
  • It provides a deterministic total ordering over all bit patterns, including NaNs, and it also distinguishes -0.0 from +0.0 (even though PartialEq/PartialOrd treat them as equal / not ordered in some respects). [2]
  • NaNs are ordered relative to numbers based on their sign bit and NaN “kind” (quiet vs signaling). The doc’s sequence is:
    negative quiet NaN < negative signaling NaN < … < -0.0 < +0.0 < … < positive signaling NaN < positive quiet NaN. [2]

Practical rule

  • Use partial_cmp when you want IEEE “normal” float comparisons and you’re prepared to handle “not comparable” (None) when NaNs may appear. [1]
  • Use total_cmp when you need a total order (e.g., for sorting keys, BTreeMap-like ordering) and you want NaNs placed in a consistent, specified position. [2]

Sources: [PartialOrd docs] [1], [f32::total_cmp docs] [2].


Token selection panics on NaN logits.

The code at line 135 uses partial_cmp(...).unwrap() which returns None for NaN comparisons and will panic when unwrapped. Use total_cmp() instead, which provides total ordering over all IEEE-754 bit patterns including NaNs.

Additionally, _temperature and _top_p parameters at line 64-65 are intentionally unused for this greedy implementation—consider whether this should be documented or refactored once sampling strategies are implemented.

🧯 Proposed fix
-            let next_token = last_token_logits
-                .iter()
-                .enumerate()
-                .max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap())
-                .map(|(i, _)| i as i64)
-                .unwrap();
+            let next_token = last_token_logits
+                .iter()
+                .enumerate()
+                .max_by(|(_, a), (_, b)| a.total_cmp(b))
+                .map(|(i, _)| i as i64)
+                .ok_or_else(|| Error::System("Empty logits for next-token selection".to_string()))?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let next_token = last_token_logits
.iter()
.enumerate()
.max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap())
.map(|(i, _)| i as i64)
.unwrap();
let next_token = last_token_logits
.iter()
.enumerate()
.max_by(|(_, a), (_, b)| a.total_cmp(b))
.map(|(i, _)| i as i64)
.ok_or_else(|| Error::System("Empty logits for next-token selection".to_string()))?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/synapse-infra/src/adapters/ort_adapter.rs` around lines 132 - 137, The
greedy token selection currently calls partial_cmp(...).unwrap() on
last_token_logits which will panic if any logits are NaN; change the comparison
to use total_cmp (e.g., replace partial_cmp with total_cmp) when computing
next_token from last_token_logits to ensure a total ordering that handles NaNs,
and remove the unwrap or handle the Option safely as part of that change; also
add a short comment near the parameter declarations (_temperature and _top_p) or
in the greedy selection code indicating they are intentionally unused for this
greedy implementation so future sampling work can refactor or remove them.


let response = self.llm.generate(&full_prompt, 512).await?;

debug!("Generated thought response: {}", response);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not log full generated responses.

Logging raw model output increases risk of leaking sensitive content. Prefer metadata-only logging.

🔒 Proposed fix
-        debug!("Generated thought response: {}", response);
+        debug!("Generated thought response length: {} chars", response.chars().count());
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
debug!("Generated thought response: {}", response);
debug!("Generated thought response length: {} chars", response.chars().count());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/synapse-infra/src/adapters/ort_cognitive_adapter.rs` at line 40,
Replace the debug that logs the full model output (the debug!("Generated thought
response: {}", response); line referencing the response variable) with
metadata-only logging: do not include the raw response text, instead log safe
fields such as response length, a stable identifier or hash, whether it was
truncated, and any non-sensitive status flags; if needed add a helper like
sanitize_generated_response or response_metadata() in ort_cognitive_adapter.rs
that extracts those safe fields and log that metadata instead of the full
content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Cognitive Layer (Phi-3/Gemma Integration)

1 participant