-
Notifications
You must be signed in to change notification settings - Fork 0
Feature/fp32 support for metal #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Critical fixes: - codegraph/sync.py: Batch FalkorDB deletes to avoid N+1 query pattern - benchmark/behavior.py: Use itertools.accumulate instead of O(n²) string concat - ai/tools/deeplake/tool.py: Replace exception-based control flow with dict.get() High priority fixes: - instrumentation/run_logger.py: Document file handle lifecycle - instrumentation/hdf5_store.py: Add context manager protocol - tui/input_bar.py: Add MAX_HISTORY_SIZE (1000) with LRU eviction - ai/process/__init__.py: Add MAX_HISTORY_SIZE (10000) with LRU eviction - core/event_bus.py: Add max_queue_size (100000) to prevent unbounded growth Medium priority optimizations: - benchmark/artifacts.py: Use fields() + getattr() instead of repeated asdict() - benchmark/artifacts.py: Use str.translate() for LaTeX escaping - layer/memory_block/ngram_cache.py: Extract tensor creation from loop body - ai/tools/deeplake/tool.py: Add _cleanup_deeplake_tool() for resource cleanup - ai/tools/codegraph/tool.py: Use list comprehension for row conversion
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…543tv8g3aojo-7jbM7 Fix performance anti-patterns across multiple modules
Major changes to the AI package architecture:
1. Development Workflow (OpenHands SDK):
- Created new development process using OpenHands SDK for contained
code modifications (ai/process/development/)
- OpenHandsWorkspace wraps the SDK for git operations and code editing
- DevelopmentProcess coordinates feature analysis, implementation, and review
2. Manifest Workflow:
- Created manifest process for AI-driven experiment management
(ai/process/manifest/)
- ManifestBuilder for constructing valid experiment configurations
- ResultsCollector for gathering and summarizing experiment outputs
- ManifestProcess enables propose → run → analyze research loops
3. Structured Outputs (output_schema):
- Added output_schema to verifier.yml for verdict responses
- Created specialized personas with output_schema:
- ideator_* (architect, catalyst, ml) for structured idea proposals
- file_selector for file selection
- feature_analyst for feature analysis
- code_reviewer for code review
- experiment_proposer for experiment proposals
- results_analyzer for results analysis
- Updated platform_improve/process.py to use Pydantic models
instead of manual JSON parsing
4. Process Registry:
- Updated experiment/runner.py to register development and manifest
processes alongside brainstorm
- Removed platform_improve from registry (replaced by development)
The three main workflows are now:
1. General discussion with persistence (existing brainstorm)
2. Development workflow using OpenHands SDK
3. Manifest workflow for building/running/collecting experiments
- Add openhands-sdk and openhands-tools to pyproject.toml dependencies - Add new packages to setuptools packages list - Add docker-compose services for new personas: - feature-analyst - code-reviewer - experiment-proposer - results-analyzer - Update root agent's sub_agents list to include new personas - Update root agent instructions to document new expert agents The system is now fully self-contained - just start the TUI and talk to the root agent.
- Fix _run_task stub to extract output from OpenHands conversation history - Add shlex.quote() for shell command safety to prevent injection - Use secure tempfile.TemporaryDirectory instead of hardcoded /tmp - Use exit codes for test success detection instead of string matching - Catch specific RuntimeError exceptions instead of broad Exception - Add platform_improve back to process map in runner.py - Add DevelopmentProcessConfig and ManifestProcessConfig to agents.py - Update DevelopmentProcess and ManifestProcess to use proper config types - Standardize files_to_touch to array type in ideator personas and models - Clean up temporary directory on workspace cleanup
Review AI package for machine learning assistance
|
Caution Review failedThe pull request is closed. 📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughIntroduces Llama 3.2 checkpoints, migrates event serialization from JSON to Cap'n Proto, adds memory autotuning with ResonantRouter, implements Constructive Compression Learning (CCL), adds Vast.ai orchestration and PyTorch Lightning engine support, and extends fp32 support across Metal optimizers. Removes best-effort terminology throughout codebase. Changes
Sequence Diagram(s)sequenceDiagram
participant Trainer as Trainer (standard.py)
participant Memory as MemoryBlockMemory
participant Tuner as UniversalMemoryTuner
participant Router as ResonantRouter
participant Telemetry as MemoryHealthTelemetry
Trainer->>Memory: forward(batch, ctx)
activate Memory
Memory->>Router: route(tag, collect_aux=True)
activate Router
Router->>Router: Iterative dynamics<br/>(steps, coupling, damping)
Router-->>Memory: Routing{logits, aux}
deactivate Router
Memory->>Memory: collect_health_telemetry()
activate Memory
Memory->>Telemetry: Build metrics<br/>(util, resonant, vsa)
Telemetry-->>Memory: MemoryHealthTelemetry
deactivate Memory
Memory->>Tuner: update(telemetry)
activate Tuner
Tuner->>Tuner: Update EMA signals<br/>Compute objective
Tuner->>Tuner: Explore parameters<br/>(if adaptive & post-warmup)
Tuner-->>Memory: metrics{coupling_mult, damping_mult, ...}
deactivate Tuner
Memory-->>Trainer: routing_output + telemetry
deactivate Memory
Trainer->>Trainer: Log telemetry & viz
sequenceDiagram
participant Runner as experiment/runner.py
participant VastAI as VastAIClient
participant Lightning as LightningEngine
participant Trainer as Trainer (standard/ccl)
participant Dataset as Dataset
Runner->>Runner: check_target_readiness()
alt compute.type == "vast_ai"
Runner->>VastAI: VastAIClient(api_key)
Runner->>VastAI: run_lifecycle(config)
activate VastAI
VastAI->>VastAI: find_best_offer()
VastAI->>VastAI: provision_instance()
VastAI->>VastAI: wait_for_ssh()
VastAI-->>Runner: ssh_string
deactivate VastAI
end
alt backend == "lightning"
Runner->>Lightning: LightningEngine()
Runner->>Lightning: run_experiment(manifest, target)
activate Lightning
Lightning->>Dataset: build_dataset()
Lightning->>Trainer: wrap in CarambaLightningModule
Lightning->>Lightning: Trainer.fit(model, dataloader)
Lightning-->>Runner: results
deactivate Lightning
else backend == "torch"
Runner->>Trainer: TorchEngine.run_experiment()
end
sequenceDiagram
participant EventBus as core/event_bus.py
participant TaskQueue as core/task_queue.py
participant Codec as CapnpEventCodec
participant Encoder as CapnpEventEncoder
EventBus->>EventBus: publish(event)
alt task_queue configured
EventBus->>EventBus: Construct Task<br/>(id, context_id, metadata)
EventBus->>TaskQueue: push(task)
else fallback
EventBus->>EventBus: publish_sync()
end
Note over EventBus,Encoder: Event Encoding Path
EventBus->>Encoder: encode(EventEnvelope)
activate Encoder
Encoder->>Encoder: Load Cap'n Proto schema
Encoder->>Encoder: Build message<br/>(fields + payload JSON)
Encoder->>Codec: Serialize to bytes
Encoder->>Encoder: Convert to uint8 tensor
Encoder-->>EventBus: Tensor(int64)
deactivate Encoder
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (117)
Comment |
Summary of ChangesHello @TheApeMachine, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request fundamentally transforms the platform's architecture by integrating a sophisticated multi-agent system, enabling complex task orchestration and collaborative AI workflows. Concurrently, it delivers substantial performance gains on Apple Silicon hardware through enhanced FP32 support for Metal kernels and introduces advanced, self-tuning memory mechanisms for neural networks. These improvements are complemented by an expanded suite of specialized MCP tools and optimized Docker deployments, collectively boosting the platform's scalability, efficiency, and agent-driven capabilities for ML experimentation. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a major overhaul and expansion of the AI agent system, transitioning to a more robust and scalable architecture based on Google's Agent Development Kit (ADK) and the Agent-to-Agent (A2A) protocol. Key changes include a new comprehensive CLAUDE.md documentation, a refactored ai module with explicit __all__ exports (though noted by the reviewer as a potential maintenance burden), and the introduction of Agent, AgentFactory, RootAgent, and LeadAgent classes for hierarchical agent orchestration. New components like ConnectionManager, PendingTaskManager, ADKAgentExecutor, StreamingExecutor, AgentServer, DatabaseSessionService, TaskQueue, and push notification mechanisms (InMemoryPushNotificationConfigStore, HttpPushNotificationSender) were added to support asynchronous, persistent, and distributed agent interactions. The .gitignore file was updated to include ADK.txt, with a reviewer suggestion to also ignore .vscode/. Several MCP tools (codegraph, deeplake, docling, filesystem, projectboard, webcrawl) were updated to use streamable-http transport and include TransportSecuritySettings, with performance optimizations noted by the reviewer in codegraph (batch deletes) and deeplake (efficient row data extraction). Additionally, new persona configurations (data_scientist, product_owner, researcher, tester, project_manager) were added, and existing ones (architect, catalyst, developer, mathematician, ml_expert, reviewer, verifier, writer, root) were updated to reflect the new A2A structure, including explicit URLs and model references. The ExperimentRunner was updated to support remote compute provisioning via Vast.ai, and memory block layers gained a 'resonant' router option with adaptive auto-tuning, including new telemetry and tuner components. Metal kernels for adamw, layernorm, and rope were updated to support float32 on MPS, and Triton kernels for DBA attention were refined with explicit type casting for masks.
I am having trouble creating individual review comments. Click here to see my feedback.
.vscode/c_cpp_properties.json (1-26)
This file contains user-specific absolute paths (e.g., /Users/theapemachine/...). Such IDE configuration files should not be checked into version control as they will cause issues for other developers on different machines. Please add .vscode/ to the .gitignore file to prevent this from being committed in the future.
ai/init.py (51-87)
This __all__ list is quite long and appears to be manually maintained. This can become a maintenance burden and a source of ImportError for users doing from caramba.ai import * if it gets out of sync with the actual imports in the module. While not a critical issue, consider if there's a way to automate this or if a more granular import strategy might be better for consumers of this package.
ai/connection.py (352)
The line self._pending_tasks.add(task.id) seems redundant. The PendingTaskManager is already responsible for tracking pending tasks via its register method. The _pending_tasks set on the RemoteAgent class itself doesn't appear to be used anywhere else. This looks like leftover code and could be removed to avoid confusion.
ai/tools/codegraph/tool.py (81-84)
Using a list comprehension here is a good optimization over appending to a list in a loop, especially for potentially large result sets. It's more concise and generally faster.
ai/tools/deeplake/tool.py (167-188)
Refactoring the data extraction from the DeepLake row to use a helper function _rowview_to_str_dict and .get() with defaults is a great improvement. It avoids the performance overhead of repeated try...except blocks inside the loop and makes the code more robust and readable.
codegraph/sync.py (111)
Switching from deleting nodes file-by-file in a loop to a single batch delete operation (_delete_files_batch) is an excellent performance optimization. This avoids the N+1 query problem and will be significantly faster when syncing multiple files.
core/event_codec/init.py (1-23)
The switch from JSON/binary codecs to a Cap'n Proto based codec is a significant performance optimization. Using a zero-copy serialization format like Cap'n Proto is an excellent choice for a high-throughput event processing system. This change should dramatically reduce serialization/deserialization overhead.
docker/Dockerfile.persona-a2a (1-131)
The optimization of this Dockerfile using a multi-stage build is a fantastic improvement. By separating dependency installation and copying only the necessary source files for the agent runtime, you've likely reduced the image size and build time by an order of magnitude. This significantly improves the development lifecycle, CI/CD performance, and resource consumption. Excellent work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 49
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
optimizer/metal/layernorm.py (1)
164-175: Critical: Non-gradient path still hardcodes fp16, breaking fp32 support.Lines 170 and 174 still explicitly cast weight and bias to
torch.float16, which is inconsistent with the gradient path (lines 72, 78) that usesx.dtype. This breaks fp32 support for inference scenarios where gradients are not required.🐛 Proposed fix to use x.dtype consistently
- w2 = weight.to(device=x.device, dtype=torch.float16).contiguous() + w2 = weight.to(device=x.device, dtype=x.dtype).contiguous() if bias is None: return ops.layernorm_weight(x2, w2, float(eps)) - b2 = bias.to(device=x.device, dtype=torch.float16).contiguous() + b2 = bias.to(device=x.device, dtype=x.dtype).contiguous() return ops.layernorm(x2, w2, b2, float(eps))optimizer/metal/lion.py (1)
21-38:lion_fp16name no longer matches behavior; also address Ruff TRY003 if CI enforces it.
Line 33-34: dtype broadening is correct, but consider adding a clearer alias (e.g.,lion_step/lion) or docstring note thatlion_fp16supports fp16/fp32. Also, if Ruff TRY003 is enforced, suppress or refactor the long inline exception message.Proposed minimal adjustments
def lion_fp16( @@ ) -> Tensor: + """Run one fused Lion step on Metal (supports fp16/fp32; p/grad/m dtypes must match).""" @@ - if p.dtype not in (torch.float16, torch.float32) or grad.dtype != p.dtype or m.dtype != p.dtype: - raise RuntimeError("Metal Lion currently supports fp16/fp32 tensors only (matching)") + if p.dtype not in (torch.float16, torch.float32) or grad.dtype != p.dtype or m.dtype != p.dtype: + raise RuntimeError("Metal Lion currently supports fp16/fp32 tensors only (matching)") # noqa: TRY003optimizer/metal/rope.py (1)
38-52: Dtype propagation is coherent; TRY003 linting will flag inline exception messages.Lines 44–45 and 66–67 correctly cast cos/sin to
x.dtypeand aligngrad_outto saved dtype, making the fp32 path consistent end-to-end. However, with TRY003 enabled inruff.toml, the sixRuntimeErrorraises (lines 39, 41, 59, 61, 91, 93) with inline messages will trigger linting warnings. Address by adding# noqa: TRY003to each raise statement, or by centralizing repeated messages as module-level constants.optimizer/metal/rmsnorm.py (1)
85-118: Function namermsnorm_fp16is now misleading.The function handles both FP16 and FP32, but the name suggests FP16-only. Consider renaming to
rmsnorm_metalorrmsnorm_mpsfor clarity.optimizer/metal/ops.mm (1)
2016-2041: PYBIND docstrings are outdated - still claim fp16-only.The module docstrings reference "fp16" but all operations now support FP32. Consider updating to "(Metal/MPS, fp16/fp32)" for accuracy.
Example fix for a few entries
- m.def("rmsnorm", &rmsnorm, "RMSNorm Forward (Metal/MPS, fp16)"); + m.def("rmsnorm", &rmsnorm, "RMSNorm Forward (Metal/MPS, fp16/fp32)"); - m.def("rope", &rope, "RoPE Apply (Metal/MPS, fp16)"); + m.def("rope", &rope, "RoPE Apply (Metal/MPS, fp16/fp32)"); - m.def("lion_step", &lion_step, "Lion step update (Metal/MPS, fp16)"); + m.def("lion_step", &lion_step, "Lion step update (Metal/MPS, fp16/fp32)"); - m.def("adamw_master_step", &adamw_master_step, "AdamW master step update (Metal/MPS, fp16 params + fp32 state)"); + m.def("adamw_master_step", &adamw_master_step, "AdamW master step update (Metal/MPS, fp16/fp32 params + fp32 state)");optimizer/adamw_master.py (1)
109-120: MPS fused path has critical dtype mismatch with Metal kernel.The Metal kernel at
ops.mm:1707enforcesgrad.dtype() == p.dtype(), but the fused condition inadamw_master.pyallows FP32 parameters (line 111) while requiring FP16 gradients (line 113). This creates an incompatibility: whenp.dtype == torch.float32, the kernel expectsgrad.dtype == torch.float32, but the condition enforcesgrad.dtype == torch.float16, causing a runtime assertion failure.Either expand the grad dtype check to allow both FP16 and FP32 to match the parameter dtype, or restrict the parameter dtype to FP16 only to match the gradient requirement.
🤖 Fix all issues with AI agents
In @.vscode/c_cpp_properties.json:
- Around line 1-26: The .vscode/c_cpp_properties.json currently contains
hardcoded absolute include paths (the entries in the "includePath" array
pointing to /Users/theapemachine/...) which breaks portability; update those
specific includePath entries to use workspace-relative variables (e.g.
${workspaceFolder}/...) or env vars (e.g. ${env:PYENV} or
${workspaceFolder}/.venv/...) so all contributors resolve headers locally, and
either add .vscode/c_cpp_properties.json to .gitignore or add a template file
(c_cpp_properties.json.example) for local customization; target the
"includePath" array and the file-level config in .vscode/c_cpp_properties.json
when making the change.
In @artifacts/checkpoints/original/params.json:
- Around line 1-12: Remove the unused "use_scaled_rope" key from params.json and
add a trailing newline at EOF; the RoPE scaling is already handled via the
rope_scaling dict (see config/layer.py) and consumed by layer/rope.py and
attention implementations, so deleting use_scaled_rope avoids confusion and the
newline ensures standard EOF formatting.
In @artifacts/checkpoints/tokenizer_config.json:
- Around line 1-2061: Repo artifacts are bloating the repository because large
tokenizer files (tokenizer.json ~8.7M and tokenizer.model ~2.1M) are committed
under artifacts/checkpoints alongside tokenizer_config.json; remove or offload
them to avoid heavy PRs. Fix by: (1) remove large binary tokenizer files from
the commit and add them to Git LFS (or delete and re-add via git rm --cached) so
tokenizer.json and tokenizer.model are tracked via LFS; (2) update
documentation/README to state that artifacts are fetched at runtime from the
HuggingFace Hub (use AutoTokenizer.from_pretrained or transformers'
from_pretrained for the model repo) and include a fallback local path for
tokenizer_config.json if you must keep a small config; (3) ensure CI/deployment
pulls tokenizer artifacts at build/runtime (reference files tokenizer.json,
tokenizer.model and tokenizer_config.json in artifacts/checkpoints and functions
that load them) so the repo contains only the lightweight config while large
blobs live in LFS or are downloaded on demand.
In @artifacts/checkpoints/USE_POLICY.md:
- Around line 45-52: Replace the redirect/incorrect hrefs in the policy links so
the visible link text points directly to the intended secure URLs: update the
"Reporting issues with the model" link to use the direct HTTPS URL
https://github.com/meta-llama/llama-models/issues instead of the l.workplace.com
redirect, and change the other two links to HTTPS (use
https://developers.facebook.com/llama_output_feedback and
https://facebook.com/whitehat/info) while leaving the LlamaUseReport@meta.com
contact as-is; ensure the markdown link targets (the hrefs) are the direct
secure URLs rather than redirects.
- Around line 1-43: Convert the initial bold title into a proper Markdown
heading (e.g., use a single-level heading like "# Llama 3.2 Acceptable Use
Policy") and ensure a blank line follows it; normalize ordered-list markers so
nested and top-level numbered lists use a consistent prefix style (prefer using
"1." for every ordered item per MD029 or sequential numbers consistently), fix
nested indentation for sublists so they render as distinct levels (use two or
four spaces per indent), remove all trailing spaces and hard tabs (address
MD009), and ensure there is a blank line before/after block-level elements
(lists/headings) to satisfy MD041/MD036/MD029/MD009 checks.
In @config/compute_test.py:
- Around line 31-33: The test constructs VastAIComputeConfig without the
required field gpu_name so the ValidationError will be raised for missing
gpu_name instead of negative max_price_per_hr; update the test to pass a valid
gpu_name (e.g., "A100") when calling VastAIComputeConfig(max_price_per_hr=-1.0,
gpu_name="A100") so the validation failure is due to the negative price as
intended.
- Around line 51-54: Add an explicit runtime type check for target.compute so
static analyzers know its concrete type: after creating target via
ExperimentTargetConfig.model_validate(data), assert isinstance(target.compute,
VastAIComputeConfig) (or use typing.cast) before accessing
target.compute.gpu_name; reference ExperimentTargetConfig, target.compute, and
VastAIComputeConfig to locate where to add this check.
In @config/compute.py:
- Line 43: The default Docker image string assigned to image (image: str =
"pytorch/pytorch:2.2.1-cuda12.1-cudnn8-devel") is hardcoded and may become
stale; make this configurable or document it by replacing the literal with a
configurable value (e.g., read from an env var or config setting) and/or add a
comment next to the image variable explaining how to override it and
recommending updating to a current PyTorch+CUDA tag (mentioning current project
uses PyTorch 2.9.1), so future maintainers know to pin or update the image.
- Around line 15-21: LocalComputeConfig currently inherits from BaseModel which
bypasses the project's Config behavior (notably extra="forbid"); change the base
class to the project's Config (imported from caramba.config as Config) so
LocalComputeConfig: class LocalComputeConfig(Config): ... and do the same for
the other compute-related classes in this file (the block spanning lines 23-46)
to ensure YAML typo validation is enforced; update imports to remove BaseModel
and import Config from caramba.config.
In @config/defaults.py:
- Around line 42-46: DefaultsCompute defines vast_ai_api_key with an alias
"VAST_AI_API_KEY" but Pydantic v2 won't accept that alias as a keyword when
instantiating the model unless you enable population by alias/name; update the
DefaultsCompute model to include a model_config that enables populate_by_name
(and/or populate_by_alias if you prefer alias-first population) so instantiating
with DefaultsCompute(VAST_AI_API_KEY="...") or loading from dict/YAML using that
alias works as intended.
In @config/layer.py:
- Around line 489-490: The mem_autotune field currently typed as str allows any
value; change its annotation to a Literal of the allowed values (Literal["off",
"monitor", "adaptive"]) and import Literal from typing (or typing_extensions for
older Python) so Pydantic enforces and validates allowed values; keep the
default "off" and, if present, adjust or remove any custom validators that
duplicate this check to rely on Pydantic's validation instead.
In @config/presets/mosaic_resonant.yml:
- Around line 26-27: The vars define mem_buckets and mem_hashes but the layer
blocks still use hardcoded values, causing inconsistency; update the layer
configuration entries that currently set mem_buckets and mem_hashes (also the
occurrences around lines 75-77) to reference the variables (${mem_buckets} and
${mem_hashes}) or delete the unused vars block—pick one approach and apply it
consistently across all layer entries so mem_buckets and mem_hashes are not
defined in two places.
In @core/event_codec/capnp_codec_test.py:
- Around line 67-70: Rename the unused loop index `i` to `_` in the for-loop
that iterates over zipped pairs of `events` and `decoded`, and call zip with
`strict=True` so the loop becomes: iterate over zip(events, decoded,
strict=True) to enforce equal lengths; keep the existing assertions that compare
`dec.type`, `dec.sender`, and `dec.payload` against `orig` unchanged.
In @core/event_codec/capnp_codec.py:
- Around line 20-27: Add return type annotations for the private helper
functions to satisfy the static analyzer: annotate _load_schema() and
_get_schema() with an appropriate return type (use typing.Any if pycapnp types
are unavailable), and add an import for Any from typing. Ensure the function
signatures include the return annotation (e.g., def _load_schema() -> Any:) and
keep behavior unchanged.
- Around line 119-120: The current tensor-to-bytes conversion creates a Python
list and iterates over it (vals =
ids.detach().cpu().to(dtype=torch.int64).tolist(); raw = bytes(...)), which is
slow; change it to a zero-copy path by converting the tensor to uint8 on CPU and
using numpy().tobytes() (e.g. replace the to(dtype=torch.int64).tolist() +
bytes(...) logic with
ids.detach().cpu().to(dtype=torch.uint8).numpy().tobytes()), ensuring you
preserve the intended byte masking and avoid intermediate Python-level
iteration.
- Line 73: The tensor created from the byte buffer is using dtype=torch.long and
should use dtype=torch.uint8 to store bytes efficiently: change the return in
the function that does "return torch.tensor(list(buf), dtype=torch.long)" to use
torch.uint8; also update the corresponding decoder function (the decoder at the
review note) and the encode_padded implementation (encode_padded) to produce and
consume torch.uint8 tensors, and add explicit casting to larger integer types
only where arithmetic expects int64 so behavior doesn’t change.
In @examples/lightning_vastai_demo.yml:
- Around line 4-6: Replace the literal placeholder value for VAST_AI_API_KEY
under the defaults -> compute section with an environment-variable reference or
clear comment: use a substitution token like ${VAST_AI_API_KEY} (if the config
loader supports it) or add a comment next to VAST_AI_API_KEY stating this is an
example placeholder that must be replaced with a real key via environment
variables; update the defaults/compute entry for VAST_AI_API_KEY accordingly so
static scanners won't flag a hardcoded secret.
In @experiment/runner.py:
- Line 143: The logged message at the logger.info call inside
experiment/runner.py is using an unnecessary f-string; remove the f prefix so
the call uses a plain string literal (change logger.info(f"Instance ready.
Remote run not fully implemented, continuing locally for demonstration.") to
logger.info("Instance ready. Remote run not fully implemented, continuing
locally for demonstration.")) to avoid misleading usage of f-strings where no
interpolation occurs.
- Around line 131-136: Accessing self.manifest.defaults.compute can raise
AttributeError when defaults or compute are missing; before constructing
VastAIClient use a defensive lookup (e.g., check hasattr(self.manifest,
"defaults") and hasattr(self.manifest.defaults, "compute") or use
getattr(self.manifest, "defaults", None) and getattr(..., "compute", None)) and
pass a None-safe api_key to VastAIClient
(getattr(self.manifest.defaults.compute, "vast_ai_api_key", None)) or handle the
missing defaults.compute case by raising a clear RuntimeError; update the block
that builds VastAIClient and reads the api_key so it won’t assume
self.manifest.defaults.compute exists.
In @infer/event_runtime.py:
- Around line 54-56: The delimiter=0 field and comment are wrong for Cap'n Proto
framing and risk collisions because Cap'n Proto payloads can contain 0 bytes;
change the framing: either replace delimiter default with a safe unused byte
(e.g., delimiter: int = 255) and update encode_with_delimiter() and the decoder
logic in EventEncoder/EventDecoder to use that byte, or implement proper
length-prefixed framing (write a 4-byte length before each encoded message and
update decode logic to read length then that many bytes) so the decoder can
unambiguously parse messages; also remove or correct the misleading comment
about Cap'n Proto using 0-byte delimiters.
In @layer/attention/standard/layer.py:
- Around line 100-101: The file references torch (e.g., torch.float32) in the
condition near the SDPA fallback but never imports the torch module; add an
import for the torch package at the top of the file (alongside existing imports
like "from torch import Tensor, nn" and "import torch.nn.functional as F") so
symbols like torch.float32 and torch.device are defined and the NameError is
resolved.
In @layer/memory_block/memory/memory.py:
- Around line 510-514: The method collect_health_telemetry currently performs a
local import of MemoryHealthTelemetry, ResonantSettlingMetrics and
VsaNoveltyMetrics on every call, causing unnecessary overhead; move the import
statement for MemoryHealthTelemetry, ResonantSettlingMetrics, VsaNoveltyMetrics
to the module top-level with the other imports and remove the inline import
inside collect_health_telemetry so the function simply references those classes
directly.
- Around line 524-531: ResonantSettlingMetrics is being populated with
resonant_convergence_steps from routing but ResonantRouter.route() never sets
"resonant_convergence_steps", so it always defaults to 0; either have
ResonantRouter.route() compute and emit convergence steps into its aux/routing
dict (e.g., track when energy history converges and set
"resonant_convergence_steps") or remove the field usage here and stop reading
"resonant_convergence_steps" in memory.py; locate ResonantRouter.route and the
place where routing/aux is returned to add the convergence counter, or adjust
the construction of ResonantSettlingMetrics to omit/respectably default that
metric.
In @layer/memory_block/memory/routing.py:
- Around line 264-265: The variable energy_history is being collected in the
settling loop but never returned; update the code so the auxiliary output
includes it: after the settling loop add energy_history to the aux dict (e.g.,
aux['energy_history'] = energy_history) or, if you decide not to expose it,
remove the energy_history list and any pushes to it to avoid wasted work; locate
the energy_history variable and the aux dict in the settling function in
routing.py and either attach the list to aux before returning or delete all
energy_history usage.
- Around line 228-232: After applying tuner adjustments in the block that
multiplies coupling/damping and adds resonant_steps_delta to steps (the code
using self.tuner and getattr(self.tuner, "resonant_steps_delta", 0)), clamp or
validate the resulting steps to be at least 1 (or another safe minimum) before
the settling loop runs; e.g., compute steps = max(1, steps + getattr(self.tuner,
"resonant_steps_delta", 0)) and optionally emit a warning via the existing
logger if the clamp was applied so callers know the tuner produced a
non-positive delta.
- Around line 277-284: The energy metric is computed after you normalize x and y
(so x**2 + y**2 ≈ 1), yielding constant values; move the energy computation to
before the normalization (compute energy = (x**2 + y**2).mean().item() prior to
computing mag and dividing x,y) or replace it with a convergence metric (e.g.,
mean squared change between iterations: ((x - x_prev)**2 + (y -
y_prev)**2).mean().item()) and append that to energy_history within the existing
collect_aux and s_idx % 5 check in the same block where x and y are updated.
- Around line 211-213: The _check_nan method currently prints a message via
print, which should be replaced with structured logging: add or use a
module/class logger (e.g., logger = logging.getLogger(__name__) or a class
attribute on ResonantRouter) and replace the print call with logger.error (or
logger.warning) including the name and useful tensor diagnostics (shape, dtype,
mean/std or a sample) so NaN/Inf events are recorded with severity and visible
in production logs; keep the function behavior otherwise (do not swallow
exceptions) and ensure the logging import is added if missing.
In @layer/memory_block/memory/telemetry.py:
- Around line 55-86: The to_dict method currently omits nested metrics
(resonant.energy_drop, resonant.state_drift, vsa.tag_collision_rate,
rmf.delta_rms); add an optional parameter to_dict(self, verbose: bool = False)
and when verbose is True include these extra fields under the same flattened
keys (e.g. "memory/resonant/energy_drop", "memory/resonant/state_drift",
"memory/vsa/tag_collision_rate", "memory/rmf/delta_rms"); keep the existing
compact output when verbose is False, update the docstring to document the new
parameter, and ensure you access the same attributes on
self.resonant/self.vsa/self.rmf only after the corresponding truthy checks
already present.
In @layer/memory_block/memory/tuner_test.py:
- Around line 10-61: Add a complementary unit test that verifies high
utilization (oversaturation) causes the tuner to increase thresholds rather than
decrease: inside TestUniversalMemoryTuner add a method (e.g.,
test_memory_oversaturation_increases_thresholds) that constructs a
UniversalMemoryTuner in "adaptive" mode with a known target_utilization (0.5)
and max_delta_per_step (e.g., 0.01), supply a MemoryHealthTelemetry with
utilization > target (e.g., 0.9), call tuner.update(tel) and assert that
target_write_threshold and write_threshold_mult have increased
(write_threshold_mult == 1.0 + max_delta_per_step after first update when
appropriate) and that repeated updates move the multiplier further up but still
respect gradual increments; reference UniversalMemoryTuner,
MemoryHealthTelemetry, target_write_threshold, write_threshold_mult, and
max_delta_per_step to locate the relevant logic.
In @layer/memory_block/memory/tuner.py:
- Line 170: The setattr call in MemoryTuner (look for setattr(self, actual_name,
int(round(final_val)))) uses an unnecessary outer int() because round(final_val)
already returns an int in Python 3; remove the redundant int() and set the
attribute with round(final_val) directly to improve clarity.
In @layer/memory_block/memory/writer.py:
- Line 51: The field tuner is typed as Any which loses type safety; define a
Protocol (e.g., MemoryTuner) that declares the expected attributes
write_threshold_mult and vsa_novelty_mult and update the tuner annotation to use
this Protocol instead of Any so the writer code (references to tuner,
write_threshold_mult, vsa_novelty_mult) gets static type checks and clearer
documentation.
In @optimizer/metal/adamw.py:
- Around line 37-38: The Metal AdamW wrapper enforces p.dtype == grad.dtype, but
adamw_master.py's MPS fused-path only checks g.dtype == torch.float16 causing a
mismatch for fp32 params; update the fused-path condition in adamw_master.py
(the block that currently tests g.dtype == torch.float16) to require g.dtype ==
p.dtype (or equivalently allow both torch.float16 and torch.float32 by checking
g.dtype in (torch.float16, torch.float32) and g.dtype == p.dtype) so the fused
path is only taken when gradient and parameter dtypes match the wrapper's
expectation.
In @optimizer/metal/layernorm.metal:
- Around line 169-170: The threadgroup arrays ts[8] and tq[8] in layernorm.metal
are hardcoded to NSIMD==8 and will break if TG/SIMD constants change; update
layernorm_impl usage by either documenting the dependency with a clear inline
comment referencing NSIMD/TG/SIMD and layernorm_impl<...>, or add a compile-time
check (static_assert-like macro) that validates the array size equals NSIMD (and
mirror the same change for the other occurrence at the region around lines
250-251); ensure the assertion/comment references the symbols ts, tq, NSIMD,
SIMD/TG, and layernorm_impl so future maintainers see the coupling.
- Around line 322-433: The two backward kernels layernorm_bwd_x_impl and
layernorm_bwd_x_noweight_impl are nearly identical; replace them with a single
templated kernel (e.g., template<typename T, bool HAS_WEIGHT>) that takes the
same arguments but conditionally applies weight only when HAS_WEIGHT is true
(use weight parameter when HAS_WEIGHT, otherwise ignore it), keep all reductions
and shared memory logic identical, remove layernorm_bwd_x_noweight_impl, and
update any callers to instantiate the unified function with HAS_WEIGHT=true or
false as appropriate.
In @optimizer/metal/layernorm.py:
- Around line 129-130: The function name layernorm_fp16 is misleading because
the implementation supports both fp16 and fp32; rename the function to
layernorm_metal (or add a compatibility alias layernorm_fp16 = layernorm_metal)
and update any references/exports/tests to use the new name; make sure the
docstring and any public API lists (e.g., __all__ or module exports) reflect the
new name to avoid breaking callers.
In @optimizer/metal/ops.mm:
- Around line 1706-1707: The C++ check in optimizer/metal/ops.mm enforces p and
grad have matching FP16/FP32 dtypes, but adamw_master.py currently hardcodes
g.dtype == torch.float16; update adamw_master.py to compare grad.dtype against
p.dtype (or assert grad.dtype == p.dtype) instead of hardcoding float16, and
adjust any related casting/initialization logic in the adamw_master function so
it works for both torch.float16 and torch.float32 and matches the C++
adamw_master_step constraint.
In @orchestrator/compute/vast_ai.py:
- Around line 47-57: The wait_for_ssh function currently ignores the timeout and
contains a mock sleep; add a TODO at the top of wait_for_ssh noting that the
timeout parameter must be enforced when wiring to the real Vast.ai API, and
replace the mock with polling logic that queries
vastai.get_instance(instance_id) until the instance status is 'running' and an
IP is available, aborting (return None or raise) when the timeout elapses;
reference the wait_for_ssh function and the timeout parameter so future
implementers know to use them and return a proper SSH string only when ready.
- Around line 67-75: run_lifecycle currently provisions an instance
(provision_instance) then waits for SSH (wait_for_ssh) but never decommissions
it; change run_lifecycle to ensure decommission_instance is called on the
provisioned instance when finished or on error by implementing a
context-manager/RAII pattern (e.g., return a context manager object or generator
that yields the connection info and calls decommission_instance(instance_id) in
its __exit__/finally block) and ensure all code paths (success and exception)
invoke decommission_instance for the instance_id returned by provision_instance.
In @pyproject.toml:
- Around line 70-71: Update the dependency entry for "asyncpg" in pyproject.toml
to pin it to a specific stable version (e.g., change the plain "asyncpg" entry
to a fixed version such as "asyncpg==0.31.0" or a bounded spec like
"asyncpg>=0.31.0,<0.32.0") so builds are reproducible; locate the "asyncpg" line
in the dependency list and replace it with the chosen version constraint.
In @research/ccp/notes copy.md:
- Line 25: Remove the trailing whitespace characters on the two affected
Markdown lines: delete the extra space after "modify:" (the text "Files to
modify:") and after "# Gate operations based on opcode" to satisfy MD009; you
can enable trim-trailing-whitespace in your editor or run a markdown
linter/formatter to strip trailing spaces automatically and re-run the linter to
confirm the fixes.
- Line 5: Replace the bolded section title "**What exists:**" with a proper
Markdown heading (e.g., "## What exists") to satisfy MD036; also apply the same
treatment to similar titles such as changing "**What's missing:**" to "## What's
Missing" while leaving comments/sections like "Implementation skeleton"
unchanged.
- Line 1: Add a top-level Markdown heading as the first line by turning the
existing first-line text into a level-1 heading: prepend "# " to "Complete
MOSAIC Differentiable VM Architecture" so the document begins with "# Complete
MOSAIC Differentiable VM Architecture" (ensure there is a single blank line
after the heading if needed).
- Around line 20-25: The ordered list under the "Phase 1: Wire Opcodes to
Control Behavior" heading has inconsistent numbering (lines currently show 2–7)
and triggers MD029; fix it by either converting the items to an unordered list
(using "-" or "*") or renumbering the list to start at 1 and be sequential;
update the list items referenced in that section so they are consistently
formatted and valid Markdown (e.g., restart numbering at 1 or replace the
numeric prefixes with "-" for each item).
- Around line 1-10: This file research/ccp/notes copy.md is unrelated to the
"Feature/fp32 support for metal" PR and should be either removed or explicitly
linked to the PR as design/reference material; either delete the file from the
branch, move it into a dedicated documentation/design PR, or add a clear note in
the commit message and at the top of the file stating it is reference material
for this PR (and rename to remove "copy" from the filename) so reviewers
understand its intent.
In @runtime/engine/lightning_engine.py:
- Around line 103-109: Replace the invalid devices=0 logic in the L.Trainer call
so CPU runs use a positive device count; specifically update the devices
argument in the L.Trainer(...) call (where trainer is created) to use a valid
value such as devices=1 when train_cfg.device == "cpu" and otherwise "auto"
(e.g., devices=1 if train_cfg.device == "cpu" else "auto"), leaving accelerator
and precision logic (train_cfg.use_amp) unchanged.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (4)
artifacts/mosaic/paper.pdfis excluded by!**/*.pdfresearch/ccp/Cognitive Control Plane.pdfis excluded by!**/*.pdfresearch/ccp/Cognitive_Control_Plane_MOSAIC_Runtime.pdfis excluded by!**/*.pdfuv.lockis excluded by!**/*.lock
📒 Files selected for processing (61)
.vscode/c_cpp_properties.jsonartifacts/checkpoints/.gitattributesartifacts/checkpoints/LICENSE.txtartifacts/checkpoints/README.mdartifacts/checkpoints/USE_POLICY.mdartifacts/checkpoints/config.jsonartifacts/checkpoints/generation_config.jsonartifacts/checkpoints/original/params.jsonartifacts/checkpoints/original/tokenizer.modelartifacts/checkpoints/special_tokens_map.jsonartifacts/checkpoints/tokenizer.jsonartifacts/checkpoints/tokenizer_config.jsonconfig/compute.pyconfig/compute_test.pyconfig/defaults.pyconfig/layer.pyconfig/presets/mosaic_resonant.ymlconfig/target.pycore/__init__.pycore/event_codec/__init__.pycore/event_codec/binary_codec.pycore/event_codec/binary_codec_test.pycore/event_codec/capnp_codec.pycore/event_codec/capnp_codec_test.pycore/event_codec/event.capnpcore/event_codec/json_codec.pycore/event_codec/stream_parser.pyexamples/lightning_vastai_demo.ymlexperiment/runner.pyinfer/event_runtime.pyinstrumentation/viz.pylayer/attention/standard/layer.pylayer/memory_block/memory/memory.pylayer/memory_block/memory/memory_autotune_test.pylayer/memory_block/memory/reader.pylayer/memory_block/memory/routing.pylayer/memory_block/memory/telemetry.pylayer/memory_block/memory/tuner.pylayer/memory_block/memory/tuner_test.pylayer/memory_block/memory/vsa.pylayer/memory_block/memory/writer.pyoptimizer/adamw_master.pyoptimizer/kernels.pyoptimizer/metal/adamw.metaloptimizer/metal/adamw.pyoptimizer/metal/layernorm.metaloptimizer/metal/layernorm.pyoptimizer/metal/lion.metaloptimizer/metal/lion.pyoptimizer/metal/ops.mmoptimizer/metal/rmsnorm.metaloptimizer/metal/rmsnorm.pyoptimizer/metal/rope.metaloptimizer/metal/rope.pyorchestrator/compute/vast_ai.pypyproject.tomlresearch/ccp/notes copy.mdresearch/ccp/paper.texresearch/ccp/references.bibruntime/engine/lightning_engine.pytrainer/standard.py
💤 Files with no reviewable changes (4)
- core/event_codec/stream_parser.py
- core/event_codec/binary_codec.py
- core/event_codec/binary_codec_test.py
- core/event_codec/json_codec.py
🧰 Additional context used
🧬 Code graph analysis (13)
config/compute_test.py (2)
config/compute.py (1)
LocalComputeConfig(15-20)config/target.py (1)
ExperimentTargetConfig(21-77)
config/target.py (1)
config/compute.py (1)
LocalComputeConfig(15-20)
orchestrator/compute/vast_ai.py (2)
config/compute.py (1)
VastAIComputeConfig(23-45)console/logger.py (4)
warning(77-79)info(69-71)success(73-75)error(81-83)
optimizer/metal/layernorm.py (4)
trainer/distributed.py (1)
device(206-208)optimizer/metal/rmsnorm.py (1)
save_for_backward(20-20)optimizer/metal/rope.py (1)
save_for_backward(20-20)optimizer/metal/ssm.py (1)
save_for_backward(19-19)
optimizer/metal/rmsnorm.py (3)
model/generic_system.py (1)
to(67-69)topology/graph.py (1)
to(157-159)trainer/distributed.py (1)
device(206-208)
layer/attention/standard/layer.py (1)
trainer/distributed.py (1)
device(206-208)
layer/memory_block/memory/tuner.py (1)
layer/memory_block/memory/telemetry.py (1)
MemoryHealthTelemetry(41-86)
trainer/standard.py (1)
trainer/distributed.py (1)
device(206-208)
core/event_codec/__init__.py (1)
core/event_codec/capnp_codec.py (2)
CapnpEventDecoder(102-164)CapnpEventEncoder(41-99)
layer/memory_block/memory/writer.py (1)
orchestrator/telemetry.py (1)
threshold(174-176)
core/event_codec/capnp_codec.py (1)
core/event.py (1)
EventEnvelope(18-142)
layer/memory_block/memory/memory.py (2)
layer/memory_block/memory/routing.py (4)
ResonantRouter(179-319)route(53-67)route(117-130)route(215-319)layer/memory_block/memory/telemetry.py (4)
to_dict(55-86)MemoryHealthTelemetry(41-86)ResonantSettlingMetrics(13-19)VsaNoveltyMetrics(23-28)
layer/memory_block/memory/telemetry.py (1)
layer/memory_block/memory/tuner.py (1)
update(72-106)
🪛 Checkov (3.2.334)
artifacts/checkpoints/tokenizer_config.json
[low] 2060-2061: Base64 High Entropy String
(CKV_SECRET_6)
examples/lightning_vastai_demo.yml
[low] 6-7: Base64 High Entropy String
(CKV_SECRET_6)
🪛 LanguageTool
artifacts/checkpoints/USE_POLICY.md
[grammar] ~41-~41: Use a hyphen to join words.
Context: ...of your AI system 5. Interact with third party tools, models, or software designe...
(QB_NEW_EN_HYPHEN)
[style] ~42-~42: ‘With respect to’ might be wordy. Consider a shorter alternative.
Context: ... are associated with Meta or Llama 3.2 With respect to any multimodal models included in Llama...
(EN_WORDINESS_PREMIUM_WITH_RESPECT_TO)
artifacts/checkpoints/LICENSE.txt
[style] ~26-~26: This phrase is redundant. Consider using “outside”.
Context: ...eta Platforms, Inc. (if you are located outside of the EEA or Switzerland). By clicking...
(OUTSIDE_OF)
[grammar] ~46-~46: Use a hyphen to join words.
Context: ...e Llama Materials to create, train, fine tune, or otherwise improve an AI model, ...
(QB_NEW_EN_HYPHEN)
[grammar] ~51-~51: Use a hyphen to join words.
Context: ... a Licensee as part of an integrated end user product, then Section 2 of this Agr...
(QB_NEW_EN_HYPHEN)
[style] ~64-~64: This phrasing could be wordy, so try replacing it with something more concise.
Context: ...or Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the...
(MORE_THAN_EXCEEDS)
[style] ~79-~79: Consider simply using “of” instead.
Context: ...ES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Propert...
(OF_ANY_OF)
[style] ~91-~91: ‘with respect to’ might be wordy. Consider a shorter alternative.
Context: ...ls and derivatives made by or for Meta, with respect to any derivative works and modificati...
(EN_WORDINESS_PREMIUM_WITH_RESPECT_TO)
[style] ~97-~97: Consider simply using “of” instead.
Context: ... outputs or results, or any portion of any of the foregoing, constitutes infringement...
(OF_ANY_OF)
artifacts/checkpoints/README.md
[grammar] ~224-~224: Use a hyphen to join words.
Context: ...t). The Llama 3.2 instruction-tuned text only models are optimized for multilingu...
(QB_NEW_EN_HYPHEN)
[grammar] ~224-~224: Use a hyphen to join words.
Context: ...ey outperform many of the available open source and closed chat models on common ...
(QB_NEW_EN_HYPHEN)
[grammar] ~251-~251: Use a hyphen to join words.
Context: ...h use in multiple languages. Instruction tuned text only models are intended for ...
(QB_NEW_EN_HYPHEN)
[grammar] ~251-~251: Use a hyphen to join words.
Context: ...ltiple languages. Instruction tuned text only models are intended for assistant-l...
(QB_NEW_EN_HYPHEN)
[grammar] ~251-~251: Use a hyphen to join words.
Context: ...e retrieval and summarization, mobile AI powered writing assistants and query and...
(QB_NEW_EN_HYPHEN)
[style] ~251-~251: Consider using a more concise synonym.
Context: ...g. Pretrained models can be adapted for a variety of additional natural language generation ...
(A_VARIETY_OF)
[style] ~251-~251: Consider using a more concise synonym.
Context: ...ly, quantized models can be adapted for a variety of on-device use-cases with limited comput...
(A_VARIETY_OF)
[grammar] ~293-~293: Use a hyphen to join words.
Context: ...custom training libraries, Meta's custom built GPU cluster, and production infras...
(QB_NEW_EN_HYPHEN)
[style] ~297-~297: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...adjusted for power usage efficiency. Training Greenhouse Gas Emissions: Estimated t...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~323-~323: ‘taking into account’ might be wordy. Consider a shorter alternative.
Context: ... framework and Arm CPU backend in mind, taking into account metrics including model quality, prefil...
(EN_WORDINESS_PREMIUM_TAKING_INTO_ACCOUNT)
[grammar] ~331-~331: Use a hyphen to join words.
Context: ... stages, using the same data as the full precision models. To initialize QAT, we ...
(QB_NEW_EN_HYPHEN)
[uncategorized] ~331-~331: Do not mix variants of the same word (‘adapter’ and ‘adaptor’) within a single text.
Context: ...3) (i.e., quantization followed by LoRA adapters), we refer this method as QLoRA. Finall...
(EN_WORD_COHERENCY)
[grammar] ~353-~353: Use a hyphen to join words.
Context: ...0 | em | 96.8 | 1 | 1 | ### Instruction Tuned Models | Capability | | Benchmar...
(QB_NEW_EN_HYPHEN)
[style] ~424-~424: Consider using a more concise synonym.
Context: ...ional technology designed to be used in a variety of use cases. Examples on how Meta’s Llama...
(A_VARIETY_OF)
[grammar] ~456-~456: Use a hyphen to join words.
Context: ...nd appropriate risk mitigations via fine tuning. We leveraged those expert red te...
(QB_NEW_EN_HYPHEN)
[grammar] ~456-~456: Use a hyphen to join words.
Context: ...ed new in-depth sessions using objective based methodologies to assess the model ...
(QB_NEW_EN_HYPHEN)
[grammar] ~456-~456: Use a hyphen to join words.
Context: ...g content while taking account of market specific nuances or experiences. **3. ...
(QB_NEW_EN_HYPHEN)
[grammar] ~463-~463: Use a hyphen to join words.
Context: ...uations. Our Purple Llama tools are open sourced for the community to use and wid...
(QB_NEW_EN_HYPHEN)
[uncategorized] ~463-~463: The official name of this software platform is spelled with a capital “H”.
Context: ...courage community contributions to our [Github repository](https://github.com/meta-lla...
(GITHUB)
🪛 markdownlint-cli2 (0.18.1)
artifacts/checkpoints/USE_POLICY.md
1-1: First line in a file should be a top-level heading
(MD041, first-line-heading, first-line-h1)
5-5: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
20-20: Ordered list item prefix
Expected: 1; Actual: 2; Style: 1/1/1
(MD029, ol-prefix)
21-21: Ordered list item prefix
Expected: 1; Actual: 3; Style: 1/1/1
(MD029, ol-prefix)
22-22: Ordered list item prefix
Expected: 1; Actual: 4; Style: 1/1/1
(MD029, ol-prefix)
23-23: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1
(MD029, ol-prefix)
24-24: Ordered list item prefix
Expected: 1; Actual: 6; Style: 1/1/1
(MD029, ol-prefix)
25-25: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
25-25: Ordered list item prefix
Expected: 1; Actual: 7; Style: 1/1/1
(MD029, ol-prefix)
39-39: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
52-52: Bare URL used
(MD034, no-bare-urls)
research/ccp/notes copy.md
1-1: First line in a file should be a top-level heading
(MD041, first-line-heading, first-line-h1)
5-5: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
20-20: Ordered list item prefix
Expected: 1; Actual: 2; Style: 1/1/1
(MD029, ol-prefix)
21-21: Ordered list item prefix
Expected: 1; Actual: 3; Style: 1/1/1
(MD029, ol-prefix)
22-22: Ordered list item prefix
Expected: 1; Actual: 4; Style: 1/1/1
(MD029, ol-prefix)
23-23: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1
(MD029, ol-prefix)
24-24: Ordered list item prefix
Expected: 1; Actual: 6; Style: 1/1/1
(MD029, ol-prefix)
25-25: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
25-25: Ordered list item prefix
Expected: 1; Actual: 7; Style: 1/1/1
(MD029, ol-prefix)
39-39: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
52-52: Bare URL used
(MD034, no-bare-urls)
artifacts/checkpoints/README.md
222-222: First line in a file should be a top-level heading
(MD041, first-line-heading, first-line-h1)
247-247: Link text should be descriptive
(MD059, descriptive-link-text)
287-287: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
311-311: Link text should be descriptive
(MD059, descriptive-link-text)
465-465: Link text should be descriptive
(MD059, descriptive-link-text)
🪛 Ruff (0.14.10)
config/compute.py
48-51: Type alias ComputeConfig uses TypeAlias annotation instead of the type keyword
Use the type keyword
(UP040)
orchestrator/compute/vast_ai.py
37-37: Unused method argument: config
(ARG002)
47-47: Unused method argument: timeout
(ARG002)
optimizer/metal/layernorm.py
41-41: Avoid specifying long messages outside the exception class
(TRY003)
137-137: Avoid specifying long messages outside the exception class
(TRY003)
139-139: Avoid specifying long messages outside the exception class
(TRY003)
layer/memory_block/memory/routing.py
196-196: Dynamically typed expressions (typing.Any) are disallowed in tuner
(ANN401)
218-218: Avoid specifying long messages outside the exception class
(TRY003)
optimizer/metal/adamw.py
38-38: Avoid specifying long messages outside the exception class
(TRY003)
layer/memory_block/memory/vsa.py
73-73: Avoid specifying long messages outside the exception class
(TRY003)
optimizer/metal/rmsnorm.py
40-40: Avoid specifying long messages outside the exception class
(TRY003)
100-100: Avoid specifying long messages outside the exception class
(TRY003)
102-102: Avoid specifying long messages outside the exception class
(TRY003)
optimizer/metal/lion.py
34-34: Avoid specifying long messages outside the exception class
(TRY003)
optimizer/metal/rope.py
41-41: Avoid specifying long messages outside the exception class
(TRY003)
91-91: Avoid specifying long messages outside the exception class
(TRY003)
93-93: Avoid specifying long messages outside the exception class
(TRY003)
layer/memory_block/memory/memory_autotune_test.py
27-27: Use a regular assert instead of unittest-style assertIsNotNone
Replace assertIsNotNone(...) with assert ...
(PT009)
28-28: Use a regular assert instead of unittest-style assertEqual
Replace assertEqual(...) with assert ...
(PT009)
49-49: Use a regular assert instead of unittest-style assertIn
Replace assertIn(...) with assert ...
(PT009)
50-50: Use a regular assert instead of unittest-style assertIn
Replace assertIn(...) with assert ...
(PT009)
51-51: Use a regular assert instead of unittest-style assertIn
Replace assertIn(...) with assert ...
(PT009)
54-54: Use a regular assert instead of unittest-style assertIsInstance
Replace assertIsInstance(...) with assert ...
(PT009)
layer/memory_block/memory/tuner_test.py
15-15: Use a regular assert instead of unittest-style assertEqual
Replace assertEqual(...) with assert ...
(PT009)
16-16: Use a regular assert instead of unittest-style assertEqual
Replace assertEqual(...) with assert ...
(PT009)
25-25: Use a regular assert instead of unittest-style assertLess
Replace assertLess(...) with assert ...
(PT009)
26-26: Use a regular assert instead of unittest-style assertAlmostEqual
(PT009)
32-32: Use a regular assert instead of unittest-style assertLess
Replace assertLess(...) with assert ...
(PT009)
33-33: Use a regular assert instead of unittest-style assertGreater
Replace assertGreater(...) with assert ...
(PT009)
42-42: Use a regular assert instead of unittest-style assertEqual
Replace assertEqual(...) with assert ...
(PT009)
44-44: Use a regular assert instead of unittest-style assertAlmostEqual
(PT009)
55-55: Use a regular assert instead of unittest-style assertEqual
Replace assertEqual(...) with assert ...
(PT009)
60-60: Use a regular assert instead of unittest-style assertGreaterEqual
Replace assertGreaterEqual(...) with assert ...
(PT009)
61-61: Use a regular assert instead of unittest-style assertGreaterEqual
Replace assertGreaterEqual(...) with assert ...
(PT009)
core/event_codec/capnp_codec_test.py
67-67: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
67-67: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
runtime/engine/lightning_engine.py
21-21: Missing return type annotation for special method __init__
Add return type annotation: None
(ANN204)
23-23: Dynamically typed expressions (typing.Any) are disallowed in system
(ANN401)
24-24: Dynamically typed expressions (typing.Any) are disallowed in objective
(ANN401)
25-25: Dynamically typed expressions (typing.Any) are disallowed in train_cfg
(ANN401)
37-37: Unused method argument: batch_idx
(ARG002)
67-67: Unused method argument: manifest
(ARG002)
71-71: Dynamically typed expressions (typing.Any) are disallowed in run_experiment
(ANN401)
layer/attention/standard/layer.py
101-101: Undefined name torch
(F821)
layer/memory_block/memory/tuner.py
170-170: Value being cast to int is already an integer
Remove unnecessary int call
(RUF046)
core/event_codec/capnp_codec.py
20-20: Missing return type annotation for private function _load_schema
(ANN202)
25-25: Avoid specifying long messages outside the exception class
(TRY003)
34-34: Missing return type annotation for private function _get_schema
(ANN202)
49-49: Avoid specifying long messages outside the exception class
(TRY003)
71-71: Avoid specifying long messages outside the exception class
(TRY003)
83-83: Avoid specifying long messages outside the exception class
(TRY003)
86-86: Avoid specifying long messages outside the exception class
(TRY003)
90-90: Avoid specifying long messages outside the exception class
(TRY003)
110-110: Avoid specifying long messages outside the exception class
(TRY003)
112-112: Avoid specifying long messages outside the exception class
(TRY003)
114-114: Avoid specifying long messages outside the exception class
(TRY003)
149-149: Avoid specifying long messages outside the exception class
(TRY003)
151-151: Avoid specifying long messages outside the exception class
(TRY003)
153-155: Avoid specifying long messages outside the exception class
(TRY003)
157-157: Avoid specifying long messages outside the exception class
(TRY003)
infer/event_runtime.py
189-189: Avoid specifying long messages outside the exception class
(TRY003)
210-212: try-except-continue detected, consider logging the exception
(S112)
210-210: Do not catch blind exception: Exception
(BLE001)
237-237: Avoid specifying long messages outside the exception class
(TRY003)
layer/memory_block/memory/memory.py
244-244: Avoid specifying long messages outside the exception class
(TRY003)
289-289: Avoid specifying long messages outside the exception class
(TRY003)
450-450: Avoid specifying long messages outside the exception class
(TRY003)
510-510: Undefined name MemoryHealthTelemetry
(F821)
experiment/runner.py
136-136: Avoid specifying long messages outside the exception class
(TRY003)
143-143: f-string without any placeholders
Remove extraneous f prefix
(F541)
| { | ||
| "configurations": [ | ||
| { | ||
| "name": "Mac", | ||
| "includePath": [ | ||
| "${workspaceFolder}/**", | ||
| "/Users/theapemachine/go/src/github.com/theapemachine/caramba/.venv/lib/python3.12/site-packages/torch/include", | ||
| "/Users/theapemachine/go/src/github.com/theapemachine/caramba/.venv/lib/python3.12/site-packages/torch/include/torch/csrc/api/include", | ||
| "/usr/local/include", | ||
| "/Library/Developer/CommandLineTools/usr/include/c++/v1" | ||
| ], | ||
| "defines": [ | ||
| "TORCH_EXTENSION_NAME=caramba_metal_ops", | ||
| "TORCH_API_INCLUDE_EXTENSION_H" | ||
| ], | ||
| "macFrameworkPath": [ | ||
| "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/System/Library/Frameworks" | ||
| ], | ||
| "compilerPath": "/usr/bin/clang", | ||
| "cStandard": "c17", | ||
| "cppStandard": "c++17", | ||
| "intelliSenseMode": "macos-clang-arm64" | ||
| } | ||
| ], | ||
| "version": 4 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace hardcoded absolute paths with portable alternatives.
The configuration contains hardcoded absolute paths specific to one developer's machine (lines 7-8), which will break IntelliSense for other contributors. VSCode C/C++ configurations should use workspace-relative paths or environment variables.
🔧 Proposed fix using portable paths
{
"configurations": [
{
"name": "Mac",
"includePath": [
"${workspaceFolder}/**",
- "/Users/theapemachine/go/src/github.com/theapemachine/caramba/.venv/lib/python3.12/site-packages/torch/include",
- "/Users/theapemachine/go/src/github.com/theapemachine/caramba/.venv/lib/python3.12/site-packages/torch/include/torch/csrc/api/include",
+ "${workspaceFolder}/.venv/lib/python3.12/site-packages/torch/include",
+ "${workspaceFolder}/.venv/lib/python3.12/site-packages/torch/include/torch/csrc/api/include",
"/usr/local/include",
"/Library/Developer/CommandLineTools/usr/include/c++/v1"
],
"defines": [
"TORCH_EXTENSION_NAME=caramba_metal_ops",
"TORCH_API_INCLUDE_EXTENSION_H"
],
"macFrameworkPath": [
"/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/System/Library/Frameworks"
],
"compilerPath": "/usr/bin/clang",
"cStandard": "c17",
"cppStandard": "c++17",
"intelliSenseMode": "macos-clang-arm64"
}
],
"version": 4
}Alternatively, consider adding this file to .gitignore and providing a .vscode/c_cpp_properties.json.example template that developers can copy and customize locally.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| { | |
| "configurations": [ | |
| { | |
| "name": "Mac", | |
| "includePath": [ | |
| "${workspaceFolder}/**", | |
| "/Users/theapemachine/go/src/github.com/theapemachine/caramba/.venv/lib/python3.12/site-packages/torch/include", | |
| "/Users/theapemachine/go/src/github.com/theapemachine/caramba/.venv/lib/python3.12/site-packages/torch/include/torch/csrc/api/include", | |
| "/usr/local/include", | |
| "/Library/Developer/CommandLineTools/usr/include/c++/v1" | |
| ], | |
| "defines": [ | |
| "TORCH_EXTENSION_NAME=caramba_metal_ops", | |
| "TORCH_API_INCLUDE_EXTENSION_H" | |
| ], | |
| "macFrameworkPath": [ | |
| "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/System/Library/Frameworks" | |
| ], | |
| "compilerPath": "/usr/bin/clang", | |
| "cStandard": "c17", | |
| "cppStandard": "c++17", | |
| "intelliSenseMode": "macos-clang-arm64" | |
| } | |
| ], | |
| "version": 4 | |
| } | |
| { | |
| "configurations": [ | |
| { | |
| "name": "Mac", | |
| "includePath": [ | |
| "${workspaceFolder}/**", | |
| "${workspaceFolder}/.venv/lib/python3.12/site-packages/torch/include", | |
| "${workspaceFolder}/.venv/lib/python3.12/site-packages/torch/include/torch/csrc/api/include", | |
| "/usr/local/include", | |
| "/Library/Developer/CommandLineTools/usr/include/c++/v1" | |
| ], | |
| "defines": [ | |
| "TORCH_EXTENSION_NAME=caramba_metal_ops", | |
| "TORCH_API_INCLUDE_EXTENSION_H" | |
| ], | |
| "macFrameworkPath": [ | |
| "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/System/Library/Frameworks" | |
| ], | |
| "compilerPath": "/usr/bin/clang", | |
| "cStandard": "c17", | |
| "cppStandard": "c++17", | |
| "intelliSenseMode": "macos-clang-arm64" | |
| } | |
| ], | |
| "version": 4 | |
| } |
🤖 Prompt for AI Agents
In @.vscode/c_cpp_properties.json around lines 1 - 26, The
.vscode/c_cpp_properties.json currently contains hardcoded absolute include
paths (the entries in the "includePath" array pointing to
/Users/theapemachine/...) which breaks portability; update those specific
includePath entries to use workspace-relative variables (e.g.
${workspaceFolder}/...) or env vars (e.g. ${env:PYENV} or
${workspaceFolder}/.venv/...) so all contributors resolve headers locally, and
either add .vscode/c_cpp_properties.json to .gitignore or add a template file
(c_cpp_properties.json.example) for local customization; target the
"includePath" array and the file-level config in .vscode/c_cpp_properties.json
when making the change.
| { | ||
| "dim": 2048, | ||
| "ffn_dim_multiplier": 1.5, | ||
| "multiple_of": 256, | ||
| "n_heads": 32, | ||
| "n_kv_heads": 8, | ||
| "n_layers": 16, | ||
| "norm_eps": 1e-05, | ||
| "rope_theta": 500000.0, | ||
| "use_scaled_rope": true, | ||
| "vocab_size": 128256 | ||
| } No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Check if the params.json file has a trailing newline
od -c artifacts/checkpoints/original/params.json | tail -2Repository: TheApeMachine/caramba
Length of output: 105
🏁 Script executed:
# Search for use_scaled_rope usage in the codebase
rg "use_scaled_rope" -t json -t py -t js -A 2 -B 2Repository: TheApeMachine/caramba
Length of output: 378
🏁 Script executed:
# Find loaders and converters that might consume params.json
rg "params\.json" -t py -t js -A 3 -B 1Repository: TheApeMachine/caramba
Length of output: 47
🏁 Script executed:
# Search for rope_scaling (HF convention) to understand the relationship
rg "rope_scaling|rope_theta" -t py -t json -A 2 -B 2Repository: TheApeMachine/caramba
Length of output: 3930
🏁 Script executed:
# List files in the checkpoints directory to understand structure
find artifacts/checkpoints -type f -name "*.json" | head -20Repository: TheApeMachine/caramba
Length of output: 313
🏁 Script executed:
# Check for trailing newline in params.json using cat -A (shows $ at EOF)
cat -A artifacts/checkpoints/original/params.json | tail -3Repository: TheApeMachine/caramba
Length of output: 118
🏁 Script executed:
# Search for all use_scaled_rope references across entire codebase
rg "use_scaled_rope" -t py -t json -t ts -t jsRepository: TheApeMachine/caramba
Length of output: 134
🏁 Script executed:
# Find where params.json is loaded
rg "params\.json|load.*params" -t py -A 5 -B 2Repository: TheApeMachine/caramba
Length of output: 981
🏁 Script executed:
# Check what the config loader expects/accepts
rg "class.*[Cc]onfig|def.*load.*config" -t py -A 10 -B 1 | head -80Repository: TheApeMachine/caramba
Length of output: 4162
🏁 Script executed:
# Search more carefully for how params.json is loaded - check imports and usage patterns
rg "json\.load|load.*json|open.*params" -t py -A 3 -B 1 | head -100Repository: TheApeMachine/caramba
Length of output: 6264
🏁 Script executed:
# Check if there's a config loader that specifically handles params.json
rg "def load|from_pretrained|artifacts/checkpoints" -t py -A 5 -B 2 | head -80Repository: TheApeMachine/caramba
Length of output: 5526
🏁 Script executed:
# Look for the config layer initialization to understand field mapping
cat -n config/layer.py | head -80Repository: TheApeMachine/caramba
Length of output: 2926
🏁 Script executed:
# Continue reading config/layer.py to find rope_scaling definition
cat -n config/layer.py | sed -n '190,250p'Repository: TheApeMachine/caramba
Length of output: 2537
🏁 Script executed:
# Search for where params.json might be loaded in model initialization
rg "artifacts/checkpoints.*original|params\.json" -t py -A 5 -B 2Repository: TheApeMachine/caramba
Length of output: 47
🏁 Script executed:
# Look for any imports or usage of params.json in Python files
rg "params" -t py -A 2 -B 2 | grep -A 2 -B 2 "json\|load" | head -60Repository: TheApeMachine/caramba
Length of output: 1501
🏁 Script executed:
# Check if use_scaled_rope is referenced anywhere in Python code
rg "use_scaled_rope" -t pyRepository: TheApeMachine/caramba
Length of output: 47
Add trailing newline and remove unused use_scaled_rope field.
The file lacks a trailing newline. Additionally, use_scaled_rope is never consumed by any loader or converter in the codebase. The actual RoPE scaling configuration is loaded from HF config.json via the rope_scaling dict (defined in config/layer.py and used throughout layer/rope.py and attention implementations). Remove the orphaned use_scaled_rope field and add a newline at EOF to maintain consistency with standard tooling expectations.
🤖 Prompt for AI Agents
In @artifacts/checkpoints/original/params.json around lines 1 - 12, Remove the
unused "use_scaled_rope" key from params.json and add a trailing newline at EOF;
the RoPE scaling is already handled via the rope_scaling dict (see
config/layer.py) and consumed by layer/rope.py and attention implementations, so
deleting use_scaled_rope avoids confusion and the newline ensures standard EOF
formatting.
| { | ||
| "added_tokens_decoder": { | ||
| "128000": { | ||
| "content": "<|begin_of_text|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128001": { | ||
| "content": "<|end_of_text|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128002": { | ||
| "content": "<|reserved_special_token_0|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128003": { | ||
| "content": "<|reserved_special_token_1|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128004": { | ||
| "content": "<|finetune_right_pad_id|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128005": { | ||
| "content": "<|reserved_special_token_2|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128006": { | ||
| "content": "<|start_header_id|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128007": { | ||
| "content": "<|end_header_id|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128008": { | ||
| "content": "<|eom_id|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128009": { | ||
| "content": "<|eot_id|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128010": { | ||
| "content": "<|python_tag|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128011": { | ||
| "content": "<|reserved_special_token_3|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128012": { | ||
| "content": "<|reserved_special_token_4|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128013": { | ||
| "content": "<|reserved_special_token_5|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128014": { | ||
| "content": "<|reserved_special_token_6|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128015": { | ||
| "content": "<|reserved_special_token_7|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128016": { | ||
| "content": "<|reserved_special_token_8|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128017": { | ||
| "content": "<|reserved_special_token_9|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128018": { | ||
| "content": "<|reserved_special_token_10|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128019": { | ||
| "content": "<|reserved_special_token_11|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128020": { | ||
| "content": "<|reserved_special_token_12|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128021": { | ||
| "content": "<|reserved_special_token_13|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128022": { | ||
| "content": "<|reserved_special_token_14|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128023": { | ||
| "content": "<|reserved_special_token_15|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128024": { | ||
| "content": "<|reserved_special_token_16|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128025": { | ||
| "content": "<|reserved_special_token_17|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128026": { | ||
| "content": "<|reserved_special_token_18|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128027": { | ||
| "content": "<|reserved_special_token_19|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128028": { | ||
| "content": "<|reserved_special_token_20|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128029": { | ||
| "content": "<|reserved_special_token_21|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128030": { | ||
| "content": "<|reserved_special_token_22|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128031": { | ||
| "content": "<|reserved_special_token_23|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128032": { | ||
| "content": "<|reserved_special_token_24|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128033": { | ||
| "content": "<|reserved_special_token_25|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128034": { | ||
| "content": "<|reserved_special_token_26|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128035": { | ||
| "content": "<|reserved_special_token_27|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128036": { | ||
| "content": "<|reserved_special_token_28|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128037": { | ||
| "content": "<|reserved_special_token_29|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128038": { | ||
| "content": "<|reserved_special_token_30|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128039": { | ||
| "content": "<|reserved_special_token_31|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128040": { | ||
| "content": "<|reserved_special_token_32|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128041": { | ||
| "content": "<|reserved_special_token_33|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128042": { | ||
| "content": "<|reserved_special_token_34|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128043": { | ||
| "content": "<|reserved_special_token_35|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128044": { | ||
| "content": "<|reserved_special_token_36|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128045": { | ||
| "content": "<|reserved_special_token_37|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128046": { | ||
| "content": "<|reserved_special_token_38|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128047": { | ||
| "content": "<|reserved_special_token_39|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128048": { | ||
| "content": "<|reserved_special_token_40|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128049": { | ||
| "content": "<|reserved_special_token_41|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128050": { | ||
| "content": "<|reserved_special_token_42|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128051": { | ||
| "content": "<|reserved_special_token_43|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128052": { | ||
| "content": "<|reserved_special_token_44|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128053": { | ||
| "content": "<|reserved_special_token_45|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128054": { | ||
| "content": "<|reserved_special_token_46|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128055": { | ||
| "content": "<|reserved_special_token_47|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128056": { | ||
| "content": "<|reserved_special_token_48|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128057": { | ||
| "content": "<|reserved_special_token_49|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128058": { | ||
| "content": "<|reserved_special_token_50|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128059": { | ||
| "content": "<|reserved_special_token_51|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128060": { | ||
| "content": "<|reserved_special_token_52|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128061": { | ||
| "content": "<|reserved_special_token_53|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128062": { | ||
| "content": "<|reserved_special_token_54|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128063": { | ||
| "content": "<|reserved_special_token_55|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128064": { | ||
| "content": "<|reserved_special_token_56|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128065": { | ||
| "content": "<|reserved_special_token_57|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128066": { | ||
| "content": "<|reserved_special_token_58|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128067": { | ||
| "content": "<|reserved_special_token_59|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128068": { | ||
| "content": "<|reserved_special_token_60|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128069": { | ||
| "content": "<|reserved_special_token_61|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128070": { | ||
| "content": "<|reserved_special_token_62|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128071": { | ||
| "content": "<|reserved_special_token_63|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128072": { | ||
| "content": "<|reserved_special_token_64|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128073": { | ||
| "content": "<|reserved_special_token_65|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128074": { | ||
| "content": "<|reserved_special_token_66|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128075": { | ||
| "content": "<|reserved_special_token_67|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128076": { | ||
| "content": "<|reserved_special_token_68|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128077": { | ||
| "content": "<|reserved_special_token_69|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128078": { | ||
| "content": "<|reserved_special_token_70|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128079": { | ||
| "content": "<|reserved_special_token_71|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128080": { | ||
| "content": "<|reserved_special_token_72|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128081": { | ||
| "content": "<|reserved_special_token_73|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128082": { | ||
| "content": "<|reserved_special_token_74|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128083": { | ||
| "content": "<|reserved_special_token_75|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128084": { | ||
| "content": "<|reserved_special_token_76|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128085": { | ||
| "content": "<|reserved_special_token_77|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128086": { | ||
| "content": "<|reserved_special_token_78|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128087": { | ||
| "content": "<|reserved_special_token_79|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128088": { | ||
| "content": "<|reserved_special_token_80|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128089": { | ||
| "content": "<|reserved_special_token_81|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128090": { | ||
| "content": "<|reserved_special_token_82|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128091": { | ||
| "content": "<|reserved_special_token_83|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128092": { | ||
| "content": "<|reserved_special_token_84|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128093": { | ||
| "content": "<|reserved_special_token_85|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128094": { | ||
| "content": "<|reserved_special_token_86|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128095": { | ||
| "content": "<|reserved_special_token_87|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128096": { | ||
| "content": "<|reserved_special_token_88|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128097": { | ||
| "content": "<|reserved_special_token_89|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128098": { | ||
| "content": "<|reserved_special_token_90|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128099": { | ||
| "content": "<|reserved_special_token_91|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128100": { | ||
| "content": "<|reserved_special_token_92|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128101": { | ||
| "content": "<|reserved_special_token_93|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128102": { | ||
| "content": "<|reserved_special_token_94|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128103": { | ||
| "content": "<|reserved_special_token_95|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128104": { | ||
| "content": "<|reserved_special_token_96|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128105": { | ||
| "content": "<|reserved_special_token_97|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128106": { | ||
| "content": "<|reserved_special_token_98|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128107": { | ||
| "content": "<|reserved_special_token_99|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128108": { | ||
| "content": "<|reserved_special_token_100|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128109": { | ||
| "content": "<|reserved_special_token_101|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128110": { | ||
| "content": "<|reserved_special_token_102|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128111": { | ||
| "content": "<|reserved_special_token_103|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128112": { | ||
| "content": "<|reserved_special_token_104|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128113": { | ||
| "content": "<|reserved_special_token_105|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128114": { | ||
| "content": "<|reserved_special_token_106|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128115": { | ||
| "content": "<|reserved_special_token_107|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128116": { | ||
| "content": "<|reserved_special_token_108|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128117": { | ||
| "content": "<|reserved_special_token_109|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128118": { | ||
| "content": "<|reserved_special_token_110|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128119": { | ||
| "content": "<|reserved_special_token_111|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128120": { | ||
| "content": "<|reserved_special_token_112|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128121": { | ||
| "content": "<|reserved_special_token_113|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128122": { | ||
| "content": "<|reserved_special_token_114|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128123": { | ||
| "content": "<|reserved_special_token_115|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128124": { | ||
| "content": "<|reserved_special_token_116|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128125": { | ||
| "content": "<|reserved_special_token_117|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128126": { | ||
| "content": "<|reserved_special_token_118|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128127": { | ||
| "content": "<|reserved_special_token_119|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128128": { | ||
| "content": "<|reserved_special_token_120|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128129": { | ||
| "content": "<|reserved_special_token_121|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128130": { | ||
| "content": "<|reserved_special_token_122|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128131": { | ||
| "content": "<|reserved_special_token_123|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128132": { | ||
| "content": "<|reserved_special_token_124|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128133": { | ||
| "content": "<|reserved_special_token_125|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128134": { | ||
| "content": "<|reserved_special_token_126|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128135": { | ||
| "content": "<|reserved_special_token_127|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128136": { | ||
| "content": "<|reserved_special_token_128|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128137": { | ||
| "content": "<|reserved_special_token_129|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128138": { | ||
| "content": "<|reserved_special_token_130|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128139": { | ||
| "content": "<|reserved_special_token_131|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128140": { | ||
| "content": "<|reserved_special_token_132|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128141": { | ||
| "content": "<|reserved_special_token_133|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128142": { | ||
| "content": "<|reserved_special_token_134|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128143": { | ||
| "content": "<|reserved_special_token_135|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128144": { | ||
| "content": "<|reserved_special_token_136|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128145": { | ||
| "content": "<|reserved_special_token_137|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128146": { | ||
| "content": "<|reserved_special_token_138|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128147": { | ||
| "content": "<|reserved_special_token_139|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128148": { | ||
| "content": "<|reserved_special_token_140|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128149": { | ||
| "content": "<|reserved_special_token_141|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128150": { | ||
| "content": "<|reserved_special_token_142|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128151": { | ||
| "content": "<|reserved_special_token_143|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128152": { | ||
| "content": "<|reserved_special_token_144|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128153": { | ||
| "content": "<|reserved_special_token_145|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128154": { | ||
| "content": "<|reserved_special_token_146|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128155": { | ||
| "content": "<|reserved_special_token_147|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128156": { | ||
| "content": "<|reserved_special_token_148|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128157": { | ||
| "content": "<|reserved_special_token_149|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128158": { | ||
| "content": "<|reserved_special_token_150|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128159": { | ||
| "content": "<|reserved_special_token_151|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128160": { | ||
| "content": "<|reserved_special_token_152|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128161": { | ||
| "content": "<|reserved_special_token_153|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128162": { | ||
| "content": "<|reserved_special_token_154|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128163": { | ||
| "content": "<|reserved_special_token_155|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128164": { | ||
| "content": "<|reserved_special_token_156|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128165": { | ||
| "content": "<|reserved_special_token_157|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128166": { | ||
| "content": "<|reserved_special_token_158|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128167": { | ||
| "content": "<|reserved_special_token_159|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128168": { | ||
| "content": "<|reserved_special_token_160|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128169": { | ||
| "content": "<|reserved_special_token_161|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128170": { | ||
| "content": "<|reserved_special_token_162|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128171": { | ||
| "content": "<|reserved_special_token_163|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128172": { | ||
| "content": "<|reserved_special_token_164|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128173": { | ||
| "content": "<|reserved_special_token_165|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128174": { | ||
| "content": "<|reserved_special_token_166|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128175": { | ||
| "content": "<|reserved_special_token_167|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128176": { | ||
| "content": "<|reserved_special_token_168|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128177": { | ||
| "content": "<|reserved_special_token_169|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128178": { | ||
| "content": "<|reserved_special_token_170|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128179": { | ||
| "content": "<|reserved_special_token_171|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128180": { | ||
| "content": "<|reserved_special_token_172|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128181": { | ||
| "content": "<|reserved_special_token_173|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128182": { | ||
| "content": "<|reserved_special_token_174|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128183": { | ||
| "content": "<|reserved_special_token_175|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128184": { | ||
| "content": "<|reserved_special_token_176|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128185": { | ||
| "content": "<|reserved_special_token_177|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128186": { | ||
| "content": "<|reserved_special_token_178|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128187": { | ||
| "content": "<|reserved_special_token_179|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128188": { | ||
| "content": "<|reserved_special_token_180|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128189": { | ||
| "content": "<|reserved_special_token_181|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128190": { | ||
| "content": "<|reserved_special_token_182|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128191": { | ||
| "content": "<|reserved_special_token_183|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128192": { | ||
| "content": "<|reserved_special_token_184|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128193": { | ||
| "content": "<|reserved_special_token_185|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128194": { | ||
| "content": "<|reserved_special_token_186|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128195": { | ||
| "content": "<|reserved_special_token_187|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128196": { | ||
| "content": "<|reserved_special_token_188|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128197": { | ||
| "content": "<|reserved_special_token_189|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128198": { | ||
| "content": "<|reserved_special_token_190|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128199": { | ||
| "content": "<|reserved_special_token_191|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128200": { | ||
| "content": "<|reserved_special_token_192|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128201": { | ||
| "content": "<|reserved_special_token_193|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128202": { | ||
| "content": "<|reserved_special_token_194|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128203": { | ||
| "content": "<|reserved_special_token_195|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128204": { | ||
| "content": "<|reserved_special_token_196|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128205": { | ||
| "content": "<|reserved_special_token_197|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128206": { | ||
| "content": "<|reserved_special_token_198|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128207": { | ||
| "content": "<|reserved_special_token_199|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128208": { | ||
| "content": "<|reserved_special_token_200|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128209": { | ||
| "content": "<|reserved_special_token_201|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128210": { | ||
| "content": "<|reserved_special_token_202|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128211": { | ||
| "content": "<|reserved_special_token_203|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128212": { | ||
| "content": "<|reserved_special_token_204|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128213": { | ||
| "content": "<|reserved_special_token_205|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128214": { | ||
| "content": "<|reserved_special_token_206|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128215": { | ||
| "content": "<|reserved_special_token_207|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128216": { | ||
| "content": "<|reserved_special_token_208|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128217": { | ||
| "content": "<|reserved_special_token_209|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128218": { | ||
| "content": "<|reserved_special_token_210|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128219": { | ||
| "content": "<|reserved_special_token_211|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128220": { | ||
| "content": "<|reserved_special_token_212|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128221": { | ||
| "content": "<|reserved_special_token_213|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128222": { | ||
| "content": "<|reserved_special_token_214|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128223": { | ||
| "content": "<|reserved_special_token_215|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128224": { | ||
| "content": "<|reserved_special_token_216|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128225": { | ||
| "content": "<|reserved_special_token_217|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128226": { | ||
| "content": "<|reserved_special_token_218|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128227": { | ||
| "content": "<|reserved_special_token_219|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128228": { | ||
| "content": "<|reserved_special_token_220|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128229": { | ||
| "content": "<|reserved_special_token_221|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128230": { | ||
| "content": "<|reserved_special_token_222|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128231": { | ||
| "content": "<|reserved_special_token_223|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128232": { | ||
| "content": "<|reserved_special_token_224|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128233": { | ||
| "content": "<|reserved_special_token_225|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128234": { | ||
| "content": "<|reserved_special_token_226|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128235": { | ||
| "content": "<|reserved_special_token_227|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128236": { | ||
| "content": "<|reserved_special_token_228|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128237": { | ||
| "content": "<|reserved_special_token_229|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128238": { | ||
| "content": "<|reserved_special_token_230|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128239": { | ||
| "content": "<|reserved_special_token_231|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128240": { | ||
| "content": "<|reserved_special_token_232|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128241": { | ||
| "content": "<|reserved_special_token_233|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128242": { | ||
| "content": "<|reserved_special_token_234|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128243": { | ||
| "content": "<|reserved_special_token_235|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128244": { | ||
| "content": "<|reserved_special_token_236|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128245": { | ||
| "content": "<|reserved_special_token_237|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128246": { | ||
| "content": "<|reserved_special_token_238|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128247": { | ||
| "content": "<|reserved_special_token_239|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128248": { | ||
| "content": "<|reserved_special_token_240|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128249": { | ||
| "content": "<|reserved_special_token_241|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128250": { | ||
| "content": "<|reserved_special_token_242|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128251": { | ||
| "content": "<|reserved_special_token_243|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128252": { | ||
| "content": "<|reserved_special_token_244|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128253": { | ||
| "content": "<|reserved_special_token_245|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128254": { | ||
| "content": "<|reserved_special_token_246|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| }, | ||
| "128255": { | ||
| "content": "<|reserved_special_token_247|>", | ||
| "lstrip": false, | ||
| "normalized": false, | ||
| "rstrip": false, | ||
| "single_word": false, | ||
| "special": true | ||
| } | ||
| }, | ||
| "bos_token": "<|begin_of_text|>", | ||
| "clean_up_tokenization_spaces": true, | ||
| "eos_token": "<|end_of_text|>", | ||
| "model_input_names": [ | ||
| "input_ids", | ||
| "attention_mask" | ||
| ], | ||
| "model_max_length": 131072, | ||
| "tokenizer_class": "PreTrainedTokenizerFast" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the actual file size and line count
echo "=== File Statistics ==="
wc -l artifacts/checkpoints/tokenizer_config.json
du -sh artifacts/checkpoints/tokenizer_config.json
# Check the artifacts directory total size
echo -e "\n=== Artifacts Directory Size ==="
du -sh artifacts/checkpoints/
find artifacts/checkpoints/ -type f -exec du -h {} \; | sort -h
# Extract and validate key fields from the JSON
echo -e "\n=== JSON Validation ==="
python3 << 'EOF'
import json
with open('artifacts/checkpoints/tokenizer_config.json', 'r') as f:
config = json.load(f)
# Check added_tokens_decoder
added_tokens = config.get('added_tokens_decoder', {})
token_ids = sorted([int(k) for k in added_tokens.keys()])
print(f"Total special tokens: {len(token_ids)}")
print(f"Token ID range: {min(token_ids)} to {max(token_ids)}")
print(f"Contiguous range: {token_ids[0]}-{token_ids[-1]}")
# Check for specific tokens mentioned in review
special_token_names = {}
for token_id, token_info in added_tokens.items():
content = token_info.get('content', '')
special_token_names[content] = token_id
print(f"\nNamed special tokens mentioned in review:")
for name in ['<|begin_of_text|>', '<|end_of_text|>', '<|start_header_id|>', '<|end_header_id|>', '<|eom_id|>', '<|eot_id|>']:
if name in special_token_names:
print(f" {name}: {special_token_names[name]}")
# Check other config fields
print(f"\nModel max length: {config.get('model_max_length')}")
print(f"Tokenizer class: {config.get('tokenizer_class')}")
print(f"BOS token: {config.get('bos_token')}")
print(f"EOS token: {config.get('eos_token')}")
EOFRepository: TheApeMachine/caramba
Length of output: 1170
🏁 Script executed:
#!/bin/bash
# Check end of file and line 2060 specifically
echo "=== Lines 2055-2061 (end of file) ==="
tail -20 artifacts/checkpoints/tokenizer_config.json | head -20
# Count reserved tokens vs named tokens
echo -e "\n=== Token Classification ==="
python3 << 'EOF'
import json
with open('artifacts/checkpoints/tokenizer_config.json', 'r') as f:
config = json.load(f)
added_tokens = config.get('added_tokens_decoder', {})
# Classify tokens
named_tokens = {}
reserved_tokens = {}
for token_id, token_info in added_tokens.items():
content = token_info.get('content', '')
if 'reserved_special_token' in content:
reserved_tokens[content] = token_id
else:
named_tokens[content] = token_id
print(f"Named/special tokens (non-reserved): {len(named_tokens)}")
print(f"Reserved tokens: {len(reserved_tokens)}")
print(f"Total: {len(named_tokens) + len(reserved_tokens)}")
print(f"\nNamed tokens:")
for name in sorted(named_tokens.keys()):
print(f" {name}")
EOFRepository: TheApeMachine/caramba
Length of output: 834
LGTM - Standard tokenizer configuration with extensive special tokens.
The tokenizer configuration is correctly structured for PreTrainedTokenizerFast with:
- 256 special tokens (IDs 128000-128255) properly mapped
- Begin/end of text tokens at 128000/128001
- Named special tokens for chat formatting (start_header_id, end_header_id, eom_id, eot_id, python_tag, finetune_right_pad_id)
- 248 reserved tokens for future extensibility
- Model max length of 131072 (128K context) aligns with Llama 3.2 specifications
Consider repository bloat: The artifacts directory is 11M, primarily driven by tokenizer.json (8.7M) and tokenizer.model (2.1M) rather than the config file itself (52K). For production deployments, consider:
- Using Git LFS for large model artifacts
- Fetching tokenizer configs from HuggingFace Hub at runtime
- Documenting why local copies are necessary if they must be included
🧰 Tools
🪛 Checkov (3.2.334)
[low] 2060-2061: Base64 High Entropy String
(CKV_SECRET_6)
🤖 Prompt for AI Agents
In @artifacts/checkpoints/tokenizer_config.json around lines 1 - 2061, Repo
artifacts are bloating the repository because large tokenizer files
(tokenizer.json ~8.7M and tokenizer.model ~2.1M) are committed under
artifacts/checkpoints alongside tokenizer_config.json; remove or offload them to
avoid heavy PRs. Fix by: (1) remove large binary tokenizer files from the commit
and add them to Git LFS (or delete and re-add via git rm --cached) so
tokenizer.json and tokenizer.model are tracked via LFS; (2) update
documentation/README to state that artifacts are fetched at runtime from the
HuggingFace Hub (use AutoTokenizer.from_pretrained or transformers'
from_pretrained for the model repo) and include a fallback local path for
tokenizer_config.json if you must keep a small config; (3) ensure CI/deployment
pulls tokenizer artifacts at build/runtime (reference files tokenizer.json,
tokenizer.model and tokenizer_config.json in artifacts/checkpoints and functions
that load them) so the repo contains only the lightweight config while large
blobs live in LFS or are downloaded on demand.
| **Llama 3.2** **Acceptable Use Policy** | ||
|
|
||
| Meta is committed to promoting safe and fair use of its tools and features, including Llama 3.2. If you access or use Llama 3.2, you agree to this Acceptable Use Policy (“**Policy**”). The most recent copy of this policy can be found at [https://www.llama.com/llama3_2/use-policy](https://www.llama.com/llama3_2/use-policy). | ||
|
|
||
| **Prohibited Uses** | ||
|
|
||
| We want everyone to use Llama 3.2 safely and responsibly. You agree you will not use, or allow others to use, Llama 3.2 to: | ||
|
|
||
|
|
||
|
|
||
| 1. Violate the law or others’ rights, including to: | ||
| 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: | ||
| 1. Violence or terrorism | ||
| 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material | ||
| 3. Human trafficking, exploitation, and sexual violence | ||
| 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. | ||
| 5. Sexual solicitation | ||
| 6. Any other criminal activity | ||
| 1. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals | ||
| 2. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services | ||
| 3. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices | ||
| 4. Collect, process, disclose, generate, or infer private or sensitive information about individuals, including information about individuals’ identity, health, or demographic information, unless you have obtained the right to do so in accordance with applicable law | ||
| 5. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama Materials | ||
| 6. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system | ||
| 7. Engage in any action, or facilitate any action, to intentionally circumvent or remove usage restrictions or other safety measures, or to enable functionality disabled by Meta | ||
| 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 3.2 related to the following: | ||
| 8. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State or to the U.S. Biological Weapons Anti-Terrorism Act of 1989 or the Chemical Weapons Convention Implementation Act of 1997 | ||
| 9. Guns and illegal weapons (including weapon development) | ||
| 10. Illegal drugs and regulated/controlled substances | ||
| 11. Operation of critical infrastructure, transportation technologies, or heavy machinery | ||
| 12. Self-harm or harm to others, including suicide, cutting, and eating disorders | ||
| 13. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual | ||
| 3. Intentionally deceive or mislead others, including use of Llama 3.2 related to the following: | ||
| 14. Generating, promoting, or furthering fraud or the creation or promotion of disinformation | ||
| 15. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content | ||
| 16. Generating, promoting, or further distributing spam | ||
| 17. Impersonating another individual without consent, authorization, or legal right | ||
| 18. Representing that the use of Llama 3.2 or outputs are human-generated | ||
| 19. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement | ||
| 4. Fail to appropriately disclose to end users any known dangers of your AI system | ||
| 5. Interact with third party tools, models, or software designed to generate unlawful content or engage in unlawful or harmful conduct and/or represent that the outputs of such tools, models, or software are associated with Meta or Llama 3.2 | ||
|
|
||
| With respect to any multimodal models included in Llama 3.2, the rights granted under Section 1(a) of the Llama 3.2 Community License Agreement are not being granted to you if you are an individual domiciled in, or a company with a principal place of business in, the European Union. This restriction does not apply to end users of a product or service that incorporates any such multimodal models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix markdownlint blockers (headings, ordered-list prefixes, trailing spaces).
If markdownlint-cli2 is enforced, current MD041/MD036/MD029/MD009 findings will fail CI.
Proposed fix (format-only)
-**Llama 3.2** **Acceptable Use Policy**
+# Llama 3.2 Acceptable Use Policy
@@
-**Prohibited Uses**
+## Prohibited Uses
@@
-1. Violate the law or others’ rights, including to:
+1. Violate the law or others’ rights, including to:
1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
@@
-2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 3.2 related to the following:
+1. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 3.2 related to the following:
@@
-3. Intentionally deceive or mislead others, including use of Llama 3.2 related to the following:
+1. Intentionally deceive or mislead others, including use of Llama 3.2 related to the following:
@@
-4. Fail to appropriately disclose to end users any known dangers of your AI system
-5. Interact with third party tools, models, or software designed to generate unlawful content or engage in unlawful or harmful conduct and/or represent that the outputs of such tools, models, or software are associated with Meta or Llama 3.2
+1. Fail to appropriately disclose to end users any known dangers of your AI system
+1. Interact with third-party tools, models, or software designed to generate unlawful content or engage in unlawful or harmful conduct and/or represent that the outputs of such tools, models, or software are associated with Meta or Llama 3.2🧰 Tools
🪛 LanguageTool
[grammar] ~41-~41: Use a hyphen to join words.
Context: ...of your AI system 5. Interact with third party tools, models, or software designe...
(QB_NEW_EN_HYPHEN)
[style] ~42-~42: ‘With respect to’ might be wordy. Consider a shorter alternative.
Context: ... are associated with Meta or Llama 3.2 With respect to any multimodal models included in Llama...
(EN_WORDINESS_PREMIUM_WITH_RESPECT_TO)
🪛 markdownlint-cli2 (0.18.1)
1-1: First line in a file should be a top-level heading
(MD041, first-line-heading, first-line-h1)
5-5: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
20-20: Ordered list item prefix
Expected: 1; Actual: 2; Style: 1/1/1
(MD029, ol-prefix)
21-21: Ordered list item prefix
Expected: 1; Actual: 3; Style: 1/1/1
(MD029, ol-prefix)
22-22: Ordered list item prefix
Expected: 1; Actual: 4; Style: 1/1/1
(MD029, ol-prefix)
23-23: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1
(MD029, ol-prefix)
24-24: Ordered list item prefix
Expected: 1; Actual: 6; Style: 1/1/1
(MD029, ol-prefix)
25-25: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
25-25: Ordered list item prefix
Expected: 1; Actual: 7; Style: 1/1/1
(MD029, ol-prefix)
39-39: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
🤖 Prompt for AI Agents
In @artifacts/checkpoints/USE_POLICY.md around lines 1 - 43, Convert the initial
bold title into a proper Markdown heading (e.g., use a single-level heading like
"# Llama 3.2 Acceptable Use Policy") and ensure a blank line follows it;
normalize ordered-list markers so nested and top-level numbered lists use a
consistent prefix style (prefer using "1." for every ordered item per MD029 or
sequential numbers consistently), fix nested indentation for sublists so they
render as distinct levels (use two or four spaces per indent), remove all
trailing spaces and hard tabs (address MD009), and ensure there is a blank line
before/after block-level elements (lists/headings) to satisfy
MD041/MD036/MD029/MD009 checks.
| Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: | ||
|
|
||
|
|
||
|
|
||
| * Reporting issues with the model: [https://github.com/meta-llama/llama-models/issues](https://l.workplace.com/l.php?u=https%3A%2F%2Fgithub.com%2Fmeta-llama%2Fllama-models%2Fissues&h=AT0qV8W9BFT6NwihiOHRuKYQM_UnkzN_NmHMy91OT55gkLpgi4kQupHUl0ssR4dQsIQ8n3tfd0vtkobvsEvt1l4Ic6GXI2EeuHV8N08OG2WnbAmm0FL4ObkazC6G_256vN0lN9DsykCvCqGZ) | ||
| * Reporting risky content generated by the model: [developers.facebook.com/llama_output_feedback](http://developers.facebook.com/llama_output_feedback) | ||
| * Reporting bugs and security concerns: [facebook.com/whitehat/info](http://facebook.com/whitehat/info) | ||
| * Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama 3.2: LlamaUseReport@meta.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace the redirected/mismatched GitHub Issues link target; prefer https links.
Line 49’s link text is GitHub, but the href is a l.workplace.com redirect—please make the href the direct GitHub URL (and consider upgrading the http:// links on Lines 50-51 to https:// if supported).
Proposed fix (safer URLs)
-* Reporting issues with the model: [https://github.com/meta-llama/llama-models/issues](https://l.workplace.com/l.php?u=https%3A%2F%2Fgithub.com%2Fmeta-llama%2Fllama-models%2Fissues&h=AT0qV8W9BFT6NwihiOHRuKYQM_UnkzN_NmHMy91OT55gkLpgi4kQupHUl0ssR4dQsIQ8n3tfd0vtkobvsEvt1l4Ic6GXI2EeuHV8N08OG2WnbAmm0FL4ObkazC6G_256vN0lN9DsykCvCqGZ)
-* Reporting risky content generated by the model: [developers.facebook.com/llama_output_feedback](http://developers.facebook.com/llama_output_feedback)
-* Reporting bugs and security concerns: [facebook.com/whitehat/info](http://facebook.com/whitehat/info)
+* Reporting issues with the model: [https://github.com/meta-llama/llama-models/issues](https://github.com/meta-llama/llama-models/issues)
+* Reporting risky content generated by the model: [developers.facebook.com/llama_output_feedback](https://developers.facebook.com/llama_output_feedback)
+* Reporting bugs and security concerns: [facebook.com/whitehat/info](https://facebook.com/whitehat/info)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: | |
| * Reporting issues with the model: [https://github.com/meta-llama/llama-models/issues](https://l.workplace.com/l.php?u=https%3A%2F%2Fgithub.com%2Fmeta-llama%2Fllama-models%2Fissues&h=AT0qV8W9BFT6NwihiOHRuKYQM_UnkzN_NmHMy91OT55gkLpgi4kQupHUl0ssR4dQsIQ8n3tfd0vtkobvsEvt1l4Ic6GXI2EeuHV8N08OG2WnbAmm0FL4ObkazC6G_256vN0lN9DsykCvCqGZ) | |
| * Reporting risky content generated by the model: [developers.facebook.com/llama_output_feedback](http://developers.facebook.com/llama_output_feedback) | |
| * Reporting bugs and security concerns: [facebook.com/whitehat/info](http://facebook.com/whitehat/info) | |
| * Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama 3.2: LlamaUseReport@meta.com | |
| Please report any violation of this Policy, software "bug," or other problems that could lead to a violation of this Policy through one of the following means: | |
| * Reporting issues with the model: [https://github.com/meta-llama/llama-models/issues](https://github.com/meta-llama/llama-models/issues) | |
| * Reporting risky content generated by the model: [developers.facebook.com/llama_output_feedback](https://developers.facebook.com/llama_output_feedback) | |
| * Reporting bugs and security concerns: [facebook.com/whitehat/info](https://facebook.com/whitehat/info) | |
| * Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama 3.2: LlamaUseReport@meta.com |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
52-52: Bare URL used
(MD034, no-bare-urls)
🤖 Prompt for AI Agents
In @artifacts/checkpoints/USE_POLICY.md around lines 45 - 52, Replace the
redirect/incorrect hrefs in the policy links so the visible link text points
directly to the intended secure URLs: update the "Reporting issues with the
model" link to use the direct HTTPS URL
https://github.com/meta-llama/llama-models/issues instead of the l.workplace.com
redirect, and change the other two links to HTTPS (use
https://developers.facebook.com/llama_output_feedback and
https://facebook.com/whitehat/info) while leaving the LlamaUseReport@meta.com
contact as-is; ensure the markdown link targets (the hrefs) are the direct
secure URLs rather than redirects.
research/ccp/notes copy.md
Outdated
| Complete MOSAIC Differentiable VM Architecture | ||
| This plan implements the five missing architectural elements discussed in the meeting notes to transform MOSAIC into a fully autonomous "Differentiable Virtual Machine." | ||
|
|
||
| Current State Analysis | ||
| What exists: | ||
|
|
||
| HomeostaticLoop, IntrinsicDrive, DriveBand in core/homeostasis.py - primitives only, not integrated | ||
| MosaicOpcode ISA in layer/mosaic/isa.py - defines 10 opcodes (NOP, READ_MEM, WRITE_MEM, CLEAR_MEM, IDLE, GATE_UP, GATE_DOWN, SCAN, COMMIT, RESPOND) | ||
| opcode_head in MemoryBlockLayer - emit-only, does not control behavior | ||
| sleep_replay_per_pair in datasets - training data exists, no runtime loop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Verification inconclusive
Script executed:
# Check if the file exists and read initial content
find . -type f -name "notes copy.md" -o -name "*notes*copy*"Repository: TheApeMachine/caramba
Repository: TheApeMachine/caramba
Exit code: 0
stdout:
./research/ccp/notes copy.md
Script executed:
# Also search more broadly for this path
fd -t f "notes copy" .Repository: TheApeMachine/caramba
Repository: TheApeMachine/caramba
Exit code: 0
stdout:
./research/ccp/notes copy.md
Script executed:
# List files in the research/ccp directory to understand structure
ls -la research/ccp/ 2>/dev/null || echo "Directory not found at research/ccp"Repository: TheApeMachine/caramba
Repository: TheApeMachine/caramba
Exit code: 0
stdout:
total 16792
drwxr-xr-x 2 jailuser jailuser 140 Jan 10 22:36 .
drwxr-xr-x 5 jailuser jailuser 100 Jan 10 22:36 ..
-rw-r--r-- 1 jailuser jailuser 605422 Jan 10 22:36 Cognitive Control Plane.pdf
-rw-r--r-- 1 jailuser jailuser 16495005 Jan 10 22:36 Cognitive_Control_Plane_MOSAIC_Runtime.pdf
-rw-r--r-- 1 jailuser jailuser 8039 Jan 10 22:36 notes copy.md
-rw-r--r-- 1 jailuser jailuser 67873 Jan 10 22:36 paper.tex
-rw-r--r-- 1 jailuser jailuser 10941 Jan 10 22:36 references.bib
Script executed:
# Search for fp32 and Metal-related content in the repository
rg -l "fp32|FP32" --type md --max-count 20Repository: TheApeMachine/caramba
Repository: TheApeMachine/caramba
Exit code: 0
stdout:
docs/inference.md
CLAUDE.md
Script executed:
# Search for Metal-related mentions
rg -l "metal|Metal" --type md --max-count 20Repository: TheApeMachine/caramba
Repository: TheApeMachine/caramba
Exit code: 0
stdout:
docs/optimization.md
docs/getting-started.md
docker/OPTIMIZATION.md
README.md
artifacts/ai/meeting_notes/temp.md
CLAUDE.md
Script executed:
# Read the actual file content to verify scope
cat -n ./research/ccp/notes\ copy.md | head -80Repository: TheApeMachine/caramba
Repository: TheApeMachine/caramba
Exit code: 0
stdout:
1 Complete MOSAIC Differentiable VM Architecture
2 This plan implements the five missing architectural elements discussed in the meeting notes to transform MOSAIC into a fully autonomous "Differentiable Virtual Machine."
3
4 Current State Analysis
5 What exists:
6
7 HomeostaticLoop, IntrinsicDrive, DriveBand in core/homeostasis.py - primitives only, not integrated
8 MosaicOpcode ISA in layer/mosaic/isa.py - defines 10 opcodes (NOP, READ_MEM, WRITE_MEM, CLEAR_MEM, IDLE, GATE_UP, GATE_DOWN, SCAN, COMMIT, RESPOND)
9 opcode_head in MemoryBlockLayer - emit-only, does not control behavior
10 sleep_replay_per_pair in datasets - training data exists, no runtime loop
11 EventResponder + ModelHandler in infer/event_runtime.py - reactive only (prompt-triggered)
12 mosaic_idle.yml preset - agent process exists but is a research loop, not dVM idle
13 What's missing:
14
15 Opcodes don't control registers/memory (just logged)
16 No impulse-driven awakening (only prompt-triggered)
17 No idle-time compute loop (model waits passively)
18 No medium/slow timescale learning (only fast writes)
19 No tool creation mechanism (only tool usage)
20 ---
21
22 Phase 1: Wire Opcodes to Control Behavior
23 Currently opcodes are emit-only. Wire them to actually gate subsystem operations.
24
25 Files to modify:
26
27 layer/mosaic/block.py - Add opcode dispatch logic
28 Implementation:
29
30 # In _process_token or forward loop:
31 if self.opcodes_enabled:
32 op_logits = self.opcode_head(u_t)
33 op_id = op_logits.argmax(dim=-1) # Hard decode
34
35 # Use STE for training gradients
36 op_soft = F.softmax(op_logits, dim=-1)
37 op_hard = F.one_hot(op_id, self.opcode_vocab)
38 op_sel = (op_hard - op_soft).detach() + op_soft
39
40 # Gate operations based on opcode
41 do_read = op_sel[:, MosaicOpcode.READ_MEM]
42 do_write = op_sel[:, MosaicOpcode.WRITE_MEM] * write_gate
43 do_clear = op_sel[:, MosaicOpcode.CLEAR_MEM]
44 Behavior mapping:
45
46 | Opcode | Effect |
47
48 |--------|--------|
49
50 | READ_MEM | Enable memory read path |
51
52 | WRITE_MEM | Gate write operation (AND with existing write_gate) |
53
54 | CLEAR_MEM | Zero target register/memory slot |
55
56 | GATE_UP/DOWN | Boost/suppress fusion gates |
57
58 | IDLE | Suppress output (internal consolidation) |
59
60 | COMMIT | Trigger commitment delta |
61
62 ---
63
64 Phase 2: Integrate Homeostatic Impulses into Runtime
65 Connect HomeostaticLoop to the inference runtime so the model can self-activate.
66
67 Files to modify:
68
69 infer/event_runtime.py - Add impulse handling
70 New file: infer/autonomous_runtime.py - Idle loop with homeostasis
71 Key metrics to expose:
72
73 Memory utilization (write rate, occupancy)
74 Routing entropy
75 Commitment balance (open vs closed)
76 Output confidence (logit entropy)
77 Implementation skeleton:
78
79 @dataclass
80 class AutonomousRuntime:
Script executed:
# Check file size and line count
wc -l ./research/ccp/notes\ copy.mdRepository: TheApeMachine/caramba
Repository: TheApeMachine/caramba
Exit code: 0
stdout:
261 ./research/ccp/notes copy.md
Script executed:
# Search for any mentions of fp32, FP32, metal, or Metal in this file
rg -i "fp32|metal" ./research/ccp/notes\ copy.mdRepository: TheApeMachine/caramba
Repository: TheApeMachine/caramba
Exit code: 1
Script executed:
# Check git status and recent commits to understand PR context
git log --oneline -20 2>/dev/null | head -20Repository: TheApeMachine/caramba
Repository: TheApeMachine/caramba
Exit code: 0
stdout:
45e6115 Merge branch 'main' into feature/fp32-support-for-metal
Script executed:
# Look for any other files with fp32 or metal in their names or recent changes
git diff --name-only HEAD~5..HEAD 2>/dev/null | head -20Repository: TheApeMachine/caramba
Repository: TheApeMachine/caramba
Exit code: 0
Script executed:
# Check git status to see all modified/added files in the current branch
git diff --name-only main..HEAD 2>/dev/null | head -30Repository: TheApeMachine/caramba
Repository: TheApeMachine/caramba
Exit code: 0
stdout:
.vscode/c_cpp_properties.json
artifacts/checkpoints/.gitattributes
artifacts/checkpoints/LICENSE.txt
artifacts/checkpoints/README.md
artifacts/checkpoints/USE_POLICY.md
artifacts/checkpoints/config.json
artifacts/checkpoints/generation_config.json
artifacts/checkpoints/original/params.json
artifacts/checkpoints/original/tokenizer.model
artifacts/checkpoints/special_tokens_map.json
artifacts/checkpoints/tokenizer.json
artifacts/checkpoints/tokenizer_config.json
artifacts/mosaic/paper.pdf
config/compute.py
config/compute_test.py
config/defaults.py
config/layer.py
config/presets/mosaic_resonant.yml
config/target.py
core/__init__.py
core/event_codec/__init__.py
core/event_codec/binary_codec.py
core/event_codec/binary_codec_test.py
core/event_codec/capnp_codec.py
core/event_codec/capnp_codec_test.py
core/event_codec/event.capnp
core/event_codec/json_codec.py
core/event_codec/stream_parser.py
examples/lightning_vastai_demo.yml
experiment/runner.py
Script executed:
# Search the entire codebase for actual fp32 or metal implementations
rg -i "fp32" --type py --max-count 10Repository: TheApeMachine/caramba
Repository: TheApeMachine/caramba
Exit code: 0
stdout:
trainer/standard.py: # TF32 is safe and commonly enabled on Ampere+ for faster fp32 matmuls
trainer/standard.py: # (many reductions / softmax stats run in fp32 even when weights are bf16).
trainer/standard.py: "Supported: MPS fp16/fp32, CUDA fp16/bf16."
trainer/steppers/global_stepper.py: # IMPORTANT: compute CE in fp32 for numerical stability (esp. MPS + 128k vocab).
trainer/steppers/global_stepper.py: logger.warning("Disabling fp16 autocast on MPS (no GradScaler); using fp32 math for stability.")
trainer/steppers/global_orchestrated.py: # MPS: fp16 *weights* + AdamW-style updates is a common source of NaNs (no fp32 master weights).
trainer/chunked_cross_entropy.py: # Use fp32 for stability/perf, but preserve fp64 for gradcheck correctness.
trainer/chunked_cross_entropy.py: # Compute LSE in fp32 for stability; matmul uses tensor cores for bf16/fp16.
trainer/chunked_cross_entropy.py: # Recompute LSE in compute_dtype (fp32 fast path, fp64 for gradcheck).
trainer/diffusion_codegen/training.py: if d in {"float32", "fp32"}:
orchestrator/strategy.py: # MPS + fp16 weights + AdamW can poison weights without fp32 master params.
orchestrator/nowcast.py: "predictor_dtype": str(self._pred_dtype) if self._pred_dtype is not None else "fp32",
optimizer/dba_attention_triton.py: self._require(q_sem.dtype in (torch.float16, torch.bfloat16, torch.float32), msg="DBA attention supports fp16/bf16/fp32.")
optimizer/dba_attention_triton_kernels_fwd.py: # (accumulation still happens in fp32).
optimizer/fused_attention.py: """Single source of truth for decoupled score computation in fp32.
optimizer/fused_attention.py: fp32 accumulation avoids mixed-precision drift when validating policies.
optimizer/adamw_master.py:"""AdamW with fp32 master weights for fp16 models (MPS-safe).
optimizer/adamw_master.py:- The standard fix is to keep fp32 "master" weights for the optimizer update,
optimizer/adamw_master.py:but maintains AdamW state + updates in fp32.
optimizer/adamw_master.py: """AdamW optimizer with fp32 master weights stored in state."""
optimizer/adamw_master.py: # fp32 master copy + fp32 moments
optimizer/kernels.py: msg=f"RMSNorm on MPS requires fp16/fp32, got dtype={x.dtype}.",
optimizer/kernels.py: msg=f"RoPE on MPS requires fp16/fp32, got dtype={x.dtype}.",
optimizer/kernels.py: msg=f"LayerNorm on MPS requires fp16/fp32, got dtype={x.dtype}.",
optimizer/kernels.py: msg=f"AdamW step on MPS requires fp16/fp32 params, got dtype={p.dtype}.",
optimizer/kernels.py: msg="AdamW step on CUDA requires fp32 master/exp_avg/exp_avg_sq.",
optimizer/kernels.py: msg=f"Lion step on MPS requires fp16/fp32 params, got dtype={p.dtype}.",
optimizer/metal/adamw.py: raise RuntimeError("Metal AdamWMaster requires fp16/fp32 p/grad (matching)")
optimizer/metal/adamw.py: raise RuntimeError("Metal AdamWMaster requires fp32 master/exp_avg/exp_avg_sq")
optimizer/metal/rmsnorm.py: raise RuntimeError("Metal RMSNorm currently supports fp16/fp32 only")
optimizer/metal/rmsnorm.py: """Fused RMSNorm (MPS/Metal) for fp16/fp32 tensors.
optimizer/metal/rmsnorm.py: x: (..., D) fp16/fp32 tensor on MPS (contiguous required)
optimizer/metal/rmsnorm.py: weight: (D,) fp16/fp32 tensor on MPS, or None for no affine scale
optimizer/metal/rmsnorm.py: raise RuntimeError("Metal RMSNorm currently supports fp16/fp32 only")
optimizer/metal/rope.py: raise RuntimeError("Metal RoPE currently supports fp16/fp32 only")
optimizer/metal/rope.py: """Apply RoPE using the Metal extension (fp16/fp32).
optimizer/metal/rope.py: raise RuntimeError("Metal RoPE currently supports fp16/fp32 only")
optimizer/metal/lion.py: raise RuntimeError("Metal Lion currently supports fp16/fp32 tensors only (matching)")
optimizer/metal/layernorm.py: raise RuntimeError("Metal LayerNorm currently supports fp16/fp32 only")
optimizer/metal/layernorm.py: """Fused LayerNorm (MPS/Metal) for fp16/fp32 tensors.
optimizer/metal/layernorm.py: raise RuntimeError("Metal LayerNorm currently supports fp16/fp32 only")
optimizer/metal/ssm_test.py: # Reference in fp32 for numerics, cast back to fp16 like other Metal tests.
optimizer/metal/rope_test.py: # Reference in fp32 for numerics.
optimizer/metal/rmsnorm_test.py: # Reference in fp32 for better numerics.
optimizer/metal/layernorm_test.py: # Reference in fp32 for numerics.
optimizer/metal/dba_decode_test.py: # Reference (PyTorch) in fp32 for numerics.
optimizer/flash_attention_triton_kernels_fwd.py: # (accumulation still happens in fp32).
optimizer/flash_attention_triton.py: self._require(q.dtype in (torch.float16, torch.bfloat16, torch.float32), msg="FlashAttention supports fp16/bf16/fp32.")
optimizer/adamw_triton_kernels.py:- State: fp32 master weights + fp32 exp_avg + fp32 exp_avg_sq (updated)
optimizer/adamw_triton_kernels.py: # Load grad + state. Grad/param are fp16 or bf16; compute in fp32.
optimizer/adamw_triton.py: _require(master.dtype == torch.float32, msg="AdamW master must be fp32.")
optimizer/adamw_triton.py: _require(exp_avg.dtype == torch.float32 and exp_avg_sq.dtype == torch.float32, msg="AdamW moments must be fp32.")
infer/cache_policy.py: if kind in (KVCacheKind.FP16, KVCacheKind.FP32):
data/transforms/cast_dtype.py: Parses common dtype name variations (like "fp32" or "float32") into PyTorch
data/transforms/cast_dtype.py: if t in ("float32", "fp32"):
config/kvcache.py: FP16/FP32: Full precision (highest quality, most memory)
config/kvcache.py: FP32 = "fp32"
cache/tensor.py: - fp16/fp32: Full precision storage
cache/tensor.py: if self.kind in ("fp16", "fp32"):
cache/tensor.py: return self.kind not in ("fp16", "fp32")
cache/tensor.py: if self.kind in ("fp16", "fp32"):
cache/tensor.py: raise RuntimeError("Expected fp buffer for fp16/fp32 cache")
cache/tensor.py: if self.kind in ("fp16", "fp32"):
cache/tensor.py: raise RuntimeError("Expected fp buffer for fp16/fp32 cache")
cache/tensor.py: if self.kind in ("fp16", "fp32"):
cache/tensor.py: raise RuntimeError("Expected fp buffer for fp16/fp32 cache")
benchmark/memory.py: "fp32": 4.0,
cache/layer.py: synchronized. Supports both fp16/fp32 storage and quantized formats.
carmath/carmath_test.py: assert bytes_per_kind("fp32") == 4.0
carmath/bytes.py: - fp32, fp16
carmath/bytes.py: if k == "fp32":
Script executed:
# Search for metal-related code in Python files
rg -i "metal" --type py --max-count 10Repository: TheApeMachine/caramba
Repository: TheApeMachine/caramba
Exit code: 0
stdout:
runtime/readiness_test.py: monkeypatch.setattr(readiness, "metal_supported", False)
runtime/readiness_test.py: monkeypatch.setattr(readiness, "metal_supported", False)
runtime/readiness_test.py: monkeypatch.setattr(readiness, "metal_build_tools_available", True)
runtime/readiness_test.py:def test_metal_build_tools_missing_is_error_or_warning_based_on_best_effort(monkeypatch) -> None:
runtime/readiness_test.py: monkeypatch.setattr(readiness, "metal_supported", True)
runtime/readiness_test.py: monkeypatch.setattr(readiness, "metal_build_tools_available", False)
runtime/readiness_test.py: assert any(i.code == "metal_build_tools_missing" for i in rep_err.errors)
runtime/readiness_test.py: assert any(i.code == "metal_build_tools_missing" for i in rep_warn.warnings)
runtime/readiness.py:- Missing *performance* backends (Metal/Triton) are **errors** by default.
runtime/readiness.py: metal_build_tools_available,
runtime/readiness.py: metal_supported,
runtime/readiness.py: # Metal (MPS) fused decode is used for fp16 caches on MPS.
runtime/readiness.py: if not _probe_flag(metal_supported):
runtime/readiness.py: elif not _probe_flag(metal_build_tools_available):
runtime/readiness.py: code="metal_build_tools_missing",
runtime/readiness.py: "Metal build tools are not available (missing `metal`/`metallib` in the active Xcode toolchain); "
runtime/readiness.py: " - `xcrun -sdk macosx --find metal`\n"
runtime/readiness.py: " - `xcrun -sdk macosx --find metallib`"
layer/ssm.py: from caramba.optimizer.metal import MetalSSMSelectiveScan
layer/ssm.py: scan = MetalSSMSelectiveScan()
layer/attention/decoupled/layer.py: allowed = {"auto", "triton", "sdpa", "metal"}
layer/attention/decoupled/layer.py: # "metal" is a manifest-friendly alias for the SDPA-style path on MPS.
layer/attention/decoupled/layer.py: force_sdpa = (dba_backend == "sdpa" or dba_backend == "metal")
layer/attention/decoupled/decode.py: from caramba.optimizer.metal import dba_decode_fp16, metal_dba_decode_available
layer/attention/decoupled/decode.py: if not metal_dba_decode_available():
layer/attention/decoupled/decode.py: raise RuntimeError("Metal DBA decode kernel is unavailable on this runtime.")
layer/attention/decoupled/decode.py: "Metal DBA decode requires fp16 KV caches on MPS.\n"
experiment/runner.py: if best_effort and w.code in {"metal_build_tools_missing", "triton_missing"}:
core/platform.py: "metal": "mps",
config/layer.py: # Note: "metal" is accepted for MPS/Metal runs and maps to an SDPA-style path
config/layer.py: dba_train_backend: Literal["auto", "triton", "sdpa", "metal"] = "auto"
config/defaults.py: # If true, allow "best effort" fallbacks for optional perf backends (Metal/Triton),
optimizer/kernels.py:_METAL_IMPORT_ERROR: Exception | None = None
optimizer/kernels.py:_MetalSSMSelectiveScan: Any | None = None
optimizer/kernels.py: from caramba.optimizer.metal import (
optimizer/kernels.py: MetalSSMSelectiveScan as _MetalSSMSelectiveScan,
optimizer/kernels.py: _METAL_IMPORT_ERROR = e
optimizer/kernels.py: bool(KERNELS.mps_available and KERNELS.metal_ops_loaded),
optimizer/kernels.py: msg="RMSNorm on MPS requires the Metal extension to be available and loaded at startup.",
optimizer/kernels.py: raise RuntimeError(f"Metal RMSNorm import failed: {_METAL_IMPORT_ERROR!r}")
optimizer/kernels.py: bool(KERNELS.mps_available and KERNELS.metal_ops_loaded),
optimizer/kernels.py: msg="RoPE on MPS requires the Metal extension to be available and loaded at startup.",
optimizer/adamw_master.py: # Fast path: fused HAL kernel (Metal on MPS, Triton on CUDA).
optimizer/triton_runtime.py:The Metal/MPS fused DBA work requires backend-agnostic detection, which now lives in
optimizer/runtime.py:"""Backend availability detection (Triton + Metal/MPS).
optimizer/runtime.py:- Metal (MPS) for Apple Silicon fused DBA decode (custom MSL kernel + ObjC++ bridge)
optimizer/runtime.py: "metal_supported",
optimizer/runtime.py:def metal_build_tools_available() -> bool:
optimizer/runtime.py: """Whether the host can compile Metal shaders via Xcode toolchain.
optimizer/runtime.py: must contain the `metal` and `metallib` tools.
optimizer/runtime.py: # Do not call metal_supported() here: metal_supported() is a *runtime* check,
optimizer/runtime.py: subprocess.check_output(["xcrun", "-sdk", "macosx", "--find", "metal"], stderr=subprocess.STDOUT)
optimizer/runtime.py: subprocess.check_output(["xcrun", "-sdk", "macosx", "--find", "metallib"], stderr=subprocess.STDOUT)
optimizer/runtime.py:def metal_supported() -> bool:
optimizer/metal/layernorm_test.py:from caramba.optimizer.runtime import metal_supported
optimizer/metal/layernorm_test.py:def metal_ops() -> object:
optimizer/metal/layernorm_test.py: """Build/load the Metal extension or skip the suite."""
optimizer/metal/layernorm_test.py: from caramba.optimizer.metal.jit import load_caramba_metal_ops
optimizer/metal/layernorm_test.py: return load_caramba_metal_ops(verbose=False)
optimizer/metal/layernorm_test.py: pytest.skip(f"caramba metal extension unavailable: {e}")
optimizer/metal/layernorm_test.py:@pytest.mark.skipif(not metal_supported(), reason="Metal/MPS not supported on this platform")
optimizer/metal/layernorm_test.py:def test_metal_layernorm_matches_reference(metal_ops: object) -> None:
optimizer/metal/layernorm_test.py: _ = metal_ops
optimizer/metal/layernorm_test.py: from caramba.optimizer.metal import layernorm_fp16
optimizer/metal/rope.py:"""Fused RoPE wrapper for the Metal extension."""
optimizer/metal/rope.py:from caramba.optimizer.runtime import metal_supported
optimizer/metal/rope.py:from .jit import load_caramba_metal_ops
optimizer/metal/rope.py:def metal_rope_available() -> bool:
optimizer/metal/rope.py: """Whether the runtime is capable of using the Metal RoPE path."""
optimizer/metal/rope.py: return metal_supported()
optimizer/metal/rope.py:class _MetalRoPEFn(torch.autograd.Function):
optimizer/metal/rope.py: raise RuntimeError("Metal RoPE requires device.type == 'mps'")
optimizer/metal/rope.py: raise RuntimeError("Metal RoPE currently supports fp16/fp32 only")
optimizer/metal/rope.py: ops = load_caramba_metal_ops(verbose=bool(verbose_build))
optimizer/metal/rmsnorm.py:"""Fused RMSNorm wrapper for the Metal extension."""
optimizer/metal/rmsnorm.py:from caramba.optimizer.runtime import metal_supported
optimizer/metal/rmsnorm.py:from .jit import load_caramba_metal_ops
optimizer/metal/rmsnorm.py:def metal_rmsnorm_available() -> bool:
optimizer/metal/rmsnorm.py: """Whether the runtime is capable of using the Metal RMSNorm path."""
optimizer/metal/rmsnorm.py: return metal_supported()
optimizer/metal/rmsnorm.py:class _MetalRMSNormFn(torch.autograd.Function):
optimizer/metal/rmsnorm.py: raise RuntimeError("Metal RMSNorm requires device.type == 'mps'")
optimizer/metal/rmsnorm.py: raise RuntimeError("Metal RMSNorm currently supports fp16/fp32 only")
optimizer/metal/rmsnorm.py: ops = load_caramba_metal_ops(verbose=bool(verbose_build))
optimizer/metal/lion.py:"""Fused Lion optimizer update wrapper for the Metal extension."""
optimizer/metal/lion.py:from caramba.optimizer.runtime import metal_supported
optimizer/metal/lion.py:from .jit import load_caramba_metal_ops
optimizer/metal/lion.py:def metal_lion_available() -> bool:
optimizer/metal/lion.py: return metal_supported()
optimizer/metal/lion.py: raise RuntimeError("Metal Lion requires device.type == 'mps'")
optimizer/metal/lion.py: raise RuntimeError("Metal Lion currently supports fp16/fp32 tensors only (matching)")
optimizer/metal/lion.py: f"Metal Lion requires matching shapes for p/grad/m, got p={tuple(p.shape)}, grad={tuple(grad.shape)}, m={tuple(m.shape)}"
optimizer/metal/lion.py: ops = load_caramba_metal_ops(verbose=bool(verbose_build))
optimizer/metal/layernorm.py:"""Fused LayerNorm wrapper for the Metal extension."""
optimizer/metal/layernorm.py:from caramba.optimizer.runtime import metal_supported
optimizer/metal/layernorm.py:from .jit import load_caramba_metal_ops
optimizer/metal/layernorm.py:def metal_layernorm_available() -> bool:
optimizer/metal/layernorm.py: """Whether the runtime is capable of using the Metal LayerNorm path."""
optimizer/metal/layernorm.py: return metal_supported()
optimizer/metal/layernorm.py:class _MetalLayerNormFn(torch.autograd.Function):
optimizer/metal/layernorm.py: raise RuntimeError("Metal LayerNorm requires device.type == 'mps'")
optimizer/metal/layernorm.py: raise RuntimeError("Metal LayerNorm currently supports fp16/fp32 only")
optimizer/metal/layernorm.py: raise RuntimeError("Metal LayerNorm expects x.dim() >= 1")
optimizer/metal/ssm.py:from caramba.optimizer.runtime import metal_supported
optimizer/metal/ssm.py:from .jit import load_caramba_metal_ops
optimizer/metal/ssm.py:class MetalSSMScanAvailability:
optimizer/metal/ssm.py: return metal_supported()
optimizer/metal/ssm.py:class MetalSSMSelectiveScanFn(torch.autograd.Function):
optimizer/metal/ssm.py: raise RuntimeError("Metal SSM selective scan requires device.type == 'mps'")
optimizer/metal/ssm.py: raise RuntimeError("Metal SSM selective scan currently supports fp16 only")
optimizer/metal/ssm.py: ops = load_caramba_metal_ops(verbose=bool(verbose_build))
optimizer/metal/ssm.py: raise RuntimeError("Metal SSM selective scan backward requires grad_y")
optimizer/metal/ssm.py: raise RuntimeError("Metal SSM selective scan backward requires MPS grad_y")
optimizer/metal/rope_test.py:from caramba.optimizer.runtime import metal_supported
optimizer/metal/rope_test.py:@pytest.mark.skipif(not metal_supported(), reason="Metal/MPS not supported on this platform")
optimizer/metal/rope_test.py:def test_metal_rope_matches_reference() -> None:
optimizer/metal/rope_test.py: from caramba.optimizer.metal.jit import load_caramba_metal_ops
optimizer/metal/rope_test.py: _ = load_caramba_metal_ops(verbose=False)
optimizer/metal/rope_test.py: pytest.skip(f"caramba metal extension unavailable: {e}")
optimizer/metal/rope_test.py: from caramba.optimizer.metal import rope_fp16
optimizer/metal/rope_test.py: out_metal = rope_fp16(x=x, cos=cos, sin=sin, rot_dim=rot)
optimizer/metal/rope_test.py: assert out_metal.shape == out_ref.shape
optimizer/metal/rope_test.py: assert torch.allclose(out_metal, out_ref, atol=3e-2, rtol=3e-2)
optimizer/metal/ssm_test.py:from caramba.optimizer.runtime import metal_supported
optimizer/metal/ssm_test.py:def _skip_if_no_metal_extension() -> None:
optimizer/metal/ssm_test.py: from caramba.optimizer.metal.jit import load_caramba_metal_ops
optimizer/metal/ssm_test.py: _ = load_caramba_metal_ops(verbose=False)
optimizer/metal/ssm_test.py: pytest.skip(f"caramba metal extension unavailable: {e}")
optimizer/metal/ssm_test.py:@pytest.mark.skipif(not metal_supported(), reason="Metal/MPS not supported on this platform")
optimizer/metal/ssm_test.py:def test_metal_ssm_scan_forward_matches_reference() -> None:
optimizer/metal/ssm_test.py: _skip_if_no_metal_extension()
optimizer/metal/ssm_test.py: from caramba.optimizer.metal import MetalSSMSelectiveScan
optimizer/metal/ssm_test.py: y_metal = MetalSSMSelectiveScan().run(x=x, dt=dt, A=A, B=B, C=C, D=D, verbose_build=False)
optimizer/metal/rmsnorm_test.py:from caramba.optimizer.runtime import metal_supported
optimizer/metal/rmsnorm_test.py:@pytest.mark.skipif(not metal_supported(), reason="Metal/MPS not supported on this platform")
optimizer/metal/rmsnorm_test.py:def test_metal_rmsnorm_matches_reference() -> None:
optimizer/metal/rmsnorm_test.py: from caramba.optimizer.metal.jit import load_caramba_metal_ops
optimizer/metal/rmsnorm_test.py: _ = load_caramba_metal_ops(verbose=False)
optimizer/metal/rmsnorm_test.py: pytest.skip(f"caramba metal extension unavailable: {e}")
optimizer/metal/rmsnorm_test.py: from caramba.optimizer.metal import rmsnorm_fp16
optimizer/metal/rmsnorm_test.py: out_metal = rmsnorm_fp16(x=x, weight=w, eps=eps)
optimizer/metal/rmsnorm_test.py: assert out_metal.shape == out_ref.shape
optimizer/metal/rmsnorm_test.py: assert torch.allclose(out_metal, out_ref, atol=2e-2, rtol=2e-2)
optimizer/metal/setup.py:def compile_metal(out_dir: Path) -> None:
optimizer/metal/setup.py: HERE / "dba_decode.metal",
optimizer/metal/setup.py: HERE / "rmsnorm.metal",
optimizer/metal/setup.py: HERE / "layernorm.metal",
optimizer/metal/setup.py: HERE / "rope.metal",
optimizer/metal/setup.py: HERE / "lion.metal",
optimizer/metal/setup.py: HERE / "adamw.metal",
optimizer/metal/setup.py: HERE / "ssm_scan.metal",
optimizer/metal/setup.py: raise FileNotFoundError(f"Missing Metal source: {src}")
optimizer/metal/setup.py: metallib = out_dir / "caramba_ops.metallib"
optimizer/metal/adamw.py:from caramba.optimizer.runtime import metal_supported
optimizer/metal/adamw.py:from .jit import load_caramba_metal_ops
optimizer/metal/adamw.py:def metal_adamw_available() -> bool:
optimizer/metal/adamw.py: return metal_supported()
optimizer/metal/adamw.py: raise RuntimeError("Metal AdamWMaster requires device.type == 'mps'")
optimizer/metal/adamw.py: raise RuntimeError("Metal AdamWMaster requires fp16/fp32 p/grad (matching)")
optimizer/metal/adamw.py: raise RuntimeError("Metal AdamWMaster requires fp32 master/exp_avg/exp_avg_sq")
optimizer/metal/adamw.py: raise RuntimeError("Metal AdamWMaster requires p and grad shapes to match")
optimizer/metal/adamw.py: raise RuntimeError("Metal AdamWMaster requires state tensors to match param shape")
optimizer/metal/adamw.py: ops = load_caramba_metal_ops(verbose=bool(verbose_build))
optimizer/metal/lion_test.py:from caramba.optimizer.runtime import metal_supported
optimizer/metal/lion_test.py:@pytest.mark.skipif(not metal_supported(), reason="Metal/MPS not supported on this platform")
optimizer/metal/lion_test.py:def test_metal_lion_step_matches_reference() -> None:
optimizer/metal/lion_test.py: from caramba.optimizer.metal.jit import load_caramba_metal_ops
optimizer/metal/lion_test.py: _ = load_caramba_metal_ops(verbose=False)
optimizer/metal/lion_test.py: pytest.skip(f"caramba metal extension unavailable: {e}")
optimizer/metal/lion_test.py: from caramba.optimizer.metal import lion_fp16
optimizer/metal/dba_decode.py:"""Fused DBA decode wrapper for the Metal extension."""
optimizer/metal/dba_decode.py:from caramba.optimizer.runtime import metal_supported
optimizer/metal/dba_decode.py:from .jit import load_caramba_metal_ops
optimizer/metal/dba_decode.py:def metal_dba_decode_available() -> bool:
optimizer/metal/dba_decode.py: """Whether the runtime is capable of using the Metal DBA decode path.
optimizer/metal/dba_decode.py: return metal_supported()
optimizer/metal/dba_decode.py: """Fused DBA decode (MPS/Metal) for fp16 KV caches.
optimizer/metal/dba_decode.py: raise RuntimeError("Metal DBA decode requires device.type == 'mps'")
optimizer/metal/dba_decode.py: raise ValueError(f"Metal DBA decode requires v_hd <= 256 (got v_hd={v_hd})")
optimizer/metal/dba_decode.py: ops = load_caramba_metal_ops(verbose=bool(verbose_build))
optimizer/metal/attention_training_test.py:from caramba.optimizer.runtime import metal_supported
optimizer/metal/attention_training_test.py:def _skip_if_no_metal_attention_extension() -> None:
optimizer/metal/attention_training_test.py: from caramba.optimizer.metal.attention_jit import load_caramba_metal_attention_ops
optimizer/metal/attention_training_test.py: _ = load_caramba_metal_attention_ops(verbose=False)
optimizer/metal/attention_training_test.py: pytest.skip(f"caramba metal attention extension unavailable: {e}")
optimizer/metal/attention_training_test.py:@pytest.mark.skipif(not metal_supported(), reason="Metal/MPS not supported on this platform")
optimizer/metal/attention_training_test.py:def test_metal_attention_training_forward_matches_sdpa(causal: bool) -> None:
optimizer/metal/attention_training_test.py: _skip_if_no_metal_attention_extension()
optimizer/metal/attention_training_test.py: from caramba.optimizer.metal.attention_training import MetalAttentionTraining
optimizer/metal/attention_training_test.py: y = MetalAttentionTraining().run(q=q, k=k, v=v, causal=causal, scale=scale, dropout_p=0.0)
optimizer/metal/attention_jit.py:"""JIT build + load the Metal attention training extension.
optimizer/metal/attention_jit.py:This is intentionally separate from `optimizer/metal/jit.py` to keep the
optimizer/metal/attention_jit.py:from caramba.optimizer.runtime import metal_supported, metal_build_tools_available
optimizer/metal/attention_jit.py: " `xcrun -sdk macosx --find metal`\n"
optimizer/metal/attention_jit.py: " `xcrun -sdk macosx --find metallib`\n"
optimizer/metal/attention_jit.py:def _compile_attention_metallib(*, out_dir: Path, verbose: bool) -> Path:
optimizer/metal/attention_jit.py: """Compile attention training Metal shaders -> metallib in `out_dir`."""
optimizer/metal/attention_jit.py: src = _this_dir() / "attention_train.metal"
optimizer/metal/attention_jit.py: metallib = out_dir / "caramba_attention_ops.metallib"
optimizer/metal/attention_jit.py: metal = _xcrun_find("metal")
optimizer/metal/rmsnorm_autograd_test.py:class TestMetalRMSNormAutograd:
optimizer/metal/rmsnorm_autograd_test.py: # Metal
optimizer/metal/rmsnorm_autograd_test.py: from caramba.optimizer.metal.rmsnorm import rmsnorm_fp16
optimizer/metal/dba_decode_test.py:from caramba.optimizer.runtime import metal_supported
optimizer/metal/dba_decode_test.py:@pytest.mark.skipif(not metal_supported(), reason="Metal/MPS not supported on this platform")
optimizer/metal/dba_decode_test.py:def test_metal_dba_decode_matches_reference() -> None:
optimizer/metal/dba_decode_test.py: from caramba.optimizer.metal.jit import load_caramba_metal_ops
optimizer/metal/dba_decode_test.py: _ = load_caramba_metal_ops(verbose=False)
optimizer/metal/dba_decode_test.py: pytest.skip(f"caramba metal extension unavailable: {e}")
optimizer/metal/dba_decode_test.py: from caramba.optimizer.metal import dba_decode_fp16
optimizer/metal/dba_decode_test.py: out_metal = dba_decode_fp16(
optimizer/metal/dba_decode_test.py: assert out_metal.shape == out_ref.shape
optimizer/metal/dba_decode_test.py: assert torch.allclose(out_metal, out_ref, atol=3e-2, rtol=3e-2)
optimizer/lion.py:"""Lion optimizer (with optional Metal fast path).
optimizer/lion.py:for MPS/Metal fused update experiments.
optimizer/lion.py: # Metal fused path (fp16 on MPS).
optimizer/metal/layernorm_autograd_test.py:from caramba.optimizer.runtime import metal_supported
optimizer/metal/layernorm_autograd_test.py:def _skip_if_no_metal_extension() -> None:
optimizer/metal/layernorm_autograd_test.py: from caramba.optimizer.metal.jit import load_caramba_metal_ops
optimizer/metal/layernorm_autograd_test.py: _ = load_caramba_metal_ops(verbose=False)
optimizer/metal/layernorm_autograd_test.py: pytest.skip(f"caramba metal extension unavailable: {e}")
optimizer/metal/layernorm_autograd_test.py:@pytest.mark.skipif(not metal_supported(), reason="Metal/MPS not supported on this platform")
optimizer/metal/layernorm_autograd_test.py: _skip_if_no_metal_extension()
optimizer/metal/layernorm_autograd_test.py: from caramba.optimizer.metal.layernorm import layernorm_fp16
optimizer/metal/layernorm_autograd_test.py:@pytest.mark.skipif(not metal_supported(), reason="Metal/MPS not supported on this platform")
optimizer/metal/layernorm_autograd_test.py: _skip_if_no_metal_extension()
optimizer/kernel_registry.py: metal_build_tools_available,
optimizer/kernel_registry.py: metal_supported,
optimizer/kernel_registry.py: metal_supported: bool
optimizer/kernel_registry.py: metal_build_tools_available: bool
optimizer/kernel_registry.py: metal_ops_loaded: bool
optimizer/kernel_registry.py: metal_supported=bool(metal_supported()),
optimizer/kernel_registry.py: metal_build_tools_available=bool(metal_build_tools_available()),
optimizer/kernel_registry.py: metal_ops_loaded=False,
optimizer/kernel_registry.py: # ---- Metal/MPS validation (compile+load extension at startup) ----
optimizer/kernel_registry.py: metal_ops_loaded = False
optimizer/metal/__init__.py:"""Metal (MPS) fused kernels for Apple Silicon.
optimizer/metal/__init__.py:attention *decode* path for MPS, backed by a custom Metal Shading Language kernel
optimizer/metal/__init__.py:softmax/value matmuls. The Metal kernel performs a numerically-stable, two-pass
optimizer/metal/__init__.py:from .dba_decode import dba_decode_fp16, metal_dba_decode_available
optimizer/metal/__init__.py:from .attention_training import MetalAttentionTraining, metal_attention_training_available
optimizer/metal/__init__.py:from .adamw import AdamWMasterStep, metal_adamw_available
optimizer/metal/__init__.py:from .rmsnorm import metal_rmsnorm_available, rmsnorm_fp16
optimizer/metal/__init__.py:from .layernorm import layernorm_fp16, metal_layernorm_available
optimizer/metal/__init__.py:from .rope import metal_rope_available, rope_fp16
optimizer/metal/__init__.py:from .lion import lion_fp16, metal_lion_available
optimizer/attention.py:- MPS training uses custom Metal fused attention (forward+backward).
optimizer/attention.py:from caramba.optimizer.metal.attention_training import MetalAttentionTraining
optimizer/attention.py: return MetalAttentionTraining().run(
optimizer/metal/jit.py:"""JIT build + load the Metal extension.
optimizer/metal/jit.py:from caramba.optimizer.runtime import metal_build_tools_available, metal_supported
optimizer/metal/jit.py: " `xcrun -sdk macosx --find metal`\n"
optimizer/metal/jit.py: " `xcrun -sdk macosx --find metallib`\n"
optimizer/metal/jit.py:def _compile_metallib(*, out_dir: Path, verbose: bool) -> Path:
optimizer/metal/jit.py: """Compile Metal shaders -> `caramba_ops.metallib` in `out_dir`."""
optimizer/metal/jit.py: _this_dir() / "dba_decode.metal",
optimizer/metal/jit.py: _this_dir() / "rmsnorm.metal",
optimizer/metal/jit.py: _this_dir() / "layernorm.metal",
optimizer/metal/jit.py: _this_dir() / "rope.metal",
optimizer/adamw_triton_test.py: # Matches Metal kernel semantics exactly.
optimizer/metal/attention_training.py:"""Metal fused attention training (forward + backward).
optimizer/metal/attention_training.py:from caramba.optimizer.runtime import metal_supported
optimizer/metal/attention_training.py:from .attention_jit import load_caramba_metal_attention_ops
optimizer/metal/attention_training.py:def metal_attention_training_available() -> bool:
optimizer/metal/attention_training.py: """Whether the runtime is capable of using the Metal attention training path."""
optimizer/metal/attention_training.py: return metal_supported()
optimizer/metal/attention_training.py:class _MetalAttnTrainFn(torch.autograd.Function):
optimizer/metal/attention_training.py: raise RuntimeError("MetalAttentionTraining requires device.type == 'mps'")
optimizer/metal/attention_training.py: raise RuntimeError(f"MetalAttentionTraining requires fp16 inputs on MPS (got dtype={q.dtype})")
optimizer/metal/attention_training.py: raise RuntimeError("MetalAttentionTraining requires q/k/v shapes to match (B,H,T,D)")
optimizer/adamw_triton_kernels.py:Implements the same update semantics as the Metal `adamw_master_step_fp16` kernel:
Script executed:
# Get the full PR diff summary to understand scope
git diff --stat main..HEAD 2>/dev/null | head -30Repository: TheApeMachine/caramba
Repository: TheApeMachine/caramba
Exit code: 0
stdout:
.vscode/c_cpp_properties.json | 26 +
artifacts/checkpoints/.gitattributes | 35 +
artifacts/checkpoints/LICENSE.txt | 111 +
artifacts/checkpoints/README.md | 473 +
artifacts/checkpoints/USE_POLICY.md | 52 +
artifacts/checkpoints/config.json | 35 +
artifacts/checkpoints/generation_config.json | 9 +
artifacts/checkpoints/original/params.json | 12 +
artifacts/checkpoints/original/tokenizer.model | 128000 ++++++
artifacts/checkpoints/special_tokens_map.json | 16 +
artifacts/checkpoints/tokenizer.json | 410563 ++++++++++++++++++
artifacts/checkpoints/tokenizer_config.json | 2061 +
artifacts/mosaic/paper.pdf | Bin 589308 -> 598293 bytes
config/compute.py | 51 +
config/compute_test.py | 54 +
config/defaults.py | 9 +-
config/layer.py | 9 +-
config/presets/mosaic_resonant.yml | 122 +
config/target.py | 2 +
core/__init__.py | 6 +-
core/event_codec/__init__.py | 27 +-
core/event_codec/binary_codec.py | 81 -
core/event_codec/binary_codec_test.py | 33 -
core/event_codec/capnp_codec.py | 164 +
core/event_codec/capnp_codec_test.py | 70 +
core/event_codec/event.capnp | 38 +
core/event_codec/json_codec.py | 103 -
core/event_codec/stream_parser.py | 41 -
examples/lightning_vastai_demo.yml | 53 +
experiment/runner.py | 28 +-
Script executed:
# Check if there are actual implementation files for MOSAIC in this PR
git diff --name-only main..HEAD | grep -i mosaic | head -20Repository: TheApeMachine/caramba
Repository: TheApeMachine/caramba
Exit code: 0
stdout:
artifacts/mosaic/paper.pdf
config/presets/mosaic_resonant.yml
research/ccp/Cognitive_Control_Plane_MOSAIC_Runtime.pdf
Script executed:
# Read more of the notes file to see if it documents the PRs actual changes
cat -n ./research/ccp/notes\ copy.md | tail -100Repository: TheApeMachine/caramba
Repository: TheApeMachine/caramba
Exit code: 0
stdout:
163 optimizer: torch.optim.Optimizer
164 update_interval: int = 1000 # tokens between updates
165 replay_buffer_size: int = 10000
166
167 def maybe_step(self, loss: Tensor, token_count: int):
168 self.buffer.append((loss.detach(), token_count))
169
170 if token_count % self.update_interval == 0:
171 # Compute gradient on adapter params only
172 for p in self.model.parameters():
173 p.requires_grad = p in self.adapter_params
174
175 # Aggregate recent losses
176 recent_loss = torch.stack([l for l, _ in self.buffer[-100:]])
177 recent_loss.mean().backward()
178 self.optimizer.step()
179 self.optimizer.zero_grad()
180 ---
181
182 Phase 5: Native Tool Building
183 Implement the mechanism for the model to create new tools, not just use predefined ones.
184
185 Approach: The model emits a structured "tool definition" event that gets validated and registered.
186
187 Files to create/modify:
188
189 New file: ai/tools/builder.py - Tool builder/registry
190 core/event.py - Add ToolDefinition event type
191 infer/event_runtime.py - Handle tool creation events
192 Tool Definition Schema:
193
194 @dataclass
195 class ToolDefinition:
196 name: str
197 description: str
198 parameters: dict[str, ParameterSpec]
199 implementation: str # Python code or MCP endpoint
200 sandbox: bool = True # Run in sandbox by default
201
202 def validate(self) -> list[str]:
203 """Return list of validation errors, empty if valid."""
204 errors = []
205 if not self.name.isidentifier():
206 errors.append(f"Invalid tool name: {self.name}")
207 # ... more validation
208 return errors
209 Event flow:
210
211 ToolRegistry
212 ToolBuilder
213 EventBus
214 Model
215 ToolRegistry
216 ToolBuilder
217 EventBus
218 Model
219 alt
220 [Valid]
221 [Invalid]
222 Emit ToolDefinition event
223 Handle event
224 Validate definition
225 Register new tool
226 Emit ToolRegistered event
227 Emit ToolRejected event
228 Safety constraints:
229
230 Sandboxed execution (no filesystem/network by default)
231 Rate limiting on tool creation
232 Human approval for privileged tools
233 Automatic deprecation of unused tools
234 ---
235
236 Architecture Diagram
237 Tool System
238 Continuous Learning
239 MOSAIC Model
240 Autonomous Runtime
241 ToolBuilder
242 ToolRegistry
243 Fast: Memory Writes
244 Medium: Adapter Updates
245 Slow: Full Training
246 OpcodeHead
247 MosaicBlock
248 Memory
249 Registers
250 EventBus
251 HomeostaticLoop
252 ImpulseRouter
253 ---
254
255 Testing Strategy
256 Each phase includes verification tests:
257
258 Opcodes: Unit test that READ_MEM opcode gates memory read contribution
259 Homeostasis: Test that high-entropy state triggers impulse event
260 Idle compute: Test that consolidation runs during no-event periods
261 Continuous learning: Test adapter weights change during inference
262 Tool building: Test that valid ToolDefinition creates usable tool
Script executed:
# Check the specific PR changes to optimizer/metal and related fp32 files
git diff --stat main..HEAD | grep -E "(metal|fp32)" | head -20Repository: TheApeMachine/caramba
Repository: TheApeMachine/caramba
Exit code: 0
stdout:
optimizer/metal/adamw.metal | 46 +-
optimizer/metal/adamw.py | 4 +-
optimizer/metal/layernorm.metal | 670 +-
optimizer/metal/layernorm.py | 26 +-
optimizer/metal/lion.metal | 40 +-
optimizer/metal/lion.py | 4 +-
optimizer/metal/ops.mm | 234 +-
optimizer/metal/rmsnorm.metal | 484 +-
optimizer/metal/rmsnorm.py | 27 +-
optimizer/metal/rope.metal | 108 +-
optimizer/metal/rope.py | 27 +-
Resolve mismatch between PR title and this document's scope.
The PR is titled "Feature/fp32 support for metal" and includes substantial Metal GPU fp32 optimizations (optimizer/metal/ kernels and wrappers). However, this markdown document describes a general MOSAIC Differentiable VM architecture roadmap with no Metal or fp32-specific content.
Either:
- This document should be removed from this PR, or
- Clarify whether it is intended as reference/design material for this PR (if so, note that in a commit message or comment), or
- Move this to a separate documentation or design PR
Also, consider renaming the file to remove "copy" from the filename.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
1-1: First line in a file should be a top-level heading
(MD041, first-line-heading, first-line-h1)
5-5: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🤖 Prompt for AI Agents
In @research/ccp/notes copy.md around lines 1 - 10, This file research/ccp/notes
copy.md is unrelated to the "Feature/fp32 support for metal" PR and should be
either removed or explicitly linked to the PR as design/reference material;
either delete the file from the branch, move it into a dedicated
documentation/design PR, or add a clear note in the commit message and at the
top of the file stating it is reference material for this PR (and rename to
remove "copy" from the filename) so reviewers understand its intent.
| This plan implements the five missing architectural elements discussed in the meeting notes to transform MOSAIC into a fully autonomous "Differentiable Virtual Machine." | ||
|
|
||
| Current State Analysis | ||
| What exists: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert emphasis to a proper heading.
Line 5 uses bold emphasis (**What exists:**) where a Markdown heading would be more semantically appropriate (MD036).
📝 Proposed fix
-**What exists:**
+## What ExistsConsider converting other section titles similarly:
- Line 13:
**What's missing:**→## What's Missing - Line 27:
**Implementation skeleton:**→ (already a comment, skip)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| What exists: | |
| ## What Exists |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
5-5: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🤖 Prompt for AI Agents
In @research/ccp/notes copy.md at line 5, Replace the bolded section title
"**What exists:**" with a proper Markdown heading (e.g., "## What exists") to
satisfy MD036; also apply the same treatment to similar titles such as changing
"**What's missing:**" to "## What's Missing" while leaving comments/sections
like "Implementation skeleton" unchanged.
| --- | ||
|
|
||
| Phase 1: Wire Opcodes to Control Behavior | ||
| Currently opcodes are emit-only. Wire them to actually gate subsystem operations. | ||
|
|
||
| Files to modify: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix ordered list numbering.
Lines 20–25 have inconsistent list prefixes (2, 3, 4, 5, 6, 7). Markdown list items should be numbered sequentially starting from 1 (MD029). This section appears to be a simple enumeration; use unordered lists (- or *) or restart numbering at 1.
📝 Proposed fix
-2. Opcodes don't control registers/memory (just logged)
-3. No impulse-driven awakening (only prompt-triggered)
-4. No idle-time compute loop (model waits passively)
-5. No medium/slow timescale learning (only fast writes)
-6. No tool creation mechanism (only tool usage)
+1. Opcodes don't control registers/memory (just logged)
+2. No impulse-driven awakening (only prompt-triggered)
+3. No idle-time compute loop (model waits passively)
+4. No medium/slow timescale learning (only fast writes)
+5. No tool creation mechanism (only tool usage)🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
20-20: Ordered list item prefix
Expected: 1; Actual: 2; Style: 1/1/1
(MD029, ol-prefix)
21-21: Ordered list item prefix
Expected: 1; Actual: 3; Style: 1/1/1
(MD029, ol-prefix)
22-22: Ordered list item prefix
Expected: 1; Actual: 4; Style: 1/1/1
(MD029, ol-prefix)
23-23: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1
(MD029, ol-prefix)
24-24: Ordered list item prefix
Expected: 1; Actual: 6; Style: 1/1/1
(MD029, ol-prefix)
25-25: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
25-25: Ordered list item prefix
Expected: 1; Actual: 7; Style: 1/1/1
(MD029, ol-prefix)
🤖 Prompt for AI Agents
In @research/ccp/notes copy.md around lines 20 - 25, The ordered list under the
"Phase 1: Wire Opcodes to Control Behavior" heading has inconsistent numbering
(lines currently show 2–7) and triggers MD029; fix it by either converting the
items to an unordered list (using "-" or "*") or renumbering the list to start
at 1 and be sequential; update the list items referenced in that section so they
are consistently formatted and valid Markdown (e.g., restart numbering at 1 or
replace the numeric prefixes with "-" for each item).
| Phase 1: Wire Opcodes to Control Behavior | ||
| Currently opcodes are emit-only. Wire them to actually gate subsystem operations. | ||
|
|
||
| Files to modify: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trailing spaces.
Lines 25 and 39 contain trailing whitespace (MD009), which is not allowed per Markdown style guidelines.
🧹 Proposed fix
Use a linter or editor with trim-trailing-whitespace enabled to automatically fix these.
Alternatively, manually remove the trailing space on:
- Line 25 (after "modify:")
- Line 39 (after "# Gate operations based on opcode")
Also applies to: 39-39
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
25-25: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
25-25: Ordered list item prefix
Expected: 1; Actual: 7; Style: 1/1/1
(MD029, ol-prefix)
🤖 Prompt for AI Agents
In @research/ccp/notes copy.md at line 25, Remove the trailing whitespace
characters on the two affected Markdown lines: delete the extra space after
"modify:" (the text "Files to modify:") and after "# Gate operations based on
opcode" to satisfy MD009; you can enable trim-trailing-whitespace in your editor
or run a markdown linter/formatter to strip trailing spaces automatically and
re-run the linter to confirm the fixes.
runtime/engine/lightning_engine.py
Outdated
| dataset_comp = self.registry.build(target.data, backend="torch") | ||
| system = self.registry.build(target.system, backend="torch") | ||
| objective = self.registry.build(target.objective, backend="torch") | ||
|
|
||
| # For Lightning, we typically run one "representative" run or handle multiple? | ||
| # Standard Caramba runner iterates over target.runs. | ||
| results = {} | ||
| for run in target.runs: | ||
| train_cfg = run.train | ||
| if not train_cfg: | ||
| continue | ||
|
|
||
| # Wrap in LightningModule | ||
| lightning_model = CarambaLightningModule(system, objective, train_cfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
System is shared across runs without re-initialization.
The system is built once (line 78) but used for all runs in the loop (line 90). If runs are meant to be independent experiments, this causes weight state to carry over. Either rebuild the system per run or document that runs are sequential fine-tuning.
🔧 Option A: Rebuild per run
results = {}
for run in target.runs:
train_cfg = run.train
if not train_cfg:
continue
+ # Build fresh system per run for independent experiments
+ system = self.registry.build(target.system, backend="torch")
+ objective = self.registry.build(target.objective, backend="torch")
+
# Wrap in LightningModule
lightning_model = CarambaLightningModule(system, objective, train_cfg)🔧 Option B: Add documentation
- # For Lightning, we typically run one "representative" run or handle multiple?
- # Standard Caramba runner iterates over target.runs.
+ # Note: system and objective are shared across runs, so runs represent
+ # sequential training (weights carry over). For independent runs,
+ # move system/objective building inside the loop.
results = {}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review continued from previous batch...
core/event_codec/capnp_codec.py
Outdated
| def _load_schema(): | ||
| """Load the Cap'n Proto schema file.""" | ||
| schema_path = Path(__file__).parent / "event.capnp" | ||
|
|
||
| if not schema_path.exists(): | ||
| raise FileNotFoundError(f"Cap'n Proto schema not found: {schema_path}") | ||
|
|
||
| return capnp.load(str(schema_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Optional: Add return type annotations to private functions.
While not critical, adding return type annotations improves IDE support and type checking. The static analyzer flags missing annotations for _load_schema() and _get_schema().
📝 Proposed type annotations
-def _load_schema():
+def _load_schema() -> Any: # or the specific capnp module type if available
"""Load the Cap'n Proto schema file."""-def _get_schema():
+def _get_schema() -> Any: # or the specific capnp module type if available
global _schemaNote: The exact type depends on the pycapnp library's type stubs. Using Any is acceptable if precise types are unavailable.
Also applies to: 34-38
🧰 Tools
🪛 Ruff (0.14.10)
20-20: Missing return type annotation for private function _load_schema
(ANN202)
25-25: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In @core/event_codec/capnp_codec.py around lines 20 - 27, Add return type
annotations for the private helper functions to satisfy the static analyzer:
annotate _load_schema() and _get_schema() with an appropriate return type (use
typing.Any if pycapnp types are unavailable), and add an import for Any from
typing. Ensure the function signatures include the return annotation (e.g., def
_load_schema() -> Any:) and keep behavior unchanged.
core/event_codec/capnp_codec.py
Outdated
| if not buf: | ||
| raise ValueError("Cap'n Proto serialization produced empty bytes") | ||
|
|
||
| return torch.tensor(list(buf), dtype=torch.long) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider using torch.uint8 instead of torch.long for byte storage.
The current implementation uses torch.long (int64) to store byte values, which consumes 8x more memory than necessary. Since Cap'n Proto serialization produces bytes (0-255), using torch.uint8 would be more memory-efficient and semantically correct.
♻️ Proposed refactor to use torch.uint8
- return torch.tensor(list(buf), dtype=torch.long)
+ return torch.tensor(list(buf), dtype=torch.uint8)Additionally, update the decoder at line 119 to match:
- vals = ids.detach().cpu().to(dtype=torch.int64).tolist()
+ vals = ids.detach().cpu().to(dtype=torch.uint8).tolist()And line 93 in encode_padded:
- ids = torch.full((batch_size, max_len), pad, dtype=torch.long)
+ ids = torch.full((batch_size, max_len), pad, dtype=torch.uint8)This change reduces memory footprint by 87.5% for encoded events.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return torch.tensor(list(buf), dtype=torch.long) | |
| return torch.tensor(list(buf), dtype=torch.uint8) |
🤖 Prompt for AI Agents
In @core/event_codec/capnp_codec.py at line 73, The tensor created from the byte
buffer is using dtype=torch.long and should use dtype=torch.uint8 to store bytes
efficiently: change the return in the function that does "return
torch.tensor(list(buf), dtype=torch.long)" to use torch.uint8; also update the
corresponding decoder function (the decoder at the review note) and the
encode_padded implementation (encode_padded) to produce and consume torch.uint8
tensors, and add explicit casting to larger integer types only where arithmetic
expects int64 so behavior doesn’t change.
| encoder: EventEncoder = field(default_factory=EventEncoder) | ||
| decoder: EventDecoder = field(default_factory=EventDecoder) | ||
| delimiter: int = 0 # Cap'n Proto uses 0-byte as segment delimiter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
Cap'n Proto message framing format segment delimiter null byte
💡 Result:
Short answer: No — there is no extra "null byte" delimiter between segments. The stream framing is:
- 4 bytes: (segment count − 1), little-endian.
- N × 4 bytes: each segment length in 64-bit words, little-endian.
- 0 or 4 bytes padding to the next 8-byte (word) boundary.
- Then the raw contents of each segment concatenated in order.
Segments are just concatenated back-to-back (with the above header/padding), and packing uses its own tag bytes inside words — there is no separate null-byte segment delimiter. [1][2]
Sources:
[1] Cap'n Proto Encoding / Serialization Over a Stream.
[2] Stack Overflow discussion: Cap'n Proto supports delimited messages.
🏁 Script executed:
# Find how delimiter is used in the codebase
rg "delimiter" infer/ -B 2 -A 2Repository: TheApeMachine/caramba
Length of output: 2833
🏁 Script executed:
# Look at the EventStreamCodec class and its usage of delimiter
rg -A 10 "class EventStreamCodec" infer/event_runtime.pyRepository: TheApeMachine/caramba
Length of output: 634
🏁 Script executed:
# Check if there's any framing logic or how messages are actually serialized
rg "EventEncoder|EventDecoder" infer/ -A 3 | head -50Repository: TheApeMachine/caramba
Length of output: 701
🏁 Script executed:
# Find EventEncoder and EventDecoder definitions
fd -e py "event_codec" infer/ --exec head -100 {} \;Repository: TheApeMachine/caramba
Length of output: 47
🏁 Script executed:
# Search for EventEncoder implementation details
rg "class EventEncoder" -A 20Repository: TheApeMachine/caramba
Length of output: 47
🏁 Script executed:
# Check EventDecoder to see how it handles/expects the delimiter
rg "class EventDecoder" -A 20Repository: TheApeMachine/caramba
Length of output: 47
🏁 Script executed:
# Look for token vocabulary or encoder range information
rg "EventEncoder|EventDecoder|vocab" infer/ -B 1 -A 1 | head -60Repository: TheApeMachine/caramba
Length of output: 3546
🏁 Script executed:
# Find event_codec file location
fd "event_codec" --type fRepository: TheApeMachine/caramba
Length of output: 47
🏁 Script executed:
# Check core directory structure
fd --type f -e py caramba/core/ | head -20Repository: TheApeMachine/caramba
Length of output: 47
🏁 Script executed:
# If file exists, read EventEncoder implementation
find . -name "event_codec.py" -exec head -150 {} \;Repository: TheApeMachine/caramba
Length of output: 47
🏁 Script executed:
# List the actual directory structure
git ls-files | grep -E "caramba|core|event" | head -30Repository: TheApeMachine/caramba
Length of output: 947
🏁 Script executed:
# Check for requirements or setup files that might indicate external dependencies
git ls-files | grep -E "requirements|setup|pyproject|poetry"Repository: TheApeMachine/caramba
Length of output: 384
🏁 Script executed:
# Look at the imports more carefully in event_runtime.py
head -30 infer/event_runtime.pyRepository: TheApeMachine/caramba
Length of output: 1008
🏁 Script executed:
# Check what's exported from event_codec/__init__.py
cat core/event_codec/__init__.pyRepository: TheApeMachine/caramba
Length of output: 611
🏁 Script executed:
# Read the capnp_codec.py file to find EventEncoder and EventDecoder definitions
cat core/event_codec/capnp_codec.pyRepository: TheApeMachine/caramba
Length of output: 5890
The comment about Cap'n Proto using 0-byte delimiters is incorrect, but the delimiter choice is still problematic.
Cap'n Proto uses length-prefixed framing (4-byte segment count, lengths, padding), not null-byte delimiters. However, Cap'n Proto can produce null bytes in its serialized output, which will collide with the delimiter=0 value appended by encode_with_delimiter(). The decoder cannot distinguish between a 0 byte that is part of the Cap'n Proto message and the 0 byte terminator. Either use a different delimiter byte (e.g., 255), implement length-prefixed framing like Cap'n Proto does internally, or document why this collision is safe in your use case.
🤖 Prompt for AI Agents
In @infer/event_runtime.py around lines 54 - 56, The delimiter=0 field and
comment are wrong for Cap'n Proto framing and risk collisions because Cap'n
Proto payloads can contain 0 bytes; change the framing: either replace delimiter
default with a safe unused byte (e.g., delimiter: int = 255) and update
encode_with_delimiter() and the decoder logic in EventEncoder/EventDecoder to
use that byte, or implement proper length-prefixed framing (write a 4-byte
length before each encoded message and update decode logic to read length then
that many bytes) so the decoder can unambiguously parse messages; also remove or
correct the misleading comment about Cap'n Proto using 0-byte delimiters.
| if "resonant_final_sim" in routing: | ||
| res_metrics = ResonantSettlingMetrics( | ||
| final_sim=float(routing["resonant_final_sim"]), | ||
| convergence_steps=int(routing.get("resonant_convergence_steps", 0)), | ||
| energy_drop=0.0, # TODO: implement if needed | ||
| bucket_entropy=float(routing.get("resonant_bucket_entropy", 0.0)), | ||
| state_drift=0.0, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, find the routing.py file and verify its location
find . -name "routing.py" -type f | head -20Repository: TheApeMachine/caramba
Length of output: 103
🏁 Script executed:
# Search for resonant_convergence_steps in the entire codebase
rg -n "resonant_convergence_steps" --type pyRepository: TheApeMachine/caramba
Length of output: 189
🏁 Script executed:
# Check if ResonantRouter class exists and where it's defined
rg -n "class ResonantRouter" --type pyRepository: TheApeMachine/caramba
Length of output: 138
🏁 Script executed:
# Read the ResonantRouter class, starting from line 179
sed -n '179,350p' layer/memory_block/memory/routing.pyRepository: TheApeMachine/caramba
Length of output: 5593
🏁 Script executed:
# Check the context around line 527 in memory.py to understand the flow
sed -n '510,540p' layer/memory_block/memory/memory.pyRepository: TheApeMachine/caramba
Length of output: 1612
resonant_convergence_steps is never populated by ResonantRouter.
ResonantRouter.route() does not add resonant_convergence_steps to the aux dict, so this will always default to 0. The method tracks energy history internally but never exports it. Either add convergence step tracking to ResonantRouter or remove this field from ResonantSettlingMetrics.
🤖 Prompt for AI Agents
In @layer/memory_block/memory/memory.py around lines 524 - 531,
ResonantSettlingMetrics is being populated with resonant_convergence_steps from
routing but ResonantRouter.route() never sets "resonant_convergence_steps", so
it always defaults to 0; either have ResonantRouter.route() compute and emit
convergence steps into its aux/routing dict (e.g., track when energy history
converges and set "resonant_convergence_steps") or remove the field usage here
and stop reading "resonant_convergence_steps" in memory.py; locate
ResonantRouter.route and the place where routing/aux is returned to add the
convergence counter, or adjust the construction of ResonantSettlingMetrics to
omit/respectably default that metric.
layer/memory_block/memory/tuner.py
Outdated
| if self.resonant_sim_ema is not None: | ||
| if self.resonant_sim_ema < 0.4: | ||
| # Weak coupling -> increase target | ||
| self.target_resonant_coupling = min(5.0, self.target_resonant_coupling * 1.05) | ||
| elif self.resonant_sim_ema > 0.9: | ||
| # Strong overlap -> relax coupling | ||
| self.target_resonant_coupling = max(0.2, self.target_resonant_coupling * 0.98) | ||
|
|
||
| if self.resonant_steps_ema is not None and self.resonant_steps_ema > 15: | ||
| self.target_resonant_steps_delta = min(20.0, self.target_resonant_steps_delta + 0.1) | ||
|
|
||
| # --- Storage/Write Logic --- | ||
| if self.utilization_ema is not None: | ||
| if self.utilization_ema < self.target_utilization * 0.5: | ||
| # Starvation -> lower targets to admit more | ||
| self.target_write_threshold = max(0.1, self.target_write_threshold * 0.95) | ||
| self.target_vsa_novelty = max(0.1, self.target_vsa_novelty * 0.95) | ||
|
|
||
| if self.conflict_ema is not None and self.conflict_ema > 0.3: | ||
| # Saturation -> increase targets | ||
| self.target_vsa_novelty = min(10.0, self.target_vsa_novelty * 1.05) | ||
| self.target_write_threshold = min(10.0, self.target_write_threshold * 1.05) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Document or extract magic numbers as named constants.
The heuristic thresholds (0.4, 0.9, 15, 0.5, 0.3) and adjustment factors (1.05, 0.98, 0.95) appear to be empirically derived but lack documentation. Consider extracting these as class constants with descriptive names or documenting their rationale in comments.
📚 Example refactor with named constants
Add class constants at the beginning of the class:
class UniversalMemoryTuner:
"""Cybernetic controller for memory blocks."""
# Heuristic thresholds
WEAK_COUPLING_THRESHOLD = 0.4
STRONG_OVERLAP_THRESHOLD = 0.9
SLOW_CONVERGENCE_STEPS = 15
STARVATION_THRESHOLD = 0.5 # fraction of target_utilization
HIGH_CONFLICT_THRESHOLD = 0.3
# Adjustment factors
INCREASE_FACTOR = 1.05
DECREASE_FACTOR = 0.98
AGGRESSIVE_DECREASE_FACTOR = 0.95Then update the heuristics:
if self.resonant_sim_ema is not None:
- if self.resonant_sim_ema < 0.4:
+ if self.resonant_sim_ema < self.WEAK_COUPLING_THRESHOLD:
# Weak coupling -> increase target
- self.target_resonant_coupling = min(5.0, self.target_resonant_coupling * 1.05)
+ self.target_resonant_coupling = min(5.0, self.target_resonant_coupling * self.INCREASE_FACTOR)This improves maintainability and makes tuning adjustments easier.
layer/memory_block/memory/tuner.py
Outdated
| # 3. Update state | ||
| # Convert back to int for discrete levers | ||
| if actual_name == "resonant_steps_delta": | ||
| setattr(self, actual_name, int(round(final_val))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Remove redundant int() call.
The round() function already returns an int in Python 3, making the outer int() call unnecessary. While not harmful, removing it improves clarity.
✨ Proposed simplification
- setattr(self, actual_name, int(round(final_val)))
+ setattr(self, actual_name, round(final_val))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| setattr(self, actual_name, int(round(final_val))) | |
| setattr(self, actual_name, round(final_val)) |
🧰 Tools
🪛 Ruff (0.14.10)
170-170: Value being cast to int is already an integer
Remove unnecessary int call
(RUF046)
🤖 Prompt for AI Agents
In @layer/memory_block/memory/tuner.py at line 170, The setattr call in
MemoryTuner (look for setattr(self, actual_name, int(round(final_val)))) uses an
unnecessary outer int() because round(final_val) already returns an int in
Python 3; remove the redundant int() and set the attribute with round(final_val)
directly to improve clarity.
…ore, added CCL trainer, and improved error handling across various modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 81
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
api/app.py (1)
281-294: Narrow theexcept Exceptionaroundos.kill()toOSError(fix Ruff BLE001).
os.kill()raisesOSErrorsubclasses; catchingExceptioncan mask unrelated bugs.Proposed diff
if proc is None: # best-effort by pid try: os.kill(pid, signal.SIGTERM) except ProcessLookupError: return True - except Exception as e: + except OSError as e: logger.warning(f"StopRun: Failed to terminate process {pid} (best-effort): {e}") return False return Trueai/worker.py (1)
28-40: Add error handling to prevent worker crashes.The infinite loop in the
run()method has no error handling around the database operations. Iflist_tasksraises an exception (e.g., database connection failure), the entire worker will crash and stop processing tasks.🛡️ Proposed fix to add error handling
async def run(self) -> None: """Run the background worker.""" logger.trace("Running TaskWorker") while True: - await asyncio.sleep(1) - logger.trace("Polling for pending tasks") - tasks = [t async for t in self.task_store.list_tasks(tenant=self.tenant)] - logger.trace(f"Found {len(tasks)} pending tasks") - - for task in tasks: - logger.trace(f"Enqueuing task {task.id}") - await self.queue.put(task) + try: + await asyncio.sleep(1) + logger.trace("Polling for pending tasks") + tasks = [t async for t in self.task_store.list_tasks(tenant=self.tenant)] + logger.trace(f"Found {len(tasks)} pending tasks") + + for task in tasks: + logger.trace(f"Enqueuing task {task.id}") + await self.queue.put(task) + except Exception as e: + logger.error(f"Error polling tasks: {e}", exc_info=True) + await asyncio.sleep(5) # Back off on errorai/push_notifications.py (1)
11-11: Pin the httpx dependency to a specific version in pyproject.toml.The httpx library is listed in
pyproject.toml(line 64) without any version constraint. This allows any version to be installed, creating non-reproducible builds and risking breaking changes or security vulnerabilities. Specify an explicit version constraint (e.g.,"httpx>=0.24.0,<0.25.0").benchmark/artifacts.py (1)
155-163: Add BLE001 suppression and optionally format exception details more concisely.The
except Exceptionon line 159 is intentional for best-effort behavior (never fail the artifact run). Add# noqa: BLE001 – best-effort artifact generationto suppress the Ruff lint warning and document the justification.Line 162: Consider formatting the exception details as
f"({type(e).__name__}: {e})"for consistency with similar patterns in the codebase (e.g., trainer/ccl.py, trainer/upcycle.py); this reduces verbosity if the exception message is large. The import on line 161 is safe and widely used throughout the codebase with no documented issues.Suggested improvement (optional)
- except Exception as e: + except Exception as e: # noqa: BLE001 – best-effort artifact generation # Best-effort: never fail the run due to LaTeX formatting. from caramba.console import logger - logger.warning(f"ArtifactGenerator: Failed to write LaTeX behavior table: {e}") + logger.warning( + f"ArtifactGenerator: Failed to write LaTeX behavior table " + f"({type(e).__name__}: {e})" + )core/task_queue.py (1)
45-64: Use timezone-aware timestamps (and ideally TIMESTAMPTZ) for consistency across machines.Right now you write naive
datetime.now()intoTIMESTAMPcolumns (Ruff DTZ005 flagged Line 135), which is ambiguous if hosts differ in local timezone/DST. At minimum, write UTC-aware datetimes consistently; ideally, store asTIMESTAMPTZ.Minimal code-side fix (UTC-aware)
@@ -from datetime import datetime +from datetime import datetime, timezone @@ - datetime.now(), + datetime.now(timezone.utc), @@ - datetime.now(), + datetime.now(timezone.utc),Schema improvement (breaking; consider migration)
@@ - created_at TIMESTAMP NOT NULL, - updated_at TIMESTAMP NOT NULL + created_at TIMESTAMPTZ NOT NULL, + updated_at TIMESTAMPTZ NOT NULLAlso applies to: 135-136, 160-163
layer/memory_block/block/layer.py (1)
288-293: Variablestepis re-declared, shadowing the earlier assignment.Line 291 re-declares
stepfromctx.step, butstepwas already computed on lines 240-247 with fallback logic. This shadows the earlier value and could cause inconsistent behavior ifctx.stepdiffers from the fallback-computed step.If the intention is to use
ctx.stepspecifically for the write warmup check, consider using a different variable name.🔧 Suggested fix
if ctx is not None: warm = int(getattr(ctx, "memblock_write_warmup_steps", 0) or 0) - step = int(getattr(ctx, "step", 0) or 0) - if warm > 0 and step > 0 and step <= warm: + ctx_step = int(getattr(ctx, "step", 0) or 0) + if warm > 0 and ctx_step > 0 and ctx_step <= warm: write_mask = torch.zeros((int(B), int(T)), device=x.device, dtype=torch.float32)
🤖 Fix all issues with AI agents
In @.gitignore:
- Line 35: Remove the redundant specific artifact ignore entries now covered by
the broad "artifacts/" pattern: delete the lines containing
"artifacts/llama32_1b_dba_paper_efficiency/" and "artifacts/platform_improve/"
(and any duplicate occurrences such as the one noted at line 43) so only the
single "artifacts/" pattern remains.
In @ai/push_notifications.py:
- Around line 124-138: The credential normalization currently uses unsafe
str(val) conversions; change the logic in the credential handling block
(variables: cred, val, key, headers) to only treat a value as a token if it is
already a string (isinstance(val, str)); if val is bytes try decoding with utf-8
and use that string; otherwise treat it as invalid (leave key None) so you don't
set headers["Authorization"] to a coerced non-string token. Ensure the
if-branches that set key use these validations (replace str(val) calls) so only
legitimate string tokens populate headers.
In @ai/server.py:
- Around line 270-276: The in-loop import of MessageSendParams should be moved
out of the loop and merged with the module-level import from a2a.types (where
RequestContext is already imported); remove the repeated "from a2a.types import
MessageSendParams" inside the while True loop and add MessageSendParams to the
existing top-level import list so the RequestContext(...) call can continue to
use MessageSendParams without runtime repeated imports.
In @ai/worker.py:
- Line 12: Several files still import from the old module path
`a2a.server.tasks`; update those imports to the new internal path under
`caramba.ai.task` (e.g., replace `a2a.server.tasks` with `caramba.ai.task` in
ai/push_notifications.py, ai/server.py (all three import sites),
ai/task_store.py, ai/executor.py, and ai/task/store/__init__.py), ensuring each
referenced symbol such as TaskStore and any task-related classes/functions keep
their original names but point to the new module; after changes, run the test
suite or import checks to verify there are no broken references.
In @ccl/context_counts.py:
- Around line 119-124: The assignment uses a redundant int() around len(uniq);
replace "num_classes = int(len(uniq))" with "num_classes = len(uniq)" to avoid
the unnecessary cast while preserving behavior (uniq and len(uniq) are already
ints); keep the subsequent validation "if int(num_classes) <= 0" as-is or
optionally remove its cast if you want consistency.
- Line 11: Replace legacy typing imports on the top-level import line in
ccl/context_counts.py by removing Dict, List, Tuple, Optional and instead import
Iterable and Sequence from collections.abc; then update all type annotations in
this module: use built-in generic types (dict[int, int], list[ClassCountsModel],
tuple[int, int], etc.) and replace Optional[SparseCounts] with the union syntax
SparseCounts | None; ensure any remaining Iterable/Sequence references are using
the imported names from collections.abc rather than typing.
In @ccl/patch_vq.py:
- Around line 76-82: The unpacked but unused variables should be prefixed with
an underscore to satisfy linters: change the unpack of x.shape from "n, d =
x.shape" to "n, _d = x.shape" (or "n, _ = x.shape") and similarly rename the
unused unpacked "k" at the later unpack site to "_k" (or "_") so the variables
are clearly marked as intentionally unused; update any references if they were
accidentally relying on those names.
- Around line 164-177: The _decode_grid function currently accepts out_h and
out_w but never uses them; either remove these parameters from the _decode_grid
signature and docstring and update all call sites that pass out_h/out_w to stop
providing them (search for _decode_grid usages), or keep the parameters and add
validation inside _decode_grid that computed h and w (from ht,wt,stride,patch)
match out_h and out_w (raise a clear ValueError with context if they differ);
update callers only if you choose to remove the parameters.
In @ccl/system.py:
- Around line 1-8: The module docstring in ccl.system is placed after the from
__future__ import annotations statement so it isn't recognized as the module
__doc__; move the triple-quoted module docstring to be the very first statement
in the file (before from __future__ import annotations) so it becomes the module
docstring while keeping the existing content and preserving the future import.
- Around line 82-90: The local import of loglik_grid inside the forward
implementation causes unnecessary per-call lookup overhead; move the import to
module scope by adding "from caramba.ccl.context_counts import loglik_grid" at
the top of ccl/system.py and remove the in-method import on line containing
"from caramba.ccl.context_counts import loglik_grid", leaving the loop that
calls loglik_grid(m, grid) unchanged.
In @codegraph/parser.py:
- Line 271: The warning log currently uses an f-string which eagerly formats
even when the message will be suppressed; change the call to use logging's
parameterized form by passing a format string and arguments to logger.warning
(replace logger.warning(f"Codegraph: Failed to read {file_path}: {e}") with a
parameterized call that references file_path and e), ensuring lazy evaluation
and preserving the same message content and variables.
In @config/presets/ccl_mnist.yml:
- Around line 85-86: The file ends with a trailing blank line after the YAML key
"dtype: float32"; remove the extra empty line at the end of
config/presets/ccl_mnist.yml so the file terminates immediately after the
"dtype: float32" line (no final blank line) to satisfy YAML linting.
In @console/logger.py:
- Around line 352-360: The conditional sets speed_style to an undefined theme
key "amber" (in the velocity < 0 branch), causing fallback; update the velocity
branch in the logger where speed_style and arrow are set (the block assigning
speed_style/arrow for velocity >0, <0, ==0) to use the theme's correct key
"warning" instead of "amber" so negative velocities use the defined warning
style.
- Around line 536-543: The finally block cleans _live_display and
_live_renderable but leaves panel state intact, so clear both panels on exit by
resetting self._tuner_panel and self._health_panel to None in the same cleanup
path; locate the context manager where self._live_display is stopped and add
nullification of self._tuner_panel and self._health_panel (or call their
appropriate cleanup methods if present) so no stale panel data persists when the
logger is reused.
- Around line 404-406: Remove the redundant inner imports of Table, Panel, and
Text in console/logger.py: delete the lines "from rich.table import Table",
"from rich.panel import Panel", and "from rich.text import Text" that appear
inside the function/block (they duplicate the top-of-file imports for Table,
Panel, Text); ensure no local shadowing remains and run lint/tests to confirm
nothing else relies on those inner imports.
- Around line 55-59: The __init__ currently sets self.console,
self._live_display and self._live_renderable but never initializes
self._tuner_panel and self._health_panel used by tuner_status() and
health_bars(); update the __init__ of the Console class to set self._tuner_panel
= None and self._health_panel = None (alongside existing initializations) so
those attributes exist predictably and avoid hasattr() reliance when
tuner_status() or health_bars() run.
In @core/task_queue.py:
- Around line 78-83: push() mutates the caller's Task by assigning
task.status.state = TaskState.submitted and uses naive datetime.now(); instead,
avoid mutating the input Task by creating a local copy (e.g., deep copy or new
Task/TaskStatus instance) and set the state on that local copy before
persisting, and change all naive datetime.now() calls in push() and update()
(and the other occurrences at the noted sites) to timezone-aware calls such as
datetime.now(timezone.utc) and ensure timezone is imported from datetime. Keep
modifications scoped to push(), update(), and the datetime usages so callers’
Task objects remain untouched and timestamps are UTC-aware.
In @data/hf_image_classification.py:
- Line 10: The import statement brings in Callable and Optional which are
unused; remove those two names from the typing import so only actually used
symbols (e.g., Any) remain. Update the import line that currently reads "from
typing import Any, Callable, Optional" to import only the symbols used in this
module (remove Callable and Optional).
- Around line 53-60: The current branch that checks isinstance(img, Image.Image)
then calls Image.open(img) should also detect dict-style HuggingFace image
entries (e.g., {"bytes": ..., "path": ...}) and handle them explicitly: if img
is a dict and contains "bytes", wrap the bytes in io.BytesIO and call Image.open
on that stream; if it contains "path" use that path with Image.open; preserve
the existing TypeError fallback for unsupported types and ensure any caught
exception is raised from the original for traceback clarity (refer to the
variables img, im and the Image.open call).
In @data/icl_rule_test.py:
- Around line 52-57: The code sets V = int(256) unnecessarily; change it to V =
256 in the test so V is a plain integer literal. Update the assignment in
data/icl_rule_test.py where V is defined (near the logits construction that uses
V) to remove the redundant int() call.
In @infer/event_runtime_test.py:
- Line 49: Remove the unused local variable prompt_len from the test (it’s
assigned but never used); update the code in infer/event_runtime_test.py to
delete the line setting prompt_len = 71 so the test relies on self._prompt_len
as intended and no unused-variable remains.
- Around line 83-84: Move the mid-file import "from caramba.core.event_codec
import EventEncoder" into the module's top import block with the other imports
and remove the duplicate from between class definitions; update import ordering
to match project style (group stdlib, third-party, local) if needed and ensure
any references to EventEncoder in the classes still work after the move (no
local shadowing or conditional imports).
In @infer/event_runtime.py:
- Around line 219-234: The except Exception: pass in the replay recording block
silently swallows errors from building or adding the replay seq (involving
self.replay_max_len, prompt, out_ids, delim, self.replay.add), losing visibility
into failures; change it to catch Exception as e and log the failure (using the
module/class logger or process logger) with contextual details (e.g., values or
shapes of prompt/out_ids, max_len, delim and the exception message/stack) while
preserving best-effort behavior (do not re-raise so inference continues).
In @instrumentation/run_logger.py:
- Around line 67-72: Move the inline import of console_logger out of the
exception handler and into the module top-level (e.g., add "from caramba.console
import logger as console_logger" at the top of instrumentation/run_logger.py),
then remove the local import inside the except block that logs the failure to
create self.out_dir; keep the same warning call and setting self.enabled = False
so behavior of RunLogger remains unchanged.
In @instrumentation/training_metrics.py:
- Around line 36-49: The module claims "zero-overhead" yet the property getters
loss, accuracy, and step acquire self._lock, so update the module docstring and
the singleton comment to reflect that reads are protected and incur minimal
locking overhead (not literally zero), and add a short note advising callers
that very frequent reads (per-step) should be batched or sampled to avoid
unnecessary lock contention; locate and edit the module-level docstring and any
comment around the singleton and the properties loss, accuracy, step to make
this clarification.
- Around line 12-25: The class TrainingMetrics is annotated with @dataclass but
defines a manual __init__ and uses object.__setattr__ unnecessarily; replace the
custom initializer by keeping @dataclass and declare _lock =
field(default_factory=Lock) with _loss: float | None = None, _accuracy: float |
None = None, _step: int = 0, remove the __init__ method entirely, and import
field from dataclasses, or alternatively remove @dataclass and keep the existing
__init__; update references to _lock, _loss, _accuracy, and _step accordingly.
In @instrumentation/wandb_writer.py:
- Around line 79-81: Import console_logger once at module level and remove the
repeated inline imports in each exception handler: add "from caramba.console
import logger as console_logger" to the top of instrumentation/wandb_writer.py,
then replace occurrences of "from caramba.console import logger as
console_logger" inside exception blocks (e.g., in the except handling within
WandBWriter methods that log "WandBWriter: Failed to dump config" and similar
messages) with direct uses of console_logger.warning(...). Keep the broad
"except Exception" handlers as-is per the module's design.
In @layer/attention/standard/layer.py:
- Around line 101-102: Add a unit/integration test that exercises the standard
attention path when qh.dtype is torch.float32 and qh.device.type == "mps" to
assert the code falls back to SDPA rather than attempting the
AttentionTraining/Triton kernel; specifically, create a test that constructs
tensors resembling qh, invokes the standard attention entry point (the branch
that checks the guard containing qh.dtype and qh.device.type), and verifies the
SDPA implementation is selected (and runs) while ensuring
DecoupledAttentionTraining is not used.
In @layer/memory_block/block/viz_test.py:
- Around line 39-42: The test accesses mock_routing.call_args without confirming
the mock was invoked; update each test in this class that patches
self.layer.memory.compute_routing (e.g., the block using "with
patch.object(self.layer.memory, 'compute_routing', ... ) as mock_routing:") to
assert the mock was called before unpacking call_args—use
mock_routing.assert_called() or self.assertTrue(mock_routing.called) immediately
after invoking self.layer(x, ctx=ctx) and before reading mock_routing.call_args
to avoid AttributeError when compute_routing wasn't called.
In @layer/memory_block/memory/memory.py:
- Around line 638-643: The code imports the statistics module inside the method
when computing loss_variance from self.loss_history (the import appears before
calculating loss_variance and the variable loss_variance is set from
statistics.variance); move the import to the top of the module (module-level
import statistics) and remove the in-function import so the method simply calls
statistics.variance(self.loss_history) to avoid repeated imports and minor
runtime overhead.
- Around line 522-557: Remove the duplicated comment lines around the post-write
telemetry block and avoid re-importing modules on every hot-path call: delete
the repeated "After write..." line, and move or cache the conditional imports
for should_update_tuner, get_shared_tuner and get_training_metrics so they
aren’t executed on every call when routing.collect_aux is true (e.g., import
once at module load or assign them to local/static variables on first use). Keep
the rest of the logic (checks of self.mem_autotune, routing.get("collect_aux"),
global_step logic, tuner.update, mem_autotune_viz block calling
tuner.get_viz_data/get_health_metrics) unchanged.
- Around line 625-636: The loop that converts routing values to a float
(iterating keys ("read_teacher_agree_free", "read_teacher_agree",
"vq_read_group_acc", "vq_write_group_acc") and using
v.detach().float().mean().item()) currently swallows all exceptions; instead
restrict the except to likely conversion errors (e.g., RuntimeError, ValueError,
TypeError) or validate v (isinstance(v, torch.Tensor) and v.numel() > 0 and
torch.isfinite(v).all()) before converting, and on failure emit a warning or
debug log that includes the routing key and a brief representation of v so
malformed values are visible; keep falling back to metrics.accuracy only after
these checks and logging.
- Around line 656-662: The bare "except Exception: pass" around the telemetry
assignments (setting tel.aux["table_utilization"], tel.aux["write_fire_frac"]
and tel.aux["write_threshold_eff"] when routing["write_threshold_eff"] is a
torch.Tensor) silently swallows errors; replace it by catching only expected
errors (e.g., ValueError, TypeError, AttributeError) and log the exception
details instead of ignoring them (use the module/class logger or raise after
logging), so failures in tel.aux assignment are visible and include the
exception message and stack trace.
In @layer/memory_block/memory/reader.py:
- Around line 184-193: The duplicated pattern that fetches the shared tuner when
self.tuner_mode != "off" should be extracted into a helper on the class to avoid
repetition; add a method (e.g., _get_tuner or _shared_tuner) that checks
self.tuner_mode, imports and calls get_shared_tuner(mode=self.tuner_mode) and
returns the tuner or None, then update both places that currently mutate
vsa_weight (vsa weight scaling) and read_temp (read temp scaling) to call this
helper and apply getattr(tuner, "...", 1.0) only if the helper returned a tuner.
In @layer/memory_block/memory/routing.py:
- Around line 306-309: The variable energy_history is collected but never
returned; either remove its collection or attach it to the aux output. To fix,
update the function that defines energy_history (variable energy_history) and
either (a) delete the energy_history list and the append at the accumulation
site, or (b) add energy_history to the aux dict returned (e.g., ensure aux or
aux_requested handling adds aux["energy_history"] = energy_history before the
function returns), so telemetry/convergence data is exposed; adjust any
conditional that only populates aux so energy_history is included when aux is
requested.
- Around line 326-339: Remove the unused local variables u_r and u_i that are
assigned from u_r_t.transpose... and u_i_t.transpose... (they are never used
later); keep the computations that use u_r_t and u_i_t and the subsequent
v_r_t/v_i_t and transposes, and simply delete the two lines that create u_r and
u_i to resolve the unused-assignment warning.
- Around line 222-224: Replace the print call in method _check_nan of
ResonantRouter with the project logger: add "from caramba.console import logger"
at the top of the file and change the print(...) to logger.warning(...)
(including the same message and the name variable) so warnings go through the
standard logging pipeline; ensure the message text remains clear and use
logger.warning for non-fatal NaN/Inf detection.
In @layer/memory_block/memory/tuner_test.py:
- Around line 47-57: The test modifies a frozen dataclass field
(MemoryHealthTelemetry.step) causing FrozenInstanceError; change the loop in
test_warmup_period_no_exploration to create a new MemoryHealthTelemetry instance
each iteration (e.g., tel = MemoryHealthTelemetry(step=i,
utilization=previous_utilization)) before calling tuner.update(tel) so the
UniversalMemoryTuner.update sees the incremented step without mutating a frozen
object and the assertion on tuner.resonant_coupling_mult remains valid.
- Around line 103-108: The test's change detection is redundant because it
compares tuner.resonant_damping_mult to tuner.explorers["damping"].value (and
similarly for novelty), which will always be equal; update the test to capture
initial values for all three tracked params (e.g., initial_coupling =
tuner.resonant_coupling_mult, initial_damping = tuner.resonant_damping_mult,
initial_novelty = tuner.vsa_novelty_mult) before warmup and then compute changed
by comparing current values to those saved initials
(tuner.resonant_coupling_mult != initial_coupling or tuner.resonant_damping_mult
!= initial_damping or tuner.vsa_novelty_mult != initial_novelty) so the
assertion actually detects exploration.
In @layer/memory_block/memory/tuner.py:
- Around line 259-307: The _compute_objective method embeds hardcoded "magic"
weights; extract these into configurable class-level constants or constructor
parameters (e.g., UTIL_CRITICAL_PENALTY, UTIL_OVER_PENALTY, UTIL_HEALTHY_BONUS,
ACCURACY_WEIGHT, LOSS_VARIANCE_WEIGHT, CONFLICT_WEIGHT, RESONANT_MAX_SCORE) and
replace the numeric literals in _compute_objective with those symbols, providing
sensible defaults and allowing overrides in __init__ so experiments can tune
weights without editing the method.
- Around line 189-190: The method resonant_steps_delta currently returns
int(round(self.explorers["steps"].value)) which redundantly wraps round(...)
with int(); remove the outer int() and return
round(self.explorers["steps"].value) directly so the return type remains an int;
update the resonant_steps_delta implementation accordingly, referencing the
explorers["steps"].value expression.
- Around line 409-437: The module-level singleton and step-tracking globals
(_shared_tuner, _shared_tuner_last_step) are not thread-safe; update the
docstrings/comments for get_shared_tuner, should_update_tuner, and
reset_shared_tuner to state the single-threaded assumption and potential race on
concurrent calls, and suggest remedies (e.g., protecting access with a
threading.Lock around reads/writes or using a thread-safe singleton pattern) so
callers know to serialize access or change the implementation if used across
threads.
- Line 178: Replace the dict comprehension used to initialize self.deltas with
dict.fromkeys for brevity and style: in the Tuner (or the class where
self.deltas is set) change the initialization of self.deltas from the
comprehension over self.explorers.keys() to using
dict.fromkeys(self.explorers.keys(), 0.0) (or dict.fromkeys(self.explorers,
0.0)) so the intent is clearer and the code is more concise.
- Around line 12-13: The file imports unused symbols `torch` and `nn`; remove
these imports from the top of the file so the tuner only uses plain
floats/telemetry types—delete or replace the lines "import torch" and "from
torch import nn" and run linters/tests to verify no other references to `torch`
or `nn` remain in the module.
- Around line 45-111: Add two unit tests for ParameterExplorer.step(): one that
forces a braking-edge collision by setting up an explorer with momentum=1,
braking=True, direction pointing toward a boundary, and a non-improving step so
momentum falls to 0 and triggers the braking-reversal logic; assert that after
the step the direction has flipped, momentum is reset to >=1, value is clamped
inside [min_value, max_value], and braking is cleared. The second test should
cover the cooldown-to-active reactivation: initialize explorer.active=False,
steps_since_deactivation set to cooldown-1, then call step and assert it returns
0.0 until cooldown is reached, after which explorer.active becomes True,
steps_since_deactivation and steps_without_improvement are zero, momentum==1,
and direction is set to 1.0 or -1.0 based on value vs (min_value+max_value)/2;
use the ParameterExplorer.step, active, cooldown, steps_since_deactivation,
steps_without_improvement, momentum, direction, base_step_size, min_value and
max_value attributes to drive and verify behavior.
In @optimizer/metal/resonant_jit.py:
- Around line 120-134: The code uses the private API ce._get_build_directory to
construct build_dir; replace it by calling the public
torch.utils.cpp_extension.get_default_build_root() (import it) and set build_dir
= Path(get_default_build_root()) / name, or simply omit manual build_dir
construction and pass no build_directory to ce.load so PyTorch uses its
default/TORCH_EXTENSIONS_DIR; update references to ce._get_build_directory to
use get_default_build_root and ensure the variable name (name) is appended when
forming build_dir before passing build_directory to ce.load.
In @optimizer/metal/resonant_update_autograd_test.py:
- Around line 33-37: The compound assertion checking all gradients at once
(assert x.grad is not None and y.grad is not None and vr.grad is not None and
vi.grad is not None) should be split into individual assertions so failures
identify which gradient is missing; add four separate assertions (e.g., assert
x.grad is not None, "x.grad is None") for x.grad, y.grad, vr.grad, and vi.grad
before the detach/clone lines (gx_m, gy_m, gvr_m, gvi_m) so the test fails with
a clear diagnostic pointing to the missing gradient.
- Around line 12-14: The test function test_backward_matches_pytorch_reference
currently returns when MPS isn't available which marks the test as passed;
replace the bare return with pytest.skip("MPS backend unavailable") and ensure
pytest is imported at the top of the file (add import pytest if missing) so the
test is reported as skipped rather than silently succeeding.
In @optimizer/metal/resonant_update.py:
- Around line 74-78: The code uses verbose getattr calls to read attributes from
the saved context; replace getattr(ctx, "scale"), getattr(ctx, "damping"), and
getattr(ctx, "zero_diag") with direct attribute access (ctx.scale, ctx.damping,
ctx.zero_diag) and keep the existing float()/bool() casts for scale, damping,
and zero_diag respectively; update the block that unpacks x, y, diag, a, b,
inv_r and calls load_caramba_metal_resonant_ops to use the direct attribute
accesses.
- Around line 48-66: The forward saves the original x, y, diag but calls the
Metal op with x.contiguous(), y.contiguous(), diag.contiguous(), so change the
saved tensors to the contiguous versions used in the op: save the contiguous
copies (the same objects passed to resonant_update_forward_fp32) via
ctx.save_for_backward instead of the original x, y, diag; keep storing
ctx.scale, ctx.damping, ctx.zero_diag as before and ensure xo, yo, a, b, inv_r
assignment and return are unchanged.
In @optimizer/resonant_update_triton_kernels.py:
- Around line 111-142: The loads for x and y (the tl.load calls assigning to
variables x and y from x_ptr and y_ptr) are dead code and should be removed to
avoid wasted memory bandwidth; delete the two lines that load x and y and their
.to(tl.float32) conversions inside the backward kernel loop, but keep the kernel
signature parameters x_ptr and y_ptr intact for API compatibility (do not change
other computations that use offs, d, h, diag or the stores to
grad_x_ptr/grad_y_ptr/grad_vr_ptr/grad_vi_ptr).
In @optimizer/resonant_update_triton_test.py:
- Around line 34-38: Split the single compound assertion into separate checks so
failures indicate which tensor lacks grad: replace the single assert that checks
x.grad, y.grad, vr.grad and vi.grad together with four distinct assertions (or
individual if statements) that validate x.grad is not None, y.grad is not None,
vr.grad is not None and vi.grad is not None before cloning to gx_m, gy_m, gvr_m
and gvi_m; reference the existing symbols x.grad, y.grad, vr.grad, vi.grad and
the subsequent clones gx_m, gy_m, gvr_m, gvi_m to ensure each grad is present
and provide clearer diagnostics on failure.
- Around line 11-15: Replace the early returns in
test_backward_matches_pytorch_reference with pytest.skip calls: import pytest if
not already, then use pytest.skip("CUDA not available") when
torch.cuda.is_available() is False and pytest.skip("Triton not supported") when
triton_supported() is False so the test reports as skipped rather than silently
passing; reference the existing torch.cuda.is_available() and triton_supported()
checks and update them accordingly.
In @optimizer/resonant_update_triton.py:
- Around line 129-133: The code currently only checks grad_xo's device in
ResonantPhaseUpdateTriton; add a similar CUDA device validation for grad_yo
using the same _require pattern (e.g., _require(grad_yo.device.type == "cuda",
msg="grad_yo must be CUDA for ResonantPhaseUpdateTriton.")) so both gradients
are verified before creating gx/gy/gvr/gvi.
In @pyproject.toml:
- Around line 24-73: The dependency list in pyproject.toml has been unpinned
(e.g., "torch", "numpy", "transformers", etc.), removing reproducibility and
risking silent breakages; restore explicit version constraints by pinning each
dependency to the tested version or using bounded ranges (e.g.,
torch==<tested_version> or torch>=<min_version>,< <upper_bound>) for all listed
packages (torch, numpy, tiktoken, datasets, transformers, huggingface_hub,
pydantic, fastapi, uvicorn[standard], pytorch-lightning, wandb, etc.), ensuring
the chosen versions reflect what CI/local tests currently validate and applying
the same pattern to optional extras like "uvicorn[standard]" and "mcp[cli]".
In @research/ccl/ccl_mnist.py:
- Line 295: The zip() usages in the training and test loops (the tqdm loops that
iterate over zip(tokens, labels) and zip(test_tokens, y_test)) can silently
truncate on length mismatch; update both zip(...) calls to zip(..., strict=True)
so mismatched lengths raise an error, and keep the existing total=... and desc
arguments unchanged (refer to the loop variables grid, y in the training loop
and the analogous variables in the test loop).
- Around line 59-63: Replace the overly broad "except Exception as e" in the
import try/except blocks with "except ImportError as e" so only import failures
are caught; specifically update the try/except that wraps "import torch; from
torchvision import datasets" and the other two analogous import blocks
referenced in the review (the blocks around lines 117-120 and 629-632) to use
"except ImportError as e" and re-raise the same RuntimeError with the original
exception as the cause.
- Line 273: The assert num_classes == 10 is unsafe because assertions can be
disabled; replace it with an explicit runtime check such as: if num_classes !=
10: raise ValueError("This script requires num_classes == 10 (MNIST labels
0..9)"), updating the check near the existing reference to num_classes in
ccl_mnist.py so the script always enforces the requirement.
- Line 151: Replace the bare assert with an explicit runtime check: verify that
D == patch * patch using an if statement and raise a ValueError with a clear
message (e.g., f"Expected D == patch*patch, got D={D}, patch={patch}") so the
validation always runs even when Python is optimized; update the check
surrounding the variables D and patch in ccl_mnist.py accordingly.
- Around line 483-490: The variable lam_uni2 is assigned but never used; remove
lam_uni2 from the renormalization branch (and any other assignment sites) and
keep using the original lam_uni in the sampling logic, or alternatively update
the sampling code to use lam_uni2 consistently; specifically edit the
model.use_full conditional that sets lam_full2, lam_mid2, lam_uni2 so it only
defines lam_full2 and lam_mid2 (or ensure downstream sampling thresholds
reference lam_uni2 instead of implicit unigram), and update any comments to
reflect the chosen approach.
- Around line 227-258: SparseCounts, _update_sparse and _prob_sparse are
duplicated here and in ccl/context_counts.py; remove the local definitions and
import the shared implementations instead. Replace the local dataclass and
functions SparseCounts, _update_sparse, and _prob_sparse with from
ccl.context_counts import SparseCounts, _update_sparse, _prob_sparse (or the
public names used there), and update any local references (e.g., usages in
ClassModel.mid/full and calls to _update_sparse/_prob_sparse) to use the
imported symbols so behavior remains identical and duplication is eliminated.
- Line 36: The file currently imports legacy typing names (Dict, Tuple,
Optional, List); remove those from the "from typing ..." import and update all
annotations that use them to Python 3.9+ built-in generics and union syntax
(e.g., Dict -> dict[…], Tuple -> tuple[…], List -> list[…], and Optional[X] -> X
| None or X | None where appropriate). Locate uses of the symbols Dict, Tuple,
Optional, and List in the module and replace each with the corresponding
built-in generic or union form so annotations use modern typing syntax.
- Around line 407-436: The function _sample_from_sparse_dirichlet_smoothed
declares an unused parameter alpha; remove alpha from the function signature and
from all call sites so the function only accepts (ent, rng, alphaK, K), or
alternatively if API must remain, add a short docstring note explaining alpha is
unused and keep it—but preferred fix is to delete the alpha parameter and update
every caller that passes alpha to pass only the remaining parameters (search for
_sample_from_sparse_dirichlet_smoothed usages to update).
In @research/ccl/requirements.txt:
- Around line 1-6: The requirements file lists unpinned packages (numpy,
matplotlib, tqdm, torch, torchvision, scikit-learn); pin each to explicit,
tested versions in the requirements.txt so installs are reproducible and
torch/torchvision remain compatible; choose specific version numbers consistent
with your target Metal/Triton kernel and any parent/CI manifests, verify
torch/torchvision pairs against the PyTorch compatibility matrix, and update
related manifests/CI to match the chosen pins.
In @runtime/engine/lightning_engine.py:
- Around line 50-58: The Lightning module's configure_optimizers currently
hardcodes AdamW which diverges from Caramba's optimizer construction; replace
the fixed optimizer with Caramba's builder by delegating to
StandardTrainer._build_optimizer (or the same internal factory) using the
module's training config (self.train_cfg) and model parameters so options like
AdamWMaster, Lion, SGD and schedulers are preserved; update configure_optimizers
to call that builder and return whatever optimizer/scheduler tuple it produces,
ensuring any required flags or device/amp settings from the Lightning context
are forwarded.
- Around line 104-110: The Trainer instantiation sets devices and accelerator
incorrectly; update the L.Trainer call in lightning_engine.py (the trainer =
L.Trainer(...) block) so that CPU training uses devices=1 (not 0) and remove the
inverted explicit CPU branch—prefer accelerator="auto" and set devices to 1 when
train_cfg.device == "cpu" or to "auto"/appropriate GPU count when not, and keep
precision as "16-mixed" if train_cfg.use_amp else "32-true"; ensure the
devices/accelerator logic reflects that devices=0 is invalid and that
accelerator="auto" can handle CPU fallback.
In @trainer/ccl.py:
- Around line 47-106: The _collect_images_labels function can consume huge
memory when max_items is None because it preallocates all images; add a brief
docstring note describing memory implications and also emit a runtime
warning/log when max_items is None and the dataset length (len(cast(Sized, ds)))
exceeds a safe threshold (e.g., >10_000 or make threshold a small constant),
using the module logger or warnings.warn to advise callers to pass max_items or
use a streaming approach; update the _collect_images_labels docstring and insert
the warning check early (before allocation) so users see it before large memory
is reserved.
- Around line 376-406: The try/except around "from caramba.ccl.context_counts
import sample_grid" and "import matplotlib.pyplot as plt" is catching all
Exception; narrow it to only import-related errors by catching ImportError and
ModuleNotFoundError (e.g., replace "except Exception as e" with "except
(ImportError, ModuleNotFoundError) as e") so runtime errors in the generation
logic (sample_grid, codec.decode, plotting) still surface; keep the existing
logger.warning message and the rest of the else block unchanged.
- Around line 109-116: The current _split_train_val implementation slices the
dataset in order, which can bias splits if data is ordered; instead, generate a
list of indices (0..n-1), shuffle them (e.g., via random.shuffle or a passed RNG
for reproducibility), then take the first n_train indices for train and the rest
for val and construct Subset(ds, train_indices) and Subset(ds, val_indices);
update _split_train_val to perform this shuffled-index split so train/val are
randomized.
- Around line 403-406: After saving the figure with plt.savefig(out_path) in the
CCL sampling code, explicitly close the Matplotlib figure to avoid resource
leaks; update the block around plt.tight_layout(pad=0.2), plt.savefig(out_path),
logger.info(...) to call plt.close(fig) (or plt.close() if no fig variable is
used) immediately after logger.info so the figure is released.
In @trainer/gradient_isolation.py:
- Around line 420-421: The warning message in the except block using
logger.warning(...) incorrectly states "failed to merge MOSAIC aux outputs" even
though this try/except is counting keys that start with "mosaic_" rather than
performing a merge; update the log text in trainer/gradient_isolation.py (the
except block catching Exception as e) to accurately reflect the operation (e.g.,
"failed to count MOSAIC aux output keys" or "error while counting mosaic_ keys")
and keep the exception variable (e) included in the message so the original
behavior and diagnostics remain unchanged.
- Around line 197-199: The current except block in GradientIsolationTrainer that
catches Exception e and only logs a warning should not silently degrade to
single-device mode; change it to log an error-level message that clearly states
distributed wrapping failed and distributed training has been disabled, then
re-raise the exception (or raise a new RuntimeError) so callers can fail fast
and handle the failure; update the handler around the system module wrapping
(the try/except around the distributed wrapping code in the
GradientIsolationTrainer class/method) to use logger.error and then raise the
original exception (or an explicit RuntimeError) instead of continuing.
In @trainer/standard.py:
- Around line 343-346: Keep the inline import of reset_shared_tuner where it’s
called rather than moving it to module level to preserve the established pattern
used across the codebase; ensure the code still calls reset_shared_tuner() in
the same place and add a brief comment above the inline import explaining the
consistency rationale (the function name to locate is reset_shared_tuner).
In @trainer/swap_manager.py:
- Around line 32-34: The current broad "except Exception as e" in the
SwapManager swap/optimizer load path should be narrowed to only the expected
failure types to avoid hiding unexpected bugs; replace the blanket except with a
targeted catch (e.g., IOError/OSError for file access, pickle.UnpicklingError
for deserialization, KeyError/ValueError for missing or malformed state, and any
framework-specific load errors) while keeping the
console_logger.warning(f"SwapManager: Failed to load optimizer state: {e}")
behavior; locate the except block in SwapManager (the optimizer-state load
method) and update the exception tuple accordingly so unexpected exceptions
still propagate.
| *.synctex.gz | ||
| artifacts/mosaic/CONTEXT.md | ||
| artifacts/ai/brainstorm.jsonl | ||
| artifacts/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Remove redundant artifact-specific patterns.
Adding a broad artifacts/ ignore pattern (line 35) is appropriate given the PR's extensive artifact additions. However, this makes the specific artifact directory patterns at lines 6-7 (artifacts/llama32_1b_dba_paper_efficiency/ and artifacts/platform_improve/) redundant and should be removed to avoid maintenance confusion.
🧹 Proposed cleanup
Remove lines 6-7, as they are now covered by the broad artifacts/ pattern:
__pycache__/
.DS_Store
.venv/
.pytest_cache/
.pycache/
- artifacts/llama32_1b_dba_paper_efficiency/
- artifacts/platform_improve/
internal/
caramba.egg-info/
WARP.md
.env
wandb/
runs/
.caramba/
train.h5
x.jsonl
fineweb_100m.npy
fineweb_100m.npy.meta
fineweb_20b.npy
fineweb_20b.npy.meta
*.crdownload
train.h5
.coverage
fineweb_edu_v29_100m.npy
NOTES.md
*.fdb_latexmk
*.aux
*.blg
*.bbl
*.fls
*.log
*.out
*.toc
*.synctex.gz
+ artifacts/
.cache/
.tmp/
*.pt
tmp/
*.npy
*.npy.meta
ADK.txt
+ research/dba/runs100k.zipAlso applies to: 43-43
🤖 Prompt for AI Agents
In @.gitignore at line 35, Remove the redundant specific artifact ignore entries
now covered by the broad "artifacts/" pattern: delete the lines containing
"artifacts/llama32_1b_dba_paper_efficiency/" and "artifacts/platform_improve/"
(and any duplicate occurrences such as the one noted at line 43) so only the
single "artifacts/" pattern remains.
ai/push_notifications.py
Outdated
| # A2A SDKs differ in how credentials are represented (string vs object). | ||
| # Normalize to a string token if present. | ||
| key: str | None = None | ||
| if isinstance(cred, str): | ||
| key = cred | ||
| elif isinstance(cred, dict): | ||
| val = cred.get("key") or cred.get("token") | ||
| key = str(val) if val else None | ||
| else: | ||
| val = getattr(cred, "key", None) or getattr(cred, "token", None) | ||
| key = str(val) if val else None | ||
|
|
||
| if key: | ||
| headers["Authorization"] = f"Bearer {key}" | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type validation before converting credentials to strings.
The str(val) conversions at lines 131 and 134 are unsafe. If val is not a string (e.g., a boolean, number, or complex object), the conversion could produce invalid authentication tokens. For example:
str(False)→"False"str(0)→"0"str(SomeObject())→ potentially arbitrary string representation
This could lead to authentication failures or mask configuration errors.
🔒 Proposed fix with type validation
# Add authentication if configured
if config.authentication:
for cred in config.authentication.credentials or []:
# A2A SDKs differ in how credentials are represented (string vs object).
# Normalize to a string token if present.
key: str | None = None
if isinstance(cred, str):
key = cred
elif isinstance(cred, dict):
val = cred.get("key") or cred.get("token")
- key = str(val) if val else None
+ key = val if isinstance(val, str) else None
+ if val and not isinstance(val, str):
+ _logger.warning(
+ f"Invalid credential type {type(val).__name__} for task {task.id}, expected string"
+ )
else:
val = getattr(cred, "key", None) or getattr(cred, "token", None)
- key = str(val) if val else None
+ key = val if isinstance(val, str) else None
+ if val and not isinstance(val, str):
+ _logger.warning(
+ f"Invalid credential type {type(val).__name__} for task {task.id}, expected string"
+ )
if key:
headers["Authorization"] = f"Bearer {key}"
break🤖 Prompt for AI Agents
In @ai/push_notifications.py around lines 124 - 138, The credential
normalization currently uses unsafe str(val) conversions; change the logic in
the credential handling block (variables: cred, val, key, headers) to only treat
a value as a token if it is already a string (isinstance(val, str)); if val is
bytes try decoding with utf-8 and use that string; otherwise treat it as invalid
(leave key None) so you don't set headers["Authorization"] to a coerced
non-string token. Ensure the if-branches that set key use these validations
(replace str(val) calls) so only legitimate string tokens populate headers.
| from a2a.types import MessageSendParams | ||
| context = RequestContext( | ||
| message=message, | ||
| current_task=task, | ||
| request=MessageSendParams(message=message), | ||
| task_id=task.id, | ||
| context_id=task.context_id, | ||
| task=task, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the file structure and locate the imports section
head -n 20 ai/server.pyRepository: TheApeMachine/caramba
Length of output: 591
🏁 Script executed:
# Check the imports and context around lines 180-290
sed -n '180,290p' ai/server.py | cat -nRepository: TheApeMachine/caramba
Length of output: 7279
🏁 Script executed:
# Search for RequestContext usage patterns in the codebase
rg -n 'RequestContext' --type py -B 2 -A 2Repository: TheApeMachine/caramba
Length of output: 5137
🏁 Script executed:
# Check if MessageSendParams exists in the codebase and where it's used
rg -n 'MessageSendParams' --type pyRepository: TheApeMachine/caramba
Length of output: 378
🏁 Script executed:
# Look at the file structure to understand the loop context
sed -n '260,280p' ai/server.py | cat -nRepository: TheApeMachine/caramba
Length of output: 1151
🏁 Script executed:
# Check the function structure and loop context around these imports
sed -n '150,200p' ai/server.py | cat -nRepository: TheApeMachine/caramba
Length of output: 3142
🏁 Script executed:
# Check if there's a while True loop and confirm the import placement
sed -n '230,280p' ai/server.py | cat -nRepository: TheApeMachine/caramba
Length of output: 3156
🏁 Script executed:
# Verify RequestContext signature in a2a.server.agent_execution
rg -n 'class RequestContext|def __init__' --type py -B 1 -A 5 | grep -A 10 'RequestContext'Repository: TheApeMachine/caramba
Length of output: 668
🏁 Script executed:
# Check imports in other files to see how MessageSendParams is typically imported
rg -n 'from a2a.types import.*MessageSendParams' --type py -B 2 -A 2Repository: TheApeMachine/caramba
Length of output: 366
🏁 Script executed:
# Search for RequestContext class definition to verify constructor signature
rg -n 'class RequestContext' --type py -A 10Repository: TheApeMachine/caramba
Length of output: 47
🏁 Script executed:
# Check the actual a2a-sdk package info if available
find . -name 'pyproject.toml' -o -name 'requirements.txt' -o -name 'setup.py' | head -5Repository: TheApeMachine/caramba
Length of output: 255
🏁 Script executed:
# Look for any constraints on the RequestContext usage
rg -n 'RequestContext(' --type py -B 2 -A 5Repository: TheApeMachine/caramba
Length of output: 145
🏁 Script executed:
# Search for RequestContext construction with escaped parenthesis
rg -n 'RequestContext\(' --type py -B 2 -A 5Repository: TheApeMachine/caramba
Length of output: 545
🏁 Script executed:
# Check if RequestContext is defined in the codebase or from a2a-sdk
rg -n 'class RequestContext' --type pyRepository: TheApeMachine/caramba
Length of output: 47
🏁 Script executed:
# Look at the imports at the top of ai/server.py to understand dependencies
head -n 50 ai/server.pyRepository: TheApeMachine/caramba
Length of output: 1914
🏁 Script executed:
# Check the main project requirements to understand a2a-sdk version
cat pyproject.toml 2>/dev/null | grep -A 50 'dependencies' || cat setup.py 2>/dev/null || find . -maxdepth 2 -name 'requirements*.txt' -exec cat {} \;Repository: TheApeMachine/caramba
Length of output: 980
🏁 Script executed:
# Check what's actually in a2a.types module (if available in the repo)
find . -path '*/a2a/types*' -type f 2>/dev/nullRepository: TheApeMachine/caramba
Length of output: 47
🏁 Script executed:
# Verify there are no other conditional imports of MessageSendParams in the loop
sed -n '180,300p' ai/server.py | grep -n 'import.*MessageSendParams'Repository: TheApeMachine/caramba
Length of output: 127
🏁 Script executed:
# Confirm the exact line numbers where imports occur within the loop
sed -n '170,280p' ai/server.py | grep -n 'from a2a\|while True' | head -20Repository: TheApeMachine/caramba
Length of output: 366
Move MessageSendParams import outside the loop.
The MessageSendParams import on line 270 is inside the while True loop and should be consolidated with the existing imports at line 186. Both import from a2a.types, so they can be combined into a single import statement. While Python caches imports, eliminating the repeated lookup on each loop iteration improves efficiency and code organization.
Move line 270's import to line 186:
-from a2a.types import Message, Part, TextPart, Role
+from a2a.types import Message, MessageSendParams, Part, TextPart, RoleThen remove line 270:
- from a2a.types import MessageSendParams
context = RequestContext(The RequestContext constructor signature using request, task_id, context_id, and task parameters is consistent with the a2a-sdk API as observed across the codebase.
🤖 Prompt for AI Agents
In @ai/server.py around lines 270 - 276, The in-loop import of MessageSendParams
should be moved out of the loop and merged with the module-level import from
a2a.types (where RequestContext is already imported); remove the repeated "from
a2a.types import MessageSendParams" inside the while True loop and add
MessageSendParams to the existing top-level import list so the
RequestContext(...) call can continue to use MessageSendParams without runtime
repeated imports.
| import logging | ||
|
|
||
| from a2a.server.tasks import TaskStore | ||
| from caramba.ai.task.store import TaskStore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Check if the new module exists
find . -path "*caramba/ai/task/store*" -type f
# Check ai/worker.py content
head -40 ai/worker.py
# Search for remaining references to old import path
rg 'a2a\.server\.tasks' --type pyRepository: TheApeMachine/caramba
Length of output: 1946
Update remaining imports from old module path.
While the import in ai/worker.py has been correctly refactored to caramba.ai.task.store, multiple other files still reference the old a2a.server.tasks module path:
ai/push_notifications.pyai/server.py(3 separate imports)ai/task_store.pyai/executor.pyai/task/store/__init__.py
These imports should be updated to use the new internal module path as part of the same refactoring effort to complete the migration.
🤖 Prompt for AI Agents
In @ai/worker.py at line 12, Several files still import from the old module path
`a2a.server.tasks`; update those imports to the new internal path under
`caramba.ai.task` (e.g., replace `a2a.server.tasks` with `caramba.ai.task` in
ai/push_notifications.py, ai/server.py (all three import sites),
ai/task_store.py, ai/executor.py, and ai/task/store/__init__.py), ensuring each
referenced symbol such as TaskStore and any task-related classes/functions keep
their original names but point to the new module; after changes, run the test
suite or import checks to verify there are no broken references.
| await asyncio.sleep(1) | ||
| logger.trace("Polling for pending tasks") | ||
| tasks = await self.task_store.list_tasks(tenant=self.tenant) | ||
| tasks = [t async for t in self.task_store.list_tasks(tenant=self.tenant)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider refactoring to avoid holding database connections during collection.
The async comprehension immediately materializes all tasks from the generator, which keeps the database connection open throughout the iteration. Since the list_tasks implementation in ai/task/store/__init__.py already loads all tasks into memory with .all() before yielding, the generator pattern doesn't provide streaming benefits here.
♻️ Suggested approaches
Option 1 (Preferred): Refactor list_tasks to return List[Task] directly
In ai/task/store/__init__.py, change the signature and close the connection immediately:
async def list_tasks(self, tenant: str) -> list[Task]:
"""List tasks for a tenant"""
async with self.engine.connect() as connect:
result = await connect.execute(
select(self.task_model).where(self.task_model.context_id == tenant)
)
return [self._from_orm(task) for task in result.scalars().all()]Then in ai/worker.py:
-tasks = [t async for t in self.task_store.list_tasks(tenant=self.tenant)]
+tasks = await self.task_store.list_tasks(tenant=self.tenant)Option 2: Process tasks without materialization
If streaming is desired in the future, consume the generator without building an intermediate list:
-tasks = [t async for t in self.task_store.list_tasks(tenant=self.tenant)]
-logger.trace(f"Found {len(tasks)} pending tasks")
-
-for task in tasks:
+task_count = 0
+async for task in self.task_store.list_tasks(tenant=self.tenant):
+ task_count += 1
logger.trace(f"Enqueuing task {task.id}")
await self.queue.put(task)
+logger.trace(f"Enqueued {task_count} pending tasks")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| tasks = [t async for t in self.task_store.list_tasks(tenant=self.tenant)] | |
| task_count = 0 | |
| async for task in self.task_store.list_tasks(tenant=self.tenant): | |
| task_count += 1 | |
| logger.trace(f"Enqueuing task {task.id}") | |
| await self.queue.put(task) | |
| logger.trace(f"Enqueued {task_count} pending tasks") |
| def configure_optimizers(self): | ||
| # This is a bit tricky because Caramba has its own optimizer building logic. | ||
| # For now, we'll implement a simple version or try to reuse Caramba's builders. | ||
| optimizer = torch.optim.AdamW( | ||
| self.model.parameters(), | ||
| lr=self.train_cfg.lr, | ||
| weight_decay=getattr(self.train_cfg, "weight_decay", 0.0) | ||
| ) | ||
| return optimizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider reusing Caramba's optimizer building logic.
The current implementation uses a simple AdamW, but Caramba's StandardTrainer._build_optimizer supports:
AdamWMasterwith fused kernels (the main feature of this PR)- Lion optimizer
- SGD
- Learning rate schedulers
This may cause behavioral differences between Lightning and standard training paths.
Would you like me to help integrate Caramba's optimizer building logic into the Lightning module?
🤖 Prompt for AI Agents
In @runtime/engine/lightning_engine.py around lines 50 - 58, The Lightning
module's configure_optimizers currently hardcodes AdamW which diverges from
Caramba's optimizer construction; replace the fixed optimizer with Caramba's
builder by delegating to StandardTrainer._build_optimizer (or the same internal
factory) using the module's training config (self.train_cfg) and model
parameters so options like AdamWMaster, Lion, SGD and schedulers are preserved;
update configure_optimizers to call that builder and return whatever
optimizer/scheduler tuple it produces, ensuring any required flags or device/amp
settings from the Lightning context are forwarded.
trainer/gradient_isolation.py
Outdated
| except Exception as e: | ||
| logger.warning(f"GradientIsolationTrainer: [best-effort] failed to wrap system module for distributed training: {e}") | ||
| # Best-effort: continue without distributed wrapping if it fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silently degrading to single-device training may mask critical distributed setup issues.
When distributed wrapping fails, continuing execution without it means the user-requested distributed training strategy is ignored. This could lead to:
- Unexpected single-device training with potential OOM or drastically slower training
- Training results that differ from expectations (e.g., effective batch size, gradient synchronization)
- Difficult-to-diagnose issues when users assume distributed training is active
Consider re-raising the exception or at minimum elevating this to an error-level log with a prominent notice that distributed training has been disabled.
🔧 Proposed fix: Re-raise the exception
- except Exception as e:
- logger.warning(f"GradientIsolationTrainer: [best-effort] failed to wrap system module for distributed training: {e}")
- # Best-effort: continue without distributed wrapping if it fails.
+ except Exception as e:
+ raise RuntimeError(f"Failed to wrap system module for distributed training: {e}") from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception as e: | |
| logger.warning(f"GradientIsolationTrainer: [best-effort] failed to wrap system module for distributed training: {e}") | |
| # Best-effort: continue without distributed wrapping if it fails. | |
| except Exception as e: | |
| raise RuntimeError(f"Failed to wrap system module for distributed training: {e}") from e |
🧰 Tools
🪛 Ruff (0.14.10)
197-197: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In @trainer/gradient_isolation.py around lines 197 - 199, The current except
block in GradientIsolationTrainer that catches Exception e and only logs a
warning should not silently degrade to single-device mode; change it to log an
error-level message that clearly states distributed wrapping failed and
distributed training has been disabled, then re-raise the exception (or raise a
new RuntimeError) so callers can fail fast and handle the failure; update the
handler around the system module wrapping (the try/except around the distributed
wrapping code in the GradientIsolationTrainer class/method) to use logger.error
and then raise the original exception (or an explicit RuntimeError) instead of
continuing.
| except Exception as e: | ||
| from caramba.console import logger as console_logger | ||
| console_logger.warning(f"SwapManager: Failed to load optimizer state: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Good: Logging improves observability of swap failures.
The change from silent exception swallowing to explicit warning logging aligns with the broader PR pattern of improving error visibility.
♻️ Optional: Consider narrowing exception types
While the broad Exception catch aligns with the "best-effort" design (documented in the docstring), you could improve specificity:
- except Exception as e:
+ except (RuntimeError, IOError, torch.cuda.OutOfMemoryError) as e:
from caramba.console import logger as console_logger
console_logger.warning(f"SwapManager: Failed to load optimizer state: {e}")This would catch the most likely failure modes while still allowing truly unexpected exceptions to propagate. However, given the "best-effort and safe" design goal, the current broad catch may be intentional.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception as e: | |
| from caramba.console import logger as console_logger | |
| console_logger.warning(f"SwapManager: Failed to load optimizer state: {e}") | |
| except (RuntimeError, IOError, torch.cuda.OutOfMemoryError) as e: | |
| from caramba.console import logger as console_logger | |
| console_logger.warning(f"SwapManager: Failed to load optimizer state: {e}") |
🧰 Tools
🪛 Ruff (0.14.10)
32-32: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In @trainer/swap_manager.py around lines 32 - 34, The current broad "except
Exception as e" in the SwapManager swap/optimizer load path should be narrowed
to only the expected failure types to avoid hiding unexpected bugs; replace the
blanket except with a targeted catch (e.g., IOError/OSError for file access,
pickle.UnpicklingError for deserialization, KeyError/ValueError for missing or
malformed state, and any framework-specific load errors) while keeping the
console_logger.warning(f"SwapManager: Failed to load optimizer state: {e}")
behavior; locate the except block in SwapManager (the optimizer-state load
method) and update the exception tuple accordingly so unexpected exceptions
still propagate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review continued from previous batch...
|
|
||
| uniq = sorted({int(v) for v in y.tolist()}) | ||
| if num_classes is None: | ||
| num_classes = int(len(uniq)) | ||
| if int(num_classes) <= 0: | ||
| raise ValueError("num_classes must be > 0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Redundant int() cast.
Line 122: len(uniq) is already an int. The cast is unnecessary.
♻️ Remove redundant cast
if num_classes is None:
- num_classes = int(len(uniq))
+ num_classes = len(uniq)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| uniq = sorted({int(v) for v in y.tolist()}) | |
| if num_classes is None: | |
| num_classes = int(len(uniq)) | |
| if int(num_classes) <= 0: | |
| raise ValueError("num_classes must be > 0") | |
| uniq = sorted({int(v) for v in y.tolist()}) | |
| if num_classes is None: | |
| num_classes = len(uniq) | |
| if int(num_classes) <= 0: | |
| raise ValueError("num_classes must be > 0") |
🧰 Tools
🪛 Ruff (0.14.10)
122-122: Value being cast to int is already an integer
Remove unnecessary int call
(RUF046)
124-124: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In @ccl/context_counts.py around lines 119 - 124, The assignment uses a
redundant int() around len(uniq); replace "num_classes = int(len(uniq))" with
"num_classes = len(uniq)" to avoid the unnecessary cast while preserving
behavior (uniq and len(uniq) are already ints); keep the subsequent validation
"if int(num_classes) <= 0" as-is or optionally remove its cast if you want
consistency.
| from __future__ import annotations | ||
|
|
||
| """CCL system wrapper. | ||
|
|
||
| This is a small nn.Module so it can participate in the same orchestration path | ||
| as other Caramba systems (e.g., can be returned from a trainer and inspected). | ||
| """ | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Module docstring should come before from __future__ import.
PEP 257 specifies that module docstrings should be the first statement in a module file. The current placement after the future import means Python won't recognize it as the module's __doc__.
Proposed fix
+"""CCL system wrapper.
+
+This is a small nn.Module so it can participate in the same orchestration path
+as other Caramba systems (e.g., can be returned from a trainer and inspected).
+"""
+
from __future__ import annotations
-"""CCL system wrapper.
-
-This is a small nn.Module so it can participate in the same orchestration path
-as other Caramba systems (e.g., can be returned from a trainer and inspected).
-"""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from __future__ import annotations | |
| """CCL system wrapper. | |
| This is a small nn.Module so it can participate in the same orchestration path | |
| as other Caramba systems (e.g., can be returned from a trainer and inspected). | |
| """ | |
| """CCL system wrapper. | |
| This is a small nn.Module so it can participate in the same orchestration path | |
| as other Caramba systems (e.g., can be returned from a trainer and inspected). | |
| """ | |
| from __future__ import annotations | |
🤖 Prompt for AI Agents
In @ccl/system.py around lines 1 - 8, The module docstring in ccl.system is
placed after the from __future__ import annotations statement so it isn't
recognized as the module __doc__; move the triple-quoted module docstring to be
the very first statement in the file (before from __future__ import annotations)
so it becomes the module docstring while keeping the existing content and
preserving the future import.
ccl/system.py
Outdated
| b = int(tokens.shape[0]) | ||
| c = int(len(self.models)) | ||
| logits = np.empty((b, c), dtype=np.float32) | ||
| from caramba.ccl.context_counts import loglik_grid | ||
| for i in range(b): | ||
| grid = tokens[i] | ||
| for j, m in enumerate(self.models): | ||
| logits[i, j] = float(loglik_grid(m, grid)) | ||
| return {"logits": torch.from_numpy(logits)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Move import out of forward method to avoid per-call overhead.
The loglik_grid import on line 85 executes on every forward() call. While Python caches imports, there's still lookup overhead. Since loglik_grid is from the same package, move it to module-level imports.
Proposed fix
from caramba.ccl.context_counts import ClassCountsModel, ContextTemplate, predict_class
+from caramba.ccl.context_counts import loglik_grid
from caramba.ccl.patch_vq import PatchKMeansVQThen remove line 85:
logits = np.empty((b, c), dtype=np.float32)
- from caramba.ccl.context_counts import loglik_grid
for i in range(b):🧰 Tools
🪛 Ruff (0.14.10)
83-83: Value being cast to int is already an integer
Remove unnecessary int call
(RUF046)
🤖 Prompt for AI Agents
In @ccl/system.py around lines 82 - 90, The local import of loglik_grid inside
the forward implementation causes unnecessary per-call lookup overhead; move the
import to module scope by adding "from caramba.ccl.context_counts import
loglik_grid" at the top of ccl/system.py and remove the in-method import on line
containing "from caramba.ccl.context_counts import loglik_grid", leaving the
loop that calls loglik_grid(m, grid) unchanged.
core/event_codec/capnp_codec.py
Outdated
| def encode(self, event: EventEnvelope) -> Tensor: | ||
| if not isinstance(event, EventEnvelope): | ||
| raise TypeError(f"Expected EventEnvelope, got {type(event).__name__}") | ||
|
|
||
| schema = _get_schema() | ||
|
|
||
| # Create message | ||
| msg = schema.EventEnvelope.new_message() | ||
| msg.id = str(event.id) | ||
| msg.ts = float(event.ts) | ||
| msg.type = str(event.type) | ||
| msg.sender = str(event.sender) | ||
| msg.priority = int(event.priority) | ||
| msg.budgetMs = int(event.budget_ms) if event.budget_ms is not None else -1 | ||
| msg.commitmentDelta = int(event.commitment_delta) | ||
| msg.commitmentId = str(event.commitment_id) if event.commitment_id else "" | ||
|
|
||
| # Serialize payload to JSON bytes | ||
| payload_bytes = json.dumps(event.payload, ensure_ascii=False).encode("utf-8") | ||
| msg.payload = payload_bytes | ||
|
|
||
| # Serialize to bytes | ||
| buf = msg.to_bytes() | ||
| if not buf: | ||
| raise ValueError("Cap'n Proto serialization produced empty bytes") | ||
|
|
||
| return torch.tensor(list(buf), dtype=torch.long) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Potential data loss: encoding bytes as int64 tensor is wasteful.
Each byte (0-255) is stored as a 64-bit integer, resulting in 8x memory overhead. Consider using torch.uint8 or torch.int8 for efficiency.
♻️ Proposed optimization to use uint8
- return torch.tensor(list(buf), dtype=torch.long)
+ return torch.tensor(list(buf), dtype=torch.uint8)Note: This would require updating the decoder and any downstream consumers that expect torch.long.
🧰 Tools
🪛 Ruff (0.14.10)
52-52: Avoid specifying long messages outside the exception class
(TRY003)
74-74: Avoid specifying long messages outside the exception class
(TRY003)
| def _collect_images_labels( | ||
| ds: Dataset[Any], | ||
| *, | ||
| input_key: str, | ||
| target_key: str, | ||
| max_items: int | None, | ||
| ) -> tuple[np.ndarray, np.ndarray]: | ||
| n = int(len(cast(Sized, ds))) | ||
| if max_items is not None: | ||
| n = min(n, int(max_items)) | ||
| if n <= 0: | ||
| raise ValueError("Dataset is empty") | ||
|
|
||
| # Discover shape from first sample. | ||
| ex0 = _as_dict(ds[0]) | ||
| x0 = ex0.get(str(input_key), None) | ||
| y0 = ex0.get(str(target_key), None) | ||
| if x0 is None or y0 is None: | ||
| raise KeyError(f"Dataset items must contain keys {input_key!r} and {target_key!r}") | ||
|
|
||
| def to_np_img(x: object) -> np.ndarray: | ||
| if isinstance(x, torch.Tensor): | ||
| t = x.detach().cpu() | ||
| if t.dtype != torch.float32: | ||
| t = t.float() | ||
| return t.numpy() | ||
| if isinstance(x, np.ndarray): | ||
| return x.astype(np.float32, copy=False) | ||
| raise TypeError(f"Unsupported image type {type(x).__name__}") | ||
|
|
||
| def to_int(y: Any) -> int: | ||
| if isinstance(y, torch.Tensor): | ||
| return int(y.detach().cpu().item()) | ||
| return int(y) | ||
|
|
||
| # Collect images into (N,C,H,W) float32 | ||
| x0n = to_np_img(x0) | ||
| if x0n.ndim == 2: | ||
| x0n = x0n[None, :, :] # 1HW | ||
| if x0n.ndim != 3: | ||
| raise ValueError(f"Expected image as (H,W) or (C,H,W), got {x0n.shape}") | ||
| c, h, w = int(x0n.shape[0]), int(x0n.shape[1]), int(x0n.shape[2]) | ||
|
|
||
| images = np.empty((n, c, h, w), dtype=np.float32) | ||
| labels = np.empty((n,), dtype=np.int64) | ||
| images[0] = x0n.astype(np.float32, copy=False) | ||
| labels[0] = np.int64(to_int(y0)) | ||
|
|
||
| for i in range(1, n): | ||
| ex = _as_dict(ds[i]) | ||
| xi = to_np_img(ex[str(input_key)]) | ||
| yi = to_int(ex[str(target_key)]) | ||
| if xi.ndim == 2: | ||
| xi = xi[None, :, :] | ||
| if xi.shape != (c, h, w): | ||
| raise ValueError(f"Inconsistent image shapes: expected {(c,h,w)}, got {xi.shape}") | ||
| images[i] = xi.astype(np.float32, copy=False) | ||
| labels[i] = np.int64(yi) | ||
|
|
||
| return images, labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider potential memory pressure with large datasets.
The _collect_images_labels function pre-allocates and fills arrays sequentially. For very large datasets with max_items=None, this could cause memory pressure since all images are loaded into memory at once.
The implementation is correct, but consider adding a warning or documentation note about memory requirements for large datasets.
🧰 Tools
🪛 Ruff (0.14.10)
54-54: Value being cast to int is already an integer
Remove unnecessary int call
(RUF046)
58-58: Avoid specifying long messages outside the exception class
(TRY003)
65-65: Avoid specifying long messages outside the exception class
(TRY003)
75-75: Avoid specifying long messages outside the exception class
(TRY003)
77-77: Dynamically typed expressions (typing.Any) are disallowed in y
(ANN401)
87-87: Avoid specifying long messages outside the exception class
(TRY003)
102-102: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In @trainer/ccl.py around lines 47 - 106, The _collect_images_labels function
can consume huge memory when max_items is None because it preallocates all
images; add a brief docstring note describing memory implications and also emit
a runtime warning/log when max_items is None and the dataset length
(len(cast(Sized, ds))) exceeds a safe threshold (e.g., >10_000 or make threshold
a small constant), using the module logger or warnings.warn to advise callers to
pass max_items or use a streaming approach; update the _collect_images_labels
docstring and insert the warning check early (before allocation) so users see it
before large memory is reserved.
| def _split_train_val(ds: Dataset[Any], *, val_frac: float) -> tuple[Subset[Any], Subset[Any]]: | ||
| n = int(len(cast(Sized, ds))) | ||
| n_val = int(round(float(n) * float(val_frac))) | ||
| n_val = max(1, min(n - 1, n_val)) if n > 1 else 0 | ||
| n_train = int(n - n_val) | ||
| train = Subset(ds, range(0, n_train)) | ||
| val = Subset(ds, range(n_train, n)) | ||
| return train, val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validation split doesn't shuffle - potential data ordering bias.
The train/val split takes the first n_train items for training and the remainder for validation. If the dataset has any ordering (e.g., sorted by class), this could introduce bias.
🔧 Proposed fix to shuffle indices before splitting
-def _split_train_val(ds: Dataset[Any], *, val_frac: float) -> tuple[Subset[Any], Subset[Any]]:
+def _split_train_val(ds: Dataset[Any], *, val_frac: float, seed: int = 0) -> tuple[Subset[Any], Subset[Any]]:
n = int(len(cast(Sized, ds)))
n_val = int(round(float(n) * float(val_frac)))
n_val = max(1, min(n - 1, n_val)) if n > 1 else 0
n_train = int(n - n_val)
- train = Subset(ds, range(0, n_train))
- val = Subset(ds, range(n_train, n))
+ rng = np.random.default_rng(seed)
+ indices = rng.permutation(n).tolist()
+ train = Subset(ds, indices[:n_train])
+ val = Subset(ds, indices[n_train:])
return train, val🧰 Tools
🪛 Ruff (0.14.10)
110-110: Value being cast to int is already an integer
Remove unnecessary int call
(RUF046)
111-111: Value being cast to int is already an integer
Remove unnecessary int call
(RUF046)
114-114: Unnecessary start argument in range
Remove start argument
(PIE808)
🤖 Prompt for AI Agents
In @trainer/ccl.py around lines 109 - 116, The current _split_train_val
implementation slices the dataset in order, which can bias splits if data is
ordered; instead, generate a list of indices (0..n-1), shuffle them (e.g., via
random.shuffle or a passed RNG for reproducibility), then take the first n_train
indices for train and the rest for val and construct Subset(ds, train_indices)
and Subset(ds, val_indices); update _split_train_val to perform this
shuffled-index split so train/val are randomized.
| try: | ||
| from caramba.ccl.context_counts import sample_grid | ||
| import matplotlib.pyplot as plt # type: ignore | ||
| except Exception as e: | ||
| logger.warning(f"CCL generation skipped (missing deps): {type(e).__name__}: {e}") | ||
| else: | ||
| ht, wt = int(train_tokens.shape[1]), int(train_tokens.shape[2]) | ||
| # Assume channels inferred from data. | ||
| channels = int(train_images.shape[1]) | ||
| n_per = int(max(1, self.n_gen_per_class)) | ||
| c = int(len(models)) | ||
| fig = plt.figure(figsize=(max(6, int(n_per * 1.2)), max(3, int(c * 0.8))), dpi=150) | ||
| plot_idx = 1 | ||
| for ci in range(c): | ||
| for j in range(n_per): | ||
| g = sample_grid(models[ci], ht=ht, wt=wt, seed=int(self.seed + 1000 + ci * 100 + j)) | ||
| img = codec.decode(g, centers=centers, channels=int(channels)) | ||
| ax = fig.add_subplot(c, n_per, plot_idx) | ||
| if channels == 1: | ||
| ax.imshow(img[0], cmap="gray", vmin=0.0, vmax=1.0) | ||
| else: | ||
| ax.imshow(np.transpose(img, (1, 2, 0))) | ||
| ax.set_xticks([]) | ||
| ax.set_yticks([]) | ||
| if j == 0: | ||
| ax.set_ylabel(str(class_to_label.get(ci, ci)), rotation=0, labelpad=10) | ||
| plot_idx += 1 | ||
| plt.tight_layout(pad=0.2) | ||
| out_path = base_out / "generated_grid.png" | ||
| plt.savefig(out_path) | ||
| logger.info(f"Saved CCL samples to {out_path}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broad exception catch suppresses potentially important errors.
Catching bare Exception at line 379 may suppress important errors beyond just missing dependencies. Consider catching more specific exceptions.
🔧 Proposed fix to catch specific exceptions
if bool(self.generate):
try:
from caramba.ccl.context_counts import sample_grid
import matplotlib.pyplot as plt # type: ignore
- except Exception as e:
- logger.warning(f"CCL generation skipped (missing deps): {type(e).__name__}: {e}")
+ except (ImportError, ModuleNotFoundError) as e:
+ logger.warning(f"CCL generation skipped (missing deps): {type(e).__name__}: {e}")
else:🧰 Tools
🪛 Ruff (0.14.10)
379-379: Do not catch blind exception: Exception
(BLE001)
386-386: Value being cast to int is already an integer
Remove unnecessary int call
(RUF046)
🤖 Prompt for AI Agents
In @trainer/ccl.py around lines 376 - 406, The try/except around "from
caramba.ccl.context_counts import sample_grid" and "import matplotlib.pyplot as
plt" is catching all Exception; narrow it to only import-related errors by
catching ImportError and ModuleNotFoundError (e.g., replace "except Exception as
e" with "except (ImportError, ModuleNotFoundError) as e") so runtime errors in
the generation logic (sample_grid, codec.decode, plotting) still surface; keep
the existing logger.warning message and the rest of the else block unchanged.
| plt.tight_layout(pad=0.2) | ||
| out_path = base_out / "generated_grid.png" | ||
| plt.savefig(out_path) | ||
| logger.info(f"Saved CCL samples to {out_path}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing plt.close(fig) may cause resource leak.
The matplotlib figure is created but never explicitly closed. In long-running processes or repeated calls, this can lead to memory accumulation.
🔧 Proposed fix to close the figure after saving
plt.tight_layout(pad=0.2)
out_path = base_out / "generated_grid.png"
plt.savefig(out_path)
+ plt.close(fig)
logger.info(f"Saved CCL samples to {out_path}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| plt.tight_layout(pad=0.2) | |
| out_path = base_out / "generated_grid.png" | |
| plt.savefig(out_path) | |
| logger.info(f"Saved CCL samples to {out_path}") | |
| plt.tight_layout(pad=0.2) | |
| out_path = base_out / "generated_grid.png" | |
| plt.savefig(out_path) | |
| plt.close(fig) | |
| logger.info(f"Saved CCL samples to {out_path}") |
🤖 Prompt for AI Agents
In @trainer/ccl.py around lines 403 - 406, After saving the figure with
plt.savefig(out_path) in the CCL sampling code, explicitly close the Matplotlib
figure to avoid resource leaks; update the block around
plt.tight_layout(pad=0.2), plt.savefig(out_path), logger.info(...) to call
plt.close(fig) (or plt.close() if no fig variable is used) immediately after
logger.info so the figure is released.
|
|
||
| # Reset shared tuner for new training runs | ||
| from caramba.layer.memory_block.memory.tuner import reset_shared_tuner | ||
| reset_shared_tuner() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
# First, let's explore the repository structure
git ls-files | grep -E "(trainer|caramba)" | head -20Repository: TheApeMachine/caramba
Length of output: 644
🏁 Script executed:
# Find and read the trainer/standard.py file to understand context
find . -name "standard.py" -path "*/trainer/*" | head -5Repository: TheApeMachine/caramba
Length of output: 86
🏁 Script executed:
# Get the file size first to determine how to read it
wc -l trainer/standard.py 2>/dev/null || find . -name "standard.py" -path "*/trainer/*" -exec wc -l {} \;Repository: TheApeMachine/caramba
Length of output: 89
🏁 Script executed:
# Read the context around lines 343-346
sed -n '340,350p' trainer/standard.pyRepository: TheApeMachine/caramba
Length of output: 499
🏁 Script executed:
# Read the module-level imports at the top of trainer/standard.py
head -50 trainer/standard.pyRepository: TheApeMachine/caramba
Length of output: 1664
🏁 Script executed:
# Check if the tuner module exists and examine its imports
find . -path "*/memory_block/memory/tuner.py" | head -1Repository: TheApeMachine/caramba
Length of output: 101
🏁 Script executed:
# Read the tuner.py file to check for circular dependencies
head -50 ./layer/memory_block/memory/tuner.pyRepository: TheApeMachine/caramba
Length of output: 1788
🏁 Script executed:
# Check if reset_shared_tuner is ever imported at module level
rg "from caramba.layer.memory_block.memory.tuner import|import.*reset_shared_tuner" --type pyRepository: TheApeMachine/caramba
Length of output: 1490
🏁 Script executed:
# Find where reset_shared_tuner is defined and understand it
rg "def reset_shared_tuner" -A 5Repository: TheApeMachine/caramba
Length of output: 445
🏁 Script executed:
# Check if trainer/standard.py imports from caramba.layer modules at the top
sed -n '1,60p' trainer/standard.py | grep "caramba.layer"Repository: TheApeMachine/caramba
Length of output: 177
🏁 Script executed:
# Check if there's any circular dependency by looking at what tuner imports
cat ./layer/memory_block/memory/tuner.py | head -100Repository: TheApeMachine/caramba
Length of output: 4000
🏁 Script executed:
# Check the frequency of _run_single being called to understand overhead impact
rg "_run_single" --type py -B 2 -A 2Repository: TheApeMachine/caramba
Length of output: 3119
🏁 Script executed:
# Let's see the broader context of the method to understand when/how often this is called
sed -n '330,360p' trainer/standard.pyRepository: TheApeMachine/caramba
Length of output: 1213
🏁 Script executed:
# Check if trainer/standard.py is imported by tuner or any of its dependencies
rg "from caramba.trainer|import.*trainer" ./layer/memory_block/memory/tuner.py ./layer/memory_block/memory/telemetry.pyRepository: TheApeMachine/caramba
Length of output: 47
🏁 Script executed:
# Check if any layer modules that trainer imports also import from trainer (circular)
grep -n "from caramba.trainer" ./layer/memory_block/block.py ./layer/memory_block/memory/memory.py ./layer/memory_block/memory/routing.py 2>/dev/null | head -20Repository: TheApeMachine/caramba
Length of output: 47
🏁 Script executed:
# Let's see the full method signature and understand how many times _run_single is called
sed -n '330,340p' trainer/standard.pyRepository: TheApeMachine/caramba
Length of output: 346
Moving inline import to module level is optional—no circular dependency exists, but the pattern is intentionally consistent across the codebase.
There is no circular import issue with moving from caramba.layer.memory_block.memory.tuner import reset_shared_tuner to the module level. The tuner module only imports from torch, math, typing, and internal telemetry—it has no dependencies on the trainer module. However, the inline import pattern is used consistently throughout the codebase (routing.py, writer.py, reader.py, memory.py), suggesting a deliberate architectural choice. Since the function is called once per training run (not in a tight loop), the overhead is negligible. If you prefer consistency with the rest of the codebase, keep it inline; if you prefer cleanliness at the module level, moving it is safe.
🤖 Prompt for AI Agents
In @trainer/standard.py around lines 343 - 346, Keep the inline import of
reset_shared_tuner where it’s called rather than moving it to module level to
preserve the established pattern used across the codebase; ensure the code still
calls reset_shared_tuner() in the same place and add a brief comment above the
inline import explaining the consistency rationale (the function name to locate
is reset_shared_tuner).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review continued from previous batch...
| def step(self, improved: bool) -> float: | ||
| """Take a step based on whether last move improved objective. | ||
|
|
||
| Momentum increments/decrements by 1 each step, acting as multiplier. | ||
|
|
||
| Args: | ||
| improved: True if objective improved, False if degraded | ||
|
|
||
| Returns: | ||
| Delta applied to value | ||
| """ | ||
| if not self.active: | ||
| # Check if cooldown period has passed | ||
| self.steps_since_deactivation += 1 | ||
| if self.steps_since_deactivation >= self.cooldown: | ||
| # Reactivate | ||
| self.active = True | ||
| self.steps_without_improvement = 0 | ||
| self.steps_since_deactivation = 0 | ||
| self.momentum = 1 # Start with small momentum | ||
| self.direction = 1.0 if self.value < (self.min_value + self.max_value) / 2 else -1.0 | ||
| else: | ||
| return 0.0 | ||
|
|
||
| if improved: | ||
| # Accelerate: increase momentum by 1 | ||
| self.momentum = min(self.momentum + 1, self.max_momentum) | ||
| self.steps_without_improvement = 0 | ||
| self.braking = False # Not braking if improving | ||
| else: | ||
| # Decelerate: decrease momentum by 1 | ||
| self.momentum = max(self.momentum - 1, 0) | ||
| self.steps_without_improvement += 1 | ||
|
|
||
| # If momentum reached zero, decide what to do | ||
| if self.momentum == 0: | ||
| if self.braking: | ||
| # We were braking and have now fully stopped - reverse direction | ||
| self.direction *= -1.0 | ||
| self.momentum = 1 # Start moving in new direction | ||
| self.braking = False | ||
| self.steps_without_improvement = 0 | ||
| elif self.steps_without_improvement >= self.patience: | ||
| # Stop exploring after patience exhausted | ||
| self.active = False | ||
| self.steps_since_deactivation = 0 | ||
| return 0.0 | ||
| else: | ||
| # Start braking phase (will reverse after fully stopped) | ||
| self.braking = True | ||
| self.momentum = 1 # Continue decelerating | ||
|
|
||
| # Apply step: momentum * base_step_size * direction | ||
| delta = self.direction * self.momentum * self.base_step_size | ||
| new_value = self.value + delta | ||
|
|
||
| # Bounce off boundaries | ||
| if new_value < self.min_value or new_value > self.max_value: | ||
| # Hit boundary - reverse and reduce momentum | ||
| self.direction *= -1.0 | ||
| self.momentum = max(1, self.momentum // 2) | ||
| new_value = max(self.min_value, min(self.max_value, new_value)) | ||
|
|
||
| actual_delta = new_value - self.value | ||
| self.value = new_value | ||
|
|
||
| return actual_delta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Check for existing unit tests for ParameterExplorer
rg -n "ParameterExplorer" --type=py -g "*test*"Repository: TheApeMachine/caramba
Length of output: 698
🏁 Script executed:
# First, let's locate the file and understand its structure
find . -name "tuner.py" -type f | head -20Repository: TheApeMachine/caramba
Length of output: 101
🏁 Script executed:
# Get the full class definition to understand ParameterExplorer
rg -A 200 "class ParameterExplorer" --type=pyRepository: TheApeMachine/caramba
Length of output: 15059
🏁 Script executed:
# Now let's read the test file to see what edge cases are actually covered
cat -n layer/memory_block/memory/tuner_test.pyRepository: TheApeMachine/caramba
Length of output: 6814
Add unit tests for two specific edge cases in ParameterExplorer.step().
Tests exist but don't cover: (1) boundary collision while in braking state (when momentum reaches 0 and direction reverses), and (2) full cooldown-to-active reactivation cycle. The state machine logic itself appears sound, but these transitions should be explicitly verified with test cases.
🧰 Tools
🪛 Ruff (0.14.10)
45-45: Boolean-typed positional argument in function definition
(FBT001)
🤖 Prompt for AI Agents
In @layer/memory_block/memory/tuner.py around lines 45 - 111, Add two unit tests
for ParameterExplorer.step(): one that forces a braking-edge collision by
setting up an explorer with momentum=1, braking=True, direction pointing toward
a boundary, and a non-improving step so momentum falls to 0 and triggers the
braking-reversal logic; assert that after the step the direction has flipped,
momentum is reset to >=1, value is clamped inside [min_value, max_value], and
braking is cleared. The second test should cover the cooldown-to-active
reactivation: initialize explorer.active=False, steps_since_deactivation set to
cooldown-1, then call step and assert it returns 0.0 until cooldown is reached,
after which explorer.active becomes True, steps_since_deactivation and
steps_without_improvement are zero, momentum==1, and direction is set to 1.0 or
-1.0 based on value vs (min_value+max_value)/2; use the ParameterExplorer.step,
active, cooldown, steps_since_deactivation, steps_without_improvement, momentum,
direction, base_step_size, min_value and max_value attributes to drive and
verify behavior.
…or handling, and updating documentation for clarity. Adjusted various modules to improve consistency and readability.
No description provided.