Releases: CloudLLM-ai/cloudllm
0.11.2
0.11.2 FEB/25/2026
- maint: Remove deprecated grok-2-image-1212 image generation model
- [REMOVED] ImageModel::Grok2Image enum variant mapped to deprecated grok-2-image-1212
- xAI deprecated grok-2-image-1212 model effective FEB/28/2026; now exclusively use grok-imagine-image
- Simplified ImageModel enum to single GrokImagineImage variant
- Simplified image_model_to_string() to single pattern match (no breaking change; function is private)
- docs: Updated README.md Grok Imagine model table from grok-2-image-1212 to grok-imagine-image
- test: Updated image generation tests to expect grok-imagine-image model name
0.7.2
gpt 5.2 support
Full Changelog: 0.7.1...0.7.2
0.6.3
0.6.3 DEC/04/2025
- feat: xAI Responses API Support for Agentic Tool Calling
- GrokClient now automatically switches between Chat Completions and Responses API
- When tools are provided (web_search, x_search, etc.), uses /v1/responses endpoint
- Without tools, uses standard /v1/chat/completions endpoint
- Real-time web search now working with current data and citations
- new: send_and_track_responses() in common.rs for Responses API calls
- refactor: GrokClient restructured with direct openai_rust::Client access
- Removed OpenAIClient delegation pattern for better API control
- Added token_usage tracking directly in GrokClient
- deps: openai-rust2 upgraded to 1.7.1
- Adds ResponsesArguments, ResponsesMessage, ResponsesCompletion types
- Adds Client::create_responses() for /v1/responses endpoint
- grok_tools field serializes as "server_tools" for Chat Completions
- fix: ClientWrapper trait updated for GrokTool support across all clients
- docs: Updated examples and tests for new Grok tool calling API
- See: https://docs.x.ai/docs/guides/tools/search-tools
This is how you use now the search tools with a grok based llm session
let secret_key = std::env::var("XAI_API_KEY").expect("XAI_API_KEY not set");
// Use grok-4-1-fast-reasoning which supports server_tools (web_search, x_search, etc.)
let client = GrokClient::new_with_model_enum(&secret_key, grok::Model::Grok41FastReasoning);
let mut llm_session: crate::LLMSession = crate::LLMSession::new(
std::sync::Arc::new(client),
"You are a helpful assistant with access to web search and X search.".to_string(),
1048576,
);
// Create a new Tokio runtime
let rt = tokio::runtime::Runtime::new().unwrap();
// Use the new xAI Agent Tools API (replaces deprecated Live Search)
// See: https://docs.x.ai/docs/guides/tools/overview
let grok_tools = vec![
GrokTool::web_search(),
GrokTool::x_search(),
];
let response_message: Message = rt.block_on(async {
let s = llm_session
.send_message(
crate::Role::User,
"What's the current price of Bitcoin? Search the web for the latest information."
.to_string(),
Some(grok_tools),
)
.await;
s.unwrap_or_else(|e| {
log::error!("Error: {}", e);
Message {
role: crate::Role::System,
content: format!("An error occurred: {:?}", e).into(),
}
})
});
```0.6.0
0.6.0 OCT/27/2025
- feat: MCPServerBuilder - Simplified MCP Server Creation
- New MCPServerBuilder with fluent API for creating MCP servers
- Built-in tool support: Memory, Bash, and custom tools via with_*_tool() methods
- IP filtering with single IP, IPv6, and CIDR block support
- Authentication support: Bearer tokens and basic auth
- Pluggable HTTP adapter trait for framework flexibility (Axum, Actix, Warp, etc.)
- Localhost-only convenience method for secure local development
- Methods: allow_ip(), allow_cidr(), allow_localhost_only(), with_bearer_token(), with_basic_auth()
- example: Comprehensive MCP Server with All Tools
- New examples/mcp_server_all_tools.rs demonstrating complete server setup
- All 5 framework tools integrated: Memory, Calculator, FileSystem, HTTP Client, Bash
- Localhost-only security (127.0.0.1, ::1)
- Ready for OpenAI Desktop Client integration at http://localhost:8008/mcp
- Beautiful formatted startup output with tool descriptions and usage examples
- CustomToolProtocol wrappers for flexible tool integration
- feat: Calculator Tool Enhancement
- Migrated from unmaintained meval v0.2 to actively maintained evalexpr v12.0.3
- Added inverse hyperbolic functions: asinh(), acosh(), atanh()
- Significantly improved performance and reliability
- Removed future-incompatible nom v1.2.4 dependency
- fix: HTTP Adapter Borrow Checker Issues
- Fixed move conflicts in route handlers by wrapping bearer_token and allowed_ips in Arc
- Pre-cloned Arc values before passing to route handlers
- HTTP adapter now compiles cleanly with --features mcp-server
- feat: Documentation and Examples
- Updated all example documentation to clarify HTTP (not HTTPS) is secure for localhost-only servers
- Explanation that localhost binding provides security, not protocol choice
- Follows industry standard practice (npm, Flask, Django, etc.)
- fix: Code Quality
- Removed unused std::fs import in filesystem.rs tests
- Added #[allow(dead_code)] attribute for example struct fields not currently used
- All 185+ tests passing with zero warnings
- deps: Updated dependencies
- axum updated to 0.8.6 (from 0.7.9)
- tower updated to 0.5.2 (from 0.4.13)
- Improved HTTP server stability and performance
Full Changelog: 0.4.0...0.6.0
0.4.0
CloudLLM 0.4.0: Introducing Mullti-Agent Council Support and basic (alpha) MCP Tool Support
Deploy compute to as many agents as you can afford and organize them under different topologies:
- Parallel Mode - Independent Expert Analysis
- Round-Robin Mode - Sequential Deliberation
- Moderated Mode - Expert Panel with Chair
- Hierarchical Mode - Multi-Layer Problem Solving
- Debate Mode - Adversarial Refinement with Convergence
We've also introduced early support for Tool use, we'll be focusing heavily on tool usage for agents on our next releases.
See the TUTORIAL and examples
0.4.0 OCT/13/2025
- feat: Multi-Agent Council System
- New council.rs module implementing collaborative multi-agent orchestration
- Five council modes for different collaboration patterns:
- Parallel: All agents process prompt simultaneously, responses aggregated
- RoundRobin: Agents take sequential turns building on previous responses
- Moderated: Agents submit proposals, moderator synthesizes final answer
- Hierarchical: Lead agent coordinates, specialists handle specific aspects
- Debate: Agents discuss and challenge each other until convergence
- Agent identity system with name, expertise, personality, and optional tool access
- Conversation history tracking with CouncilMessage and round metadata
- CouncilResponse includes final answer, message history, rounds executed, convergence score, and total tokens used
- feat: Tool Protocol Abstraction Layer (tool_protocol.rs)
- Flexible abstraction for connecting agents to various tool protocols
- ToolProtocol trait with execute(), list_tools(), get_tool_metadata()
- Support for MCP (Model Context Protocol), custom functions, and user-defined protocols
- ToolResult struct with success status, output, error, and execution metadata
- ToolParameter system with support for String, Number, Integer, Boolean, Array, Object types
- ToolMetadata with parameter definitions and protocol-specific metadata
- ToolRegistry for centralized tool management
- Tool and ToolError types for type-safe tool operations
- feat: Tool Adapters (tool_adapters.rs)
- CustomToolAdapter: Execute user-defined Rust closures as tools
- MCPToolAdapter: Integration with Model Context Protocol servers
- OpenAIToolAdapter: Compatible with OpenAI function calling format
- All adapters implement async ToolProtocol trait
- feat: Automatic Tool Execution in Agent Generation
- Agents automatically discover and execute tools during response generation
- Tool information injected into system prompts with name, description, and parameters
- JSON-based tool calling format: {"tool_call": {"name": "...", "parameters": {...}}}
- Automatic tool execution loop with max 5 iterations to prevent infinite loops
- Tool results fed back to LLM as user messages for continued generation
- Token usage tracked cumulatively across all LLM calls and tool executions
- New AgentResponse struct returns both content and token usage
- Agent::generate_with_tokens() method for internal use with token tracking
- feat: Convergence Detection for Debate Mode
- Jaccard similarity-based convergence detection for debate termination
- Compares word sets between consecutive debate rounds
- Configurable convergence threshold (default: 0.75 / 75% similarity)
- Early termination when agents reach consensus, saving tokens and cost
- Convergence score returned in CouncilResponse for inspection
- calculate_convergence_score() and jaccard_similarity() helper methods
- feat: Token Usage Tracking in Council Modes
- Parallel mode tracks and aggregates tokens from all concurrent agents
- RoundRobin mode accumulates tokens across sequential turns
- Token usage includes all LLM calls plus tool execution overhead
- CouncilResponse.total_tokens_used provides complete cost visibility
- feat: Comprehensive Multi-Agent Tutorial (COUNCIL_TUTORIAL.md)
- Cookbook-style tutorial with progressive complexity
- Five detailed recipes demonstrating each council mode
- Real-world carbon capture strategy problem domain
- Examples use multiple LLM providers (OpenAI, Claude, Gemini, Grok)
- Up to 5 agents per example with distinct expertise and personalities
- Tool integration example with MCPToolAdapter
- Best practices guide for agent design and council mode selection
- Troubleshooting section with common issues and solutions
- Complete multi-stage pipeline example combining multiple modes
- test: Added comprehensive test coverage
- test_agent_with_tool_execution: Validates tool discovery, execution, and result integration
- test_debate_mode_convergence: Validates convergence detection with mock agents
- test_parallel_execution: Tests concurrent agent execution
- test_round_robin_execution: Tests sequential turn-taking
- test_moderated_execution: Tests proposal aggregation
- test_hierarchical_execution: Tests lead-specialist coordination
- test_debate_execution: Tests debate discussion flow
- All 12 tests passing
- refactor: Code quality improvements
- Fixed all compiler warnings
- Switched from std::sync::Mutex to tokio::sync::Mutex for async compatibility
- Removed unused imports and assignments
- Improved error handling in council execution paths
- docs: Added inline documentation for council system and tool protocol
- Module-level documentation with architecture diagrams
- Example code in doc comments
- Detailed parameter and return value documentation
0.3.0
CloudLLM 0.3.0 Change Report
Comparison window: tag 0.2.9 → current 0.3.0 (HEAD)
TL;DR
- First-class response streaming across the stack (new
MessageChunktype,ClientWrapper::send_message_stream,LLMSession::send_message_stream, streaming examples & tests). - Client trait shake-up:
send_messagenow accepts&[Message], token usage retrieval isasync,Messagestores content asArc<str>, andmodel_name()is exposed on every client/session. - Session engine revamp focused on throughput: bump allocation for message bodies, pre-transmission trimming of the prompt, reusable request buffers, and token-count caching.
- New Claude client plus expanded Grok and Gemini support, all sharing a pooled HTTP client with persistent connections.
- Extensive test/doc updates (integration tests moved under
tests/, new streaming walkthroughs, refreshed README, new logo). - New dependencies:
bumpalo,lazy_static,reqwest,futures-util;tokio::sync::Mutexreplaces stdMutexin async paths.
Release Timeline Since 0.2.9
| Version | Highlights |
|---|---|
| 0.2.10 | Added Grok model enums (e.g., grok-4-fast-reasoning, grok-code-fast-1). |
| 0.2.11 | Introduced the Claude client (delegates to OpenAI transport), README updated to list Claude. |
| 0.2.12 | Claude client finalized with full model list, examples updated accordingly. |
| 0.3.0 | Major feature release: streaming, session optimization, async token usage, pooled HTTP connections, docs/tests overhaul. |
(Changelog entries for these releases are consolidated in changelog.txt; the diff confirms the implementation of each item listed there.)
Major Features & Enhancements
1. Streaming Support (0.3.0)
- Core trait additions (commit
a12234f,8f98bb6):MessageChunk(content + optionalfinish_reason).MessageChunkStreamandMessageStreamFuturetype aliases.- Default
ClientWrapper::send_message_streamreturningNone; OpenAI and Grok override it.
LLMSession::send_message_streammirrors session-aware flow; it reuses trimming and request buffering, returning an optional stream (commita12234f).- New examples:
examples/streaming_example.rs&.md: minimal streaming walkthrough.examples/interactive_streaming_session.rs&.md: interactive CLI with live token rendering.
tests/streaming_tests.rscovers chunk delivery paths.- README now advertises streaming as a top-line feature.
2. Client Infrastructure Improvements
- HTTP connection pooling (
21cfbf9): sharedreqwest::Client(vialazy_static) with tuned keep-alive and idle timeouts (clients/common.rs). Every client constructor now usesnew_with_client(_and_base_url). - Token usage tracking:
- Swapped to
tokio::sync::Mutexto avoid blocking in async contexts (5e0791f). - Trait hook
get_last_usageis nowasync fn, delegating to the mutex. - Additional tests under
tests/client_wrappers_tests.rsverify concurrency.
- Swapped to
- Claude client (
ec09a0b–9a8f9ca):- New file
src/cloudllm/clients/claude.rswith full model enum, constructors, and streaming compatibility via OpenAI delegate. - README and examples updated accordingly.
- New file
- Gemini & Grok clients follow new trait contracts, reuse the pooled HTTP client, and forward streaming to the delegate (OpenAI for Grok).
3. Session Engine Overhaul
LLMSessionrewrites (52series of commits througha12234f):- Stores conversation history in
Vec<Message>with parallelcached_token_countsto avoid repeated token estimation. - Uses a
Bumparena for message bodies and a reusablerequest_bufferfor outbound payloads (commits6354b2c,6a3646c). - Adds pre-transmission trimming: history is pruned before hitting the API when estimated tokens exceed
max_tokens(0528983). - After the response, actual usage (when available) triggers another pruning pass.
- New helpers:
LLMSession::model_name(),LLMSession::last_token_usage(). - Streaming method described earlier integrates with the same trimming pipeline.
- Stores conversation history in
4. Documentation & Examples
- README v2:
- New inline logo asset (
logo.pngreplaces external link). - Highlights streaming, marks Claude as “supported”.
- Dependency line now reflects 0.2.12 (likely needs another bump to 0.3.0 before publishing).
- New inline logo asset (
- Fresh guides (
examples/*_streaming_*.rs/.md) demonstrate usingfutures_util::StreamExtwith the new API. changelog.txtexpanded with multi-level bullet items covering each addition.
5. Testing
- Unit/integration tests extracted from client modules into
tests/(b0955e2,5392b09):llm_session_tests.rs,llm_session_bump_allocations_test.rs,connection_pooling_test.rs,client_tests.rs, etc.- Coverage now includes buffer reuse, token caching, async mutex behavior, and streaming flows.
- Example binaries (
run_streaming_example,run_example) accompany the docs.
API & Behavioral Changes (Action Required)
| Area | Old (≤0.2.9) | New (0.3.0) | Impact |
|---|---|---|---|
ClientWrapper::send_message |
async fn send_message(&self, messages: Vec<Message>, …) |
async fn send_message(&self, messages: &[Message], …) |
Callers must pass a slice reference. Reuse the same Vec to avoid allocations. |
ClientWrapper::get_last_usage |
Sync method returning Option<TokenUsage> |
async fn get_last_usage(&self) -> Option<TokenUsage> |
Requires .await. Update call sites, including in examples/tests. |
Message struct |
content: String |
content: Arc<str> |
When constructing manually, convert with "text".into() or Arc::<str>::from("text"). |
| Streaming | Not available | ClientWrapper::send_message_stream and LLMSession::send_message_stream |
Requires futures-util::StreamExt (already added) to iterate over chunks. |
| Session API | No streaming, limited inspection | Added send_message_stream, model_name, last_token_usage |
Enables richer instrumentation and streaming integration. |
| Token usage mutex | std::sync::Mutex |
tokio::sync::Mutex |
Removes potential deadlocks in async contexts; trait methods now async. |
| HTTP client reuse | Each client built its own internal client | Shared pooled reqwest::Client |
Reduced connection churn; no action required by consumers. |
Code migration examples
Before (0.2.9):
let mut messages = vec![
Message { role: Role::System, content: "You are helpful.".to_string() },
Message { role: Role::User, content: user_prompt.clone() },
];
let reply = client.send_message(messages, None).await?;
if let Some(usage) = client.get_last_usage() {
println!("total tokens: {}", usage.total_tokens);
}After (0.3.0):
let messages = vec![
Message { role: Role::System, content: "You are helpful.".into() },
Message { role: Role::User, content: Arc::<str>::from(user_prompt.as_str()) },
];
let reply = client.send_message(&messages, None).await?;
if let Some(usage) = client.get_last_usage().await {
println!("total tokens: {}", usage.total_tokens);
}Streaming integration (new):
use futures_util::StreamExt;
if let Some(mut stream) = session
.send_message_stream(Role::User, "Tell me a joke".into(), None)
.await?
{
while let Some(chunk) = stream.next().await {
let chunk = chunk?;
print!("{}", chunk.content);
if let Some(reason) = chunk.finish_reason {
println!("\nStream finished because: {reason}");
}
}
} else {
println!("This client does not support streaming yet.");
}Dependencies & Tooling
Cargo.tomlnow pulls in:bumpalo = "3.16"(arena allocator for messages).lazy_static = "1.5.0"(shared HTTP client).reqwest = "0.12"(transport for pooled HTTP connections).futures-util = "0.3"(stream combinators for chunk handling).
- Existing dependencies (e.g.,
tokio 1.47.1,openai-rust2 1.6.0) remain but the new features rely on them more heavily.
Additional Notes & Observations
LLMSessionstill trims conversation history usingVec::remove(0)despite the changelog mentioningVecDeque; functionality-wise, trimming behavior is improved through cached token counts and pre-send pruning, but not O(1) removal.TokenUsageis nowClone + Debug, reflecting caching requirements.LLMSession::token_usage()still returns totals accumulated from provider-reported usage; be aware that streaming paths currently do not update usage counters automatically (explicitly documented in changelog and code comments).- README dependency example still states
cloudllm = "0.2.12"—update to0.3.0before publishing crates.io release notes.
Recommended Migration Checklist for Integrators
- Update dependencies to
cloudllm = "0.3.0"and ensurefutures-utilis available if you plan to consume streaming APIs. - Adjust client calls:
- Pass message slices (
&messages) tosend_message. - Await
client.get_last_usage().await. - Review any direct
Messageconstructions for theArc<str>content type.
- Pass message slices (
- Adopt optional streaming where beneficial; see
examples/streaming_example.rsfor end-to-end usage. - Review session hooks (
model_name,last_token_usage) for monitoring/telemetry integration. - If you implemented custom
ClientWrappers, implement the new trait methods:- Accept
&[Message], returnArc<str>content, providemodel_name, ensureget_last_usageawaits atokio::sync::Mutex. - Optionally override
send_message_streamto expose streaming.
- Accept
- Update documentation/tests in your codebase to reflect async token usage and streaming behaviour.
...
0.2.11
0.2.11 SEP/21/2025
- Added Claude client implementation at src/cloudllm/clients/claude.rs:
- ClaudeClient struct follows the same delegate pattern as GrokClient, using OpenAIClient internally
- Supports Anthropic API base URL (https://api.anthropic.com/v1)
- Includes 6 Claude model variants: Claude35Sonnet20241022, Claude35Haiku20241022, Claude3Opus20240229, Claude35Sonnet20240620, Claude3Sonnet20240229, Claude3Haiku20240307
- Implements all standard constructor methods and ClientWrapper trait
- Added test function with CLAUDE_API_KEY environment variable
- Updated README.md to mark Claude as supported (removed "Coming Soon")
- Added Claude example to interactive_session.rs example
0.2.10 SEP/21/2025
- New Grok model enums added to the GrokClient:
- grok-4-fast-reasoning
- grok-4-fast-non-reasoning
- grok-code-fast-1
0.2.9
0.2.8
0.2.8
- Bumped cloudllm version to 0.2.8
- Upgraded tokio dependency from 1.44.5 to 1.46.1
- Updated Grok client model names and enums in src/cloudllm/clients/grok.rs:
- Renamed Grok3MiniFastBeta to Grok3MiniFast, Grok3MiniBeta to Grok3Mini, Grok3FastBeta to Grok3Fast, Grok3Beta to Grok3, and Grok3Latest to Grok4_0709
- Updated model_to_string function to reflect new model names
- Changed test client initialization to use Grok4_0709 instead of Grok3Latest
- Updated Gemini client model names and enums in src/cloudllm/clients/gemini.rs:
- Renamed Gemini25FlashPreview0520 to Gemini25Flash and Gemini25ProPreview0506 to Gemini25Pro to reflect stable releases
- Added new model enum Gemini25FlashLitePreview0617 for lightweight preview model
- Updated model_to_string function to map new enum names: gemini-2.5-flash, gemini-2.5-pro, and gemini-2.5-flash-lite-preview-06-17
0.2.7
0.2.7
- Bumped cloudllm version to 0.2.7
- Upgraded openai-rust2 dependency from 1.5.9 to 1.6.0
- Extended ChatArguments and client wrappers for search and tool support:
- Added
SearchParametersstruct andwith_search_parameters()builder toopenai_rust::chat::ChatArguments - Added
ToolTypeenum andToolstruct, plustoolsfield andwith_tools()builder (snake_case serialization) - Updated
ClientWrapper::send_messagesignature to acceptoptional_search_parameters: Option<SearchParameters> - Modified
clients/common.rssend_and_track()to take and injectoptional_search_parameters - Updated
OpenAIClient,GeminiClient, andGrokClientto forwardoptional_search_parameterstosend_and_track - Exposed
optional_search_parametersthroughLLMSession::send_messageand its callers
- Added
- Other updates:
- Added
Grok3Latestvariant togrok::Modelenum and updated test to use it - Ensured backward compatibility: all existing call sites default
optional_search_parameterstoNone
- Added
Example of how to use the new optional SearchParameters with the GrokClient
#[test]
pub fn test_grok_client() {
// initialize logger
crate::init_logger();
let secret_key = env::var("XAI_API_KEY").expect("XAI_API_KEY not set");
let client = GrokClient::new_with_model_enum(&secret_key, Grok3Latest); // LiveSearch only works with this model for now, most expensive output tokens
let mut llm_session: LLMSession = LLMSession::new(
std::sync::Arc::new(client),
"You are a math professor.".to_string(),
1048576,
);
// Create a new Tokio runtime
let rt = Runtime::new().unwrap();
let search_parameters =
openai_rust::chat::SearchParameters::new(SearchMode::On).with_citations(true);
let response_message: Message = rt.block_on(async {
let s = llm_session
.send_message(
Role::User,
"Using your Live search capabilities: What's the current price of Bitcoin?"
.to_string(),
Some(search_parameters),
)
.await;
match s {
Ok(msg) => msg,
Err(e) => {
error!("Error: {}", e);
Message {
role: Role::System,
content: format!("An error occurred: {:?}", e),
}
}
}
});
info!("test_grok_client() response: {}", response_message.content);
}