Skip to content

Releases: CloudLLM-ai/cloudllm

0.11.2

26 Feb 04:38
cfa172c

Choose a tag to compare

0.11.2 FEB/25/2026

  • maint: Remove deprecated grok-2-image-1212 image generation model
    • [REMOVED] ImageModel::Grok2Image enum variant mapped to deprecated grok-2-image-1212
    • xAI deprecated grok-2-image-1212 model effective FEB/28/2026; now exclusively use grok-imagine-image
    • Simplified ImageModel enum to single GrokImagineImage variant
    • Simplified image_model_to_string() to single pattern match (no breaking change; function is private)
  • docs: Updated README.md Grok Imagine model table from grok-2-image-1212 to grok-imagine-image
  • test: Updated image generation tests to expect grok-imagine-image model name

0.7.2

13 Dec 01:43
8be8bf6

Choose a tag to compare

gpt 5.2 support

Full Changelog: 0.7.1...0.7.2

0.6.3

05 Dec 13:52
735fa26

Choose a tag to compare

0.6.3 DEC/04/2025

  • feat: xAI Responses API Support for Agentic Tool Calling
    • GrokClient now automatically switches between Chat Completions and Responses API
    • When tools are provided (web_search, x_search, etc.), uses /v1/responses endpoint
    • Without tools, uses standard /v1/chat/completions endpoint
    • Real-time web search now working with current data and citations
  • new: send_and_track_responses() in common.rs for Responses API calls
  • refactor: GrokClient restructured with direct openai_rust::Client access
    • Removed OpenAIClient delegation pattern for better API control
    • Added token_usage tracking directly in GrokClient
  • deps: openai-rust2 upgraded to 1.7.1
    • Adds ResponsesArguments, ResponsesMessage, ResponsesCompletion types
    • Adds Client::create_responses() for /v1/responses endpoint
    • grok_tools field serializes as "server_tools" for Chat Completions
  • fix: ClientWrapper trait updated for GrokTool support across all clients
  • docs: Updated examples and tests for new Grok tool calling API
  • See: https://docs.x.ai/docs/guides/tools/search-tools

This is how you use now the search tools with a grok based llm session

let secret_key = std::env::var("XAI_API_KEY").expect("XAI_API_KEY not set");
   // Use grok-4-1-fast-reasoning which supports server_tools (web_search, x_search, etc.)
   let client = GrokClient::new_with_model_enum(&secret_key, grok::Model::Grok41FastReasoning);
   let mut llm_session: crate::LLMSession = crate::LLMSession::new(
       std::sync::Arc::new(client),
       "You are a helpful assistant with access to web search and X search.".to_string(),
       1048576,
   );

   // Create a new Tokio runtime
   let rt = tokio::runtime::Runtime::new().unwrap();

   // Use the new xAI Agent Tools API (replaces deprecated Live Search)
   // See: https://docs.x.ai/docs/guides/tools/overview
   let grok_tools = vec![
       GrokTool::web_search(),
       GrokTool::x_search(),
   ];

   let response_message: Message = rt.block_on(async {
       let s = llm_session
           .send_message(
               crate::Role::User,
               "What's the current price of Bitcoin? Search the web for the latest information."
                   .to_string(),
               Some(grok_tools),
           )
           .await;

       s.unwrap_or_else(|e| {
           log::error!("Error: {}", e);
           Message {
               role: crate::Role::System,
               content: format!("An error occurred: {:?}", e).into(),
           }
       })
   });
   ```

0.6.0

27 Oct 15:51
91550cf

Choose a tag to compare

0.6.0 OCT/27/2025

  • feat: MCPServerBuilder - Simplified MCP Server Creation
    • New MCPServerBuilder with fluent API for creating MCP servers
    • Built-in tool support: Memory, Bash, and custom tools via with_*_tool() methods
    • IP filtering with single IP, IPv6, and CIDR block support
    • Authentication support: Bearer tokens and basic auth
    • Pluggable HTTP adapter trait for framework flexibility (Axum, Actix, Warp, etc.)
    • Localhost-only convenience method for secure local development
    • Methods: allow_ip(), allow_cidr(), allow_localhost_only(), with_bearer_token(), with_basic_auth()
  • example: Comprehensive MCP Server with All Tools
    • New examples/mcp_server_all_tools.rs demonstrating complete server setup
    • All 5 framework tools integrated: Memory, Calculator, FileSystem, HTTP Client, Bash
    • Localhost-only security (127.0.0.1, ::1)
    • Ready for OpenAI Desktop Client integration at http://localhost:8008/mcp
    • Beautiful formatted startup output with tool descriptions and usage examples
    • CustomToolProtocol wrappers for flexible tool integration
  • feat: Calculator Tool Enhancement
    • Migrated from unmaintained meval v0.2 to actively maintained evalexpr v12.0.3
    • Added inverse hyperbolic functions: asinh(), acosh(), atanh()
    • Significantly improved performance and reliability
    • Removed future-incompatible nom v1.2.4 dependency
  • fix: HTTP Adapter Borrow Checker Issues
    • Fixed move conflicts in route handlers by wrapping bearer_token and allowed_ips in Arc
    • Pre-cloned Arc values before passing to route handlers
    • HTTP adapter now compiles cleanly with --features mcp-server
  • feat: Documentation and Examples
    • Updated all example documentation to clarify HTTP (not HTTPS) is secure for localhost-only servers
    • Explanation that localhost binding provides security, not protocol choice
    • Follows industry standard practice (npm, Flask, Django, etc.)
  • fix: Code Quality
    • Removed unused std::fs import in filesystem.rs tests
    • Added #[allow(dead_code)] attribute for example struct fields not currently used
    • All 185+ tests passing with zero warnings
  • deps: Updated dependencies
    • axum updated to 0.8.6 (from 0.7.9)
    • tower updated to 0.5.2 (from 0.4.13)
    • Improved HTTP server stability and performance

Full Changelog: 0.4.0...0.6.0

0.4.0

15 Oct 19:01
aac7300

Choose a tag to compare

CloudLLM 0.4.0: Introducing Mullti-Agent Council Support and basic (alpha) MCP Tool Support

CloudLLM Logo

Deploy compute to as many agents as you can afford and organize them under different topologies:

  • Parallel Mode - Independent Expert Analysis
  • Round-Robin Mode - Sequential Deliberation
  • Moderated Mode - Expert Panel with Chair
  • Hierarchical Mode - Multi-Layer Problem Solving
  • Debate Mode - Adversarial Refinement with Convergence

We've also introduced early support for Tool use, we'll be focusing heavily on tool usage for agents on our next releases.

See the TUTORIAL and examples

0.4.0 OCT/13/2025

  • feat: Multi-Agent Council System
    • New council.rs module implementing collaborative multi-agent orchestration
    • Five council modes for different collaboration patterns:
      • Parallel: All agents process prompt simultaneously, responses aggregated
      • RoundRobin: Agents take sequential turns building on previous responses
      • Moderated: Agents submit proposals, moderator synthesizes final answer
      • Hierarchical: Lead agent coordinates, specialists handle specific aspects
      • Debate: Agents discuss and challenge each other until convergence
    • Agent identity system with name, expertise, personality, and optional tool access
    • Conversation history tracking with CouncilMessage and round metadata
    • CouncilResponse includes final answer, message history, rounds executed, convergence score, and total tokens used
  • feat: Tool Protocol Abstraction Layer (tool_protocol.rs)
    • Flexible abstraction for connecting agents to various tool protocols
    • ToolProtocol trait with execute(), list_tools(), get_tool_metadata()
    • Support for MCP (Model Context Protocol), custom functions, and user-defined protocols
    • ToolResult struct with success status, output, error, and execution metadata
    • ToolParameter system with support for String, Number, Integer, Boolean, Array, Object types
    • ToolMetadata with parameter definitions and protocol-specific metadata
    • ToolRegistry for centralized tool management
    • Tool and ToolError types for type-safe tool operations
  • feat: Tool Adapters (tool_adapters.rs)
    • CustomToolAdapter: Execute user-defined Rust closures as tools
    • MCPToolAdapter: Integration with Model Context Protocol servers
    • OpenAIToolAdapter: Compatible with OpenAI function calling format
    • All adapters implement async ToolProtocol trait
  • feat: Automatic Tool Execution in Agent Generation
    • Agents automatically discover and execute tools during response generation
    • Tool information injected into system prompts with name, description, and parameters
    • JSON-based tool calling format: {"tool_call": {"name": "...", "parameters": {...}}}
    • Automatic tool execution loop with max 5 iterations to prevent infinite loops
    • Tool results fed back to LLM as user messages for continued generation
    • Token usage tracked cumulatively across all LLM calls and tool executions
    • New AgentResponse struct returns both content and token usage
    • Agent::generate_with_tokens() method for internal use with token tracking
  • feat: Convergence Detection for Debate Mode
    • Jaccard similarity-based convergence detection for debate termination
    • Compares word sets between consecutive debate rounds
    • Configurable convergence threshold (default: 0.75 / 75% similarity)
    • Early termination when agents reach consensus, saving tokens and cost
    • Convergence score returned in CouncilResponse for inspection
    • calculate_convergence_score() and jaccard_similarity() helper methods
  • feat: Token Usage Tracking in Council Modes
    • Parallel mode tracks and aggregates tokens from all concurrent agents
    • RoundRobin mode accumulates tokens across sequential turns
    • Token usage includes all LLM calls plus tool execution overhead
    • CouncilResponse.total_tokens_used provides complete cost visibility
  • feat: Comprehensive Multi-Agent Tutorial (COUNCIL_TUTORIAL.md)
    • Cookbook-style tutorial with progressive complexity
    • Five detailed recipes demonstrating each council mode
    • Real-world carbon capture strategy problem domain
    • Examples use multiple LLM providers (OpenAI, Claude, Gemini, Grok)
    • Up to 5 agents per example with distinct expertise and personalities
    • Tool integration example with MCPToolAdapter
    • Best practices guide for agent design and council mode selection
    • Troubleshooting section with common issues and solutions
    • Complete multi-stage pipeline example combining multiple modes
  • test: Added comprehensive test coverage
    • test_agent_with_tool_execution: Validates tool discovery, execution, and result integration
    • test_debate_mode_convergence: Validates convergence detection with mock agents
    • test_parallel_execution: Tests concurrent agent execution
    • test_round_robin_execution: Tests sequential turn-taking
    • test_moderated_execution: Tests proposal aggregation
    • test_hierarchical_execution: Tests lead-specialist coordination
    • test_debate_execution: Tests debate discussion flow
    • All 12 tests passing
  • refactor: Code quality improvements
    • Fixed all compiler warnings
    • Switched from std::sync::Mutex to tokio::sync::Mutex for async compatibility
    • Removed unused imports and assignments
    • Improved error handling in council execution paths
  • docs: Added inline documentation for council system and tool protocol
    • Module-level documentation with architecture diagrams
    • Example code in doc comments
    • Detailed parameter and return value documentation

0.3.0

11 Oct 20:38
b0276a7

Choose a tag to compare

CloudLLM 0.3.0 Change Report

Comparison window: tag 0.2.9 → current 0.3.0 (HEAD)

TL;DR

  • First-class response streaming across the stack (new MessageChunk type, ClientWrapper::send_message_stream, LLMSession::send_message_stream, streaming examples & tests).
  • Client trait shake-up: send_message now accepts &[Message], token usage retrieval is async, Message stores content as Arc<str>, and model_name() is exposed on every client/session.
  • Session engine revamp focused on throughput: bump allocation for message bodies, pre-transmission trimming of the prompt, reusable request buffers, and token-count caching.
  • New Claude client plus expanded Grok and Gemini support, all sharing a pooled HTTP client with persistent connections.
  • Extensive test/doc updates (integration tests moved under tests/, new streaming walkthroughs, refreshed README, new logo).
  • New dependencies: bumpalo, lazy_static, reqwest, futures-util; tokio::sync::Mutex replaces std Mutex in async paths.

Release Timeline Since 0.2.9

Version Highlights
0.2.10 Added Grok model enums (e.g., grok-4-fast-reasoning, grok-code-fast-1).
0.2.11 Introduced the Claude client (delegates to OpenAI transport), README updated to list Claude.
0.2.12 Claude client finalized with full model list, examples updated accordingly.
0.3.0 Major feature release: streaming, session optimization, async token usage, pooled HTTP connections, docs/tests overhaul.

(Changelog entries for these releases are consolidated in changelog.txt; the diff confirms the implementation of each item listed there.)


Major Features & Enhancements

1. Streaming Support (0.3.0)

  • Core trait additions (commit a12234f, 8f98bb6):
    • MessageChunk (content + optional finish_reason).
    • MessageChunkStream and MessageStreamFuture type aliases.
    • Default ClientWrapper::send_message_stream returning None; OpenAI and Grok override it.
  • LLMSession::send_message_stream mirrors session-aware flow; it reuses trimming and request buffering, returning an optional stream (commit a12234f).
  • New examples:
    • examples/streaming_example.rs & .md: minimal streaming walkthrough.
    • examples/interactive_streaming_session.rs & .md: interactive CLI with live token rendering.
  • tests/streaming_tests.rs covers chunk delivery paths.
  • README now advertises streaming as a top-line feature.

2. Client Infrastructure Improvements

  • HTTP connection pooling (21cfbf9): shared reqwest::Client (via lazy_static) with tuned keep-alive and idle timeouts (clients/common.rs). Every client constructor now uses new_with_client(_and_base_url).
  • Token usage tracking:
    • Swapped to tokio::sync::Mutex to avoid blocking in async contexts (5e0791f).
    • Trait hook get_last_usage is now async fn, delegating to the mutex.
    • Additional tests under tests/client_wrappers_tests.rs verify concurrency.
  • Claude client (ec09a0b9a8f9ca):
    • New file src/cloudllm/clients/claude.rs with full model enum, constructors, and streaming compatibility via OpenAI delegate.
    • README and examples updated accordingly.
  • Gemini & Grok clients follow new trait contracts, reuse the pooled HTTP client, and forward streaming to the delegate (OpenAI for Grok).

3. Session Engine Overhaul

  • LLMSession rewrites (52 series of commits through a12234f):
    • Stores conversation history in Vec<Message> with parallel cached_token_counts to avoid repeated token estimation.
    • Uses a Bump arena for message bodies and a reusable request_buffer for outbound payloads (commits 6354b2c, 6a3646c).
    • Adds pre-transmission trimming: history is pruned before hitting the API when estimated tokens exceed max_tokens (0528983).
    • After the response, actual usage (when available) triggers another pruning pass.
    • New helpers: LLMSession::model_name(), LLMSession::last_token_usage().
    • Streaming method described earlier integrates with the same trimming pipeline.

4. Documentation & Examples

  • README v2:
    • New inline logo asset (logo.png replaces external link).
    • Highlights streaming, marks Claude as “supported”.
    • Dependency line now reflects 0.2.12 (likely needs another bump to 0.3.0 before publishing).
  • Fresh guides (examples/*_streaming_*.rs/.md) demonstrate using futures_util::StreamExt with the new API.
  • changelog.txt expanded with multi-level bullet items covering each addition.

5. Testing

  • Unit/integration tests extracted from client modules into tests/ (b0955e2, 5392b09):
    • llm_session_tests.rs, llm_session_bump_allocations_test.rs, connection_pooling_test.rs, client_tests.rs, etc.
    • Coverage now includes buffer reuse, token caching, async mutex behavior, and streaming flows.
  • Example binaries (run_streaming_example, run_example) accompany the docs.

API & Behavioral Changes (Action Required)

Area Old (≤0.2.9) New (0.3.0) Impact
ClientWrapper::send_message async fn send_message(&self, messages: Vec<Message>, …) async fn send_message(&self, messages: &[Message], …) Callers must pass a slice reference. Reuse the same Vec to avoid allocations.
ClientWrapper::get_last_usage Sync method returning Option<TokenUsage> async fn get_last_usage(&self) -> Option<TokenUsage> Requires .await. Update call sites, including in examples/tests.
Message struct content: String content: Arc<str> When constructing manually, convert with "text".into() or Arc::<str>::from("text").
Streaming Not available ClientWrapper::send_message_stream and LLMSession::send_message_stream Requires futures-util::StreamExt (already added) to iterate over chunks.
Session API No streaming, limited inspection Added send_message_stream, model_name, last_token_usage Enables richer instrumentation and streaming integration.
Token usage mutex std::sync::Mutex tokio::sync::Mutex Removes potential deadlocks in async contexts; trait methods now async.
HTTP client reuse Each client built its own internal client Shared pooled reqwest::Client Reduced connection churn; no action required by consumers.

Code migration examples

Before (0.2.9):

let mut messages = vec![
    Message { role: Role::System, content: "You are helpful.".to_string() },
    Message { role: Role::User, content: user_prompt.clone() },
];

let reply = client.send_message(messages, None).await?;
if let Some(usage) = client.get_last_usage() {
    println!("total tokens: {}", usage.total_tokens);
}

After (0.3.0):

let messages = vec![
    Message { role: Role::System, content: "You are helpful.".into() },
    Message { role: Role::User, content: Arc::<str>::from(user_prompt.as_str()) },
];

let reply = client.send_message(&messages, None).await?;

if let Some(usage) = client.get_last_usage().await {
    println!("total tokens: {}", usage.total_tokens);
}

Streaming integration (new):

use futures_util::StreamExt;

if let Some(mut stream) = session
    .send_message_stream(Role::User, "Tell me a joke".into(), None)
    .await?
{
    while let Some(chunk) = stream.next().await {
        let chunk = chunk?;
        print!("{}", chunk.content);
        if let Some(reason) = chunk.finish_reason {
            println!("\nStream finished because: {reason}");
        }
    }
} else {
    println!("This client does not support streaming yet.");
}

Dependencies & Tooling

  • Cargo.toml now pulls in:
    • bumpalo = "3.16" (arena allocator for messages).
    • lazy_static = "1.5.0" (shared HTTP client).
    • reqwest = "0.12" (transport for pooled HTTP connections).
    • futures-util = "0.3" (stream combinators for chunk handling).
  • Existing dependencies (e.g., tokio 1.47.1, openai-rust2 1.6.0) remain but the new features rely on them more heavily.

Additional Notes & Observations

  • LLMSession still trims conversation history using Vec::remove(0) despite the changelog mentioning VecDeque; functionality-wise, trimming behavior is improved through cached token counts and pre-send pruning, but not O(1) removal.
  • TokenUsage is now Clone + Debug, reflecting caching requirements.
  • LLMSession::token_usage() still returns totals accumulated from provider-reported usage; be aware that streaming paths currently do not update usage counters automatically (explicitly documented in changelog and code comments).
  • README dependency example still states cloudllm = "0.2.12"—update to 0.3.0 before publishing crates.io release notes.

Recommended Migration Checklist for Integrators

  1. Update dependencies to cloudllm = "0.3.0" and ensure futures-util is available if you plan to consume streaming APIs.
  2. Adjust client calls:
    • Pass message slices (&messages) to send_message.
    • Await client.get_last_usage().await.
    • Review any direct Message constructions for the Arc<str> content type.
  3. Adopt optional streaming where beneficial; see examples/streaming_example.rs for end-to-end usage.
  4. Review session hooks (model_name, last_token_usage) for monitoring/telemetry integration.
  5. If you implemented custom ClientWrappers, implement the new trait methods:
    • Accept &[Message], return Arc<str> content, provide model_name, ensure get_last_usage awaits a tokio::sync::Mutex.
    • Optionally override send_message_stream to expose streaming.
  6. Update documentation/tests in your codebase to reflect async token usage and streaming behaviour.
    ...
Read more

0.2.11

22 Sep 01:47
9a8f9ca

Choose a tag to compare

0.2.11 SEP/21/2025

  • Added Claude client implementation at src/cloudllm/clients/claude.rs:
    • ClaudeClient struct follows the same delegate pattern as GrokClient, using OpenAIClient internally
    • Supports Anthropic API base URL (https://api.anthropic.com/v1)
    • Includes 6 Claude model variants: Claude35Sonnet20241022, Claude35Haiku20241022, Claude3Opus20240229, Claude35Sonnet20240620, Claude3Sonnet20240229, Claude3Haiku20240307
    • Implements all standard constructor methods and ClientWrapper trait
    • Added test function with CLAUDE_API_KEY environment variable
    • Updated README.md to mark Claude as supported (removed "Coming Soon")
    • Added Claude example to interactive_session.rs example

0.2.10 SEP/21/2025

  • New Grok model enums added to the GrokClient:
    • grok-4-fast-reasoning
    • grok-4-fast-non-reasoning
    • grok-code-fast-1

0.2.9

07 Aug 23:09
56cb0a3

Choose a tag to compare

0.2.9

  • New OpenAI model enums added to the OpenAIClient:
    • gpt-5
    • gpt-5-mini
    • gpt-5-nano
    • gpt-5-chat-latest
  • Upgraded tokio to 1.47.1

0.2.8

17 Jul 21:33
2fb702b

Choose a tag to compare

0.2.8

  • Bumped cloudllm version to 0.2.8
  • Upgraded tokio dependency from 1.44.5 to 1.46.1
  • Updated Grok client model names and enums in src/cloudllm/clients/grok.rs:
  • Renamed Grok3MiniFastBeta to Grok3MiniFast, Grok3MiniBeta to Grok3Mini, Grok3FastBeta to Grok3Fast, Grok3Beta to Grok3, and Grok3Latest to Grok4_0709
  • Updated model_to_string function to reflect new model names
  • Changed test client initialization to use Grok4_0709 instead of Grok3Latest
  • Updated Gemini client model names and enums in src/cloudllm/clients/gemini.rs:
  • Renamed Gemini25FlashPreview0520 to Gemini25Flash and Gemini25ProPreview0506 to Gemini25Pro to reflect stable releases
  • Added new model enum Gemini25FlashLitePreview0617 for lightweight preview model
  • Updated model_to_string function to map new enum names: gemini-2.5-flash, gemini-2.5-pro, and gemini-2.5-flash-lite-preview-06-17

0.2.7

28 May 22:17
12eb152

Choose a tag to compare

0.2.7

  • Bumped cloudllm version to 0.2.7
  • Upgraded openai-rust2 dependency from 1.5.9 to 1.6.0
  • Extended ChatArguments and client wrappers for search and tool support:
    • Added SearchParameters struct and with_search_parameters() builder to openai_rust::chat::ChatArguments
    • Added ToolType enum and Tool struct, plus tools field and with_tools() builder (snake_case serialization)
    • Updated ClientWrapper::send_message signature to accept optional_search_parameters: Option<SearchParameters>
    • Modified clients/common.rs send_and_track() to take and inject optional_search_parameters
    • Updated OpenAIClient, GeminiClient, and GrokClient to forward optional_search_parameters to send_and_track
    • Exposed optional_search_parameters through LLMSession::send_message and its callers
  • Other updates:
    • Added Grok3Latest variant to grok::Model enum and updated test to use it
    • Ensured backward compatibility: all existing call sites default optional_search_parameters to None

Screenshot 2025-05-28 at 4 15 18 PM

Example of how to use the new optional SearchParameters with the GrokClient

#[test]
pub fn test_grok_client() {
    // initialize logger
    crate::init_logger();

    let secret_key = env::var("XAI_API_KEY").expect("XAI_API_KEY not set");
    let client = GrokClient::new_with_model_enum(&secret_key, Grok3Latest); // LiveSearch only works with this model for now, most expensive output tokens
    let mut llm_session: LLMSession = LLMSession::new(
        std::sync::Arc::new(client),
        "You are a math professor.".to_string(),
        1048576,
    );

    // Create a new Tokio runtime
    let rt = Runtime::new().unwrap();

    let search_parameters =
        openai_rust::chat::SearchParameters::new(SearchMode::On).with_citations(true);

    let response_message: Message = rt.block_on(async {
        let s = llm_session
            .send_message(
                Role::User,
                "Using your Live search capabilities: What's the current price of Bitcoin?"
                    .to_string(),
                Some(search_parameters),
            )
            .await;

        match s {
            Ok(msg) => msg,
            Err(e) => {
                error!("Error: {}", e);
                Message {
                    role: Role::System,
                    content: format!("An error occurred: {:?}", e),
                }
            }
        }
    });

    info!("test_grok_client() response: {}", response_message.content);
}