Skip to content

Conversation

cpetersen
Copy link

What this does

This PR adds support for the Red Candle provider, enabling local LLM execution using quantized GGUF models directly in Ruby without requiring external API calls.

Key Implementation Details

Red Candle is fundamentally different from other providers: While all other RubyLLM providers communicate via HTTP APIs, Red Candle runs models locally using the Candle Rust crate. This brings true local inference to Ruby, with no network
latency or API costs.

Dependency Management

Since Red Candle requires a Rust toolchain at build time, we've made it optional at two levels:

  • For end users: red-candle is NOT a gemspec dependency. Users must explicitly add gem 'red-candle' to their Gemfile to use this provider.
  • For contributors: We've added an optional Bundler group so developers can work on RubyLLM without installing Rust. Enable with bundle config set --local with red_candle.

Testing Strategy

We implemented a comprehensive mocking system to keep tests fast:

  • Stubbed mode (default): Uses MockCandleModel to simulate responses without actual inference
  • Real inference mode: Set RED_CANDLE_REAL_INFERENCE=true to run actual model inference (downloads models on first run, ~4.5 GBs)
  • Not installed mode: Tests skip gracefully when Red Candle isn't available

Changes Made

  • Added RubyLLM::Providers::RedCandle with full chat support including streaming
  • Implemented model management with automatic GGUF file downloads from HuggingFace
  • Created comprehensive test mocks in red_candle_test_helper.rb
  • Added conditional loading in ruby_llm.rb and spec_helper.rb to handle optional dependency
  • Updated models_to_test.rb to conditionally include Red Candle models
  • Added documentation in CONTRIBUTING.md for managing the optional dependency
  • Implemented proper Content object handling for structured responses

How to Test

# Test without Red Candle (default for new contributors)
bundle install
bundle exec rspec  # Red Candle tests will be skipped

# Test with Red Candle stubbed (fast)
bundle config set --local with red_candle
bundle install
bundle exec rspec  # Uses mocked responses

# Test with real inference (slow, downloads models)
bundle config set --local with red_candle
bundle install
huggingface-cli login # Make sure to accept mistral terms
RED_CANDLE_REAL_INFERENCE=true bundle exec rspec

Once red-candle is enabled turn it back off with:

bundle config unset with

And turn it BACK on with:

bundle config set --local with red_candle

Try it out

bundle exec irb
require 'ruby_llm'

chat = RubyLLM.chat(
  provider: :red_candle,
  model: 'Qwen/Qwen2.5-1.5B-Instruct-GGUF' # 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF' is another option
)
response = chat.ask("What are the benefits of functional programming?")
puts response.content

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Performance improvement

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Quality check

  • I ran overcommit --install and all hooks pass
  • I tested my changes thoroughly
    • For provider changes: Re-recorded VCR cassettes with bundle exec rake vcr:record[provider_name]
    • All tests pass: bundle exec rspec
  • I updated documentation if needed
  • I didn't modify auto-generated files manually (models.json, aliases.json)

API changes

  • Breaking change
  • New public methods/classes
  • Changed method signatures
  • No API changes

Related issues

Fixes #394

cpetersen and others added 30 commits September 8, 2025 14:08
Copy link
Contributor

@tpaulshippy tpaulshippy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see 4 rubocop offenses. Can you clean those up?

Love the test helper.

# Red Candle doesn't provide token counts, but we can estimate them
content = result[:content]
# Rough estimation: ~4 characters per token
estimated_output_tokens = (content.length / 4.0).round
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just for funsies?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed a few things like this (infinity tokens per dollar), that I don't see in the ollama provider. While adding these lines of code may have value, I'm not really seeing it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're definitely right about the infinity tokens per dollar, that was a little too cute. I removed the whole pricing bit (I don't think it's necessary).

As for the estimated_output_tokens, the specs require this in these two places:

We can get real token counts from red-candle but we need to retokenize which seems wasteful (and I couldn't figure out how to reasonably get access to the underlying Candle::LLM right here), so we decided to estimate. I'm open to other methods of estimating, we could split on a regex or something, this just seemed simple and efficient.

end

def render_payload(messages, tools:, temperature:, model:, stream:, schema:) # rubocop:disable Metrics/ParameterLists
# Red Candle doesn't support tools
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sad. At least it has structured generation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a planned red-candle feature - just not there yet.

require_relative 'support/streaming_error_helpers'
require_relative 'support/provider_capabilities_helper'

# Handle Red Candle provider based on availability and environment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May consider putting this in a separate file to follow the pattern set.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@tpaulshippy
Copy link
Contributor

Looks like this won't work with Ruby 3.1 (which is currently part of the CI for RubyLLM). Probably need to figure this out.
https://github.com/tpaulshippy/ruby_llm_community/actions/runs/17664667094/job/50204162482?pr=15

I tried it out though - love it!

@cpetersen
Copy link
Author

@tpaulshippy Thank you for the review and the feedback! We've made some changes and I think this is ready for another look.

@crmne
Copy link
Owner

crmne commented Sep 14, 2025

Looks like this won't work with Ruby 3.1 (which is currently part of the CI for RubyLLM). Probably need to figure this out.

This is unfortunately a blocker. I'm not gonna drop Ruby 3.1 support soon as I know many users of RubyLLM are still running that. I think this patch may need to wait.

@tpaulshippy
Copy link
Contributor

tpaulshippy commented Sep 14, 2025

This is unfortunately a blocker. I'm not gonna drop Ruby 3.1 support soon as I know many users of RubyLLM are still running that. I think this patch may need to wait.

9ab992d resolved this.

@orangewolf
Copy link

@crmne with the 3.1 blocker removed (turned out to be already working and just needed testing) is there anything else keeping this from moving forward in your opinion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Add support for red-candle
4 participants