Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
2e84006
13: Failing specs
tpaulshippy Jun 9, 2025
be61e48
13: Get caching specs passing for Bedrock
tpaulshippy Jun 9, 2025
edec138
13: Remove comments in specs
tpaulshippy Jun 9, 2025
971f176
13: Add unused param on other providers
tpaulshippy Jun 9, 2025
557a5ee
13: Rubocop -A
tpaulshippy Jun 9, 2025
9673b13
13: Add cassettes for bedrock cache specs
tpaulshippy Jun 9, 2025
c47d270
13: Resolve Rubocop aside from Metrics/ParameterLists
tpaulshippy Jun 9, 2025
eaf0876
13: Use large enough prompt to hit cache meaningfully
tpaulshippy Jun 9, 2025
160d9ab
13: Ensure cache tokens are being used
tpaulshippy Jun 9, 2025
d1698bf
13: Refactor completion parameters
tpaulshippy Jun 9, 2025
344729f
16: Add guide for prompt caching
tpaulshippy Jun 9, 2025
7b98277
Add real anthropic cassettes ($0.03)
tpaulshippy Jun 12, 2025
fd30f14
Merge branch 'main' into prompt-caching
tpaulshippy Jun 12, 2025
a91d07e
Switch from large_prompt.txt to 10,000 of the letter a
tpaulshippy Jul 19, 2025
f40f37d
Make that 2048 * 4 (2048 tokens for Haiku)
tpaulshippy Jul 19, 2025
109bb51
Rename properties on message class
tpaulshippy Jul 19, 2025
1c6cbf7
Revert "13: Refactor completion parameters"
tpaulshippy Jul 19, 2025
4d78a09
Address rubocop
tpaulshippy Jul 19, 2025
25b3660
Merge remote-tracking branch 'origin/main' into prompt-caching
tpaulshippy Jul 19, 2025
8e80f08
Update docs
tpaulshippy Jul 19, 2025
d42d074
Actually return the payload
tpaulshippy Jul 19, 2025
97b1ace
Add support for cache token counts in gemini and openai
tpaulshippy Jul 19, 2025
269122e
Merge branch 'main' into prompt-caching
tpaulshippy Jul 24, 2025
2c88266
Improve specs to do double calls and check cached tokens
tpaulshippy Jul 29, 2025
8c39dc1
Do the double call in the openai/gemini specs
tpaulshippy Jul 29, 2025
24cdb63
Set cache control on last message only
tpaulshippy Jul 29, 2025
97bde47
Merge branch 'main' into prompt-caching
tpaulshippy Jul 29, 2025
8aff99a
Merge branch 'main' into prompt-caching
tpaulshippy Aug 3, 2025
7c5d792
Fix some merge issues
tpaulshippy Aug 3, 2025
2d49d5f
Get openai prompt cache reporting to work
tpaulshippy Aug 3, 2025
013b527
Fix gemini prompt caching reporting
tpaulshippy Aug 3, 2025
9dbdd12
Add comment about why gemini is special
tpaulshippy Aug 3, 2025
5f6b9b3
Resolve rubocop offenses
tpaulshippy Aug 3, 2025
f591ab1
Merge branch 'main' into prompt-caching
tpaulshippy Aug 7, 2025
dd7abc9
Merge branch 'main' into prompt-caching
tpaulshippy Aug 8, 2025
ace160c
Merge branch 'main' into prompt-caching
tpaulshippy Aug 14, 2025
74846b2
Merge branch 'main' into prompt-caching
tpaulshippy Aug 14, 2025
91032de
Clean up the aaaaaaaaaaaa prompts in VCRs
tpaulshippy Aug 14, 2025
05cc1d9
Reduce line length
tpaulshippy Aug 14, 2025
f861b63
Support caching in rails model
tpaulshippy Aug 15, 2025
f923385
Merge branch 'main' into prompt-caching
tpaulshippy Aug 25, 2025
970deba
Merge branch 'main' into prompt-caching
tpaulshippy Aug 28, 2025
010f889
Merge branch 'main' into prompt-caching
tpaulshippy Aug 29, 2025
5c31698
Merge branch 'main' into prompt-caching
tpaulshippy Aug 30, 2025
00e69ae
Merge branch 'main' into prompt-caching
tpaulshippy Sep 8, 2025
d6f36f3
Add with_provider_options and use that for opting into caching
tpaulshippy Sep 8, 2025
31b8b0e
Remove unused hash and add example to doc
tpaulshippy Sep 8, 2025
7b8b280
Remove extra unnecessary comment
tpaulshippy Sep 8, 2025
9a0ec36
Update appraisal gemfiles
tpaulshippy Sep 8, 2025
d833156
Revert "Update appraisal gemfiles"
tpaulshippy Sep 22, 2025
c90d6cd
Revert "Remove extra unnecessary comment"
tpaulshippy Sep 22, 2025
a16d6dd
Revert "Remove unused hash and add example to doc"
tpaulshippy Sep 22, 2025
2e586e1
Revert "Add with_provider_options and use that for opting into caching"
tpaulshippy Sep 22, 2025
7f30f58
Merge branch 'main' into prompt-caching
tpaulshippy Sep 22, 2025
4fdc805
Update docs to reflect new API
tpaulshippy Sep 22, 2025
f5c3825
Take cache setting as parameter
tpaulshippy Sep 22, 2025
581a568
Update specs and refactor a bit
tpaulshippy Sep 22, 2025
3da7f26
Get specs passing
tpaulshippy Sep 22, 2025
7e6fa0d
Update appraisal gemfiles
tpaulshippy Sep 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions docs/_core_features/chat.md
Original file line number Diff line number Diff line change
Expand Up @@ -520,6 +520,42 @@ puts "Total Conversation Tokens: #{total_conversation_tokens}"

Refer to the [Working with Models Guide]({% link _advanced/models.md %}) for details on accessing model-specific pricing.

## Prompt Caching

### Enabling
For Anthropic models, RubyLLM automatically opts-in to prompt caching which is documented more fully in the [Anthropic API docs](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching).

Disable prompt caching using configuration:

```ruby
RubyLLM.configure do |config|
config.cache_prompts = false # Disable prompt caching with Anthropic models
end
```

Or specify exactly which pieces you want to enable caching for:
```ruby
# Enable caching only for specific types of content
chat = RubyLLM.chat(model: 'claude-3-5-haiku-20241022', cache: :system) # Cache system instructions
chat = RubyLLM.chat(model: 'claude-3-5-haiku-20241022', cache: :user) # Cache user messages
chat = RubyLLM.chat(model: 'claude-3-5-haiku-20241022', cache: :tools) # Cache tool definitions

# Or a combination
chat = RubyLLM.chat(model: 'claude-3-5-haiku-20241022', cache: [:system, :tools]) # Cache system instructions and tool definitions

# Or do the same on the ask method
chat.ask("What do you think?", cache: :system)
chat.ask("What do you think?", cache: :user)
chat.ask("What do you think?", cache: :tools)
chat.ask("What do you think?", cache: [:system, :tools])

```

### Checking cached token counts
For Anthropic, OpenAI, and Gemini, you can see the number of tokens read from cache by looking at the `cached_tokens` property on the output messages.

For Anthropic, you can see the tokens written to cache by looking at the `cache_creation_tokens` property.

## Chat Event Handlers

You can register blocks to be called when certain events occur during the chat lifecycle. This is particularly useful for UI updates, logging, analytics, or building real-time chat interfaces.
Expand Down
11 changes: 8 additions & 3 deletions gemfiles/rails_7.1.gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ GEM
concurrent-ruby (~> 1.1)
webrick (~> 1.7)
websocket-driver (~> 0.7)
ffi (1.17.2-arm64-darwin)
ffi (1.17.2-x86_64-linux-gnu)
fiber-annotation (0.2.0)
fiber-local (1.1.0)
Expand Down Expand Up @@ -180,12 +181,12 @@ GEM
ruby-vips (>= 2.0.17, < 3)
iniparse (1.5.0)
io-console (0.8.1)
io-event (1.14.0)
io-event (1.11.2)
irb (1.15.2)
pp (>= 0.6.0)
rdoc (>= 4.0.0)
reline (>= 0.4.2)
json (2.14.1)
json (2.15.0)
json-schema (6.0.0)
addressable (~> 2.8)
bigdecimal (~> 3.1)
Expand Down Expand Up @@ -224,6 +225,8 @@ GEM
net-smtp (0.5.1)
net-protocol
nio4r (2.7.4)
nokogiri (1.18.10-arm64-darwin)
racc (~> 1.4)
nokogiri (1.18.10-x86_64-linux-gnu)
racc (~> 1.4)
os (1.1.4)
Expand Down Expand Up @@ -318,7 +321,7 @@ GEM
rubocop-ast (>= 1.46.0, < 2.0)
ruby-progressbar (~> 1.7)
unicode-display_width (>= 2.4.0, < 4.0)
rubocop-ast (1.47.0)
rubocop-ast (1.47.1)
parser (>= 3.3.7.2)
prism (~> 1.4)
rubocop-performance (1.26.0)
Expand Down Expand Up @@ -355,6 +358,7 @@ GEM
simplecov (~> 0.19)
simplecov-html (0.13.2)
simplecov_json_formatter (0.1.4)
sqlite3 (2.7.4-arm64-darwin)
sqlite3 (2.7.4-x86_64-linux-gnu)
stringio (3.1.7)
thor (1.4.0)
Expand All @@ -380,6 +384,7 @@ GEM
zeitwerk (2.7.3)

PLATFORMS
arm64-darwin-22
x86_64-linux

DEPENDENCIES
Expand Down
11 changes: 8 additions & 3 deletions gemfiles/rails_7.2.gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ GEM
concurrent-ruby (~> 1.1)
webrick (~> 1.7)
websocket-driver (~> 0.7)
ffi (1.17.2-arm64-darwin)
ffi (1.17.2-x86_64-linux-gnu)
fiber-annotation (0.2.0)
fiber-local (1.1.0)
Expand Down Expand Up @@ -174,12 +175,12 @@ GEM
ruby-vips (>= 2.0.17, < 3)
iniparse (1.5.0)
io-console (0.8.1)
io-event (1.14.0)
io-event (1.11.2)
irb (1.15.2)
pp (>= 0.6.0)
rdoc (>= 4.0.0)
reline (>= 0.4.2)
json (2.14.1)
json (2.15.0)
json-schema (6.0.0)
addressable (~> 2.8)
bigdecimal (~> 3.1)
Expand Down Expand Up @@ -217,6 +218,8 @@ GEM
net-smtp (0.5.1)
net-protocol
nio4r (2.7.4)
nokogiri (1.18.10-arm64-darwin)
racc (~> 1.4)
nokogiri (1.18.10-x86_64-linux-gnu)
racc (~> 1.4)
os (1.1.4)
Expand Down Expand Up @@ -311,7 +314,7 @@ GEM
rubocop-ast (>= 1.46.0, < 2.0)
ruby-progressbar (~> 1.7)
unicode-display_width (>= 2.4.0, < 4.0)
rubocop-ast (1.47.0)
rubocop-ast (1.47.1)
parser (>= 3.3.7.2)
prism (~> 1.4)
rubocop-performance (1.26.0)
Expand Down Expand Up @@ -348,6 +351,7 @@ GEM
simplecov (~> 0.19)
simplecov-html (0.13.2)
simplecov_json_formatter (0.1.4)
sqlite3 (2.7.4-arm64-darwin)
sqlite3 (2.7.4-x86_64-linux-gnu)
stringio (3.1.7)
thor (1.4.0)
Expand All @@ -374,6 +378,7 @@ GEM
zeitwerk (2.7.3)

PLATFORMS
arm64-darwin-22
x86_64-linux

DEPENDENCIES
Expand Down
11 changes: 8 additions & 3 deletions gemfiles/rails_8.0.gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ GEM
concurrent-ruby (~> 1.1)
webrick (~> 1.7)
websocket-driver (~> 0.7)
ffi (1.17.2-arm64-darwin)
ffi (1.17.2-x86_64-linux-gnu)
fiber-annotation (0.2.0)
fiber-local (1.1.0)
Expand Down Expand Up @@ -174,12 +175,12 @@ GEM
ruby-vips (>= 2.0.17, < 3)
iniparse (1.5.0)
io-console (0.8.1)
io-event (1.14.0)
io-event (1.11.2)
irb (1.15.2)
pp (>= 0.6.0)
rdoc (>= 4.0.0)
reline (>= 0.4.2)
json (2.14.1)
json (2.15.0)
json-schema (6.0.0)
addressable (~> 2.8)
bigdecimal (~> 3.1)
Expand Down Expand Up @@ -217,6 +218,8 @@ GEM
net-smtp (0.5.1)
net-protocol
nio4r (2.7.4)
nokogiri (1.18.10-arm64-darwin)
racc (~> 1.4)
nokogiri (1.18.10-x86_64-linux-gnu)
racc (~> 1.4)
os (1.1.4)
Expand Down Expand Up @@ -311,7 +314,7 @@ GEM
rubocop-ast (>= 1.46.0, < 2.0)
ruby-progressbar (~> 1.7)
unicode-display_width (>= 2.4.0, < 4.0)
rubocop-ast (1.47.0)
rubocop-ast (1.47.1)
parser (>= 3.3.7.2)
prism (~> 1.4)
rubocop-performance (1.26.0)
Expand Down Expand Up @@ -348,6 +351,7 @@ GEM
simplecov (~> 0.19)
simplecov-html (0.13.2)
simplecov_json_formatter (0.1.4)
sqlite3 (2.7.4-arm64-darwin)
sqlite3 (2.7.4-x86_64-linux-gnu)
stringio (3.1.7)
thor (1.4.0)
Expand All @@ -374,6 +378,7 @@ GEM
zeitwerk (2.7.3)

PLATFORMS
arm64-darwin-22
x86_64-linux

DEPENDENCIES
Expand Down
3 changes: 2 additions & 1 deletion lib/ruby_llm/active_record/chat_methods.rb
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,8 @@ def create_user_message(content, with: nil)
message_record
end

def ask(message, with: nil, &)
def ask(message, with: nil, cache: nil, &)
to_llm.instance_variable_set(:@cache_prompts, cache)
create_user_message(message, with:)
complete(&)
end
Expand Down
7 changes: 5 additions & 2 deletions lib/ruby_llm/chat.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ class Chat

attr_reader :model, :messages, :tools, :params, :headers, :schema

def initialize(model: nil, provider: nil, assume_model_exists: false, context: nil)
def initialize(model: nil, provider: nil, assume_model_exists: false, context: nil, cache: nil)
if assume_model_exists && !provider
raise ArgumentError, 'Provider must be specified if assume_model_exists is true'
end
Expand All @@ -19,6 +19,7 @@ def initialize(model: nil, provider: nil, assume_model_exists: false, context: n
@temperature = nil
@messages = []
@tools = {}
@cache_prompts = cache.nil? ? @config.cache_prompts : cache
@params = {}
@headers = {}
@schema = nil
Expand All @@ -30,7 +31,8 @@ def initialize(model: nil, provider: nil, assume_model_exists: false, context: n
}
end

def ask(message = nil, with: nil, &)
def ask(message = nil, with: nil, cache: nil, &)
@cache_prompts = cache if cache
add_message role: :user, content: Content.new(message, with)
complete(&)
end
Expand Down Expand Up @@ -127,6 +129,7 @@ def complete(&) # rubocop:disable Metrics/PerceivedComplexity
tools: @tools,
temperature: @temperature,
model: @model,
cache_prompts: @cache_prompts.dup,
params: @params,
headers: @headers,
schema: @schema,
Expand Down
4 changes: 3 additions & 1 deletion lib/ruby_llm/configuration.rb
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@ class Configuration
:logger,
:log_file,
:log_level,
:log_stream_debug
:log_stream_debug,
:cache_prompts

def initialize
@request_timeout = 120
Expand All @@ -64,6 +65,7 @@ def initialize
@log_file = $stdout
@log_level = ENV['RUBYLLM_DEBUG'] ? Logger::DEBUG : Logger::INFO
@log_stream_debug = ENV['RUBYLLM_STREAM_DEBUG'] == 'true'
@cache_prompts = true
end

def instance_variables
Expand Down
9 changes: 7 additions & 2 deletions lib/ruby_llm/message.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ module RubyLLM
class Message
ROLES = %i[system user assistant tool].freeze

attr_reader :role, :tool_calls, :tool_call_id, :input_tokens, :output_tokens, :model_id, :raw
attr_reader :role, :tool_calls, :tool_call_id, :input_tokens, :output_tokens, :model_id, :raw,
:cached_tokens, :cache_creation_tokens
attr_writer :content

def initialize(options = {})
Expand All @@ -16,6 +17,8 @@ def initialize(options = {})
@output_tokens = options[:output_tokens]
@model_id = options[:model_id]
@tool_call_id = options[:tool_call_id]
@cached_tokens = options[:cached_tokens]
@cache_creation_tokens = options[:cache_creation_tokens]
@raw = options[:raw]

ensure_valid_role
Expand Down Expand Up @@ -49,7 +52,9 @@ def to_h
tool_call_id: tool_call_id,
input_tokens: input_tokens,
output_tokens: output_tokens,
model_id: model_id
model_id: model_id,
cache_creation_tokens: cache_creation_tokens,
cached_tokens: cached_tokens
}.compact
end

Expand Down
4 changes: 3 additions & 1 deletion lib/ruby_llm/provider.rb
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ def configuration_requirements
self.class.configuration_requirements
end

def complete(messages, tools:, temperature:, model:, params: {}, headers: {}, schema: nil, &) # rubocop:disable Metrics/ParameterLists
def complete(messages, tools:, temperature:, model:, params: {}, headers: {}, schema: nil, # rubocop:disable Metrics/ParameterLists
cache_prompts: nil, &)
normalized_temperature = maybe_normalize_temperature(temperature, model)

payload = Utils.deep_merge(
Expand All @@ -46,6 +47,7 @@ def complete(messages, tools:, temperature:, model:, params: {}, headers: {}, sc
tools: tools,
temperature: normalized_temperature,
model: model,
cache_prompts: cache_prompts,
stream: block_given?,
schema: schema
),
Expand Down
1 change: 1 addition & 0 deletions lib/ruby_llm/providers/anthropic.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ class Anthropic < Provider
include Anthropic::Models
include Anthropic::Streaming
include Anthropic::Tools
include Anthropic::Cache

def api_base
'https://api.anthropic.com'
Expand Down
19 changes: 19 additions & 0 deletions lib/ruby_llm/providers/anthropic/cache.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# frozen_string_literal: true

module RubyLLM
module Providers
class Anthropic
# Handles caching of prompts for Anthropic
module Cache
def should_cache?(type)
return false unless cache_prompts
return true if cache_prompts == true
return true if cache_prompts.is_a?(Array) && cache_prompts.include?(type)
return true if cache_prompts.is_a?(Symbol) && cache_prompts == type

false
end
end
end
end
end
Loading