Skip to content

Conversation

@imkhalil
Copy link

Summary

Adds support for Gemini's explicit context caching API, enabling significant token savings (up to 99.8%) for repeated requests with static content like system instructions, tool definitions, and reference documents.

Changes

  • Added cached_content parameter to Chat.__init__() to accept existing cache references
  • Added 4 cache management methods:
    • create_cache() - Create a new cache with system instructions, tools, and contents
    • delete_cache() - Delete cache and clear cache state
    • update_cache() - Update cache TTL
    • get_cache() - Retrieve cache metadata
  • Modified _call() to pass cached_content to litellm when cache exists
  • Modified _prep_msg() to exclude system instruction when it's already in cache
  • Requires Gemini models with -001 suffix (e.g., gemini-2.0-flash-001)

Testing

All methods tested with Gemini API paid tier. Verified:

  • Cache creation and usage
  • Token savings (99.8% reduction confirmed)
  • All CRUD operations work correctly
  • Regular chat functionality remains unaffected

Related

Addresses request from community member regarding Gemini caching support.

@jph00
Copy link
Contributor

jph00 commented Nov 18, 2025

Thanks for the PR! This is an nbdev project, so the source is the notebooks, not the .py file. You should add your changes to the notebooks. Also add documentation and examples and tests there. Try to follow the coding style in the rest of the notebook, which is based on: https://docs.fast.ai/dev/style.html .

I see you're heavily leaning on AI here, which is fine, but do it in a way where you understand and check each line of code. Open the repo in Solveit ideally and tell the AI my feedback, and have it help you with the process, including using nbdev_export to get code exported.

@jph00 jph00 marked this pull request as draft November 24, 2025 06:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants