Skip to content

Conversation

@MasuRii
Copy link
Contributor

@MasuRii MasuRii commented Dec 21, 2025

πŸ“ Summary

Fix quota exhaustion detection to ensure the proxy accurately respects API quota limits and avoids making doomed requests that would result in StreamedAPIError.

✨ Changes

  • πŸ”§ Centralized Quota Check: Added _is_quota_exhausted(key, model) method to UsageManager for consistent quota exhaustion detection across all code paths
  • πŸ”„ Updated Key Acquisition: Modified get_available_credentials_for_model and both priority/non-priority paths in acquire_key to skip exhausted credentials
  • ⏰ Enhanced Quota Sync: Updated update_quota_baseline to accept optional quota_reset_ts from API responses and auto-set model cooldowns when quota is depleted
  • πŸ”— Quota Group Propagation: Ensured cooldowns and reset timestamps are propagated across all models in a quota group (e.g., Gemini 3 Pro variants)

πŸ“ Files Changed

File Change Type Impact
src/rotator_library/usage_manager.py πŸ”§ Modified Core logic for quota exhaustion detection
src/rotator_library/providers/utilities/antigravity_quota_tracker.py πŸ”§ Modified Pass reset timestamp to quota baseline update

πŸ§ͺ Testing

  • Manual Testing: ~8 hours in live environment with no issues
  • All existing functionality remains intact

πŸ“‹ Result

The proxy will now accurately respect quota limits fetched from the API and avoid making requests that would result in StreamedAPIError. Previously, credentials with 100% consumed quota could still be selected if no explicit cooldown was set yet.


Important

Centralized quota exhaustion detection in UsageManager to prevent using exhausted credentials and enhance quota synchronization.

  • Centralized Quota Check:
    • Added _is_quota_exhausted(key, model) in UsageManager for consistent quota exhaustion detection.
  • Key Acquisition:
    • Updated get_available_credentials_for_model and acquire_key to skip exhausted credentials.
  • Quota Synchronization:
    • Enhanced update_quota_baseline to accept quota_reset_ts and auto-set cooldowns.
  • Quota Group Propagation:
    • Propagated cooldowns and reset timestamps across quota groups.
  • Files Changed:
    • usage_manager.py: Core logic for quota exhaustion detection.
    • antigravity_quota_tracker.py: Pass reset timestamp to quota baseline update.

This description was created by Ellipsis for c361cdb. You can customize this summary. It will automatically update as commits are pushed.

…oomed requests

Add a `_is_quota_exhausted` method to UsageManager that checks if quota
for a credential/model combination is fully consumed. This method is now
called in all key acquisition paths to ensure exhausted keys are skipped.

Changes:
- Add `_is_quota_exhausted(key, model)` method for centralized quota check
- Update `get_available_credentials_for_model` to skip exhausted keys
- Update `acquire_key` priority path to skip exhausted keys
- Update `acquire_key` non-priority path to skip exhausted keys
- Enhance `update_quota_baseline` to accept optional `quota_reset_ts`
- Auto-set model cooldowns when quota is 0% and reset time is known
- Propagate cooldowns across all models in a quota group

This ensures the proxy accurately respects API quota limits and avoids
making requests that would result in StreamedAPIError.

Tested: ~8 hours in live environment with no issues.
@mirrobot-agent
Copy link

Starting my review of the quota exhaustion detection fix. I'll be examining the new _is_quota_exhausted method and how it integrates across the credential acquisition paths. Back shortly with findings.

Copy link

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Assessment

This PR implements a well-structured fix for quota exhaustion detection. The centralized _is_quota_exhausted method is cleanly integrated across all three credential acquisition paths (get_available_credentials_for_model, priority path in acquire_key, and non-priority path in acquire_key).

I initially questioned whether the quota check needed to account for quota groups, but on closer inspection, the design is correct: update_quota_baseline already syncs both request_count and quota_max_requests across all models in a quota group at write time, so the per-model check naturally works for grouped quotas as well.

Architectural Feedback

The approach of adding quota_reset_ts passthrough from the API response and auto-setting cooldowns when quota is depleted is sensible. This creates a dual-layer protection: immediate quota checks prevent selection of exhausted credentials, while cooldowns provide time-based recovery.

Key Suggestions

  1. Documentation: The 0.001 threshold for triggering cooldowns could use a brief comment explaining its purpose.
  2. DRY: The cooldown-setting logic for grouped models is duplicated and could be extracted into a helper.

Both are minor maintainability improvements and non-blocking.

Nitpicks and Minor Points

None beyond the inline comments.

Questions for the Author

  • Is the 0.001 threshold intentionally chosen as a precision tolerance (accounting for floating-point issues), or does it represent a "practically zero" quota state?

This review was generated by an AI assistant.

if quota_reset_ts:
model_data["quota_reset_ts"] = quota_reset_ts
# Also set as model cooldown if it's in the future and quota is low
if quota_reset_ts > now_ts and remaining_fraction <= 0.001:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The threshold 0.001 (0.1% remaining) triggers cooldowns when quota is nearly exhausted. Consider adding a brief inline comment explaining this value (e.g., # 0.1% threshold - effectively zero quota) to document the design intent for future maintainers.

Comment on lines 2181 to 2188
if quota_reset_ts:
other_model_data["quota_reset_ts"] = quota_reset_ts
# Also set as model cooldown if it's in the future and quota is low
if quota_reset_ts > now_ts and remaining_fraction <= 0.001:
model_cooldowns = key_data.setdefault(
"model_cooldowns", {}
)
model_cooldowns[grouped_model] = quota_reset_ts

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cooldown-setting logic for grouped models duplicates lines 2148-2154. Consider extracting a helper like _set_cooldown_if_exhausted(key_data, model, quota_reset_ts, now_ts, remaining_fraction) to centralize this logic and reduce maintenance burden.

Address PR Mirrowel#51 review suggestions from mirrobot-agent:

- Extract duplicate cooldown logic into `_set_quota_reset_and_cooldown()`
  helper method, eliminating code duplication in quota sync flow
- Add documentation explaining the 0.001 (0.1%) threshold meaning:
  quota is "practically zero" when below this level, not a float
  precision guard

Refs: Mirrowel#51
@Mirrowel
Copy link
Owner

Mirrowel commented Dec 21, 2025

How would it know we used up 100%? Only real way of knowing is making a request and getting an error, or polling the API to get the current quota.
Any other way has inaccuracies.

Also, same reason the error/api reset timers are used as definitive - they are most accurate.

@MasuRii
Copy link
Contributor Author

MasuRii commented Dec 21, 2025

How would it know we used up 100%? Only real way of knowing is making a request and getting an error, or polling the API to get the current quota. Any other way has inaccuracies.

Also, same reason the error/api reset timers are used as definitive - they are most accurate.

We're still using the same polling to the api to get the current quota. The only problem is the usage manager is not checking the key and still stuck at seeing the key as available and this leads to the case where it will use the key eventhough it's already exhausted. So no changes to the polling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants