Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
fffba36
feat: add basic llm token rate limit
dancing-ui Jul 24, 2025
e2c8056
feat: optimize the basic token update method with cache
dancing-ui Jul 24, 2025
13641ed
feat: filter rules and optimize redis's key
dancing-ui Jul 24, 2025
cc2b52a
refactor: wrapper global object
dancing-ui Jul 25, 2025
134ebdd
fix: fix dependency error
dancing-ui Jul 25, 2025
4f26090
feat: add predictive error temporal amortized throttling
dancing-ui Aug 1, 2025
f4c2f76
fix: fix correct error
dancing-ui Aug 1, 2025
ab2c1ae
fix: fix duplicated token error
dancing-ui Aug 2, 2025
0894b8c
examples: add new rate limiting application examples
dancing-ui Aug 3, 2025
e2997dc
fix: add maximum binary search iterations
dancing-ui Aug 8, 2025
1b69115
refactor: simplify the usage steps of rate limiting
dancing-ui Aug 9, 2025
ed9b346
test: fix test case errors
dancing-ui Aug 9, 2025
8092cb2
refactor: update token encoding ways
dancing-ui Aug 23, 2025
040e59f
fix: fix the issue of excessive goroutines
dancing-ui Aug 23, 2025
4bf1a19
refactor: optimize the calculation method of token prediction
dancing-ui Aug 29, 2025
70d0d66
refactor: optimize the code structure and introduce log auditing
dancing-ui Aug 30, 2025
177056f
deps: update Redis to v8
dancing-ui Sep 1, 2025
807a501
feat: support Redis configuration with multiple addresses
dancing-ui Sep 1, 2025
4daede3
fix: fix abnormal issues in token correction
dancing-ui Sep 2, 2025
0123eaa
fix: fix issues with data dependency in token prediction
dancing-ui Sep 2, 2025
6ab0495
refactor: delete the unused functions
dancing-ui Sep 2, 2025
ecad15b
fix: fix token prediction accuracy and response header issues
dancing-ui Sep 2, 2025
11896b8
fix: fix missing response header issue in fixed window strategy
dancing-ui Sep 3, 2025
2cfb2bb
feat: adapt to eino framework; fix initialization issues
dancing-ui Sep 5, 2025
2372981
fix: fix go.yml test case path error
dancing-ui Sep 5, 2025
1cb6955
feat: optimize response header information
dancing-ui Sep 5, 2025
73e923c
fix: fix go.yml error
dancing-ui Sep 5, 2025
5980541
remove: remove dependency test cases
dancing-ui Sep 5, 2025
c433503
feat: remove binding hit mechanism between input token and total token
dancing-ui Sep 6, 2025
48c9bb4
test: add identifier_checker and rule_collector unit test cases
dancing-ui Sep 6, 2025
d9f372e
fix: fix lint error
dancing-ui Sep 6, 2025
36fc56f
test: add context, request_info, util unit test cases
dancing-ui Sep 7, 2025
2b721d6
fix: fix lint error
dancing-ui Sep 7, 2025
6203b55
test: add rule_filter unit test cases
dancing-ui Sep 8, 2025
f3d015e
test: add resource benchmark test example
dancing-ui Sep 11, 2025
82b629d
feat: support multi-architecture Redis service
dancing-ui Sep 12, 2025
acf9702
feat: add metric logger
dancing-ui Sep 12, 2025
8e89592
test: add all unit test cases
dancing-ui Sep 14, 2025
b11e357
docs: add llm token rate limit integration steps
dancing-ui Sep 14, 2025
311ca30
docs: add llm token rate limit adapter usage
dancing-ui Sep 14, 2025
930604d
docs: update llm token rate limit usage
dancing-ui Sep 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,9 @@ Temporary Items

# coverage file
coverage.html
coverage.txt

pkg/adapters/eino/*_test.go
pkg/adapters/langchaingo/*_test.go

.env
4 changes: 3 additions & 1 deletion api/api.go
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,9 @@ func Entry(resource string, opts ...EntryOption) (*base.SentinelEntry, *base.Blo
}()

for _, opt := range opts {
opt(options)
if opt != nil {
opt(options)
}
}
if options.slotChain == nil {
options.slotChain = GlobalSlotChain()
Expand Down
16 changes: 16 additions & 0 deletions api/init.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"net/http"

"github.com/alibaba/sentinel-golang/core/config"
llmtokenratelimit "github.com/alibaba/sentinel-golang/core/llm_token_ratelimit"
"github.com/alibaba/sentinel-golang/core/log/metric"
"github.com/alibaba/sentinel-golang/core/system_metric"
metric_exporter "github.com/alibaba/sentinel-golang/exporter/metric"
Expand Down Expand Up @@ -134,6 +135,21 @@ func initCoreComponents() error {
return nil
}

if err := llmtokenratelimit.InitMetricLogger(&llmtokenratelimit.MetricLoggerConfig{
AppName: config.AppName(),
LogDir: config.LogBaseDir(),
MaxFileSize: config.MetricLogSingleFileMaxSize(),
MaxFileAmount: config.MetricLogMaxFileAmount(),
FlushInterval: config.MetricLogFlushIntervalSec(),
UsePid: config.LogUsePid(),
}); err != nil {
return err
}

if err := llmtokenratelimit.Init(config.LLMTokenRateLimit()); err != nil {
return err
}

return nil
}

Expand Down
3 changes: 3 additions & 0 deletions api/slot_chain.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"github.com/alibaba/sentinel-golang/core/flow"
"github.com/alibaba/sentinel-golang/core/hotspot"
"github.com/alibaba/sentinel-golang/core/isolation"
llmtokenratelimit "github.com/alibaba/sentinel-golang/core/llm_token_ratelimit"
"github.com/alibaba/sentinel-golang/core/log"
"github.com/alibaba/sentinel-golang/core/stat"
"github.com/alibaba/sentinel-golang/core/system"
Expand All @@ -40,11 +41,13 @@ func BuildDefaultSlotChain() *base.SlotChain {
sc.AddRuleCheckSlot(isolation.DefaultSlot)
sc.AddRuleCheckSlot(hotspot.DefaultSlot)
sc.AddRuleCheckSlot(circuitbreaker.DefaultSlot)
sc.AddRuleCheckSlot(llmtokenratelimit.DefaultSlot)

sc.AddStatSlot(stat.DefaultSlot)
sc.AddStatSlot(log.DefaultSlot)
sc.AddStatSlot(flow.DefaultStandaloneStatSlot)
sc.AddStatSlot(hotspot.DefaultConcurrencyStatSlot)
sc.AddStatSlot(circuitbreaker.DefaultMetricStatSlot)
sc.AddStatSlot(llmtokenratelimit.DefaultLLMTokenRatelimitStatSlot)
return sc
}
14 changes: 8 additions & 6 deletions core/base/result.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,16 +28,18 @@ const (
BlockTypeCircuitBreaking
BlockTypeSystemFlow
BlockTypeHotSpotParamFlow
BlockTypeLLMTokenRateLimit
)

var (
blockTypeMap = map[BlockType]string{
BlockTypeUnknown: "BlockTypeUnknown",
BlockTypeFlow: "BlockTypeFlowControl",
BlockTypeIsolation: "BlockTypeIsolation",
BlockTypeCircuitBreaking: "BlockTypeCircuitBreaking",
BlockTypeSystemFlow: "BlockTypeSystem",
BlockTypeHotSpotParamFlow: "BlockTypeHotSpotParamFlow",
BlockTypeUnknown: "BlockTypeUnknown",
BlockTypeFlow: "BlockTypeFlowControl",
BlockTypeIsolation: "BlockTypeIsolation",
BlockTypeCircuitBreaking: "BlockTypeCircuitBreaking",
BlockTypeSystemFlow: "BlockTypeSystem",
BlockTypeHotSpotParamFlow: "BlockTypeHotSpotParamFlow",
BlockTypeLLMTokenRateLimit: "BlockTypeLLMTokenRateLimit",
}
blockTypeExisted = fmt.Errorf("block type existed")
)
Expand Down
15 changes: 9 additions & 6 deletions core/base/result_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,19 +70,22 @@ func (t BlockType) stringSwitch() string {
return "System"
case BlockTypeHotSpotParamFlow:
return "HotSpotParamFlow"
case BlockTypeLLMTokenRateLimit:
return "LLMTokenRateLimit"
default:
return fmt.Sprintf("%d", t)
}
}

var (
blockTypeNames = []string{
BlockTypeUnknown: "Unknown",
BlockTypeFlow: "FlowControl",
BlockTypeIsolation: "BlockTypeIsolation",
BlockTypeCircuitBreaking: "CircuitBreaking",
BlockTypeSystemFlow: "System",
BlockTypeHotSpotParamFlow: "HotSpotParamFlow",
BlockTypeUnknown: "Unknown",
BlockTypeFlow: "FlowControl",
BlockTypeIsolation: "BlockTypeIsolation",
BlockTypeCircuitBreaking: "CircuitBreaking",
BlockTypeSystemFlow: "System",
BlockTypeHotSpotParamFlow: "HotSpotParamFlow",
BlockTypeLLMTokenRateLimit: "LLMTokenRateLimit",
}
blockTypeErr = fmt.Errorf("block type err")
)
Expand Down
5 changes: 5 additions & 0 deletions core/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import (
"strconv"
"sync"

llmtokenratelimit "github.com/alibaba/sentinel-golang/core/llm_token_ratelimit"
"github.com/alibaba/sentinel-golang/logging"
"github.com/alibaba/sentinel-golang/util"
"github.com/pkg/errors"
Expand Down Expand Up @@ -262,3 +263,7 @@ func MetricStatisticIntervalMs() uint32 {
func MetricStatisticSampleCount() uint32 {
return globalCfg.MetricStatisticSampleCount()
}

func LLMTokenRateLimit() *llmtokenratelimit.Config {
return globalCfg.LLMTokenRateLimit()
}
7 changes: 7 additions & 0 deletions core/config/entity.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import (
"fmt"

"github.com/alibaba/sentinel-golang/core/base"
llmtokenratelimit "github.com/alibaba/sentinel-golang/core/llm_token_ratelimit"
"github.com/alibaba/sentinel-golang/logging"
"github.com/pkg/errors"
)
Expand Down Expand Up @@ -46,6 +47,8 @@ type SentinelConfig struct {
Stat StatConfig
// UseCacheTime indicates whether to cache time(ms)
UseCacheTime bool `yaml:"useCacheTime"`
// LLMTokenRateLimit represents configuration items related to llm token rate limit.
LLMTokenRateLimit *llmtokenratelimit.Config `yaml:"llmTokenRatelimit"`
}

// ExporterConfig represents configuration items related to exporter, like metric exporter.
Expand Down Expand Up @@ -259,3 +262,7 @@ func (entity *Entity) MetricStatisticIntervalMs() uint32 {
func (entity *Entity) MetricStatisticSampleCount() uint32 {
return entity.Sentinel.Stat.MetricStatisticSampleCount
}

func (entity *Entity) LLMTokenRateLimit() *llmtokenratelimit.Config {
return entity.Sentinel.LLMTokenRateLimit
}
199 changes: 199 additions & 0 deletions core/llm_token_ratelimit/README_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
#### Integration Steps

From the user's perspective, to integrate the Token rate limiting function provided by Sentinel, the following steps are required:

1. Prepare a Redis instance

2. Configure and initialize Sentinel's runtime environment.
1. Only initialization from a YAML file is supported

3. Embed points (define resources) with fixed resource type: `ResourceType=ResTypeCommon` and `TrafficType=Inbound`

4. Load rules according to the configuration file below. The rule configuration items include: resource name, rate limiting strategy, specific rule items, Redis configuration, error code, and error message. The following is an example of rule configuration, with specific field meanings detailed in the "Configuration File Description" below.

```go
_, err = llmtokenratelimit.LoadRules([]*llmtokenratelimit.Rule{
{

Resource: ".*",
Strategy: llmtokenratelimit.FixedWindow,
SpecificItems: []llmtokenratelimit.SpecificItem{
{
Identifier: llmtokenratelimit.Identifier{
Type: llmtokenratelimit.Header,
Value: ".*",
},
KeyItems: []llmtokenratelimit.KeyItem{
{
Key: ".*",
Token: llmtokenratelimit.Token{
Number: 1000,
CountStrategy: llmtokenratelimit.TotalTokens,
},
Time: llmtokenratelimit.Time{
Unit: llmtokenratelimit.Second,
Value: 60,
},
},
},
},
},
},
})
```

5. Optional: Create an LLM instance and embed it into the provided adapter


#### Configuration File Description

Overall rule configuration

| Configuration Item | Type | Required | Default Value | Description |
| :----------------- | :------------------- | :------- | :------------------ | :----------------------------------------------------------- |
| enabled | bool | No | false | Whether to enable the LLM Token rate limiting function. Values: false (disable), true (enable) |
| rules | array of rule object | No | nil | Rate limiting rules |
| redis | object | No | | Redis instance connection information |
| errorCode | int | No | 429 | Error code. Will be changed to 429 if set to 0 |
| errorMessage | string | No | "Too Many Requests" | Error message |

rule configuration

| Configuration Item | Type | Required | Default Value | Description |
| :----------------- | :--------------------------- | :------- | :-------------- | :----------------------------------------------------------- |
| resource | string | No | ".*" | Rule resource name, supporting regular expressions. Values: ".*" (global match), user-defined regular expressions |
| strategy | string | No | "fixed-window" | Rate limiting strategy. Values: fixed-window, peta (predictive error temporal allocation) |
| encoding | object | No | | Token encoding method, **exclusively for peta rate limiting strategy** |
| specificItems | array of specificItem object | Yes | | Specific rule items |

encoding configuration

| Configuration Item | Type | Required | Default Value | Description |
| :----------------- | :----- | :------- | :------------ | :-------------------- |
| provider | string | No | "openai" | Model provider |
| model | string | No | "gpt-4" | Model name |

specificItem configuration

| Configuration Item | Type | Required | Default Value | Description |
| :----------------- | :---------------------- | :------- | :------------ | :------------------------------------------- |
| identifier | object | No | | Request identifier |
| keyItems | array of keyItem object | Yes | | Key-value information for rule matching |

identifier configuration

| Configuration Item | Type | Required | Default Value | Description |
| :----------------- | :----- | :------- | :------------ | :----------------------------------------------------------- |
| type | string | No | "all" | Request identifier type. Values: all (global rate limiting), header |
| value | string | No | ".*" | Request identifier value, supporting regular expressions. Values: ".*" (global match), user-defined regular expressions |

keyItem configuration

| Configuration Item | Type | Required | Default Value | Description |
| :----------------- | :----- | :------- | :------------ | :----------------------------------------------------------- |
| key | string | No | ".*" | Specific rule item value, supporting regular expressions. Values: ".*" (global match), user-defined regular expressions |
| token | object | Yes | | Token quantity and calculation strategy configuration |
| time | object | Yes | | Time unit and cycle configuration |

token configuration

| Configuration Item | Type | Required | Default Value | Description |
| :----------------- | :----- | :------- | :-------------- | :----------------------------------------------------------- |
| number | int | Yes | | Token quantity, greater than or equal to 0 |
| countStrategy | string | No | "total-tokens" | Token calculation strategy. Values: input-tokens, output-tokens, total-tokens |

time configuration

| Configuration Item | Type | Required | Default Value | Description |
| :----------------- | :----- | :------- | :------------ | :----------------------------------------------------------- |
| unit | string | Yes | | Time unit. Values: second, minute, hour, day |
| value | int | Yes | | Time value, greater than or equal to 0 |

redis configuration

| Configuration Item | Type | Required | Default Value | Description |
| :----------------- | :------------------- | :------- | :----------------------------------- | :----------------------------------------------------------- |
| addrs | array of addr object | No | [{name: "127.0.0.1", port: 6379}] | Redis node services, **see notes below** |
| username | string | No | Empty string | Redis username |
| password | string | No | Empty string | Redis password |
| dialTimeout | int | No | 0 | Maximum waiting time for establishing a Redis connection, unit: milliseconds |
| readTimeout | int | No | 0 | Maximum waiting time for Redis server response, unit: milliseconds |
| writeTimeout | int | No | 0 | Maximum time for sending command data to the network connection, unit: milliseconds |
| poolTimeout | int | No | 0 | Maximum waiting time for getting an idle connection from the connection pool, unit: milliseconds |
| poolSize | int | No | 10 | Number of connections in the connection pool |
| minIdleConns | int | No | 5 | Minimum number of idle connections in the connection pool |
| maxRetries | int | No | 3 | Maximum number of retries for failed operations |

addr configuration

| Configuration Item | Type | Required | Default Value | Description |
| :----------------- | :----- | :------- | :------------- | :----------------------------------------------------------- |
| name | string | No | "127.0.0.1" | Redis node service name, a complete [FQDN](https://en.wikipedia.org/wiki/Fully_qualified_domain_name) with service type, e.g., my-redis.dns, redis.my-ns.svc.cluster.local |
| port | int | No | 6379 | Redis node service port |


#### Overall Configuration File Example

```YAML
version: "v1"
sentinel:
app:
name: sentinel-go-demo
log:
metric:
maxFileCount: 7
llmTokenRatelimit:
enabled: true,
rules:
- resource: ".*"
strategy: "fixed-window"
specificItems:
- identifier:
type: "header"
value: ".*"
keyItems:
- key: ".*"
token:
number: 1000
countStrategy: "total-tokens"
time:
unit: "second"
value: 60

errorCode: 429
errorMessage: "Too Many Requests"

redis:
addrs:
- name: "127.0.0.1"
port: 6379
username: "redis"
password: "redis"
dialTimeout: 5000
readTimeout: 5000
writeTimeout: 5000
poolTimeout: 5000
poolSize: 10
minIdleConns: 5
maxRetries: 3
```

#### LLM Framework Adaptation
Currently, it supports non-intrusive integration of Langchaingo and Eino frameworks into the Token rate limiting capability provided by Sentinel, which is mainly applicable to text generation scenarios. For usage details, refer to:
- pkg/adapters/langchaingo/wrapper.go
- pkg/adapters/eino/wrapper.go

#### Notes

- Since only input tokens can be predicted at present, **it is recommended to use PETA for rate limiting specifically targeting input tokens**
- PETA uses tiktoken to estimate input token consumption but requires downloading or preconfiguring the `Byte Pair Encoding (BPE)` dictionary
- Online mode
- tiktoken needs to download encoding files online for the first use
- Offline mode
- Prepare pre-cached tiktoken encoding files (**not directly downloaded files, but files processed by tiktoken**) in advance, and specify the file directory via the TIKTOKEN_CACHE_DIR environment variable
- Rule deduplication description
- In keyItems, if only the number differs, the latest number will be retained after deduplication
- In specificItems, only deduplicated keyItems will be retained
- In resource, only the latest resource will be retained
- Redis configuration description
- **If the connected Redis is in cluster mode, the number of addresses in addrs must be at least 2; otherwise, it will default to Redis standalone mode, causing rate limiting to fail**
Loading