-
Notifications
You must be signed in to change notification settings - Fork 145
Update in_cloudwatch_logs.rb for better throttling handling #264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
If we look at this ruby code , there are 2 potential issues: 1. The current implementation has a flaw in it retry mechanism. When AWS throttles the get_log_events call, the code attempts to retry, but there's no limit to how many times it will retry 2. The retry mechanism is inefficient because it's retrying the entire get_events method when throttling occurs if there are more than x events
@cosmo0920 for your approval, please do inform if you see issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hd40910 Thanks for this improvement! Sorry for the delay.
This direction looks good to me.
I have commented on some points.
Please check them.
In addition, please consider the following points?
- Remove unnecessary spaces.
- Some new blank lines contain needless whitespaces.
- Update README.
- Add description for the new parameter.
- Change description of
throttling_retry_seconds
about the exponential interval.
- Fix failing tests on CI.
- Looks like they fail because the log message has changed.
- Since the wait time will be random, it would be good to loosen the assert condition. It would be unnecessary to assert an exact match for the entire message.)
@@ -43,6 +43,7 @@ class CloudwatchLogsInput < Input | |||
config_param :end_time, :string, default: nil | |||
config_param :time_range_format, :string, default: "%Y-%m-%d %H:%M:%S" | |||
config_param :throttling_retry_seconds, :time, default: nil | |||
config_param :max_retry_count, :integer, default: 999 #TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config_param :max_retry_count, :integer, default: 999 #TODO
Please consider the appropriate default value and remove the TODO comment.
I'm not familiar with CloudWatch.
Can there be a case where a user wants to retry an unlimited number of times, like the current version?
If so, retry should be unlimited when this value is nil
.
And the default value might be better to be nil
for compatibility.
request[:next_token] = log_next_token if !log_next_token.nil? && !log_next_token.empty? | ||
request[:start_from_head] = true if read_from_head?(log_next_token) | ||
|
||
# Only apply throttling retry to the API call, not the whole method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Only apply throttling retry to the API call, not the whole method |
Information that can be found in the code is not required for comments.
(Maybe those comments are for PR. Thanks. I understand the code. Let's remove them now.)
request[:next_token] = next_token if next_token | ||
request[:log_stream_name_prefix] = log_stream_name_prefix if log_stream_name_prefix | ||
|
||
# Only apply throttling retry to the API call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Only apply throttling retry to the API call |
end | ||
|
||
def throttling_handler(method_name) | ||
# New method to handle API calls with throttling retry with exponential backoff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# New method to handle API calls with throttling retry with exponential backoff |
if @throttling_retry_seconds && retry_count < @max_retry_count | ||
# Calculate backoff with jitter: base_time * (2^retry_count) + random_jitter | ||
wait_time = @throttling_retry_seconds * (2 ** retry_count) * (0.9 + 0.2 * rand) | ||
log.warn "Haia - ThrottlingException on #{method_name}. Retry #{retry_count+1}/#{@max_retry_count}. Waiting #{wait_time.round(2)} seconds." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you tell me what Haia
means?
log.warn "Haia - ThrottlingException on #{method_name}. Retry #{retry_count+1}/#{@max_retry_count}. Waiting #{wait_time.round(2)} seconds." | ||
sleep wait_time | ||
|
||
# Only retry the API call itself, not recursively |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Only retry the API call itself, not recursively |
@@ -341,7 +358,12 @@ def describe_log_groups(log_group_name_prefix, log_groups = nil, next_token = ni | |||
log_group_name_prefix: log_group_name_prefix | |||
} | |||
request[:next_token] = next_token if next_token | |||
response = @logs.describe_log_groups(request) | |||
|
|||
# Apply throttling handling to describe_log_groups too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Apply throttling handling to describe_log_groups too |
If we look at this ruby code , there are 2 potential issues: