GL-Data 1071: Stale data check dedup logic update#12876
GL-Data 1071: Stale data check dedup logic update#12876
Conversation
03271ae to
43289c9
Compare
| # Data warehouse duplicate log count check | ||
| cloudwatch_duplicate_log_counter_job: { | ||
| class: 'DataWarehouse::CloudwatchDuplicateLogCounterJob', | ||
| cron: '5,25,45 * * * *', # run 3x per hour, at 5, 25, and 45 minutes past the hour |
There was a problem hiding this comment.
Only reason this is Running 3x/hour to prevent CloudWatch log bloat and timeout issues?
There was a problem hiding this comment.
I'm giving the job a chance to process specific hour set more than once..just in case the first and/or second runs fail for some reason that we cannot control such as network, aws outage, etc.
There was a problem hiding this comment.
I think GoodJobs has a built-in retry mechanism but I'm not sure if it would catch those failures.
If the retry mechanism does catch them then this would be unnecessary.
@MrNagoo can confirm
There was a problem hiding this comment.
I can confirm, but I still needed to re-read the docs 😆 for syntax.
https://guides.rubyonrails.org/active_job_basics.html#retrying-or-discarding-failed-jobs
Since we're already using try/rescue we know we're capturing errors from the client so something like...
retry_on StandardError, wait: 10.minutes, attempts: 3 could replace the forced interval and free up significant resources.
|
Reminder to tag at least one of the commits (and the final squashed one) with the full gitlab issue URL to preserve linking |
…loudwatch stale data check (issue 1071 - https://gitlab.login.gov/lg-teams/Team-Data/data-warehouse-ag/-/issues/1071)
ba60462 to
d5518c8
Compare
…ndling in duplicate_row_count_file_path method
MrNagoo
left a comment
There was a problem hiding this comment.
There are a few things we may need to adjust but I'm not seeing anything that won't work.
| # Data warehouse duplicate log count check | ||
| cloudwatch_duplicate_log_counter_job: { | ||
| class: 'DataWarehouse::CloudwatchDuplicateLogCounterJob', | ||
| cron: '5,25,45 * * * *', # run 3x per hour, at 5, 25, and 45 minutes past the hour |
There was a problem hiding this comment.
I can confirm, but I still needed to re-read the docs 😆 for syntax.
https://guides.rubyonrails.org/active_job_basics.html#retrying-or-discarding-failed-jobs
Since we're already using try/rescue we know we're capturing errors from the client so something like...
retry_on StandardError, wait: 10.minutes, attempts: 3 could replace the forced interval and free up significant resources.
| module DataWarehouse | ||
| module Shared | ||
| module StaleDataUtils | ||
| NUM_THREADS = 6 |
There was a problem hiding this comment.
Why are we doing 6 threads? I ask because we have a hard ceiling 5 request per second on CW and we have battled with RateLimitExceeded Errors.
There was a problem hiding this comment.
6 threads was supposed to do 6 X 10 mins = 60 mins/ 1 hour. Testing in lower environments did not create an error with 6 threads.
🎫 Ticket
Issue-1071
🛠 Summary of changes
📜 Testing Plan
Provide a checklist of steps to confirm the changes.
Tested in Sandbox
Run the new hourly job:
CloudwatchDuplicateLogCounterJobproduction.logandevents.log).Run the existing modified daily job:
TableSummaryStatsExportJobVerifying the results
Returns:


Verifying the data in S3