Feat-Implement httpcache middleware for GitHub API#203
Conversation
| github.com/golang/protobuf v1.4.3 // indirect | ||
| github.com/google/go-github/v32 v32.1.0 | ||
| github.com/kr/text v0.2.0 // indirect | ||
| github.com/naveensrinivasan/httpcache v1.2.1 |
There was a problem hiding this comment.
This is a fork of gregjones/httpcache#104 this PR.
|
Integration tests failure for 35f3b88dbb4e5dad84ca2916dc6db3d8e7e32d05 |
|
Integration tests success for 35f3b88dbb4e5dad84ca2916dc6db3d8e7e32d05 |
|
Integration tests success for 73ebc1a54963240d4ff9241dce169df0f5131478 |
|
Integration tests failure for e5b609b096fc67b393e928b3245a122f3919ef31 |
|
Integration tests success for e5b609b096fc67b393e928b3245a122f3919ef31 |
dlorenc
left a comment
There was a problem hiding this comment.
A couple tiny nits, LGTM!
|
|
||
| ### Caching | ||
|
|
||
| Scorecard uses `httpcache` with <https://docs.github.com/en/rest/overview/resources-in-the-rest-api#conditional-requests> for caching httpresponse. The default cache is in-memory. |
There was a problem hiding this comment.
Maybe link to the etags stuff in GitHub - the real benefit is avoiding the API quota
| // shouldUseDiskCache checks the env variables USE_DISK_CACHE and DISK_CACHE_PATH to determine if | ||
| // disk should be used for caching. | ||
| func shouldUseDiskCache() (string, bool) { | ||
| if isDiskCache := os.Getenv(UseDiskCache); isDiskCache != "" { |
There was a problem hiding this comment.
nit: I think you can avoid this if statement and go straight into ParseBool since "" parses as false.
inferno-chromium
left a comment
There was a problem hiding this comment.
Thanks a lot! This is very exciting.
.gitignore
Outdated
| # tools | ||
| bin | ||
|
|
||
| #temp |
README.md
Outdated
|
|
||
| To use disk cache two env variables have to be set `USE_DISK_CACHE=true` and `DISK_CACHE_PATH=./cache`. | ||
|
|
||
| There are not TTL on cache. |
| } | ||
| } | ||
| } | ||
| return "", false |
There was a problem hiding this comment.
nit: maybe nil instead of ""
There was a problem hiding this comment.
can't do nil for a string in go
roundtripper/roundtripper_test.go
Outdated
| t.Parallel() | ||
| tests := []struct { | ||
| name string | ||
| want string |
There was a problem hiding this comment.
s/want/diskCachePath
s/want1/useDiskCache
The GitHub API supports conditional requests https://docs.github.com/en/rest/overview/resources-in-the-rest-api#conditional-requests https://github.com/google/go-github supports Conditional requests https://github.com/google/go-github#conditional-requests As we are scaling more and more projects this would add a lot of value. Initial run fetches information using `httpcache` as a middleware, which caches the HTTP response initially in a large disk (PVC), probably move to Redis later as a cache instead of disk. Subsequent `cron runs` will utilize the `httpcache` for checking content modification and load it from the cache if it isn't modified, which reduces the hitting the Rate Limit of the GitHub API.
e5b609b to
9645825
Compare
|
Integration tests success for 9645825ac6716a3a988b682ddc00145bb62695df |
* set GITHUB_TOKEN as default token * updates * Update doc * Update doc * updates * updates * update * update * update * update * updates
What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
Feature - Caching
What is the current behavior? (You can also link to an open issue here)
Scorecard scalability limitation: Reduce GitHub API calls #80 Reducing GitHub API calls to scale scanning repositories #202
What is the new behavior (if this is a feature change)?
The GitHub API supports conditional requests
https://docs.github.com/en/rest/overview/resources-in-the-rest-api#conditional-requests
https://github.com/google/go-github supports Conditional requests
https://github.com/google/go-github#conditional-requests
As we are scaling more and more projects this would add a lot of value.
Initial run fetches information using
httpcacheas a middleware,which caches the HTTP response initially in a large disk (PVC),
probably move to Redis later as a cache instead of disk.
Subsequent
cron runswill utilize thehttpcachefor checking content modification andload it from the cache if it isn't modified, which reduces the hitting the
Rate Limit of the GitHub API.
Also fixed the golang-ci warnings.
Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
None
Other information:
Subsequent cache runs on 50 repositories takes about 18 minutes with
3GitHub tokensFolder size
Files in the folder