Misc improvements #3

9ary · 2024-08-17T21:08:11Z

Hi!

This PR is a dump of changes I've made while operating this script. Since this repo was relatively easy to find when I was looking for this exact functionality, I'm putting this here for visibility. I don't expect things to be mergeable in this state, but I'd be happy to clean it up if there's interest.

Other than the first commit (updating dependencies) which was necessary to get going, the only change I originally intended to make was adding support for syncing tags. Things quickly spiraled out of control when the runtime exploded from this change and sent me down the rabbithole of optimizing the script.

For context, the organization I'm mirroring has ~90 repos, most of which rarely change, but a few are forks of high-profile projects with hundreds of tags. On my first attempt, the job ran into the rate limit after about 40 minutes, and I ended up interrupting it.

I noticed the code was fetching downstream references one by one, so I changed it to fetch the entire list at once. This cut down the number of requests dramatically, and allowed the job to complete quicker than it was before I touched it.
I then added some more batching which improved runtimes a bit more, and finally threading for request concurrency which got the runtime on par with the old fast updates.

I also attempted to reduce the amount of work done by comparing dates for all repos, but ended up reverting that, since the reliability tradeoff wasn't worth it.

Since it's so fast now, I disabled fast updates entirely and am now running full updates twice daily.

Here some rough numbers to give you an idea of the runtime progression:

initial run: 3 minutes (this only involved creating forks)
subsequent updates: 6-8 minutes for full runs and about 30s for fast runs
initial attempt at adding tags: 1h+, DNF
first successful run with tags: 5m9s
subsequent full runs with tags: under 4m
fetching downstream repos in a single call: under 3m
caching PIP packages: no measurable speedup
threading: 40s

It's possible to reduce the number of requests a bit more, by combining src_repo.get_branches() and src_repo.get_tags() into a single .get_git_refs() call, but otherwise I think it's close to optimal (unless it's possible to do more batching).

Reducing the number of API calls helps a lot to avoid hitting rate limits, while pure wall clock runtime optimizations are useful if the repo running the mirroring job is private, since github then "bills" by the minute (the free plan gets 2000 minutes monthly, but public repos are apparently unmetered).

This should cut down on the number of requests significantly to speed up the process and avoid hitting the rate limit.

Fast runs finish very quickly, but still count as a full minute due to how github does accounting. It's cheaper on credits to run the full update a bit more often, and more reliable too.

No point over-optimizing for now, let's prioritize reliability. This reverts commit 1ba8993.

9ary added 13 commits May 13, 2024 14:53

Update dependencies

67595d3

Update tags as well

6913403

Fetch entire ref list of destination repo at once

81777f1

This should cut down on the number of requests significantly to speed up the process and avoid hitting the rate limit.

Bump full runs to daily schedule

ca62021

Disable "fast" runs

fc56c44

Fast runs finish very quickly, but still count as a full minute due to how github does accounting. It's cheaper on credits to run the full update a bit more often, and more reliable too.

Remove fast run logic

e93cbf5

Enumerate destination org's repos

dd03e44

Skip up to date repos entirely

1ba8993

Revert "Skip up to date repos entirely"

8f75835

No point over-optimizing for now, let's prioritize reliability. This reverts commit 1ba8993.

Enable pip cache

4933354

Add nix flake

926faaf

Rework log messages

6c13994

Add threading for concurrent requests

8bacacd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc improvements #3

Misc improvements #3

Uh oh!

9ary commented Aug 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Misc improvements #3

Are you sure you want to change the base?

Misc improvements #3

Uh oh!

Conversation

9ary commented Aug 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant