Skip to content

Parallel time-range sync for 4-5x speedup#12

Open
kittleik wants to merge 1 commit intowarproxxx:mainfrom
kittleik:feat/parallel-sync
Open

Parallel time-range sync for 4-5x speedup#12
kittleik wants to merge 1 commit intowarproxxx:mainfrom
kittleik:feat/parallel-sync

Conversation

@kittleik
Copy link

What

New parallel_sync.py script that splits the sync time range into N segments and syncs them in parallel using ThreadPoolExecutor.

Key features

  • Parallel workers: Splits (last_timestamp → now) into N segments, each synced independently
  • Sticky cursor optimization: Skips tiny follow-up batches when boundary count < 100 (was ~50% of all batches)
  • Graceful shutdown: SIGINT/SIGTERM handling, preserves temp files
  • Per-worker logging: Each worker logs to parallel_logs/worker_N.log
  • Safe merge: Temp files merged into main CSV in timestamp order after all workers complete

Test results

  • 2 workers, ~30 min range: 36,380 records in 28.6s (1,273 rec/s)
  • Original sequential: ~1,000 rec/s
  • With 5 workers on the full 50-day gap: expect 4-5x throughput

Usage

pkill -f update_goldsky.py  # stop current sync first
python3 parallel_sync.py --workers 5

- Splits time range into N segments, syncs in parallel with ThreadPoolExecutor
- Sticky cursor optimization: skips follow-up batches when boundary count < threshold
- Graceful shutdown on SIGINT/SIGTERM, per-worker logging
- Tested: 2 workers, 36k records in 28.6s (1273 rec/s)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant