Skip to content

Faster ArgoIndex#577

Draft
gmaze wants to merge 7 commits intomasterfrom
faster-argoindex
Draft

Faster ArgoIndex#577
gmaze wants to merge 7 commits intomasterfrom
faster-argoindex

Conversation

@gmaze
Copy link
Member

@gmaze gmaze commented Jan 29, 2026

Goal: Optimise with parallelization wherever relevant ArgoIndex search queries.

We have found that searching for a lot of WMOs (eg + 100) is taking too long (+ 30sec), particularly with the Pandas backend.

We also note, that with the current search implementation Pandas is 2 times slower than Pyarrow.

Benchmark

I have implemented parallelization of the WMO search with multi-threading (concurrent.futures.ThreadPoolExecutor).

When one thread is used to execute one WMO search, I find that:

  • Pandas is 1.5 faster with multithreading
  • Pyarrow is 7.5 faster with multithreading

To make this PR more complete, I also benchmarked search time with several chunk size and with multi-processing, but results were not satisfactory. The overhead to quick processes is too much for this use case.

To conclude:

  • Pandas performs very badly compared to the Pyarrow backend, the difference is even larger with multi-threading
  • Multi-threading benefits much more to Pyarrow than to Pandas
  • Using chunks of WMOs for each thread does not bringing more improvements
  • Multi-processing overhead is too large

PR to-do

  • Implement multi-threading + lru_caching with Pandas and Pyarrow in:
    • .query.wmo
    • .query.cyc
    • .query.wmo_cyc
    • .query.params
  • Add a warning message in the Pandas backend for WMO search with more than 30 items that Pyarrow is recommended for large requests
  • Check for CI tests
  • Update What's New documentation section

Benchmark figures

dev-pr577-Faster-index-06-benchmark-legacy dev-pr577-Faster-index-06-benchmark-results dev-pr577-Faster-index-06-benchmark-fig1 dev-pr577-Faster-index-06-benchmark-fig2 dev-pr577-Faster-index-07-benchmark-fig1

@gmaze gmaze marked this pull request as draft February 9, 2026 09:27
@gmaze gmaze added the performance Should make argopy faster or better designed label Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Should make argopy faster or better designed

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

1 participant