`parlay::sequence` constructor much slower than reserve + fill, and both slower than `std::vector`

The following code measures
1. Time for creating `parlay::sequence` via reserve + fill and the number of distinct buffers allocated in 100000 iterations of a parallel for loop.
2. Time for  creating `parlay::sequence` via its constructor and the number of distinct buffers allocated in 100000 iterations of a parallel for loop.
3. Time for  creating `std::vector` via its constructor and the number of distinct buffers allocated in 100000 iterations of a parallel for loop.

```
#include <iostream>
#include <vector>
#include <chrono>
// #include "infra/infra.hpp"
#include <parlay>

int main() {
  int ntab = 100000;
  auto now = std::chrono::steady_clock::now();
  auto ptrs1 = parlay::tabulate(ntab, [&](int i)->auto {
    parlay::sequence<double> a; a.reserve(20000); std::fill(a.begin(), a.end(), 0);
    parlay::sequence<double> b; b.reserve(20000); std::fill(b.begin(), b.end(), 0);
    return std::pair(a.data(), b.data());
  });
  std::cout << "Reserve and fill, time cost = " << std::chrono::duration_cast<
    std::chrono::microseconds> (std::chrono::steady_clock::now() - now).count();
  std::cout << "\nNumber of distinct vectors allocated = " << 
  parlay::remove_duplicates(parlay::slice(
      &ptrs1.front().first, &ptrs1.back().second + 1)).size() << "\n\n";
  decltype(ptrs1)().swap(ptrs1);
  
  
  now = std::chrono::steady_clock::now();
  auto ptrs2 = parlay::tabulate(ntab, [&](int i)->auto {
    parlay::sequence<double> a(20000, 0);
    parlay::sequence<double> b(20000, 0);
    return std::pair(a.data(), b.data());
  });
  std::cout << "Plain initialization, time cost = " << std::chrono::duration_cast<
    std::chrono::microseconds> (std::chrono::steady_clock::now() - now).count();
  std::cout << "\nNumber of distinct vectors allocated = " << 
    parlay::remove_duplicates(parlay::slice(
        &ptrs2.front().first, &ptrs2.back().second + 1)).size() << "\n\n";
  decltype(ptrs2)().swap(ptrs2);
  
  
  now = std::chrono::steady_clock::now();
  auto ptrs3 = parlay::tabulate(ntab, [&](int i)->auto {
    std::vector<double> a(20000, 0);
    std::vector<double> b(20000, 0);
    return std::pair(a.data(), b.data());
  });
  std::cout << "Use std::vector, time cost = " << std::chrono::duration_cast<
    std::chrono::microseconds> (std::chrono::steady_clock::now() - now).count();
  std::cout << "\nNumber of distinct vectors allocated = " << 
    parlay::remove_duplicates(parlay::slice(
        &ptrs3.front().first, &ptrs3.back().second + 1)).size() << "\n\n";
  decltype(ptrs3)().swap(ptrs3);
} 
```

Compiling with `g++  -std=c++20 -O2  -lpthread  -march=native  code/cpp/parlayTest.cpp -o test`, a sample printout can be:

```
Reserve and fill, time cost = 59335
Number of distinct vectors allocated = 28

Plain initialization, time cost = 154726
Number of distinct vectors allocated = 49

Use std::vector, time cost = 50010
Number of distinct vectors allocated = 34

```
The numbers `28` and `49` will change in different runs.

What surprised me a little bit is that (i) `parlay::sequence()` is substantially slower  than `reserve + fill` and triggers more new allocations, (ii) using `parlay::sequence` is slower than using `std::vector`.

I have long believed in the superiority of `parlay`'s memory allocation, but this example says otherwise. Is there a simple explanation? 

What's the rule of thumb for allocating sequences within a parallel for loop using `parlay`?

Thanks a lot!  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`parlay::sequence` constructor much slower than reserve + fill, and both slower than `std::vector` #87

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

parlay::sequence constructor much slower than reserve + fill, and both slower than std::vector #87

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`parlay::sequence` constructor much slower than reserve + fill, and both slower than `std::vector` #87