Tiered Vector (tiered_vector)

A high-performance hybrid container combining the speed of std::vector with the stability of std::deque.

The tiered_vector is a C++ container designed to reduce performance cliffs and memory inefficiencies when working with very large datasets, high-concurrency workloads, or unpredictable growth patterns.

It can act as a drop-in replacement for std::vector when copy-on-resize costs are prohibitive, or for std::deque when iteration speed matters.

Note

This is NOT a replacement for std::vector in all cases - std::vector is preferable when contiguous storage is required.
tiered_vector can outperform std::vector in specific scenarios (see benchmarks below), particularly for heavy growth and reference-stability needs.

How It Works

tiered_vector is a high performance, block based(segmented) container made to minimize the overhead of dynamic memory allocation, as found in std::vector.

1) Two-Tiered Memory Strucutre

Unlike std::vector, which stores all elements in a single contiguous block, this container uses a tiered memory approach.

Tier 1: An array of pointers pointing to fixed array sizes of 1024 elements(T** pdata).
Tier 2: Fixed size arrays of 1024 elements each.
Indexing: Elements are accessed using bitwise operators for maximum speed. For index i, block is located at i >> 10 and offset within a block is calculated as i & 1023.

2) Small Buffer Optimization (SBO) To avoid heap allocations for smaller datasets, the container includes an internal array.

internal_pdata[8]: If the container needs 8 or fewer blocks, it uses this stack-allocated pointer array instead of allocating a directory on heap. This makes it faster for smaller-medium test cases.

3) Allocation Strategy and Performance

Alloc/Realloc Strategy: When the container grows, if the block capacity is enough, it allocates a new block after the previous 1024 element block is filled. It never directly allocates the pointer with 1024 arrays unless the resize function is called. If there is not enough blocks, the same reallocation strategy of std::vector is applied for the pointer array, where only the pointers get reallocated, not the elements or the array itself.
Memory Efficiency: Because growth is incremental (1024 elements at a time), the container avoids over allocating memory by a huge overhead.
Pointer Stability: Since elements are never moved during reallocations, pointers and references to elements remain valid for the lifetime of the element.

4) Hysteresis in pop_back

This container has a different approach to shrinking. When pop_back is called in a typical vector implementation, it resizes the size, but the capacity is never reduced, leaving large amounts of memory.
tiered_vector implements a Hysterisis like behaviour to balance memomry usage and performance. When a block becomes empty by calling pop_back, instead of deleting the block the instant it becomes empty, it waits till more than one full block is empty, then it delets the array in the pointer.
THis one block buffer prevents repeated allocation and destruction of blocks when a user calls push_back and pop_back repeatedly at the boundary of a block.

Performance benchmarks

1) general_benchmark.cpp:

Context:

A high level overview comparing the three major operations: push_back, reading sequentially, random access and bytes/element

The Mechanism:

std::vector uses a single contiguous block of memory. When it is full, it allocates a new block 2x the size, copies everything and deletes the old block
std::deque uses a map of pointers to small fixed-size arrays. It never copies existing elements, but the arrays are often small, leading to frequent allocations and cache misses.
tiered_vector uses an array of pointers pointing to moderate-large sized data chunks, size in the power of two. When full, it adds new chunks, never moves existing data.

Expected Observation and Reason:

push_back (no reserving memory initially) - tiered_vector dominates, and the reason is because no reallocation/copy of large amounts of data happens. std::deque loses here due to high allocator pressure
seq_scan - std::vector wins, but tiered_vector is competitive. vector is a single block, allowing the CPU prefetcher to predict 100% of the memory accesses. tiered_vector introduces a hop every 1024 elements, which is a negligible penalty compared to std:deque's penalty

random_access - std::vector wins, but tiered_vector is very competitive. The reason vector wins is simply because there is lesser math to do, compared to tiered_vector, which has two dereferences. The reason deque loses is because of more cache misses.

  -------------------------------------------------------------------------------------------------------------------
  Count       Type                Push(ms)    SeqScan(ms) RndAcc(ms)  Total(MB)     Bytes/Elem     
  -------------------------------------------------------------------------------------------------------------------
  1000000     tiered_vector       17.5        6.6         11.9        3.91          4.10           
  1000000     deque               67.2        79.2        81.7        2.76          2.89           
  1000000     vector              74.2        5.6         11.8        4.69          4.92           

  -------------------------------------------------------------------------------------------------------------------
  Count       Type                Push(ms)    SeqScan(ms) RndAcc(ms)  Total(MB)     Bytes/Elem     
  -------------------------------------------------------------------------------------------------------------------
  10000000    tiered_vector       171.2       66.6        59.5        36.33         3.81           
  10000000    deque               681.8       777.3       404.4       38.70         4.06           
  10000000    vector              759.6       46.0        47.9        45.02         4.72           

  -------------------------------------------------------------------------------------------------------------------
  Count       Type                Push(ms)    SeqScan(ms) RndAcc(ms)  Total(MB)     Bytes/Elem     
  -------------------------------------------------------------------------------------------------------------------
  33000000    tiered_vector       412.8       140.5       38.4        118.54        3.77           
  33000000    deque               1065.5      841.8       132.9       130.55        4.15           
  33000000    vector              803.9       57.2        18.3        156.76        4.98           

  -------------------------------------------------------------------------------------------------------------------
  Count       Type                Push(ms)    SeqScan(ms) RndAcc(ms)  Total(MB)     Bytes/Elem     
  -------------------------------------------------------------------------------------------------------------------
  60000000    tiered_vector       323.6       124.1       18.5        198.19        3.46           
  60000000    deque               1286.8      1473.6      131.6       238.38        4.17           
  60000000    vector              1480.9      103.9       18.3        259.76        4.54           

  -------------------------------------------------------------------------------------------------------------------
  Count       Type                Push(ms)    SeqScan(ms) RndAcc(ms)  Total(MB)     Bytes/Elem     
  -------------------------------------------------------------------------------------------------------------------
  65000000    tiered_vector       352.7       134.1       18.4        217.34        3.51           
  65000000    deque               1395.2      1592.0      127.6       258.34        4.17           
  65000000    vector              1590.1      112.7       18.3        278.83        4.50

2) speed_benchmark.cpp:

Context:

Aims to calculate micro-level bench marks, in two modes.. initially unreserved vs initially reserved.

The Mechanism:

Simulates both unreserved and reserved growth. If the memory is initially reserved, no reallocation happens in a vector, which is the ideal case we assume.

Expected Observation and Reason:

Unreserved Growth - tiered_vector wins for push_back operation, for reasons discussed earlier

Reserved Growth - vector is king in all aspects, because no reallocation of elements happen at any time. In this case, it just pretty much behaves as fast as an array, approaching the RAM limit..

  ========================================================================================
  BENCHMARK: UNRESERVED GROWTH (push_back)
  ========================================================================================
  N Elements     std::vector       std::deque        TieredVec         Winner         
  ----------------------------------------------------------------------------------------
  1000           0.00004           0.00003           0.00001           TieredVec      
  10000          0.00040           0.00031           0.00010           TieredVec      
  100000         0.00384           0.00333           0.00089           TieredVec      
  1000000        0.04927           0.05785           0.01115           TieredVec      
  10000000       0.69572           0.57949           0.13706           TieredVec      
  73000000       1.75732           3.35112           1.21763           TieredVec      

  ========================================================================================
  BENCHMARK: RESERVED WRITE (operator[])
  ========================================================================================
  N Elements     std::vector       std::deque        TieredVec         Winner         
  ----------------------------------------------------------------------------------------
  1000           0.00000           0.00002           0.00000           Vector         
  10000          0.00002           0.00023           0.00002           Vector         
  100000         0.00016           0.00236           0.00019           Vector         
  1000000        0.00161           0.02399           0.00191           Vector         
  10000000       0.01589           0.23690           0.01936           Vector         
  73000000       0.11689           1.73158           0.14085           Vector         

  ========================================================================================
  BENCHMARK: SEQUENTIAL READ (Linear Scan)
  ========================================================================================
  N Elements     std::vector       std::deque        TieredVec         Winner         
  ----------------------------------------------------------------------------------------
  1000           0.00000           0.00002           0.00000           Vector         
  10000          0.00002           0.00023           0.00002           Vector         
  100000         0.00027           0.00234           0.00020           TieredVec      
  1000000        0.00174           0.02370           0.00230           Vector         
  10000000       0.01659           0.23787           0.02046           Vector         
  73000000       0.12037           1.72742           0.15057           Vector         

  ========================================================================================
  BENCHMARK: RANDOM ACCESS (Index Hop)
  ========================================================================================
  N Elements     std::vector       std::deque        TieredVec         Winner         
  ----------------------------------------------------------------------------------------
  1000           0.00001           0.00003           0.00001           Vector (Close) 
  10000          0.00007           0.00028           0.00007           TieredVec      
  100000         0.00067           0.00304           0.00083           Vector         
  1000000        0.01025           0.03681           0.01387           Vector         
  10000000       0.50245           1.19925           0.52536           Vector (Close) 
  73000000       4.60984           11.60753          5.10716           Vector

3) memory_benchmark.cpp

Context:

Measures wasted memory, i.e. memory allocated, but not containing user data

Mechanism:

std::vector as discussed earlier, reallocates n to 2*n blocks, wasting memory and creating an overhead, for a major boost in speed.
tiered_vector Allocates fixed_size chunks. If a chunk holds 1024 items, waste is never more than 1023 items, regardless of the total size, if assumed to push_back sequentially.
Waste becomes 1023 elements per block at worst case scenario, if not assumed to push_back sequentially by reserving the pointers initially.

Expected Observation and Reason:

pretty trivial... tiered_vector and deque has minimal waste, while vector has a huge overhead as n scales.

  ==========================================================================================
  Elements    std::vector (Waste %)    std::deque (Waste %)     TieredVec (Waste %)      
  ==========================================================================================
  1000        0.0MB (2%)               0.0MB (4%)               0.0MB (4%)               
  100000      0.5MB (24%)              0.4MB (2%)               0.4MB (0%)               
  1000000     4.0MB (5%)               3.9MB (2%)               3.8MB (0%)               
  1048577     8.0MB (50%)              4.1MB (2%)               4.0MB (0%)               
  10000000    64.0MB (40%)             38.7MB (2%)              38.3MB (0%)              
  33554433    256.0MB (50%)            130.0MB (2%)             128.5MB (0%)             
  73000000    512.0MB (46%)            282.8MB (2%)             279.5MB (0%)             

  ------------------------------------------------------------------------------------------
  TEST 2: THE SHRINK TEST (Memory Reclamation)
  Popping 50% of elements (36.5M items)...
  ------------------------------------------------------------------------------------------
  36500000    512.0MB (73%)            141.4MB (2%)             140.2MB (0%)

4) wr_multithreaded.cpp

Context:

The "killer feature" of this data structure. 16 threads attempting to write/read random indices simultaneously.
Have not implemented thread safety in tiered_vector yet, so we will be dealing with w/r operations for now

Mechanism:

Global Locking (Vector): Because std::vector might resize and move memory, pointer reference is lost during realloc, and hence it enforces serial execution (global lock).
Segmented Locking (Tiered_Vector): Because tiered_vector guarantess reference stability, we can use mutex per chunk. Thread A writing to Chunk 1 does not block Thread B writing to Chunk 2.

Expected Observation and Reason:

Tiered vector is effectively running in parallel in the average case (16 threads working at once in the best case). This makes tiered_vector terribly more efficient for this nieche case scenario.

  =============================================================
  LOCK GRANULARITY SHOWDOWN (16 Threads, 50M Random Writes)
  =============================================================
  Scenario: 16 threads trying to write to random indices.
  Expectation: Vector locks globally. Tiered locks locally.

  Global Lock (Vector)      | Time: 19.938s | 2.5 M ops/sec
  Segmented Lock (Tiered)   | Time: 0.523s | 95.6 M ops/sec

  =============================================================

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Test_Scripts		Test_Scripts
TieredVector.cpp		TieredVector.cpp
readme.md		readme.md
sample.cpp		sample.cpp
tiered_vector.hpp		tiered_vector.hpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tiered Vector (tiered_vector)

Note

How It Works

Performance benchmarks

1) general_benchmark.cpp:

2) speed_benchmark.cpp:

3) memory_benchmark.cpp

4) wr_multithreaded.cpp

About

Uh oh!

Releases

Packages

Languages

Arjun2508/TieredVector

Folders and files

Latest commit

History

Repository files navigation

Tiered Vector (tiered_vector)

Note

How It Works

Performance benchmarks

1) general_benchmark.cpp:

2) speed_benchmark.cpp:

3) memory_benchmark.cpp

4) wr_multithreaded.cpp

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages