Skip to content

r10a/elcrq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ELCRQ - Event-count LCRQ

License: BSD-3-Clause Platform

ELCRQ is a high-performance, lock-free, blocking-when-necessary concurrent queue implementation based on the LCRQ (Linked Concurrent Ring Queue) algorithm by Adam Morrison and Yehuda Afek.

Three-process roundtrip scenario

Table of Contents

Overview

ELCRQ extends the LCRQ algorithm with an Event-Count mechanism that allows threads to block efficiently when the queue is empty, rather than busy-waiting. This design significantly reduces CPU usage and power consumption in scenarios where the queue may be empty for extended periods.

The project provides implementations in both C and C++, with the C++ implementation being recommended for production use due to its more stable memory management.

Features

  • Lock-free Operations: Both enqueue and dequeue operations are lock-free, ensuring system-wide progress
  • Linearizable: All operations are linearizable, providing strong consistency guarantees
  • Block-when-necessary: Threads can efficiently wait when the queue is empty using Linux futex-based Event-Counts
  • Shared Memory Support: Designed for inter-process communication via shared memory
  • Hazard Pointers: Safe memory reclamation without garbage collection (C++ implementation)
  • High Throughput: Optimized for high-concurrency scenarios with multiple producer/consumer threads
  • Cache-line Aligned: Data structures are aligned to avoid false sharing

Architecture

Core Components

┌─────────────────────────────────────────────────────────────┐
│                         ELCRQ/SCRQueue                       │
│  ┌──────────────────────────────────────────────────────┐  │
│  │                     EventCount                        │  │
│  │  (Futex-based blocking mechanism)                     │  │
│  └──────────────────────────────────────────────────────┘  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │                      LCRQueue                         │  │
│  │  ┌─────────┐   ┌─────────┐   ┌─────────┐            │  │
│  │  │  Ring   │──▶│  Ring   │──▶│  Ring   │──▶ ...    │  │
│  │  │  Queue  │   │  Queue  │   │  Queue  │            │  │
│  │  └─────────┘   └─────────┘   └─────────┘            │  │
│  │      ▲                            ▲                   │  │
│  │      │                            │                   │  │
│  │    head                         tail                  │  │
│  └──────────────────────────────────────────────────────┘  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │                   Hazard Pointers                     │  │
│  │  (Safe memory reclamation - C++ only)                 │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Ring Queue Structure

Each ring queue is a fixed-size circular buffer with the following properties:

  • Size: 2^RING_POW elements (configurable)
  • Each cell contains a value and an index
  • Cells are 128-byte aligned to prevent false sharing
  • When a ring fills up, a new ring is allocated and linked

Event-Count Mechanism

The Event-Count provides efficient blocking without locks:

  1. prepareWait(): Register intent to wait, get current epoch
  2. wait(key): Block until epoch changes (new item enqueued)
  3. cancelWait(): Cancel waiting if item was found
  4. notify(): Wake one waiting thread after enqueue

Implementations

C++ Implementation (Recommended)

Located in the c++/ directory. Uses:

  • Boost.Interprocess for shared memory management
  • Hazard Pointers for safe memory reclamation
  • Modern C++ atomics and memory ordering

Components:

  • SCRQueue.hpp - Main queue interface with Event-Count
  • LCRQueue.hpp - Core LCRQ implementation
  • EventCount.hpp - Futex-based blocking mechanism
  • HazardPointers.hpp - Memory reclamation
  • Futex.hpp/cpp - Linux futex wrapper

C Implementation

Located in the c/ directory. Uses:

  • Custom shared memory allocator (shm_malloc)
  • Inline assembly for atomic operations

⚠️ Warning: The C implementation uses an experimental shared memory allocator that may be unstable. Use for reference/proof-of-concept only.

Components:

  • ELCRQ.h - Complete queue implementation
  • EventCount.h - Futex-based Event-Count
  • primitives.h - Atomic operation primitives
  • malloc.c/h - Shared memory allocator

API Reference

C++ API (SCRQueue)

template<typename T>
class SCRQueue {
    // Constructor
    SCRQueue(fixed_managed_shared_memory *main_pool, 
             fixed_managed_shared_memory *mem_pool, 
             const int num_threads);
    
    // Blocking enqueue - notifies waiters after enqueue
    void enqueue(T *item, const int tid);
    
    // Non-blocking enqueue - no notification
    void spinEnqueue(T *item, const int tid);
    
    // Blocking dequeue - waits if queue is empty
    T* dequeue(const int tid);
    
    // Non-blocking dequeue with patience limit
    T* spinDequeue(const int tid);
};

C API (ELCRQ)

// Initialize a new queue
void init_queue(ELCRQ* q);

// Blocking enqueue with notification
void enqueue(Object arg, int pid, ELCRQ* q);

// Blocking dequeue - waits on empty queue
Object dequeue(int pid, ELCRQ* q);

// Non-blocking enqueue
void spinEnqueue(Object arg, int pid, ELCRQ* q);

// Non-blocking dequeue with patience limit
Object spinDequeue(int pid, ELCRQ* q);

EventCount API

class EventCount {
    Key prepareWait();    // Register intent to wait
    void wait(Key key);   // Block until notified
    void cancelWait();    // Cancel waiting
    void notify();        // Wake one waiter
    void notifyAll();     // Wake all waiters
};

Performance

The three-process roundtrip benchmark demonstrates inter-process queue communication:

Configuration Throughput
6 threads/process (C) See benchmark results
8 threads/process (C++) See benchmark results

Note: Performance depends heavily on:

  • Number of CPU cores (minimum 18-24 for default configuration)
  • Memory bandwidth
  • Cache coherence protocol efficiency

Requirements

System Requirements

  • OS: Linux (uses futex system call)
  • Architecture: x86-64 (uses 128-bit CAS via cmpxchg16b)
  • CPU Cores: Minimum 18 (C) or 24 (C++) cores for full benchmark

C++ Dependencies

  • C++11 or later
  • Boost (Interprocess library)
  • CMake 3.x+

C Dependencies

  • C99 or later
  • CMake 3.x+
  • pthreads

Building

C++ Implementation

# Install Boost (Ubuntu/Debian)
sudo apt-get install libboost-all-dev

# Clone and build
git clone https://github.com/r10a/elcrq
cd elcrq/c++
cmake .
make

# Run benchmark
./main

C Implementation

git clone https://github.com/r10a/elcrq
cd elcrq/c
cmake .
make

# Run benchmark
./c

Configuration

C++ Configuration (main.cpp)

#define NUM_THREAD 8    // Threads per process (total = 3 * NUM_THREAD)

C Configuration

main.c:

#define NUM_THREAD 6    // Threads per process
#define NUM_ITERS 10    // Simulation iterations
#define NUM_RUNS 100    // Elements per iteration
#define SHM_FILE "/shm" // Shared memory file name

ELCRQ.h:

#define RING_POW 17         // Ring size = 2^17 = 131072 elements
#define MAX_PATIENCE 1000   // Spin attempts before blocking
#define Object uint64_t     // Element type

Ring Size Considerations

The ring size (2^RING_POW) affects:

  • Memory usage: Larger rings consume more memory
  • Performance: Larger rings reduce ring allocation frequency
  • Latency: Smaller rings may cause more frequent blocking

Usage Example

C++ Example

#include <boost/interprocess/managed_shared_memory.hpp>
#include "SCRQueue.hpp"

using namespace boost::interprocess;

int main() {
    // Create shared memory
    shared_memory_object::remove("MySharedMemory");
    managed_shared_memory segment(create_only, "MySharedMemory", 65536);
    
    // Create queue
    SCRQueue<int>* queue = segment.construct<SCRQueue<int>>("queue")(
        &segment, &segment, 4 /* num_threads */
    );
    
    // Thread 0: Enqueue
    int* item = segment.construct<int>(anonymous_instance)(42);
    queue->enqueue(item, 0);
    
    // Thread 1: Dequeue (will block if empty)
    int* result = queue->dequeue(1);
    
    return 0;
}

C Example

#include "ELCRQ.h"

int main() {
    ELCRQ queue;
    init_queue(&queue);
    
    // Thread 0: Enqueue
    enqueue(42, 0, &queue);
    
    // Thread 1: Dequeue (blocks if empty)
    Object result = dequeue(1, &queue);
    
    return 0;
}

References

  1. LCRQ Paper: Morrison, A., & Afek, Y. (2013). Fast concurrent queues for x86 processors. PPoPP '13.

  2. Event-Counts:

  3. Hazard Pointers: Michael, M. M. (2004). Hazard pointers: Safe memory reclamation for lock-free objects.

  4. Shared Memory (C):

License

This project is licensed under the BSD 3-Clause License - see the copyright notice in source files for details.

Copyright (c) 2013, Adam Morrison and Yehuda Afek.
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
...

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Guidelines

  1. Follow existing code style
  2. Add tests for new functionality
  3. Update documentation as needed
  4. Ensure changes compile on Linux x86-64

About

Lockless block-when necessary queue based on LCRQ.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors