ELCRQ - Event-count LCRQ

ELCRQ is a high-performance, lock-free, blocking-when-necessary concurrent queue implementation based on the LCRQ (Linked Concurrent Ring Queue) algorithm by Adam Morrison and Yehuda Afek.

Overview

ELCRQ extends the LCRQ algorithm with an Event-Count mechanism that allows threads to block efficiently when the queue is empty, rather than busy-waiting. This design significantly reduces CPU usage and power consumption in scenarios where the queue may be empty for extended periods.

The project provides implementations in both C and C++, with the C++ implementation being recommended for production use due to its more stable memory management.

Features

Lock-free Operations: Both enqueue and dequeue operations are lock-free, ensuring system-wide progress
Linearizable: All operations are linearizable, providing strong consistency guarantees
Block-when-necessary: Threads can efficiently wait when the queue is empty using Linux futex-based Event-Counts
Shared Memory Support: Designed for inter-process communication via shared memory
Hazard Pointers: Safe memory reclamation without garbage collection (C++ implementation)
High Throughput: Optimized for high-concurrency scenarios with multiple producer/consumer threads
Cache-line Aligned: Data structures are aligned to avoid false sharing

Architecture

Core Components

┌─────────────────────────────────────────────────────────────┐
│                         ELCRQ/SCRQueue                       │
│  ┌──────────────────────────────────────────────────────┐  │
│  │                     EventCount                        │  │
│  │  (Futex-based blocking mechanism)                     │  │
│  └──────────────────────────────────────────────────────┘  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │                      LCRQueue                         │  │
│  │  ┌─────────┐   ┌─────────┐   ┌─────────┐            │  │
│  │  │  Ring   │──▶│  Ring   │──▶│  Ring   │──▶ ...    │  │
│  │  │  Queue  │   │  Queue  │   │  Queue  │            │  │
│  │  └─────────┘   └─────────┘   └─────────┘            │  │
│  │      ▲                            ▲                   │  │
│  │      │                            │                   │  │
│  │    head                         tail                  │  │
│  └──────────────────────────────────────────────────────┘  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │                   Hazard Pointers                     │  │
│  │  (Safe memory reclamation - C++ only)                 │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Ring Queue Structure

Each ring queue is a fixed-size circular buffer with the following properties:

Size: 2^RING_POW elements (configurable)
Each cell contains a value and an index
Cells are 128-byte aligned to prevent false sharing
When a ring fills up, a new ring is allocated and linked

Event-Count Mechanism

The Event-Count provides efficient blocking without locks:

prepareWait(): Register intent to wait, get current epoch
wait(key): Block until epoch changes (new item enqueued)
cancelWait(): Cancel waiting if item was found
notify(): Wake one waiting thread after enqueue

Implementations

C++ Implementation (Recommended)

Located in the c++/ directory. Uses:

Boost.Interprocess for shared memory management
Hazard Pointers for safe memory reclamation
Modern C++ atomics and memory ordering

Components:

SCRQueue.hpp - Main queue interface with Event-Count
LCRQueue.hpp - Core LCRQ implementation
EventCount.hpp - Futex-based blocking mechanism
HazardPointers.hpp - Memory reclamation
Futex.hpp/cpp - Linux futex wrapper

C Implementation

Located in the c/ directory. Uses:

Custom shared memory allocator (shm_malloc)
Inline assembly for atomic operations

⚠️ Warning: The C implementation uses an experimental shared memory allocator that may be unstable. Use for reference/proof-of-concept only.

Components:

ELCRQ.h - Complete queue implementation
EventCount.h - Futex-based Event-Count
primitives.h - Atomic operation primitives
malloc.c/h - Shared memory allocator

API Reference

C++ API (SCRQueue)

template<typename T>
class SCRQueue {
    // Constructor
    SCRQueue(fixed_managed_shared_memory *main_pool, 
             fixed_managed_shared_memory *mem_pool, 
             const int num_threads);
    
    // Blocking enqueue - notifies waiters after enqueue
    void enqueue(T *item, const int tid);
    
    // Non-blocking enqueue - no notification
    void spinEnqueue(T *item, const int tid);
    
    // Blocking dequeue - waits if queue is empty
    T* dequeue(const int tid);
    
    // Non-blocking dequeue with patience limit
    T* spinDequeue(const int tid);
};

C API (ELCRQ)

// Initialize a new queue
void init_queue(ELCRQ* q);

// Blocking enqueue with notification
void enqueue(Object arg, int pid, ELCRQ* q);

// Blocking dequeue - waits on empty queue
Object dequeue(int pid, ELCRQ* q);

// Non-blocking enqueue
void spinEnqueue(Object arg, int pid, ELCRQ* q);

// Non-blocking dequeue with patience limit
Object spinDequeue(int pid, ELCRQ* q);

EventCount API

class EventCount {
    Key prepareWait();    // Register intent to wait
    void wait(Key key);   // Block until notified
    void cancelWait();    // Cancel waiting
    void notify();        // Wake one waiter
    void notifyAll();     // Wake all waiters
};

Performance

The three-process roundtrip benchmark demonstrates inter-process queue communication:

Configuration	Throughput
6 threads/process (C)	See benchmark results
8 threads/process (C++)	See benchmark results

Note: Performance depends heavily on:

Number of CPU cores (minimum 18-24 for default configuration)
Memory bandwidth
Cache coherence protocol efficiency

Requirements

System Requirements

OS: Linux (uses futex system call)
Architecture: x86-64 (uses 128-bit CAS via cmpxchg16b)
CPU Cores: Minimum 18 (C) or 24 (C++) cores for full benchmark

C++ Dependencies

C++11 or later
Boost (Interprocess library)
CMake 3.x+

C Dependencies

C99 or later
CMake 3.x+
pthreads

Building

C++ Implementation

# Install Boost (Ubuntu/Debian)
sudo apt-get install libboost-all-dev

# Clone and build
git clone https://github.com/r10a/elcrq
cd elcrq/c++
cmake .
make

# Run benchmark
./main

C Implementation

git clone https://github.com/r10a/elcrq
cd elcrq/c
cmake .
make

# Run benchmark
./c

Configuration

C++ Configuration (`main.cpp`)

#define NUM_THREAD 8    // Threads per process (total = 3 * NUM_THREAD)

C Configuration

main.c:

#define NUM_THREAD 6    // Threads per process
#define NUM_ITERS 10    // Simulation iterations
#define NUM_RUNS 100    // Elements per iteration
#define SHM_FILE "/shm" // Shared memory file name

ELCRQ.h:

#define RING_POW 17         // Ring size = 2^17 = 131072 elements
#define MAX_PATIENCE 1000   // Spin attempts before blocking
#define Object uint64_t     // Element type

Ring Size Considerations

The ring size (2^RING_POW) affects:

Memory usage: Larger rings consume more memory
Performance: Larger rings reduce ring allocation frequency
Latency: Smaller rings may cause more frequent blocking

Usage Example

C++ Example

#include <boost/interprocess/managed_shared_memory.hpp>
#include "SCRQueue.hpp"

using namespace boost::interprocess;

int main() {
    // Create shared memory
    shared_memory_object::remove("MySharedMemory");
    managed_shared_memory segment(create_only, "MySharedMemory", 65536);
    
    // Create queue
    SCRQueue<int>* queue = segment.construct<SCRQueue<int>>("queue")(
        &segment, &segment, 4 /* num_threads */
    );
    
    // Thread 0: Enqueue
    int* item = segment.construct<int>(anonymous_instance)(42);
    queue->enqueue(item, 0);
    
    // Thread 1: Dequeue (will block if empty)
    int* result = queue->dequeue(1);
    
    return 0;
}

C Example

#include "ELCRQ.h"

int main() {
    ELCRQ queue;
    init_queue(&queue);
    
    // Thread 0: Enqueue
    enqueue(42, 0, &queue);
    
    // Thread 1: Dequeue (blocks if empty)
    Object result = dequeue(1, &queue);
    
    return 0;
}

References

LCRQ Paper: Morrison, A., & Afek, Y. (2013). Fast concurrent queues for x86 processors. PPoPP '13.
- Paper PDF
Event-Counts:
- 1024cores.net - Event Counts
Hazard Pointers: Michael, M. M. (2004). Hazard pointers: Safe memory reclamation for lock-free objects.
- Paper PDF
Shared Memory (C):
- shm_malloc library

License

This project is licensed under the BSD 3-Clause License - see the copyright notice in source files for details.

Copyright (c) 2013, Adam Morrison and Yehuda Afek.
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
...

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Guidelines

Follow existing code style
Add tests for new functionality
Update documentation as needed
Ensure changes compile on Linux x86-64

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
c++		c++
c		c
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
scenario-roundtrip.png		scenario-roundtrip.png

Folders and files

Latest commit

History

Repository files navigation

ELCRQ - Event-count LCRQ

Table of Contents

Overview

Features

Architecture

Core Components

Ring Queue Structure

Event-Count Mechanism

Implementations

C++ Implementation (Recommended)

C Implementation

API Reference

C++ API (SCRQueue)

C API (ELCRQ)

EventCount API

Performance

Requirements

System Requirements

C++ Dependencies

C Dependencies

Building

C++ Implementation

C Implementation

Configuration

C++ Configuration (main.cpp)

C Configuration

Ring Size Considerations

Usage Example

C++ Example

C Example

References

License

Contributing

Guidelines

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

C++ Configuration (`main.cpp`)

Packages