Skip to content

Arm#1

Merged
snigenigmatic merged 2 commits intomainfrom
arm
Feb 4, 2026
Merged

Arm#1
snigenigmatic merged 2 commits intomainfrom
arm

Conversation

@snigenigmatic
Copy link
Owner

No description provided.

Copilot AI review requested due to automatic review settings February 4, 2026 10:00
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve portability (notably ARM support) by replacing x86_64-specific cycle timing with a platform-agnostic timer, and documenting ARM benchmark results.

Changes:

  • Replaced rdtsc() usage in the dispatcher with now_ns() (steady-clock nanoseconds).
  • Added ARM benchmark results to the README.
  • Adjusted CI to trigger on pushes to the arm branch.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/dispatcher.cpp Switches latency measurement from rdtsc() to now_ns().
include/orderbook/timing.hpp Introduces now_ns() using std::chrono; leaves old rdtsc code commented out.
README.md Adds an ARM results section and labels x64 results.
.github/workflows/ci-cd.yml Triggers workflow on pushes to arm branch in addition to main.
Comments suppressed due to low confidence (1)

.github/workflows/ci-cd.yml:17

  • CI is now triggered for pushes to the arm branch, but the job matrix still only targets ubuntu-latest/windows-latest runners (typically x86_64). If the goal is to validate ARM support, consider adding an ARM build/test job (e.g., an ARM runner or a cross-compile step) so changes on arm are actually exercised on the intended architecture.
    branches: [ "main" , "arm"]
  pull_request:
    branches: [ "main" ]

jobs:
  build:
    name: ${{ matrix.os }} - ${{ matrix.build_type }}
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest, windows-latest]
        build_type: [Release]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 11 to 36
@@ -31,7 +31,7 @@ void Dispatcher::run(
break;
}

uint64_t end = rdtsc();
uint64_t end = now_ns();
recorder.record(end - start);
}
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now_ns() returns elapsed time in nanoseconds, but the value is being recorded via LatencyRecorder::record(uint64_t cycles) and later reported as "Latency (cycles)". This makes the benchmark output units incorrect (and likely misleading). Consider either renaming the recorder API/output to nanoseconds (or generic "ticks") or using a true cycle counter on each architecture and keeping the cycle-based reporting consistent.

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +28

// #if defined(__x86_64__) || defined(_M_X64)
// #ifdef _MSC_VER
// #include <intrin.h>
// static inline uint64_t rdtsc()
// {
// return __rdtsc();
// }
// #else
// static inline uint64_t rdtsc()
// {
// uint32_t lo, hi;
// asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
// return (static_cast<uint64_t>(hi) << 32) | lo;
// }
// #endif
// #else
// #error "rdtsc only supported on x86_64"
// #endif
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header leaves the old rdtsc implementation commented out. Keeping large blocks of commented-out code in a public header hurts readability and maintainability. Prefer removing it, or reintroducing it as an active #if/#elif implementation (e.g., use rdtsc on x86_64 and now_ns/ARM counter on other targets).

Suggested change
// #if defined(__x86_64__) || defined(_M_X64)
// #ifdef _MSC_VER
// #include <intrin.h>
// static inline uint64_t rdtsc()
// {
// return __rdtsc();
// }
// #else
// static inline uint64_t rdtsc()
// {
// uint32_t lo, hi;
// asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
// return (static_cast<uint64_t>(hi) << 32) | lo;
// }
// #endif
// #else
// #error "rdtsc only supported on x86_64"
// #endif

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +52
Latency (cycles)
p50: 42
p90: 84
p99: 167
p99.9: 1167

Vector order book
Latency (cycles)
p50: 42
p90: 125
p99: 375
p99.9: 625
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README still labels these results as "Latency (cycles)", but the code now uses std::chrono::steady_clock nanoseconds (via now_ns()). Please update the unit label (and ideally note the timing source) so the published benchmark numbers are interpretable and comparable.

Copilot uses AI. Check for mistakes.
Comment on lines 23 to 35
@@ -34,3 +34,20 @@ Latency (cycles)
p99: 3520
p99.9: 5640
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These benchmark results are labeled "Latency (cycles)", but the timing code has been switched to now_ns() (nanoseconds). Update this label (and any surrounding text) to match the units being recorded so readers don't interpret nanoseconds as CPU cycles.

Copilot uses AI. Check for mistakes.
@snigenigmatic snigenigmatic merged commit 2b51753 into main Feb 4, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant