Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to improve portability (notably ARM support) by replacing x86_64-specific cycle timing with a platform-agnostic timer, and documenting ARM benchmark results.
Changes:
- Replaced
rdtsc()usage in the dispatcher withnow_ns()(steady-clock nanoseconds). - Added ARM benchmark results to the README.
- Adjusted CI to trigger on pushes to the
armbranch.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/dispatcher.cpp |
Switches latency measurement from rdtsc() to now_ns(). |
include/orderbook/timing.hpp |
Introduces now_ns() using std::chrono; leaves old rdtsc code commented out. |
README.md |
Adds an ARM results section and labels x64 results. |
.github/workflows/ci-cd.yml |
Triggers workflow on pushes to arm branch in addition to main. |
Comments suppressed due to low confidence (1)
.github/workflows/ci-cd.yml:17
- CI is now triggered for pushes to the
armbranch, but the job matrix still only targetsubuntu-latest/windows-latestrunners (typically x86_64). If the goal is to validate ARM support, consider adding an ARM build/test job (e.g., an ARM runner or a cross-compile step) so changes onarmare actually exercised on the intended architecture.
branches: [ "main" , "arm"]
pull_request:
branches: [ "main" ]
jobs:
build:
name: ${{ matrix.os }} - ${{ matrix.build_type }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest]
build_type: [Release]
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -31,7 +31,7 @@ void Dispatcher::run( | |||
| break; | |||
| } | |||
|
|
|||
| uint64_t end = rdtsc(); | |||
| uint64_t end = now_ns(); | |||
| recorder.record(end - start); | |||
| } | |||
There was a problem hiding this comment.
now_ns() returns elapsed time in nanoseconds, but the value is being recorded via LatencyRecorder::record(uint64_t cycles) and later reported as "Latency (cycles)". This makes the benchmark output units incorrect (and likely misleading). Consider either renaming the recorder API/output to nanoseconds (or generic "ticks") or using a true cycle counter on each architecture and keeping the cycle-based reporting consistent.
|
|
||
| // #if defined(__x86_64__) || defined(_M_X64) | ||
| // #ifdef _MSC_VER | ||
| // #include <intrin.h> | ||
| // static inline uint64_t rdtsc() | ||
| // { | ||
| // return __rdtsc(); | ||
| // } | ||
| // #else | ||
| // static inline uint64_t rdtsc() | ||
| // { | ||
| // uint32_t lo, hi; | ||
| // asm volatile("rdtsc" : "=a"(lo), "=d"(hi)); | ||
| // return (static_cast<uint64_t>(hi) << 32) | lo; | ||
| // } | ||
| // #endif | ||
| // #else | ||
| // #error "rdtsc only supported on x86_64" | ||
| // #endif |
There was a problem hiding this comment.
This header leaves the old rdtsc implementation commented out. Keeping large blocks of commented-out code in a public header hurts readability and maintainability. Prefer removing it, or reintroducing it as an active #if/#elif implementation (e.g., use rdtsc on x86_64 and now_ns/ARM counter on other targets).
| // #if defined(__x86_64__) || defined(_M_X64) | |
| // #ifdef _MSC_VER | |
| // #include <intrin.h> | |
| // static inline uint64_t rdtsc() | |
| // { | |
| // return __rdtsc(); | |
| // } | |
| // #else | |
| // static inline uint64_t rdtsc() | |
| // { | |
| // uint32_t lo, hi; | |
| // asm volatile("rdtsc" : "=a"(lo), "=d"(hi)); | |
| // return (static_cast<uint64_t>(hi) << 32) | lo; | |
| // } | |
| // #endif | |
| // #else | |
| // #error "rdtsc only supported on x86_64" | |
| // #endif |
| Latency (cycles) | ||
| p50: 42 | ||
| p90: 84 | ||
| p99: 167 | ||
| p99.9: 1167 | ||
|
|
||
| Vector order book | ||
| Latency (cycles) | ||
| p50: 42 | ||
| p90: 125 | ||
| p99: 375 | ||
| p99.9: 625 |
There was a problem hiding this comment.
The README still labels these results as "Latency (cycles)", but the code now uses std::chrono::steady_clock nanoseconds (via now_ns()). Please update the unit label (and ideally note the timing source) so the published benchmark numbers are interpretable and comparable.
| @@ -34,3 +34,20 @@ Latency (cycles) | |||
| p99: 3520 | |||
| p99.9: 5640 | |||
There was a problem hiding this comment.
These benchmark results are labeled "Latency (cycles)", but the timing code has been switched to now_ns() (nanoseconds). Update this label (and any surrounding text) to match the units being recorded so readers don't interpret nanoseconds as CPU cycles.
No description provided.