cpu-o3: using reverse ordered tick & refactor the stalls logic by tastynoob · Pull Request #756 · OpenXiangShan/GEM5

tastynoob · 2026-02-09T11:32:25Z

Change-Id: I9b599a4e0d704215ad1a3bf543dbd075384fe1f4

Summary by CodeRabbit

Refactor
- Major rewrite of pipeline buffering and stall signaling: per-thread buffers consolidated, stage wiring updated, and time-buffered stage flow introduced for more consistent instruction movement.
- Simplified SMT/thread handling and unified stall propagation across fetch, decode, rename, execute and commit stages.
Bug Fixes
- Improved commit/ROB readiness signaling and stall logic to reduce starvation and misordering.
- Safer load/ memory handling and clearer diagnostics to aid reliability and debugging.

coderabbitai · 2026-02-09T11:32:46Z

📝 Walkthrough

Walkthrough

Refactors O3 pipeline buffering and stall signaling: introduces a centralized StallSignals type, replaces many per-thread queues with per-thread circular fixed buffers across stages (fetch/decode/rename/iew/commit), rewires stall propagation and stage wiring, reorders CPU tick (commit→iew→rename→decode→fetch), and updates related APIs, logging, and small ROB/LSQ behaviors.

Changes

Cohort / File(s)	Summary
Communication Structures `src/cpu/o3/comm.hh`	Introduce new public `StallSignals` struct (per-thread blockFetch/blockDecode/blockRename/blockIEW). Remove scattered boolean stall signals and adjust TimeStruct to reference grouped stall signals.
Commit Stage `src/cpu/o3/commit.hh`, `src/cpu/o3/commit.cc`	Add per-thread `fixedbuffer` (boost::circular_buffer), `StallSignals* stallSig` and setter; replace `getInsts()` with `moveInstsToBuffer()`; change commit buffering/ROB head signaling (add robheadNotReadySeqNum), adjust commit timing and tracing, and shift ROB broadcast behavior.
CPU Controller `src/cpu/o3/cpu.hh`, `src/cpu/o3/cpu.cc`	Replace per-stage queues with `*Timebuffer` variants; add `StallSignals` members and wiring; change tick order to commit→iew→rename→decode→fetch; advance new timebuffers and update drain/erase/squash iterator APIs.
Fetch Stage `src/cpu/o3/fetch.hh`, `src/cpu/o3/fetch.cc`	Replace per-thread stall helpers with centralized `StallSignals* stallSig`; remove SMT selection helpers and use single-thread placeholder (tid=0) in places; simplify cache/stall handling and sendInstructionsToDecode to use stall signals.
Decode Stage `src/cpu/o3/decode.hh`, `src/cpu/o3/decode.cc`	Replace inst/skid queues with per-thread `fixedbuffer` and a `stallBuffer`/`eachstallSize`; rename APIs (e.g., `sortInsts`→`moveInstsToBuffer`, `squash`→`selfSquash`, `updateStatus`→`updateActivate`); centralize stall/activation logic and single-active-thread decode flow; add `setStallSignals`.
Rename Stage `src/cpu/o3/rename.hh`, `src/cpu/o3/rename.cc`	Replace InstQueue/skidBuffer with per-thread `fixedbuffer`; remove many per-thread counters and serialize state; add `setStallSignals()`; rename `sortInsts`→`moveInstsToBuffer` and simplify rename/canRename/activation flow.
IEW Stage `src/cpu/o3/iew.hh`, `src/cpu/o3/iew.cc`	Replace inst/skid buffers with per-thread `fixedbuffer`; add `StallSignals* stallSig` and `serializeOnNextInst`; rename APIs (`updateStatus`→`updateActivate`, `sortInsts`→`moveInstsToBuffer`, etc.); consolidate dispatch/dispatchInsts to fixedbuffer-centric flow.
ROB & Commit Integration `src/cpu/o3/rob.cc`, `src/cpu/o3/rob.hh`	Add `ROB::getHeadGroupLastNotReadySeq(ThreadID)` and rename `findInst` param; used by commit/rob-head gating logic.
Misc: Logging, Guards, Small Fixes `src/cpu/o3/dyn_inst.hh`, `src/cpu/o3/lsq_unit.cc`, `src/cpu/o3/lsq.cc`, `src/cpu/o3/inst_queue.cc`, `src/cpu/o3/issue_queue.cc`, `src/cpu/o3/rename_map.cc`, `src/cpu/o3/regfile.hh`, `src/cpu/reg_class.hh`	Remove `DynInst::isTempSerializeAfter`; adjust debug categories to `Scoreboard`; add safer packet data copying and debug guards; minor panic/assert message tweaks and small refactors.
Header API updates `src/cpu/o3/{commit,fetch,decode,iew,rename}.hh`, `src/cpu/o3/cpu.hh`	Add `setStallSignals(StallSignals*)` to several stages; add `StallSignals` members; change several public method signatures (e.g., `getInsts()`→`moveInstsToBuffer()`, `squashInstIt` return type change, other rename/dispatch method renames).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CPU as CPU (tick)
    participant Commit as Commit
    participant IEW as IEW
    participant Rename as Rename
    participant Decode as Decode
    participant Fetch as Fetch
    participant Stall as StallSignals

    rect rgba(200,200,255,0.5)
    CPU->>Commit: tick -> process commit stage
    Commit->>Stall: read/write robheadNotReadySeqNum / blockRename
    Commit->>IEW: insert retired instructions (from fixedbuffer)
    end

    rect rgba(200,255,200,0.5)
    CPU->>IEW: tick -> process IEW stage
    IEW->>Stall: consult blockRename / blockIEW
    IEW->>Rename: dispatch (moveInstsToBuffer / serialize checks)
    end

    rect rgba(255,220,180,0.5)
    CPU->>Rename: tick -> process rename stage
    Rename->>Stall: consult blockDecode / blockRename
    Rename->>Decode: moveInstsToBuffer (per-thread fixedbuffer)
    end

    rect rgba(255,200,200,0.5)
    CPU->>Decode: tick -> process decode stage
    Decode->>Stall: consult blockFetch / blockDecode
    Decode->>Fetch: request more instructions or stall signal
    end

    rect rgba(220,220,220,0.5)
    CPU->>Fetch: tick -> process fetch stage
    Fetch->>Stall: respect blockFetch signals
    Fetch->>Decode: send instructions into decode fixedbuffer
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

cpu-o3: rename execWB to toCommit wire for instruction squashing #678 — Modifies IEW→Commit signaling and toCommit wire usage; closely related to commit/IEW wiring changes.
support ChampSim and CBP2025 trace simulation #649 — Touches comm.hh TimeStruct/CommitComm layout and stall signal fields similar to this PR.

Suggested reviewers

jensen-yan
happy-lx

Poem

🐰
I hopped through buffers, round and neat,
Herded stall flags, kept the beat.
One signal bundle ties the flow—
Threads align and pipelines go.
A crunchy carrot: smooth commit!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.83% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes: reverse-ordered tick execution in CPU and refactoring stall signal handling across multiple pipeline stages.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch refactor-stalls

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)

src/cpu/o3/fetch.hh (1)

634-641: ⚠️ Potential issue | 🟡 Minor

Remove dead Stalls struct from fetch header; clarify incomplete drain handling.

The Stalls struct at lines 634-638 is unused since stalls[MaxThreads] was replaced by StallSignals* stallSig. Unlike the Rename stage which still actively uses its Stalls struct, fetch only references stallSig with its blockFetch member. The struct definition can be removed.

Additionally, StallSignals provides no drain equivalent, and drainStall() in fetch.cc is an empty stub. This suggests drain handling was intentionally removed or remains incomplete. Clarify whether drain functionality is intentional or needs implementation. The stallSig pointer initialization via setStallSignals() called during CPU construction is safe and does not require nullptr initialization.
src/cpu/o3/commit.cc (1)
1789-1857: ⚠️ Potential issue | 🟠 Major

Protect fixedbuffer intake when prior batch hasn’t drained.

getInsts() assumes the per-thread fixedbuffer is empty whenever rename delivers a new batch. If that invariant ever breaks, the new batch is mixed with older entries. A runtime guard makes this robust even in non-assert builds.
🛠️ Suggested guard
-    ThreadID tid = insts_from_rename > 0 ? fromRename->insts[0]->threadNumber : -1;
-    if (tid != -1) assert(fixedbuffer[tid].empty());
+    ThreadID tid = insts_from_rename > 0 ? fromRename->insts[0]->threadNumber
+                                         : InvalidThreadID;
+    if (tid != InvalidThreadID && !fixedbuffer[tid].empty()) {
+        stallSig->blockIEW[tid] = true;
+        DPRINTF(Commit, "[tid:%i] Fixedbuffer not drained; deferring rename intake\n", tid);
+        return;
+    }
src/cpu/o3/rename.cc (2)
652-675: ⚠️ Potential issue | 🟠 Major

Rename stage never deactivates.

any_unblocking is hardcoded to true, so the stage stays Active even when idle, preventing CPU idling. Compute it from stall signals/buffer state.
🛠️ Suggested fix
-    bool any_unblocking = true;
+    bool any_unblocking = false;
+    for (ThreadID tid = 0; tid < numThreads; ++tid) {
+        if (!stallSig->blockDecode[tid] && !fixedbuffer[tid].empty()) {
+            any_unblocking = true;
+            break;
+        }
+    }
477-606: ⚠️ Potential issue | 🟡 Minor

Fix DPRINTF format/argument mismatch causing undefined behavior.

The format string at lines 501-502 has 2 specifiers (%llu, %s) but 3 arguments are passed (tid, inst->seqNum, inst->pcState()). The tid argument is missing from the format string, which causes the printf-style formatter to misalign arguments and trigger undefined behavior.
Suggested fix
-            DPRINTF(Rename, "[sn:%llu] instruction  with PC %s is squashed, skipping.\n",
-                    tid, inst->seqNum, inst->pcState());
+            DPRINTF(Rename, "[tid:%i] [sn:%llu] instruction with PC %s is squashed, skipping.\n",
+                    tid, inst->seqNum, inst->pcState());
src/cpu/o3/decode.cc (1)
336-371: ⚠️ Potential issue | 🟡 Minor

updateActivate is effectively never called.

status_change is never set in tick(), so updateActivate() won’t reflect block/unblock transitions. Call it unconditionally or compute status_change from stall-signal changes.
🛠️ Minimal fix
-    if (status_change) {
-        updateActivate();
-    }
+    updateActivate();
Also applies to: 464-469
src/cpu/o3/iew.hh (1)
556-559: ⚠️ Potential issue | 🟡 Minor

Typo in comment: "fetrenamech" should be "rename".
Proposed fix
-        /** Distribution of number of fetrenamech stall reasons each tick. */
+        /** Distribution of number of rename stall reasons each tick. */

🤖 Fix all issues with AI agents

In `@src/cpu/o3/comm.hh`:
- Around line 328-335: The StallSignals struct's bool arrays (blockFetch,
blockDecode, blockRename, blockIEW) are left uninitialized causing undefined
behavior; update StallSignals to value-initialize these arrays (e.g., add
default member initializers or a constructor that sets all elements to false) so
every entry for MaxThreads is explicitly false on construction, ensuring any
reader sees deterministic values.

In `@src/cpu/o3/commit.hh`:
- Around line 179-182: Initialize the raw pointer member stallSig to nullptr in
the commit class declaration and add a direct include for
<boost/circular_buffer.hpp> to this header; specifically, in commit.hh set
StallSignals* stallSig = nullptr; (so uses in commit.cc at symbols referencing
stallSig are safe) and add `#include` <boost/circular_buffer.hpp> near other
includes so fixedbuffer (boost::circular_buffer<DynInstPtr>
fixedbuffer[MaxThreads]) no longer relies on a transitive include.

In `@src/cpu/o3/cpu.hh`:
- Around line 529-546: Remove the redundant member declaration stallSig from the
class (it duplicates the existing StallSignals stallSignals and is never used);
delete the line declaring StallSignals stallSig; and ensure no other code relies
on that member (the stages use setStallSignals() and their own stallSig
members), then rebuild to verify there are no references to the removed symbol.

In `@src/cpu/o3/decode.cc`:
- Around line 112-115: Decode::clearStates currently does nothing but must purge
any per-thread buffered state for the given ThreadID to avoid stale entries when
TIDs are reused; implement it to remove/clear entries belonging to tid from
Decode's per-thread containers (e.g., fixedBuffer and stallBuffer) and reset any
other per-thread bookkeeping used by Decode so removeThread() no longer leaves
stale instructions. Locate Decode::clearStates and ensure it iterates the
fixedBuffer and stallBuffer structures, erasing elements whose ThreadID matches
the tid argument (and clearing any related per-thread maps/counters), leaving
other threads' data intact.

In `@src/cpu/o3/decode.hh`:
- Around line 186-187: The member pointer stallSig is declared uninitialized
(StallSignals* stallSig;) and should be set to a null pointer to avoid
indeterminate-value access before setStallSignals is called; initialize stallSig
to nullptr at declaration or assign nullptr in the class constructor (the class
that contains stallSig and the setStallSignals method) so all code paths see a
well-defined value until setStallSignals populates it.

In `@src/cpu/o3/fetch.cc`:
- Around line 1294-1325: The code hardcodes ThreadID tid = 0 which disables SMT;
replace this with the proper thread selection logic (use selectFetchThread() or
the same selection used by sendInstructionsToDecode()) so tid is set to the
chosen thread before checking stallSig->blockFetch[tid] and pulling from
fetchQueue[tid]; if SMT selection is not yet implemented, add a fail-fast
(ASSERT/ fatal) when numThreads > 1 to avoid silent starvation and document that
selectFetchThread()/sendInstructionsToDecode() must be used to pick tid.

In `@src/cpu/o3/iew.cc`:
- Around line 820-837: The loop currently sets stallSig->blockRename for both
threads when multiple active threads are found (using tid,
stallSig->blockRename, fixedbuffer, canInsertLDSTQue), which stalls all SMT
threads; change it to allow exactly one active thread to proceed (e.g., pick a
single thread to keep blockRename=false and set blockRename=true for all other
active threads) and implement a simple selection policy (round-robin using a
persistent nextThread index or a priority tie-breaker) so that when scanning
fixedbuffer[] you assign tid to the chosen thread and mark every other active
thread's stallSig->blockRename true instead of blocking both.
- Line 429: The IEW→Commit timing is broken because toCommit is reading
iewQueue->getWire(0) while Commit expects the data at offset -iewToCommitDelay;
restore the intended zero-latency alignment by changing the producer read to use
the negative delay: set toCommit = iewQueue->getWire(-iewToCommitDelay)
(symbols: toCommit, iewQueue, getWire, iewToCommitDelay, fromIEW, Commit) so
both IEW and Commit use the same buffer offset convention.
- Around line 798-811: The issue is that canInsertLDSTQue(ThreadID) performs
side-effectful calls getAndResetLastLQPopEntries/getAndResetLastSQPopEntries
which are only executed when stallSig->blockIEW[i] is true (due to short-circuit
&&), so counters aren't reset otherwise; fix by invoking the reset logic
unconditionally and then using its result in the block condition: call a new or
existing getter that performs
getAndResetLastLQPopEntries/getAndResetLastSQPopEntries (or call
canInsertLDSTQue(i) and store its bool result) before computing bool block =
stallSig->blockIEW[i] && !... so that the reset side-effects always run, then
use the stored boolean in the original condition (or refactor canInsertLDSTQue
into pure check + separate reset function and call reset unconditionally in
dispatchInsts()).
- Around line 724-748: When detecting a squash in the loop
(fromCommit->commitInfo[i].squash) clear fixedbuffer[i] before calling squash(i)
and returning so any stale instructions are discarded; specifically, move or add
fixedbuffer[i].clear() immediately when fromCommit->commitInfo[i].squash is true
(before squash(i) and the return) because IEW::squash() does not clear the
buffer unlike decode/rename; keep the existing clear for the robSquashing path
unchanged.
- Around line 118-119: The postfix increment and decrement operators for the
wire class are returning references to temporaries; change the signatures of
wire::operator++(int) and wire::operator--(int) to return by value (wire)
instead of wire& and update their definitions accordingly so they return the
temporary (wire(this, i)) by value; ensure both the declaration in the class and
the out-of-line definitions (operator++(int) and operator--(int)) are adjusted
to match the new return type.
- Around line 755-759: The assert fixedbuffer[tid].empty() can fire legitimately
when dispatch stalls; change it to tolerate non-empty buffers by replacing the
unconditional assert with a conditional that either clears retained instructions
or asserts only if blockRename[tid] is false. Specifically, in the block
handling insts_from_rename/fromRename, replace "if (tid != -1)
assert(fixedbuffer[tid].empty());" with logic that: if tid != -1 and
fixedbuffer[tid] is not empty, then either call a clearing/rollback helper to
remove the newly moved instructions (so moveInstsToBuffer() leaves no leftovers)
or assert(!blockRename[tid]) (i.e., only assert emptiness when blockRename[tid]
is not set); reference functions/fields: insts_from_rename, fromRename->insts,
fixedbuffer, IEW::tick(), dispatchInsts(), blockRename, moveInstsToBuffer().

In `@src/cpu/o3/iew.hh`:
- Line 157: Initialize the raw pointer StallSignals* stallSig to nullptr (e.g.,
in IEW's member initializer or declaration) and update uses to check/assert
before dereferencing: add a null-check or assertion at the start of methods that
use stallSig (notably squash and squashDueToBranch) and in any code paths that
might run before setStallSignals() is called; ensure setStallSignals() still
assigns the pointer when available.
- Line 51: Remove the unused boost include: delete the line with `#include`
<boost/circular_buffer.hpp> from iew.hh because this header is not referenced in
this file (the container used is fixedbuffer declared as
std::deque<DynInstPtr>); ensure no other symbols in iew.hh depend on
boost::circular_buffer after removal and run a quick build to confirm no missing
includes.

In `@src/cpu/o3/regfile.hh`:
- Around line 423-427: The debug print in the VecPredRegClass case uses the
wrong regClass instance: replace the call to
vectorRegFile.regClass.valString(val) with
vecPredRegFile.regClass.valString(val) so the DPRINTF reflects the same register
file that vecPredRegFile.set(idx, val) writes to; update the VecPredRegClass
case handling (the DPRINTF line) to use vecPredRegFile.regClass.valString(val).

In `@src/cpu/o3/rename.cc`:
- Around line 389-410: The bug is that releaseSeq is computed using
historyBuffer->empty() which only checks thread 0; in Rename::releasePhysRegs
you must check the current thread’s history buffer (use
historyBuffer[tid].empty()) and, when non-empty, use
historyBuffer[tid].back().instSeqNum to set releaseSeq; update the assignment
inside the loop where fromCommit->commitInfo[tid].doneSeqNum is handled so it
references historyBuffer[tid] instead of historyBuffer.

🧹 Nitpick comments (5)

src/cpu/o3/commit.cc (1)
168-187: Prefer fatal_if for the renameToROBDelay invariant.

This constraint is fundamental to the new buffering path; enforcing it in release builds avoids silent misconfigurations.
🔧 Suggested change
-    assert(renameToROBDelay == 1);
+    fatal_if(renameToROBDelay != 1,
+             "renameToROBDelay must be 1 for O3 commit (got %d).",
+             renameToROBDelay);
src/cpu/o3/rename.cc (1)
76-84: Prefer fatal_if for the decodeToRenameDelay invariant.

This is a hard requirement for the refactor; enforcing it in release builds avoids silent misconfigurations.
🔧 Suggested change
-    assert(decodeToRenameDelay == 1);
+    fatal_if(decodeToRenameDelay != 1,
+             "decodeToRenameDelay must be 1 for O3 rename (got %d).",
+             decodeToRenameDelay);
src/cpu/o3/iew.hh (1)

391-391: Consider renaming fixedbuffer to something more descriptive.

The name fixedbuffer is vague — it doesn't convey that this is the per-thread dispatch buffer between rename and IEW. A name like renameBuffer or dispatchBuffer would improve readability and align with the role described in the AI summary. Also, std::deque is not "fixed" in any meaningful sense.
src/cpu/o3/iew.cc (2)
896-901: Commented-out serialize stall check should be removed or restored.

This entire block is commented out with no explanation. If the serialize handling is intentionally removed as part of this refactor, delete the dead code. If it's temporarily disabled for debugging, add a TODO or FIXME comment explaining why.

1650-1653: Tick ordering: moveInstsToBuffer() before checkSquash() may process instructions that should be squashed.

In the reverse-ordered tick, moveInstsToBuffer() runs first (line 1651), populating fixedbuffer with instructions from rename. Then checkSquash() runs (line 1652), potentially discovering a squash from commit. As noted in a separate comment, checkSquash doesn't clear fixedbuffer on the squash path. Even if it did, the ordering means one wasted cycle of work moving instructions that will immediately be discarded. Consider swapping the order so checkSquash() runs first.
Proposed reordering
     // dispatch
-    moveInstsToBuffer();
     checkSquash();
+    moveInstsToBuffer();
     dispatchInsts();

coderabbitai · 2026-02-09T11:44:45Z

src/cpu/o3/comm.hh

+struct StallSignals
+{
+
+    bool blockFetch[MaxThreads];// decode to fetch
+    bool blockDecode[MaxThreads];// rename to decode
+    bool blockRename[MaxThreads];// iew to rename (if iew is stalling, rename all threads would be stalled)
+    bool blockIEW[MaxThreads];// commit to iew
 };


⚠️ Potential issue | 🟠 Major

StallSignals members are left uninitialized by default — risk of undefined behavior.

The bool arrays have no default member initializers and no constructor, so they contain indeterminate values until explicitly written. Any stage that reads a stall signal before the first write will observe undefined behavior.

Add default member initializers or a constructor:

🛡️ Proposed fix — value-initialize all arrays

struct StallSignals { - - bool blockFetch[MaxThreads];// decode to fetch - bool blockDecode[MaxThreads];// rename to decode - bool blockRename[MaxThreads];// iew to rename (if iew is stalling, rename all threads would be stalled) - bool blockIEW[MaxThreads];// commit to iew + bool blockFetch[MaxThreads]{}; // decode to fetch + bool blockDecode[MaxThreads]{}; // rename to decode + bool blockRename[MaxThreads]{}; // iew to rename + bool blockIEW[MaxThreads]{}; // commit to iew };

🤖 Prompt for AI Agents

In `@src/cpu/o3/comm.hh` around lines 328 - 335, The StallSignals struct's bool arrays (blockFetch, blockDecode, blockRename, blockIEW) are left uninitialized causing undefined behavior; update StallSignals to value-initialize these arrays (e.g., add default member initializers or a constructor that sets all elements to false) so every entry for MaxThreads is explicitly false on construction, ensuring any reader sees deterministic values.

coderabbitai · 2026-02-09T11:44:46Z

src/cpu/o3/commit.hh

+    boost::circular_buffer<DynInstPtr> fixedbuffer[MaxThreads];
+
+    StallSignals* stallSig;
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check includes in commit.hh echo "=== Includes in commit.hh (lines 44-69) ===" sed -n '44,69p' src/cpu/o3/commit.hh echo "" echo "=== Lines around 179-182 in commit.hh ===" sed -n '175,185p' src/cpu/o3/commit.hh echo "" echo "=== Check for stallSig declaration and initialization ===" rg -n 'stallSig' src/cpu/o3/commit.hh src/cpu/o3/commit.cc echo "" echo "=== Commit constructor signature and initialization ===" rg -n -A 50 'Commit::Commit' src/cpu/o3/commit.cc | head -80

Repository: OpenXiangShan/GEM5

Length of output: 5937

🏁 Script executed:

#!/bin/bash # Check fixedbuffer initialization in constructor body echo "=== Search for fixedbuffer initialization ===" rg -n -B 5 -A 5 'fixedbuffer' src/cpu/o3/commit.cc echo "" echo "=== Check full constructor initialization list ===" sed -n '103,200p' src/cpu/o3/commit.cc | grep -E '(fixedbuffer|stallSig|:.*\()'

Repository: OpenXiangShan/GEM5

Length of output: 2935

Initialize stallSig to nullptr and add direct include for boost::circular_buffer.

The raw pointer stallSig is declared uninitialized and later used directly in multiple places (lines 1807, 1813, 1814, 1855 in commit.cc) without null-checks. It should be initialized to nullptr.

Also, <boost/circular_buffer.hpp> is not directly included here—only transitively available via decode.hh. Adding a direct include would be more robust.

Note: fixedbuffer is already properly initialized with capacity in the constructor (line 186: fixedbuffer[tid] = boost::circular_buffer<DynInstPtr>(renameWidth)), so no changes needed there.

Proposed fix

boost::circular_buffer<DynInstPtr> fixedbuffer[MaxThreads]; - StallSignals* stallSig; + StallSignals* stallSig = nullptr;

And add to the includes section:

`#include` <boost/circular_buffer.hpp>

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

boost::circular_buffer<DynInstPtr> fixedbuffer[MaxThreads];

StallSignals* stallSig;

boost::circular_buffer<DynInstPtr> fixedbuffer[MaxThreads];

StallSignals* stallSig = nullptr;

🤖 Prompt for AI Agents

In `@src/cpu/o3/commit.hh` around lines 179 - 182, Initialize the raw pointer member stallSig to nullptr in the commit class declaration and add a direct include for <boost/circular_buffer.hpp> to this header; specifically, in commit.hh set StallSignals* stallSig = nullptr; (so uses in commit.cc at symbols referencing stallSig are safe) and add `#include` <boost/circular_buffer.hpp> near other includes so fixedbuffer (boost::circular_buffer<DynInstPtr> fixedbuffer[MaxThreads]) no longer relies on a transitive include.

coderabbitai · 2026-02-09T11:44:46Z

src/cpu/o3/cpu.hh

+    StallSignals stallSignals;
+
    /** The main time buffer to do backwards communication. */
    TimeBuffer<TimeStruct> timeBuffer;

    /** The fetch stage's instruction queue. */
-    TimeBuffer<FetchStruct> fetchQueue;
+    TimeBuffer<FetchStruct> fetchTimebuffer;

    /** The decode stage's instruction queue. */
-    TimeBuffer<DecodeStruct> decodeQueue;
+    TimeBuffer<DecodeStruct> decodeTimebuffer;

    /** The rename stage's instruction queue. */
-    TimeBuffer<RenameStruct> renameQueue;
+    TimeBuffer<RenameStruct> renameTimebuffer;

    /** The IEW stage's instruction queue. */
-    TimeBuffer<IEWStruct> iewQueue;
+    TimeBuffer<IEWStruct> iewTimebuffer;
+
+    StallSignals stallSig;


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check which StallSignals member is actually used in cpu.cc echo "=== StallSignals usage in cpu.cc ===" rg -n 'stallSignals|stallSig|setStallSignals' --type=cpp -g '*cpu.cc' echo "" echo "=== StallSignals declarations in cpu.hh ===" rg -n 'StallSignals' --type=cpp -g '*cpu.hh'

Repository: OpenXiangShan/GEM5

Length of output: 566

🏁 Script executed:

#!/bin/bash # Search entire codebase for stallSig usage echo "=== All references to stallSig in the entire codebase ===" rg -n 'stallSig' --type=cpp echo "" echo "=== All references to stallSignals in the entire codebase ===" rg -n 'stallSignals' --type=cpp | head -30

Repository: OpenXiangShan/GEM5

Length of output: 4386

Remove unused stallSig member on line 546.

The stallSig member declared at line 546 is unused dead code. All stages correctly receive stallSignals (line 529) via setStallSignals() in cpu.cc (lines 199–203). Each stage stores this pointer in their own local stallSig member; the CPU's duplicate stallSig is never accessed. Remove line 546 to eliminate confusion about which member serves as the shared stall signal object.

🤖 Prompt for AI Agents

In `@src/cpu/o3/cpu.hh` around lines 529 - 546, Remove the redundant member declaration stallSig from the class (it duplicates the existing StallSignals stallSignals and is never used); delete the line declaring StallSignals stallSig; and ensure no other code relies on that member (the stages use setStallSignals() and their own stallSig members), then rebuild to verify there are no references to the removed symbol.

coderabbitai · 2026-02-09T11:44:46Z

src/cpu/o3/decode.cc

 Decode::clearStates(ThreadID tid)
 {
-    decodeStatus[tid] = Idle;
-    stalls[tid].rename = false;
+
 }


⚠️ Potential issue | 🟠 Major

clearStates should purge per-thread buffers.

removeThread() relies on decode.clearStates(), but this is now empty; stale instructions can remain in fixedbuffer/stallBuffer if the TID is reused.

🛠️ Suggested fix

void Decode::clearStates(ThreadID tid) { - + fixedbuffer[tid].clear(); + + auto delIt = stallBuffer.begin(); + for (auto it0 = eachstallSize.begin(); it0 != eachstallSize.end();) { + int size = *it0; + auto start_it = delIt; + auto end_it = start_it + size; + if ((*start_it)->threadNumber == tid) { + delIt = stallBuffer.erase(start_it, end_it); + it0 = eachstallSize.erase(it0); + } else { + delIt = end_it; + ++it0; + } + } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Decode::clearStates(ThreadID tid)

{

decodeStatus[tid] = Idle;

stalls[tid].rename = false;

}

Decode::clearStates(ThreadID tid)

{

fixedbuffer[tid].clear();

auto delIt = stallBuffer.begin();

for (auto it0 = eachstallSize.begin(); it0 != eachstallSize.end();) {

int size = *it0;

auto start_it = delIt;

auto end_it = start_it + size;

if ((*start_it)->threadNumber == tid) {

delIt = stallBuffer.erase(start_it, end_it);

it0 = eachstallSize.erase(it0);

} else {

delIt = end_it;

+it0;

}

}

}

🤖 Prompt for AI Agents

In `@src/cpu/o3/decode.cc` around lines 112 - 115, Decode::clearStates currently does nothing but must purge any per-thread buffered state for the given ThreadID to avoid stale entries when TIDs are reused; implement it to remove/clear entries belonging to tid from Decode's per-thread containers (e.g., fixedBuffer and stallBuffer) and reset any other per-thread bookkeeping used by Decode so removeThread() no longer leaves stale instructions. Locate Decode::clearStates and ensure it iterates the fixedBuffer and stallBuffer structures, erasing elements whose ThreadID matches the tid argument (and clearing any related per-thread maps/counters), leaving other threads' data intact.

coderabbitai · 2026-02-09T11:44:46Z

src/cpu/o3/decode.hh

+    StallSignals* stallSig;
+


⚠️ Potential issue | 🟡 Minor

stallSig pointer should be initialized to nullptr.

Without initialization, stallSig holds an indeterminate value until setStallSignals is called, risking undefined behavior on any early access.

Proposed fix

- StallSignals* stallSig; + StallSignals* stallSig = nullptr;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

StallSignals* stallSig;

StallSignals* stallSig = nullptr;

🤖 Prompt for AI Agents

In `@src/cpu/o3/decode.hh` around lines 186 - 187, The member pointer stallSig is declared uninitialized (StallSignals* stallSig;) and should be set to a null pointer to avoid indeterminate-value access before setStallSignals is called; initialize stallSig to nullptr at declaration or assign nullptr in the class constructor (the class that contains stallSig and the setStallSignals method) so all code paths see a well-defined value until setStallSignals populates it.

src/cpu/o3/iew.cc

coderabbitai · 2026-02-09T11:44:46Z

src/cpu/o3/iew.hh

 #include <set>
 #include <vector>

+#include <boost/circular_buffer.hpp>


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if boost::circular_buffer is used anywhere in iew.hh or iew.cc rg -n 'circular_buffer' src/cpu/o3/iew.hh src/cpu/o3/iew.cc

Repository: OpenXiangShan/GEM5

Length of output: 119

🏁 Script executed:

#!/bin/bash # Check if circular_buffer is used anywhere else in the o3 directory or related files rg -n 'circular_buffer' src/cpu/o3/ --type h --type cc # Also verify fixedbuffer declaration to confirm it's std::deque rg -n 'fixedbuffer' src/cpu/o3/iew.hh | head -5

Repository: OpenXiangShan/GEM5

Length of output: 161

🏁 Script executed:

#!/bin/bash # Search for circular_buffer more broadly to ensure it's not used anywhere rg 'circular_buffer' src/cpu/o3/

Repository: OpenXiangShan/GEM5

Length of output: 1260

Remove unused #include <boost/circular_buffer.hpp>.

fixedbuffer is declared as std::deque<DynInstPtr> (line 391) in this file. While boost::circular_buffer is used in other o3 header files (rename.hh, decode.hh, lsq_unit.hh, commit.hh), it is not used in iew.hh. This include is a leftover and should be removed.

Proposed fix

-#include <boost/circular_buffer.hpp> -

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#include <boost/circular_buffer.hpp>

🤖 Prompt for AI Agents

In `@src/cpu/o3/iew.hh` at line 51, Remove the unused boost include: delete the line with `#include` <boost/circular_buffer.hpp> from iew.hh because this header is not referenced in this file (the container used is fixedbuffer declared as std::deque<DynInstPtr>); ensure no other symbols in iew.hh depend on boost::circular_buffer after removal and run a quick build to confirm no missing includes.

coderabbitai · 2026-02-09T11:44:47Z

src/cpu/o3/iew.hh

    ProbePointArg<DynInstPtr> *ppToCommit;

-    bool disp_stall = false;
+    StallSignals* stallSig;


⚠️ Potential issue | 🟠 Major

stallSig raw pointer is not initialized in the constructor.

stallSig is a raw StallSignals* with no default initializer. If any code path in IEW executes before setStallSignals() is called (e.g., during squash or squashDueToBranch), dereferencing stallSig will cause a null-pointer dereference. Consider initializing it to nullptr here and adding a guard or assertion before first use.

Proposed fix

- StallSignals* stallSig; + StallSignals* stallSig = nullptr;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

StallSignals* stallSig;

StallSignals* stallSig = nullptr;

🤖 Prompt for AI Agents

In `@src/cpu/o3/iew.hh` at line 157, Initialize the raw pointer StallSignals* stallSig to nullptr (e.g., in IEW's member initializer or declaration) and update uses to check/assert before dereferencing: add a null-check or assertion at the start of methods that use stallSig (notably squash and squashDueToBranch) and in any code paths that might run before setStallSignals() is called; ensure setStallSignals() still assigns the pointer when available.

src/cpu/o3/regfile.hh

coderabbitai · 2026-02-09T11:44:47Z

src/cpu/o3/rename.cc

+void
+Rename::releasePhysRegs()
+{
+    // Release physical registers up to releaseWidth
+    auto threads = activeThreads->begin();
    if (releaseSeq + releaseWidth < finalCommitSeq) {
        releaseSeq += releaseWidth;
    } else {
        releaseSeq = finalCommitSeq;
    }
-
-    while (threads != end) {
+    while (threads != activeThreads->end()) {
        ThreadID tid = *threads++;

        removeFromHistory(releaseSeq, tid);
-
        // If we committed this cycle then doneSeqNum will be > 0
        if (fromCommit->commitInfo[tid].doneSeqNum != 0 &&
-            !fromCommit->commitInfo[tid].squash &&
-            renameStatus[tid] != Squashing) {
+            !fromCommit->commitInfo[tid].squash) {

            finalCommitSeq = fromCommit->commitInfo[tid].doneSeqNum;
            releaseSeq = historyBuffer->empty() ? 0 : historyBuffer[tid].back().instSeqNum;
        }
    }


⚠️ Potential issue | 🟠 Major

releaseSeq uses the wrong thread’s history buffer.

historyBuffer->empty() checks only thread 0, which can corrupt releaseSeq for other threads.

🛠️ Suggested fix

- releaseSeq = historyBuffer->empty() ? 0 : historyBuffer[tid].back().instSeqNum; + releaseSeq = historyBuffer[tid].empty() ? 0 : historyBuffer[tid].back().instSeqNum;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

void

Rename::releasePhysRegs()

{

// Release physical registers up to releaseWidth

auto threads = activeThreads->begin();

if (releaseSeq + releaseWidth < finalCommitSeq) {

releaseSeq += releaseWidth;

} else {

releaseSeq = finalCommitSeq;

}

while (threads != end) {

while (threads != activeThreads->end()) {

ThreadID tid = *threads++;

removeFromHistory(releaseSeq, tid);

// If we committed this cycle then doneSeqNum will be > 0

if (fromCommit->commitInfo[tid].doneSeqNum != 0 &&

!fromCommit->commitInfo[tid].squash &&

renameStatus[tid] != Squashing) {

!fromCommit->commitInfo[tid].squash) {

finalCommitSeq = fromCommit->commitInfo[tid].doneSeqNum;

releaseSeq = historyBuffer->empty() ? 0 : historyBuffer[tid].back().instSeqNum;

}

}

void

Rename::releasePhysRegs()

{

// Release physical registers up to releaseWidth

auto threads = activeThreads->begin();

if (releaseSeq + releaseWidth < finalCommitSeq) {

releaseSeq += releaseWidth;

} else {

releaseSeq = finalCommitSeq;

}

while (threads != activeThreads->end()) {

ThreadID tid = *threads++;

removeFromHistory(releaseSeq, tid);

// If we committed this cycle then doneSeqNum will be > 0

if (fromCommit->commitInfo[tid].doneSeqNum != 0 &&

!fromCommit->commitInfo[tid].squash) {

finalCommitSeq = fromCommit->commitInfo[tid].doneSeqNum;

releaseSeq = historyBuffer[tid].empty() ? 0 : historyBuffer[tid].back().instSeqNum;

}

}

}

🤖 Prompt for AI Agents

In `@src/cpu/o3/rename.cc` around lines 389 - 410, The bug is that releaseSeq is computed using historyBuffer->empty() which only checks thread 0; in Rename::releasePhysRegs you must check the current thread’s history buffer (use historyBuffer[tid].empty()) and, when non-empty, use historyBuffer[tid].back().instSeqNum to set releaseSeq; update the assignment inside the loop where fromCommit->commitInfo[tid].doneSeqNum is handled so it references historyBuffer[tid] instead of historyBuffer.

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/cpu/o3/rename.cc (1)

592-613: ⚠️ Potential issue | 🟠 Major

Rename stage is forced active every cycle.

any_unblocking is initialized to true, so the deactivation branch is unreachable.

Suggested fix

-    bool any_unblocking = true;
+    bool any_unblocking = false;
+    for (auto tid : *activeThreads) {
+        if (!stallSig->blockRename[tid]) {
+            any_unblocking = true;
+            break;
+        }
+    }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/rename.cc` around lines 592 - 613, any_unblocking is incorrectly
initialized to true causing the rename stage to be forced Active every cycle;
change the initialization of any_unblocking so it reflects the real unblocking
condition (e.g., start as false and set to true only when you detect unblocking
events such as free physical regs, completed renames, or other conditions used
elsewhere in this file), then keep the existing branch that flips _status
between Inactive and Active and calls cpu->activateStage(CPU::RenameIdx) /
cpu->deactivateStage(CPU::RenameIdx); locate the variable any_unblocking and the
surrounding status logic (references: any_unblocking, _status, Inactive, Active,
cpu->activateStage, cpu->deactivateStage, CPU::RenameIdx) and compute its value
from the proper signals instead of hardcoding true.

src/cpu/o3/cpu.cc (1)

1497-1515: ⚠️ Potential issue | 🔴 Critical

Fix iterator underflow in squashInstIt (can hit UB at list boundaries).

Line 1515 always decrements instIt even when the iterator is at instList.begin() (or the list becomes empty after erase). That can cause undefined behavior during squash walks.

Proposed fix

-CPU::ListIt
-CPU::squashInstIt(ListIt &instIt, ThreadID tid)
+CPU::ListIt
+CPU::squashInstIt(ListIt instIt, ThreadID tid)
 {
+    const bool atBegin = (instIt == instList.begin());
+
     if ((*instIt)->threadNumber == tid) {
         DPRINTF(O3CPU, "Squashing instruction, "
                 "[tid:%i] [sn:%lli] PC %s\n",
                 (*instIt)->threadNumber,
                 (*instIt)->seqNum,
                 (*instIt)->pcState());

         // Mark it as squashed.
         (*instIt)->setSquashed();

         // `@todo`: Formulate a consistent method for deleting
         // instructions from the instruction list
         // Remove the instruction from the list.
         instIt = instList.erase(instIt);
     }
-    return --instIt;
+    if (instList.empty()) {
+        return instList.end();
+    }
+    if (atBegin) {
+        return instList.begin();
+    }
+    --instIt;
+    return instIt;
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/cpu.cc` around lines 1497 - 1515, The code in CPU::squashInstIt
decrements instIt unconditionally after erasing, which underflows if erase
returned begin() or the list became empty; change the logic so after
instList.erase(instIt) you check for boundary conditions: if instList.empty()
return instList.end(); otherwise if instIt == instList.begin() return instIt;
else return --instIt; this ensures you never pre-decrement begin() and handles
the empty-list case safely.

♻️ Duplicate comments (6)

src/cpu/o3/fetch.cc (1)

1305-1305: ⚠️ Potential issue | 🟠 Major

Hardcoded tid = 0 still disables SMT fetch/dispatch.

Line 1305 and Line 1675 keep single-thread hardcoding, so non-zero threads won’t make progress when numThreads > 1.

Minimal fail-fast until SMT selection is restored

 ThreadID
 Fetch::selectFetchThread()
 {
-    ThreadID tid = 0; // TODO: smt support
+    fatal_if(numThreads > 1,
+             "SMT fetch selection is not implemented; set numThreads=1.");
+    ThreadID tid = 0; // TODO: smt support

Also applies to: 1675-1675

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` at line 1305, Replace the hardcoded ThreadID tid = 0
usages with the actual thread identifier used by the fetch/dispatch code path so
SMT can make progress; locate the two occurrences where ThreadID tid is set (the
one currently written as "ThreadID tid = 0" and the duplicate at the other
occurrence) and initialize tid from the caller/context (e.g., use the
method/argument that carries thread index or the FetchState/ThreadContext’s
threadId) and ensure any subsequent SMT selection logic (in the same function
and the corresponding dispatch path) uses this tid rather than a constant 0.

src/cpu/o3/cpu.hh (1)

529-547: ⚠️ Potential issue | 🟡 Minor

Remove the redundant CPU-level stallSig member.

Line 529 already defines the shared stall state as stallSignals, but Line 546 adds another StallSignals instance. Keeping both invites accidental split wiring.

Suggested cleanup

     StallSignals stallSignals;
@@
-    StallSignals stallSig;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/cpu.hh` around lines 529 - 547, Remove the redundant CPU-level
member StallSignals stallSig: delete the declaration of stallSig and replace any
usage sites referencing stallSig to use the existing stallSignals instance
instead (search for stallSig and update references in methods/functions that
currently access it to use stallSignals). Ensure there are no remaining
duplicate definitions, update include/forward declarations if needed, and run
the build/tests to confirm no references remain to stallSig.

src/cpu/o3/rename.cc (1)

401-406: ⚠️ Potential issue | 🟠 Major

releaseSeq still checks the wrong history buffer.

historyBuffer->empty() does not check the current thread’s buffer and can corrupt per-thread release progression.

Suggested fix

-            releaseSeq = historyBuffer->empty() ? 0 : historyBuffer[tid].back().instSeqNum;
+            releaseSeq = historyBuffer[tid].empty() ? 0
+                                                    : historyBuffer[tid].back().instSeqNum;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/rename.cc` around lines 401 - 406, The code uses
historyBuffer->empty() which checks the container itself instead of the
per-thread buffer and can corrupt per-thread release progression; change the
check to historyBuffer[tid].empty() and compute releaseSeq from
historyBuffer[tid].back().instSeqNum only when that per-thread buffer is
non-empty (i.e., replace historyBuffer->empty() ? 0 :
historyBuffer[tid].back().instSeqNum with historyBuffer[tid].empty() ? 0 :
historyBuffer[tid].back().instSeqNum) to ensure releaseSeq is derived from the
correct thread buffer (symbols: fromCommit, commitInfo, tid, releaseSeq,
historyBuffer).

src/cpu/o3/decode.cc (1)

112-115: ⚠️ Potential issue | 🟠 Major

clearStates still doesn’t clear per-thread decode buffers.

This remains a no-op, so buffered entries for tid can survive remove/reuse cycles.

Suggested fix outline

 void
 Decode::clearStates(ThreadID tid)
 {
-
+    fixedbuffer[tid].clear();
+
+    auto delIt = stallBuffer.begin();
+    for (auto it = eachstallSize.begin(); it != eachstallSize.end();) {
+        const int size = *it;
+        auto start = delIt;
+        auto end = start + size;
+        if ((*start)->threadNumber == tid) {
+            delIt = stallBuffer.erase(start, end);
+            it = eachstallSize.erase(it);
+        } else {
+            delIt = end;
+            ++it;
+        }
+    }
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/decode.cc` around lines 112 - 115, Implement
Decode::clearStates(ThreadID tid) so it actually clears all per-thread decode
buffers and associated indices/state for the given tid: locate the
Decode::clearStates(ThreadID tid) definition and add logic to remove/clear any
per-thread buffer containers, queues or maps (the per-thread decode buffers and
their entries), reset head/tail or index counters and any cached decode state
for tid, and free or shrink underlying storage as needed so buffered entries
cannot survive remove/reuse cycles.

src/cpu/o3/iew.cc (2)

424-430: ⚠️ Potential issue | 🟠 Major

Validate IEW→Commit wire offset; code and comment are currently inconsistent.

The comment states IEW writes [-1] and Commit reads [-1], but Line 430 uses getWire(0). If Commit still consumes -iewToCommitDelay, this changes latency/visibility semantics.

#!/bin/bash
# Verify IEW producer and Commit consumer offsets for IEW queue.
rg -n 'toCommit\s*=\s*iewQueue->getWire' src/cpu/o3/iew.cc -C2
rg -n 'fromIEW\s*=\s*iewQueue->getWire' src/cpu/o3/commit.cc src/cpu/o3/commit.hh -C2
rg -n 'iewToCommitDelay' src/cpu/o3/iew.cc src/cpu/o3/commit.cc src/cpu/o3/commit.hh -C2

If zero-latency alignment is still intended

-    toCommit = iewQueue->getWire(0);
+    toCommit = iewQueue->getWire(-iewToCommitDelay);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 424 - 430, Comment and code disagree on the
IEW→Commit wire offset: the comment says IEW writes at [-1] but the code sets
toCommit = iewQueue->getWire(0); update the code so the IEW producer and Commit
consumer use the same queue offset (or update the comment to match the intended
behavior). Specifically, either change the assignment in iew.cc to use
iewQueue->getWire(-iewToCommitDelay) so it matches the "[-1]"/zero-latency
intent, or if zero-latency is not intended, change the comment to reflect
getWire(0); also verify the consumer binding in commit.cc/commit.hh (fromIEW)
uses the same offset and make them consistent (ensure symbols toCommit,
iewToCommitDelay, and fromIEW reference the same offset policy).

805-817: ⚠️ Potential issue | 🟠 Major

Avoid short-circuiting canInsertLDSTQue(); it has reset side effects.

canInsertLDSTQue() calls getAndResetLastLQPopEntries() / getAndResetLastSQPopEntries(). With stallSig->blockIEW[i] || !canInsertLDSTQue(i), those resets are skipped whenever blockIEW[i] is true.

Proposed fix

     for (int i = 0; i < numThreads; i++) {
-        bool block = stallSig->blockIEW[i] || !canInsertLDSTQue(i);
+        bool canInsert = canInsertLDSTQue(i);
+        bool block = stallSig->blockIEW[i] || !canInsert;
         bool active = !block && !fixedbuffer[i].empty();

Also applies to: 829-833

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 805 - 817, The expression using
short-circuiting with stallSig->blockIEW[i] prevents canInsertLDSTQue(i) from
being called and thus skips its
getAndResetLastLQPopEntries/getAndResetLastSQPopEntries side effects; fix by
ensuring canInsertLDSTQue(i) is always invoked (e.g., evaluate and store its
result in a local bool before combining with stallSig->blockIEW[i], or
explicitly call the reset/getters when blockIEW is true) so the reset side
effects in canInsertLDSTQue (and its internal calls to
getAndResetLastLQPopEntries/getAndResetLastSQPopEntries) always run; apply the
same change for the similar occurrence around the code referenced (lines
829-833) where short-circuiting currently skips the call.

🧹 Nitpick comments (1)

src/cpu/o3/issue_queue.cc (1)

331-331: Minor: Missing newline in panic message.

The panic format string is missing a trailing \n, which is inconsistent with other panic/DPRINTF calls in this file (e.g., lines 335, 348).

Suggested fix

-            if (!dst_inst->isLoad()) panic("dst[sn:%llu] is not load, src[sn:%llu]", dst_inst->seqNum, inst->seqNum);
+            if (!dst_inst->isLoad()) panic("dst[sn:%llu] is not load, src[sn:%llu]\n", dst_inst->seqNum, inst->seqNum);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/issue_queue.cc` at line 331, The panic call in the issue queue
check (the line calling panic(...) that verifies dst_inst->isLoad()) is missing
a trailing "\n" in its format string; update the panic format to include a
newline so it matches other panic/DPRINTF usage (e.g., add "\n" at the end of
the format string referencing dst_inst->seqNum and inst->seqNum).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cpu/o3/commit.cc`:
- Around line 1936-1937: The check uses rob->findInst(0, inst->seqNum) and will
panic for non-zero threads; update the call to use the instruction's thread
identifier (e.g., rob->findInst(inst->thread, inst->seqNum) or the correct field
name on Instruction if different) so the ROB lookup uses the instruction's
thread, keeping the panic_if and message unchanged.
- Around line 1387-1392: The code unsafely copies head_inst->effSize bytes into
an 8-byte buffer and then reads via an unaligned cast; instead, ensure effSize
is bounded and avoid the unaligned cast by copying into a properly typed
uint64_t: declare uint64_t value = 0; if (head_inst->memData &&
head_inst->effSize > 0) std::memcpy(&value, head_inst->memData,
std::min<std::size_t>(head_inst->effSize, sizeof(value))); then set load_value =
value and call loadTripleCounter.update(load_pc, load_addr, load_value); this
prevents overflow and UB from unaligned/oversized reads while preserving zero
extension for short loads.

In `@src/cpu/o3/decode.cc`:
- Around line 434-435: The activation path in tick() is never reached because
status_change is set to false and never updated; change the initialization to
call updateActivate() (or assign status_change = updateActivate()) so the
boolean reflects the actual activation transition, and ensure toRenameIndex is
still reset afterwards; apply the same fix to the second occurrence (the block
around the other initialization). Locate tick(), the local variable
status_change, the updateActivate() call, and toRenameIndex to implement this
swap so the activation logic runs when updateActivate() indicates a change.

In `@src/cpu/o3/fetch.cc`:
- Around line 1301-1304: In sendInstructionsToDecode(), when
stallSig->blockFetch[tid] is true the code must not drain fetchQueue[tid] or
forward instructions for that tid; add a guard around the per-thread
drain/forward logic (referencing fetchQueue[tid] and tid) to skip processing
when blockFetch[tid] is set. Also ensure that when no threads are active
(any_thread_active is false) you still reset or clear numInst before returning
so numInst cannot remain stale; move or add the numInst reset/clear immediately
before the early return that checks any_thread_active. Apply the same changes in
the nearby blocks mentioned (1308-1326, 1340-1341) where per-thread draining and
numInst handling occur.

In `@src/cpu/o3/iew.cc`:
- Around line 734-744: The loop currently returns after handling the first
thread with fromCommit->commitInfo[i].squash, skipping later threads; change the
control flow so all threads are processed: replace the early `return` in the
block that calls squash(i) with a `continue` (or otherwise let the loop proceed)
so squash(i), localSquashVer.update(...), fetchRedirect[i] = false and the
iewStats/stallEvents/ROBWalk and setAllStalls(StallReason::CommitSquash) logic
run for every thread that has commitInfo[i].squash; ensure any shared post-loop
behavior still executes only once if needed.

In `@src/cpu/o3/lsq.cc`:
- Around line 1664-1671: The code in SingleReq::recvTimingResp uses a fixed
8-byte local buffer and memcpy(buffer, pkt->getPtr<char>(), pkt->getSize())
which can overflow if pkt->getSize() > 8; change the logic in recvTimingResp to
copy at most sizeof(buffer) bytes (e.g., use std::min(pkt->getSize(), sizeof
buffer)) into buffer and, if pkt->getSize() is smaller than 8, zero-fill the
remaining bytes so the subsequent *((uint64_t*)buffer) read is safe; reference
the symbols recvTimingResp, pkt->getSize(), pkt->getPtr, buffer, and the DPRINTF
line when making the fix.

In `@src/cpu/o3/rename.cc`:
- Around line 483-485: The DPRINTF in rename.cc has a format/argument mismatch:
it prints "[sn:%llu] ... %s" but passes tid first and an extra argument; fix it
by making the sequence number the first argument and removing tid (or add a
corresponding format if tid should be logged). Concretely, update the
DPRINTF(...) call to pass (unsigned long long)inst->seqNum as the first argument
and inst->pcState() as the second (keep the "%llu" and "%s" format), referencing
the DPRINTF invocation and the inst->seqNum / inst->pcState() symbols to locate
the change.

In `@src/cpu/o3/rename.hh`:
- Around line 211-212: stallsSig is a raw pointer that may be dereferenced in
tick() before setStallSignals() is called; initialize stallSig to nullptr at its
declaration and add a defensive null-check (or assert) inside tick() before any
dereference to avoid undefined behavior, and apply the same initialization/check
fix for the other pointer instance noted around line 342; reference the member
name stallSig, the setter setStallSignals(), and the consumer tick() when making
the changes.

---

Outside diff comments:
In `@src/cpu/o3/cpu.cc`:
- Around line 1497-1515: The code in CPU::squashInstIt decrements instIt
unconditionally after erasing, which underflows if erase returned begin() or the
list became empty; change the logic so after instList.erase(instIt) you check
for boundary conditions: if instList.empty() return instList.end(); otherwise if
instIt == instList.begin() return instIt; else return --instIt; this ensures you
never pre-decrement begin() and handles the empty-list case safely.

In `@src/cpu/o3/rename.cc`:
- Around line 592-613: any_unblocking is incorrectly initialized to true causing
the rename stage to be forced Active every cycle; change the initialization of
any_unblocking so it reflects the real unblocking condition (e.g., start as
false and set to true only when you detect unblocking events such as free
physical regs, completed renames, or other conditions used elsewhere in this
file), then keep the existing branch that flips _status between Inactive and
Active and calls cpu->activateStage(CPU::RenameIdx) /
cpu->deactivateStage(CPU::RenameIdx); locate the variable any_unblocking and the
surrounding status logic (references: any_unblocking, _status, Inactive, Active,
cpu->activateStage, cpu->deactivateStage, CPU::RenameIdx) and compute its value
from the proper signals instead of hardcoding true.

---

Duplicate comments:
In `@src/cpu/o3/cpu.hh`:
- Around line 529-547: Remove the redundant CPU-level member StallSignals
stallSig: delete the declaration of stallSig and replace any usage sites
referencing stallSig to use the existing stallSignals instance instead (search
for stallSig and update references in methods/functions that currently access it
to use stallSignals). Ensure there are no remaining duplicate definitions,
update include/forward declarations if needed, and run the build/tests to
confirm no references remain to stallSig.

In `@src/cpu/o3/decode.cc`:
- Around line 112-115: Implement Decode::clearStates(ThreadID tid) so it
actually clears all per-thread decode buffers and associated indices/state for
the given tid: locate the Decode::clearStates(ThreadID tid) definition and add
logic to remove/clear any per-thread buffer containers, queues or maps (the
per-thread decode buffers and their entries), reset head/tail or index counters
and any cached decode state for tid, and free or shrink underlying storage as
needed so buffered entries cannot survive remove/reuse cycles.

In `@src/cpu/o3/fetch.cc`:
- Line 1305: Replace the hardcoded ThreadID tid = 0 usages with the actual
thread identifier used by the fetch/dispatch code path so SMT can make progress;
locate the two occurrences where ThreadID tid is set (the one currently written
as "ThreadID tid = 0" and the duplicate at the other occurrence) and initialize
tid from the caller/context (e.g., use the method/argument that carries thread
index or the FetchState/ThreadContext’s threadId) and ensure any subsequent SMT
selection logic (in the same function and the corresponding dispatch path) uses
this tid rather than a constant 0.

In `@src/cpu/o3/iew.cc`:
- Around line 424-430: Comment and code disagree on the IEW→Commit wire offset:
the comment says IEW writes at [-1] but the code sets toCommit =
iewQueue->getWire(0); update the code so the IEW producer and Commit consumer
use the same queue offset (or update the comment to match the intended
behavior). Specifically, either change the assignment in iew.cc to use
iewQueue->getWire(-iewToCommitDelay) so it matches the "[-1]"/zero-latency
intent, or if zero-latency is not intended, change the comment to reflect
getWire(0); also verify the consumer binding in commit.cc/commit.hh (fromIEW)
uses the same offset and make them consistent (ensure symbols toCommit,
iewToCommitDelay, and fromIEW reference the same offset policy).
- Around line 805-817: The expression using short-circuiting with
stallSig->blockIEW[i] prevents canInsertLDSTQue(i) from being called and thus
skips its getAndResetLastLQPopEntries/getAndResetLastSQPopEntries side effects;
fix by ensuring canInsertLDSTQue(i) is always invoked (e.g., evaluate and store
its result in a local bool before combining with stallSig->blockIEW[i], or
explicitly call the reset/getters when blockIEW is true) so the reset side
effects in canInsertLDSTQue (and its internal calls to
getAndResetLastLQPopEntries/getAndResetLastSQPopEntries) always run; apply the
same change for the similar occurrence around the code referenced (lines
829-833) where short-circuiting currently skips the call.

In `@src/cpu/o3/rename.cc`:
- Around line 401-406: The code uses historyBuffer->empty() which checks the
container itself instead of the per-thread buffer and can corrupt per-thread
release progression; change the check to historyBuffer[tid].empty() and compute
releaseSeq from historyBuffer[tid].back().instSeqNum only when that per-thread
buffer is non-empty (i.e., replace historyBuffer->empty() ? 0 :
historyBuffer[tid].back().instSeqNum with historyBuffer[tid].empty() ? 0 :
historyBuffer[tid].back().instSeqNum) to ensure releaseSeq is derived from the
correct thread buffer (symbols: fromCommit, commitInfo, tid, releaseSeq,
historyBuffer).

---

Nitpick comments:
In `@src/cpu/o3/issue_queue.cc`:
- Line 331: The panic call in the issue queue check (the line calling panic(...)
that verifies dst_inst->isLoad()) is missing a trailing "\n" in its format
string; update the panic format to include a newline so it matches other
panic/DPRINTF usage (e.g., add "\n" at the end of the format string referencing
dst_inst->seqNum and inst->seqNum).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4fc39e5 and 33b75e7.

📒 Files selected for processing (23)

src/cpu/o3/comm.hh
src/cpu/o3/commit.cc
src/cpu/o3/commit.hh
src/cpu/o3/cpu.cc
src/cpu/o3/cpu.hh
src/cpu/o3/decode.cc
src/cpu/o3/decode.hh
src/cpu/o3/dyn_inst.hh
src/cpu/o3/fetch.cc
src/cpu/o3/fetch.hh
src/cpu/o3/iew.cc
src/cpu/o3/iew.hh
src/cpu/o3/inst_queue.cc
src/cpu/o3/issue_queue.cc
src/cpu/o3/lsq.cc
src/cpu/o3/lsq_unit.cc
src/cpu/o3/regfile.hh
src/cpu/o3/rename.cc
src/cpu/o3/rename.hh
src/cpu/o3/rename_map.cc
src/cpu/o3/rob.cc
src/cpu/o3/rob.hh
src/cpu/reg_class.hh

💤 Files with no reviewable changes (2)

src/cpu/o3/dyn_inst.hh
src/cpu/o3/inst_queue.cc

🚧 Files skipped from review as they are similar to previous changes (3)

src/cpu/o3/regfile.hh
src/cpu/o3/lsq_unit.cc
src/cpu/o3/fetch.hh

src/cpu/o3/commit.cc

coderabbitai · 2026-02-28T07:41:26Z

src/cpu/o3/commit.cc

+            panic_if(!rob->findInst(0, inst->seqNum), "[sn:%llu] Committed instruction not found in ROB",
+                     inst->seqNum);


⚠️ Potential issue | 🟠 Major

Use the instruction’s thread when checking ROB membership.

Line 1936 hardcodes rob->findInst(0, inst->seqNum). For non-zero threads this can panic incorrectly.

Suggested fix

- panic_if(!rob->findInst(0, inst->seqNum), "[sn:%llu] Committed instruction not found in ROB", + panic_if(!rob->findInst(inst->threadNumber, inst->seqNum), + "[sn:%llu] Committed instruction not found in ROB", inst->seqNum);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

panic_if(!rob->findInst(0, inst->seqNum), "[sn:%llu] Committed instruction not found in ROB",

inst->seqNum);

panic_if(!rob->findInst(inst->threadNumber, inst->seqNum),

"[sn:%llu] Committed instruction not found in ROB",

inst->seqNum);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cpu/o3/commit.cc` around lines 1936 - 1937, The check uses rob->findInst(0, inst->seqNum) and will panic for non-zero threads; update the call to use the instruction's thread identifier (e.g., rob->findInst(inst->thread, inst->seqNum) or the correct field name on Instruction if different) so the ROB lookup uses the instruction's thread, keeping the panic_if and message unchanged.

src/cpu/o3/decode.cc

coderabbitai · 2026-02-28T07:41:26Z

src/cpu/o3/fetch.cc

+    if (!any_thread_active) {
+        // All threads are blocked, no instructions to send
+        return;
+    }


⚠️ Potential issue | 🟠 Major

Blocked path still forwards instructions and can leave numInst stale.

Two correctness issues in sendInstructionsToDecode():

when stallSig->blockFetch[tid] is true, the function still drains fetchQueue[tid];

if all threads are blocked, it returns before resetting numInst.

Suggested fix

if (!any_thread_active) { // All threads are blocked, no instructions to send + numInst = 0; return; } @@ if (stallSig->blockFetch[tid]) { // If decode stalled, use decode's stall reason DPRINTF(Fetch, "[tid:%i] Fetch stalled\n", tid); setAllFetchStalls(fromDecode->decodeInfo[tid].blockReason); + numInst = 0; + return; }

Also applies to: 1308-1326, 1340-1341

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cpu/o3/fetch.cc` around lines 1301 - 1304, In sendInstructionsToDecode(), when stallSig->blockFetch[tid] is true the code must not drain fetchQueue[tid] or forward instructions for that tid; add a guard around the per-thread drain/forward logic (referencing fetchQueue[tid] and tid) to skip processing when blockFetch[tid] is set. Also ensure that when no threads are active (any_thread_active is false) you still reset or clear numInst before returning so numInst cannot remain stale; move or add the numInst reset/clear immediately before the early return that checks any_thread_active. Apply the same changes in the nearby blocks mentioned (1308-1326, 1340-1341) where per-thread draining and numInst handling occur.

coderabbitai · 2026-02-28T07:41:26Z

src/cpu/o3/iew.cc

+    for (int i = 0; i < numThreads; i++) {
+        if (fromCommit->commitInfo[i].squash) {
+            squash(i);
+            localSquashVer.update(fromCommit->commitInfo[i].squashVersion.getVersion());
+            DPRINTF(IEW, "Updating squash version to %u\n", localSquashVer.getVersion());

-        if (dispatchStatus[tid] == Blocked ||
-            dispatchStatus[tid] == Unblocking) {
-            toRename->iewUnblock[tid] = true;
-            wroteToTimeBuffer = true;
+            fetchRedirect[i] = false;
+            iewStats.stallEvents[ROBWalk]++;
+            setAllStalls(StallReason::CommitSquash);
+            return;
        }


⚠️ Potential issue | 🟠 Major

Don’t return after the first squash; process all thread squash signals in the loop.

Current early return exits on the first thread with commitInfo[i].squash, skipping squash handling for later threads in the same tick.

Proposed fix

- return; + continue;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cpu/o3/iew.cc` around lines 734 - 744, The loop currently returns after handling the first thread with fromCommit->commitInfo[i].squash, skipping later threads; change the control flow so all threads are processed: replace the early `return` in the block that calls squash(i) with a `continue` (or otherwise let the loop proceed) so squash(i), localSquashVer.update(...), fetchRedirect[i] = false and the iewStats/stallEvents/ROBWalk and setAllStalls(StallReason::CommitSquash) logic run for every thread that has commitInfo[i].squash; ensure any shared post-loop behavior still executes only once if needed.

coderabbitai · 2026-02-28T07:41:26Z

src/cpu/o3/lsq.cc

+    if (debug::LSQ) {
+        char buffer[8];
+        std::memcpy(buffer, pkt->getPtr<char>(), pkt->getSize());
+        DPRINTF(LSQ, "Single Req::recvTimingResp: inst: %llu, pkt: %#lx, isLoad: %d, "
+                    "isLLSC: %d, isUncache: %d, isCachehit: %d, data: %d\n",
+                    pkt->req->getReqInstSeqNum(), pkt->getAddr(), isLoad(), mainReq()->isLLSC(),
+                    mainReq()->isUncacheable(), cacheHit, *((uint64_t*)buffer));
+    }


⚠️ Potential issue | 🟠 Major

Potential buffer overflow when packet size exceeds 8 bytes.

The code copies pkt->getSize() bytes into an 8-byte buffer without checking bounds. If the packet size exceeds 8 bytes, this will overflow buffer.

🛡️ Proposed fix

if (debug::LSQ) { - char buffer[8]; - std::memcpy(buffer, pkt->getPtr<char>(), pkt->getSize()); + uint64_t data = 0; + std::memcpy(&data, pkt->getPtr<char>(), std::min(pkt->getSize(), sizeof(data))); DPRINTF(LSQ, "Single Req::recvTimingResp: inst: %llu, pkt: %#lx, isLoad: %d, " "isLLSC: %d, isUncache: %d, isCachehit: %d, data: %d\n", pkt->req->getReqInstSeqNum(), pkt->getAddr(), isLoad(), mainReq()->isLLSC(), - mainReq()->isUncacheable(), cacheHit, *((uint64_t*)buffer)); + mainReq()->isUncacheable(), cacheHit, data); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (debug::LSQ) {

char buffer[8];

std::memcpy(buffer, pkt->getPtr<char>(), pkt->getSize());

DPRINTF(LSQ, "Single Req::recvTimingResp: inst: %llu, pkt: %#lx, isLoad: %d, "

"isLLSC: %d, isUncache: %d, isCachehit: %d, data: %d\n",

pkt->req->getReqInstSeqNum(), pkt->getAddr(), isLoad(), mainReq()->isLLSC(),

mainReq()->isUncacheable(), cacheHit, *((uint64_t*)buffer));

}

if (debug::LSQ) {

uint64_t data = 0;

std::memcpy(&data, pkt->getPtr<char>(), std::min(pkt->getSize(), sizeof(data)));

DPRINTF(LSQ, "Single Req::recvTimingResp: inst: %llu, pkt: %#lx, isLoad: %d, "

"isLLSC: %d, isUncache: %d, isCachehit: %d, data: %d\n",

pkt->req->getReqInstSeqNum(), pkt->getAddr(), isLoad(), mainReq()->isLLSC(),

mainReq()->isUncacheable(), cacheHit, data);

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cpu/o3/lsq.cc` around lines 1664 - 1671, The code in SingleReq::recvTimingResp uses a fixed 8-byte local buffer and memcpy(buffer, pkt->getPtr<char>(), pkt->getSize()) which can overflow if pkt->getSize() > 8; change the logic in recvTimingResp to copy at most sizeof(buffer) bytes (e.g., use std::min(pkt->getSize(), sizeof buffer)) into buffer and, if pkt->getSize() is smaller than 8, zero-fill the remaining bytes so the subsequent *((uint64_t*)buffer) read is safe; reference the symbols recvTimingResp, pkt->getSize(), pkt->getPtr, buffer, and the DPRINTF line when making the fix.

coderabbitai · 2026-02-28T07:41:26Z

src/cpu/o3/rename.cc

+            DPRINTF(Rename, "[sn:%llu] instruction  with PC %s is squashed, skipping.\n",
                    tid, inst->seqNum, inst->pcState());



⚠️ Potential issue | 🟡 Minor

Fix DPRINTF format/argument mismatch.

Line 483 formats [sn:%llu] but passes tid first, and also passes one extra argument.

Suggested fix

- DPRINTF(Rename, "[sn:%llu] instruction with PC %s is squashed, skipping.\n", - tid, inst->seqNum, inst->pcState()); + DPRINTF(Rename, + "[tid:%i] [sn:%llu] instruction with PC %s is squashed, skipping.\n", + tid, inst->seqNum, inst->pcState());

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

DPRINTF(Rename, "[sn:%llu] instruction with PC %s is squashed, skipping.\n",

tid, inst->seqNum, inst->pcState());

DPRINTF(Rename,

"[tid:%i] [sn:%llu] instruction with PC %s is squashed, skipping.\n",

tid, inst->seqNum, inst->pcState());

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cpu/o3/rename.cc` around lines 483 - 485, The DPRINTF in rename.cc has a format/argument mismatch: it prints "[sn:%llu] ... %s" but passes tid first and an extra argument; fix it by making the sequence number the first argument and removing tid (or add a corresponding format if tid should be logged). Concretely, update the DPRINTF(...) call to pass (unsigned long long)inst->seqNum as the first argument and inst->pcState() as the second (keep the "%llu" and "%s" format), referencing the DPRINTF invocation and the inst->seqNum / inst->pcState() symbols to locate the change.

coderabbitai · 2026-02-28T07:41:26Z

src/cpu/o3/rename.hh

+    void setStallSignals(StallSignals* stall_signals) { stallSig = stall_signals; }
+


⚠️ Potential issue | 🟠 Major

Initialize stallSig defensively before first use.

stallSig is introduced as a raw pointer but not initialized at declaration. If tick() runs before setStallSignals(), dereference is undefined.

Suggested fix

- StallSignals* stallSig; + StallSignals* stallSig = nullptr;

Also applies to: 342-342

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cpu/o3/rename.hh` around lines 211 - 212, stallsSig is a raw pointer that may be dereferenced in tick() before setStallSignals() is called; initialize stallSig to nullptr at its declaration and add a defensive null-check (or assert) inside tick() before any dereference to avoid undefined behavior, and apply the same initialization/check fix for the other pointer instance noted around line 342; reference the member name stallSig, the setter setStallSignals(), and the consumer tick() when making the changes.

github-actions · 2026-02-28T07:43:14Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.1727`	-
This PR	`1.9349`	📉 `-0.2378` (`-10.94%`)

✅ Difftest smoke test passed!

github-actions · 2026-02-28T07:45:52Z

🚀 Performance test triggered: spec06-0.8c

Change-Id: I1c63c8081da4c313df326ea7cc40e391e8a1d922

github-actions · 2026-02-28T09:48:43Z

🚀 Performance test triggered: spec06-0.8c

github-actions · 2026-02-28T09:54:46Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.1727`	-
This PR	`1.9349`	📉 `-0.2378` (`-10.94%`)

✅ Difftest smoke test passed!

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/cpu/o3/rename.cc (1)

590-603: ⚠️ Potential issue | 🟠 Major

updateActivate() currently cannot deactivate Rename.

Line 592 sets any_unblocking to true unconditionally, so the inactive/deactivate path is never taken.

Proposed fix

 void
 Rename::updateActivate()
 {
-    bool any_unblocking = true;
+    bool any_unblocking = false;
+    for (ThreadID tid = 0; tid < numThreads; ++tid) {
+        if (!stallSig->blockDecode[tid] && !fixedbuffer[tid].empty()) {
+            any_unblocking = true;
+            break;
+        }
+    }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/rename.cc` around lines 590 - 603, Rename::updateActivate
currently hardcodes any_unblocking = true which prevents deactivation; change it
to compute the real unblocking condition (replace the constant true with the
actual boolean expression that checks Rename's internal state: e.g., whether
there are pending entries, free rename/rob slots, or other wakeup conditions
used elsewhere in Rename) and then handle the deactivation path: when
any_unblocking is false and _status == Active set _status = Inactive, log
DPRINTF(Activity, "Deactivating stage.\n"); and call
cpu->deactivateStage(CPU::RenameIdx) so the stage can properly deactivate.
Ensure you update references to any_unblocking, _status, Inactive, Active,
cpu->activateStage and cpu->deactivateStage in Rename::updateActivate.

src/cpu/o3/cpu.cc (1)

1497-1515: ⚠️ Potential issue | 🔴 Critical

squashInstIt() can decrement an invalid iterator after erase.

At Line 1515, return --instIt; is unsafe when instIt is at begin() or the list becomes empty after erase. This can trigger undefined behavior in squash paths.

Suggested fix

 CPU::ListIt
 CPU::squashInstIt(ListIt &instIt, ThreadID tid)
 {
     if ((*instIt)->threadNumber == tid) {
@@
-        instIt = instList.erase(instIt);
+        instIt = instList.erase(instIt);
+        if (instList.empty() || instIt == instList.begin()) {
+            return instIt;
+        }
+        auto prev = instIt;
+        --prev;
+        return prev;
     }
-    return --instIt;
+    if (instIt == instList.begin()) {
+        return instIt;
+    }
+    auto prev = instIt;
+    --prev;
+    return prev;
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/cpu.cc` around lines 1497 - 1515, The function CPU::squashInstIt
currently does an unconditional return --instIt which can decrement begin() or
an invalid iterator after erase; change the logic so after possibly erasing (in
CPU::squashInstIt) you check if instIt == instList.begin() (or instIt ==
instList.end() && instList.empty()) and return instIt as-is, otherwise decrement
and return (--instIt); ensure both the erased-path and non-erased-path use this
safe check so you never decrement begin() or an empty list iterator.

♻️ Duplicate comments (13)

src/cpu/o3/rename.cc (1)

401-406: ⚠️ Potential issue | 🟠 Major

Use the current thread’s history buffer when computing releaseSeq.

On Line 405, historyBuffer->empty() checks only thread 0. In SMT, this can compute the wrong release boundary for other threads.

Proposed fix

-            releaseSeq = historyBuffer->empty() ? 0 : historyBuffer[tid].back().instSeqNum;
+            releaseSeq = historyBuffer[tid].empty() ? 0 : historyBuffer[tid].back().instSeqNum;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/rename.cc` around lines 401 - 406, The releaseSeq computation uses
historyBuffer->empty() which checks only thread 0; change it to test the current
thread's history buffer (use historyBuffer[tid].empty()) and then use
historyBuffer[tid].back().instSeqNum for releaseSeq when not empty; update the
block around fromCommit->commitInfo[tid], tid, releaseSeq, and historyBuffer
references accordingly so SMT threads use their own history buffers.

src/cpu/o3/comm.hh (1)

330-337: ⚠️ Potential issue | 🟠 Major

Initialize StallSignals arrays to deterministic defaults.

Line 333-Line 336 declares raw bool arrays with no initialization, so early reads may observe garbage values.

Proposed fix

 struct StallSignals
 {
-    bool blockFetch[MaxThreads];// decode to fetch
-    bool blockDecode[MaxThreads];// rename to decode
-    bool blockRename[MaxThreads];// iew to rename (if iew is stalling, rename all threads would be stalled)
-    bool blockIEW[MaxThreads];// commit to iew
+    bool blockFetch[MaxThreads]{};   // decode to fetch
+    bool blockDecode[MaxThreads]{};  // rename to decode
+    bool blockRename[MaxThreads]{};  // iew to rename
+    bool blockIEW[MaxThreads]{};     // commit to iew
 };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/comm.hh` around lines 330 - 337, StallSignals declares raw bool
arrays (blockFetch, blockDecode, blockRename, blockIEW) that are uninitialized;
add deterministic initialization by either providing an in-struct initializer
(e.g., = {false} / = {} for each array) or adding a default constructor
StallSignals() that zeroes all MaxThreads entries for blockFetch, blockDecode,
blockRename, and blockIEW so every element is false on construction.

src/cpu/o3/commit.cc (2)

1936-1937: ⚠️ Potential issue | 🟠 Major

Use the instruction’s thread in ROB lookup.

Line 1936 hardcodes rob->findInst(0, inst->seqNum). This is wrong for non-zero threads and can panic incorrectly.

Proposed fix

-            panic_if(!rob->findInst(0, inst->seqNum), "[sn:%llu] Committed instruction not found in ROB",
+            panic_if(!rob->findInst(inst->threadNumber, inst->seqNum),
+                     "[sn:%llu] Committed instruction not found in ROB",
                      inst->seqNum);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/commit.cc` around lines 1936 - 1937, The ROB lookup is using a
hardcoded thread 0: change the call rob->findInst(0, inst->seqNum) to use the
instruction's actual thread identifier (e.g. rob->findInst(inst->threadId(),
inst->seqNum) or rob->findInst(inst->thread->id, inst->seqNum) depending on the
project API) so the lookup uses inst's thread rather than 0 and avoids false
panics; update the call site in commit.cc where rob->findInst and inst->seqNum
are referenced.

1387-1392: ⚠️ Potential issue | 🔴 Critical

Fix unsafe load-value extraction (overflow + UB risk).

Line 1389 copies effSize bytes into an 8-byte buffer without bounds, and Line 1391 reads through a casted pointer. This can overflow and trigger undefined behavior.

Proposed fix

-                    char buffer[8] = {0};
-                    if (head_inst->memData) {
-                        std::memcpy(buffer, head_inst->memData, head_inst->effSize);
-                    }
-                    Addr load_value = *((uint64_t *)buffer);
+                    uint64_t load_value_u64 = 0;
+                    if (head_inst->memData && head_inst->effSize > 0) {
+                        const size_t copy_size =
+                            std::min<size_t>(head_inst->effSize, sizeof(load_value_u64));
+                        std::memcpy(&load_value_u64, head_inst->memData, copy_size);
+                    }
+                    Addr load_value = static_cast<Addr>(load_value_u64);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/commit.cc` around lines 1387 - 1392, The code unsafely copies
head_inst->effSize bytes into an 8-byte stack buffer and then reads it via a
uint64_t cast, which can overflow and cause UB; fix by guarding and building the
64-bit value safely: ensure head_inst->memData is non-null, clamp the copy
length to at most 8 bytes (e.g., size_t n = std::min<size_t>(head_inst->effSize,
8)), zero-initialize the destination (or initialize a uint64_t value = 0), then
copy only n bytes into that uint64_t using memcpy or a byte-wise loop to avoid
alignment/aliasing issues, and finally use that constructed load_value when
calling loadTripleCounter.update(load_pc, load_addr, load_value).

src/cpu/o3/decode.cc (2)

112-115: ⚠️ Potential issue | 🟠 Major

clearStates() should clear per-thread buffered state.

Line 112-Line 115 is currently a no-op, so stale instructions for tid can remain in fixedbuffer/stall queues during thread reuse.

Proposed fix

 void
 Decode::clearStates(ThreadID tid)
 {
-
+    fixedbuffer[tid].clear();
+
+    auto delIt = stallBuffer.begin();
+    for (auto it0 = eachstallSize.begin(); it0 != eachstallSize.end();) {
+        int size = *it0;
+        auto start_it = delIt;
+        auto end_it = start_it + size;
+        if ((*start_it)->threadNumber == tid) {
+            delIt = stallBuffer.erase(start_it, end_it);
+            it0 = eachstallSize.erase(it0);
+        } else {
+            delIt = end_it;
+            ++it0;
+        }
+    }
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/decode.cc` around lines 112 - 115, Decode::clearStates currently
does nothing, leaving per-thread buffered instructions alive across thread
reuse; implement Decode::clearStates(ThreadID tid) to remove/clear any entries
associated with the given tid from per-thread structures such as fixedbuffer and
any stall queues (and other per-thread buffers maintained by Decode), e.g.
iterate fixedbuffer and stall queue containers and erase or reset entries whose
thread id equals tid, and reset any per-thread bookkeeping counters/flags so the
thread starts with a clean state.

434-473: ⚠️ Potential issue | 🟠 Major

updateActivate() is never reached due a dead gate flag.

status_change is set to false on Line 434 and never updated, so the call on Line 471 does not run.

Proposed fix

-    bool status_change = false;
+    bool status_change = false;
@@
-    if (status_change) {
-        updateActivate();
-    }
+    updateActivate();

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/decode.cc` around lines 434 - 473, status_change is initialized
false and never updated so updateActivate() never runs; change the code so any
operation that can flip thread activation returns a bool and ORs into
status_change (for example make moveInstsToBuffer() and checkSquash() return a
bool and call status_change |= moveInstsToBuffer(); status_change |=
checkSquash(); or have those functions set a provided reference flag), and also
ensure decodeInsts(tid) reports/returns any activation changes (status_change |=
decodeInsts(tid); or decodeInsts sets the flag) before the final if
(status_change) updateActivate(); so updateActivate() is invoked when any of
those routines change status.

src/cpu/o3/iew.cc (3)

828-830: ⚠️ Potential issue | 🟠 Major

canInsertLDSTQue() should be evaluated unconditionally each tick.

canInsertLDSTQue() performs getAndResetLastLQPopEntries() / getAndResetLastSQPopEntries() side effects. With the current || expression, those resets are skipped whenever stallSig->blockIEW[i] is true.

Suggested fix

     for (int i = 0; i < numThreads; i++) {
-        bool block = stallSig->blockIEW[i] || !canInsertLDSTQue(i);
+        bool can_insert_ldst = canInsertLDSTQue(i);
+        bool block = stallSig->blockIEW[i] || !can_insert_ldst;
         bool active = !block && !fixedbuffer[i].empty();

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 828 - 830, The expression currently
short-circuits so canInsertLDSTQue(i) isn't called when stallSig->blockIEW[i] is
true; evaluate canInsertLDSTQue(i) every tick into a local bool (e.g., bool
canInsert = canInsertLDSTQue(i)) and then compute block as bool block =
stallSig->blockIEW[i] || !canInsert; keep the subsequent active calculation
(bool active = !block && !fixedbuffer[i].empty()) unchanged so the
side-effecting getAndResetLastLQPopEntries()/getAndResetLastSQPopEntries()
always run.

833-841: ⚠️ Potential issue | 🟠 Major

Current multi-thread arbitration can starve non-zero thread IDs.

When multiple threads are active, the logic keeps the first discovered thread as tid and blocks rename for both, which can repeatedly favor lower-index threads and degrade SMT fairness.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 833 - 841, The current arbitration in the
active-thread selection block (variables tid, i, InvalidThreadID and
stallSig->blockRename in iew.cc) always keeps the first-seen active thread and
marks both threads blocked, which biases lower-index threads; change this to a
fair round-robin/rotating selection: introduce or use a rotating start index
(e.g., lastChosenThread) and scan threads beginning after it to pick the next
active tid, then only set stallSig->blockRename for other threads as appropriate
(instead of marking both), and update lastChosenThread when you pick tid so
subsequent arbitration rotates priority and prevents starvation of non-zero
thread IDs.

734-744: ⚠️ Potential issue | 🟠 Major

Do not return on first squash inside checkSquash() loop.

At Line 743, returning after the first squashing thread skips squash handling for later threads in the same tick.

Suggested fix

-            return;
+            continue;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 734 - 744, The loop in checkSquash()
currently returns after handling the first squashing thread, which skips
processing subsequent threads; change the control flow to handle all threads by
replacing the early "return" with a "continue" (or otherwise removing the
return) so squash(i), localSquashVer.update(...), fetchRedirect[i] = false,
iewStats.stallEvents[ROBWalk]++, and setAllStalls(StallReason::CommitSquash)
execute for each thread that has fromCommit->commitInfo[i].squash; ensure the
remaining loop iterations still run and that any shared side-effects remain
consistent after this change.

src/cpu/o3/decode.hh (1)

186-186: ⚠️ Potential issue | 🟠 Major

Initialize stallSig to nullptr to avoid indeterminate pointer state.

Without default initialization, any pre-wiring access path can observe garbage pointer values.

Suggested fix

-    StallSignals* stallSig;
+    StallSignals* stallSig = nullptr;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/decode.hh` at line 186, The pointer member StallSignals* stallSig
is uninitialized and can contain garbage; initialize it to a known null state by
setting stallSig = nullptr where it's declared or by adding stallSig(nullptr) to
the constructor initializer list for the class that contains it (reference
symbol: stallSig / StallSignals* stallSig) so any pre-wiring access sees a
defined null pointer.

src/cpu/o3/fetch.cc (2)

1301-1312: ⚠️ Potential issue | 🟠 Major

Block/no-active paths should not forward instructions and must clear numInst.

At Line 1301, sendInstructionsToDecode() returns without clearing numInst, and at Line 1308 it still drains fetchQueue[tid] after detecting stallSig->blockFetch[tid]. This can leak stale per-cycle fetch accounting and violate decode backpressure.

Suggested fix

     if (!any_thread_active) {
         // All threads are blocked, no instructions to send
+        numInst = 0;
         return;
     }
@@
     if (stallSig->blockFetch[tid]) {
         // If decode stalled, use decode's stall reason
         DPRINTF(Fetch, "[tid:%i] Fetch stalled\n", tid);
         setAllFetchStalls(fromDecode->decodeInfo[tid].blockReason);
+        numInst = 0;
+        return;
     }

Also applies to: 1341-1341

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` around lines 1301 - 1312, sendInstructionsToDecode()
currently returns early when no threads are active or when
stallSig->blockFetch[tid] is set but leaves per-cycle counters and still drains
fetchQueue, which can leak stale numInst and break decode backpressure; update
the logic so that before each early return (both the any_thread_active false
path and the stallSig->blockFetch[tid] path referenced in the fetch loop and the
similar path around the other occurrence) you explicitly set numInst = 0 for the
corresponding ThreadID and refrain from draining fetchQueue[tid]; ensure you
still call setAllFetchStalls(fromDecode->decodeInfo[tid].blockReason) on stall
and return immediately after clearing numInst so no instructions are forwarded
or accounted for in that cycle.

1305-1305: ⚠️ Potential issue | 🟠 Major

Hardcoded tid = 0 still disables SMT progress.

Line 1305 and Line 1675 pin behavior to thread 0. With numThreads > 1, other threads can be starved indefinitely.

Minimal fail-fast until SMT scheduling is restored

-    ThreadID tid = 0; // TODO: smt support
+    fatal_if(numThreads > 1,
+             "SMT fetch selection is not implemented; set numThreads=1.");
+    ThreadID tid = 0; // TODO: smt support

Also applies to: 1675-1675

src/cpu/o3/iew.hh (1)

159-159: ⚠️ Potential issue | 🟠 Major

Initialize stallSig to nullptr at declaration.

stallSig currently has an indeterminate value until setStallSignals() is called, which is unsafe for any early-access path.
Suggested fix
-    StallSignals* stallSig;
+    StallSignals* stallSig = nullptr;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.hh` at line 159, Initialize the member pointer stallSig to
nullptr where it's declared (StallSignals* stallSig) in the iew.hh class
declaration so it has a defined value before setStallSignals() is called; update
the declaration to assign nullptr and ensure any early-access code guards
against a null stallSig until setStallSignals() assigns the real pointer.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cpu/o3/commit.cc`:
- Line 1246: The file uses DPRINTF with the debug categories CommitTrace and IEW
(e.g., the DPRINTF call that prints head_inst->genDisassembly()) but does not
include their debug headers; add the missing debug header includes for those
categories (for example include the corresponding debug/CommitTrace.hh and
debug/IEW.hh headers) near the other includes in the file so the CommitTrace and
IEW symbols are declared and compilation no longer depends on transitive
includes.

In `@src/cpu/o3/rename_map.cc`:
- Around line 111-121: The file uses DPRINTF(Scoreboard, ...) (seen around the
map assignment and refcount logs) but doesn't include the Scoreboard debug
header; add an include for "debug/Scoreboard.hh" alongside the existing
"debug/Rename.hh" include in src/cpu/o3/rename_map.cc so the Scoreboard debug
flag symbol is defined and builds resolve (search for the DPRINTF(Scoreboard,
...) calls to place the include near other debug headers).

---

Outside diff comments:
In `@src/cpu/o3/cpu.cc`:
- Around line 1497-1515: The function CPU::squashInstIt currently does an
unconditional return --instIt which can decrement begin() or an invalid iterator
after erase; change the logic so after possibly erasing (in CPU::squashInstIt)
you check if instIt == instList.begin() (or instIt == instList.end() &&
instList.empty()) and return instIt as-is, otherwise decrement and return
(--instIt); ensure both the erased-path and non-erased-path use this safe check
so you never decrement begin() or an empty list iterator.

In `@src/cpu/o3/rename.cc`:
- Around line 590-603: Rename::updateActivate currently hardcodes any_unblocking
= true which prevents deactivation; change it to compute the real unblocking
condition (replace the constant true with the actual boolean expression that
checks Rename's internal state: e.g., whether there are pending entries, free
rename/rob slots, or other wakeup conditions used elsewhere in Rename) and then
handle the deactivation path: when any_unblocking is false and _status == Active
set _status = Inactive, log DPRINTF(Activity, "Deactivating stage.\n"); and call
cpu->deactivateStage(CPU::RenameIdx) so the stage can properly deactivate.
Ensure you update references to any_unblocking, _status, Inactive, Active,
cpu->activateStage and cpu->deactivateStage in Rename::updateActivate.

---

Duplicate comments:
In `@src/cpu/o3/comm.hh`:
- Around line 330-337: StallSignals declares raw bool arrays (blockFetch,
blockDecode, blockRename, blockIEW) that are uninitialized; add deterministic
initialization by either providing an in-struct initializer (e.g., = {false} / =
{} for each array) or adding a default constructor StallSignals() that zeroes
all MaxThreads entries for blockFetch, blockDecode, blockRename, and blockIEW so
every element is false on construction.

In `@src/cpu/o3/commit.cc`:
- Around line 1936-1937: The ROB lookup is using a hardcoded thread 0: change
the call rob->findInst(0, inst->seqNum) to use the instruction's actual thread
identifier (e.g. rob->findInst(inst->threadId(), inst->seqNum) or
rob->findInst(inst->thread->id, inst->seqNum) depending on the project API) so
the lookup uses inst's thread rather than 0 and avoids false panics; update the
call site in commit.cc where rob->findInst and inst->seqNum are referenced.
- Around line 1387-1392: The code unsafely copies head_inst->effSize bytes into
an 8-byte stack buffer and then reads it via a uint64_t cast, which can overflow
and cause UB; fix by guarding and building the 64-bit value safely: ensure
head_inst->memData is non-null, clamp the copy length to at most 8 bytes (e.g.,
size_t n = std::min<size_t>(head_inst->effSize, 8)), zero-initialize the
destination (or initialize a uint64_t value = 0), then copy only n bytes into
that uint64_t using memcpy or a byte-wise loop to avoid alignment/aliasing
issues, and finally use that constructed load_value when calling
loadTripleCounter.update(load_pc, load_addr, load_value).

In `@src/cpu/o3/decode.cc`:
- Around line 112-115: Decode::clearStates currently does nothing, leaving
per-thread buffered instructions alive across thread reuse; implement
Decode::clearStates(ThreadID tid) to remove/clear any entries associated with
the given tid from per-thread structures such as fixedbuffer and any stall
queues (and other per-thread buffers maintained by Decode), e.g. iterate
fixedbuffer and stall queue containers and erase or reset entries whose thread
id equals tid, and reset any per-thread bookkeeping counters/flags so the thread
starts with a clean state.
- Around line 434-473: status_change is initialized false and never updated so
updateActivate() never runs; change the code so any operation that can flip
thread activation returns a bool and ORs into status_change (for example make
moveInstsToBuffer() and checkSquash() return a bool and call status_change |=
moveInstsToBuffer(); status_change |= checkSquash(); or have those functions set
a provided reference flag), and also ensure decodeInsts(tid) reports/returns any
activation changes (status_change |= decodeInsts(tid); or decodeInsts sets the
flag) before the final if (status_change) updateActivate(); so updateActivate()
is invoked when any of those routines change status.

In `@src/cpu/o3/decode.hh`:
- Line 186: The pointer member StallSignals* stallSig is uninitialized and can
contain garbage; initialize it to a known null state by setting stallSig =
nullptr where it's declared or by adding stallSig(nullptr) to the constructor
initializer list for the class that contains it (reference symbol: stallSig /
StallSignals* stallSig) so any pre-wiring access sees a defined null pointer.

In `@src/cpu/o3/fetch.cc`:
- Around line 1301-1312: sendInstructionsToDecode() currently returns early when
no threads are active or when stallSig->blockFetch[tid] is set but leaves
per-cycle counters and still drains fetchQueue, which can leak stale numInst and
break decode backpressure; update the logic so that before each early return
(both the any_thread_active false path and the stallSig->blockFetch[tid] path
referenced in the fetch loop and the similar path around the other occurrence)
you explicitly set numInst = 0 for the corresponding ThreadID and refrain from
draining fetchQueue[tid]; ensure you still call
setAllFetchStalls(fromDecode->decodeInfo[tid].blockReason) on stall and return
immediately after clearing numInst so no instructions are forwarded or accounted
for in that cycle.

In `@src/cpu/o3/iew.cc`:
- Around line 828-830: The expression currently short-circuits so
canInsertLDSTQue(i) isn't called when stallSig->blockIEW[i] is true; evaluate
canInsertLDSTQue(i) every tick into a local bool (e.g., bool canInsert =
canInsertLDSTQue(i)) and then compute block as bool block =
stallSig->blockIEW[i] || !canInsert; keep the subsequent active calculation
(bool active = !block && !fixedbuffer[i].empty()) unchanged so the
side-effecting getAndResetLastLQPopEntries()/getAndResetLastSQPopEntries()
always run.
- Around line 833-841: The current arbitration in the active-thread selection
block (variables tid, i, InvalidThreadID and stallSig->blockRename in iew.cc)
always keeps the first-seen active thread and marks both threads blocked, which
biases lower-index threads; change this to a fair round-robin/rotating
selection: introduce or use a rotating start index (e.g., lastChosenThread) and
scan threads beginning after it to pick the next active tid, then only set
stallSig->blockRename for other threads as appropriate (instead of marking
both), and update lastChosenThread when you pick tid so subsequent arbitration
rotates priority and prevents starvation of non-zero thread IDs.
- Around line 734-744: The loop in checkSquash() currently returns after
handling the first squashing thread, which skips processing subsequent threads;
change the control flow to handle all threads by replacing the early "return"
with a "continue" (or otherwise removing the return) so squash(i),
localSquashVer.update(...), fetchRedirect[i] = false,
iewStats.stallEvents[ROBWalk]++, and setAllStalls(StallReason::CommitSquash)
execute for each thread that has fromCommit->commitInfo[i].squash; ensure the
remaining loop iterations still run and that any shared side-effects remain
consistent after this change.

In `@src/cpu/o3/iew.hh`:
- Line 159: Initialize the member pointer stallSig to nullptr where it's
declared (StallSignals* stallSig) in the iew.hh class declaration so it has a
defined value before setStallSignals() is called; update the declaration to
assign nullptr and ensure any early-access code guards against a null stallSig
until setStallSignals() assigns the real pointer.

In `@src/cpu/o3/rename.cc`:
- Around line 401-406: The releaseSeq computation uses historyBuffer->empty()
which checks only thread 0; change it to test the current thread's history
buffer (use historyBuffer[tid].empty()) and then use
historyBuffer[tid].back().instSeqNum for releaseSeq when not empty; update the
block around fromCommit->commitInfo[tid], tid, releaseSeq, and historyBuffer
references accordingly so SMT threads use their own history buffers.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 33b75e7 and f77f257.

📒 Files selected for processing (23)

src/cpu/o3/comm.hh
src/cpu/o3/commit.cc
src/cpu/o3/commit.hh
src/cpu/o3/cpu.cc
src/cpu/o3/cpu.hh
src/cpu/o3/decode.cc
src/cpu/o3/decode.hh
src/cpu/o3/dyn_inst.hh
src/cpu/o3/fetch.cc
src/cpu/o3/fetch.hh
src/cpu/o3/iew.cc
src/cpu/o3/iew.hh
src/cpu/o3/inst_queue.cc
src/cpu/o3/issue_queue.cc
src/cpu/o3/lsq.cc
src/cpu/o3/lsq_unit.cc
src/cpu/o3/regfile.hh
src/cpu/o3/rename.cc
src/cpu/o3/rename.hh
src/cpu/o3/rename_map.cc
src/cpu/o3/rob.cc
src/cpu/o3/rob.hh
src/cpu/reg_class.hh

💤 Files with no reviewable changes (2)

src/cpu/o3/dyn_inst.hh
src/cpu/o3/inst_queue.cc

✅ Files skipped from review due to trivial changes (1)

src/cpu/o3/regfile.hh

🚧 Files skipped from review as they are similar to previous changes (6)

src/cpu/o3/issue_queue.cc
src/cpu/o3/lsq_unit.cc
src/cpu/o3/lsq.cc
src/cpu/o3/rob.hh
src/cpu/o3/fetch.hh
src/cpu/o3/rob.cc

coderabbitai · 2026-02-28T09:57:18Z

src/cpu/o3/commit.cc

                cpu->perfCCT->commitMeta(head_inst->seqNum);
-                head_inst->printDisassemblyAndResult(cpu->name());
+
+                DPRINTF(CommitTrace, "CT: %s\n", head_inst->genDisassembly());


⚠️ Potential issue | 🔴 Critical

Use declared debug flags (or add required debug headers).

Line 1246 logs with CommitTrace and Line 1850 logs with IEW, but this file does not include their corresponding debug headers. That can fail compilation depending on include order/transitive headers.

Proposed fix (use existing category in this file)

- DPRINTF(CommitTrace, "CT: %s\n", head_inst->genDisassembly()); + DPRINTF(Commit, "CT: %s\n", head_inst->genDisassembly()); ... - DPRINTF(IEW, "Multiple active threads detected, blocking all threads\n"); + DPRINTF(Commit, "Multiple active threads detected, blocking all threads\n");

Also applies to: 1850-1850

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cpu/o3/commit.cc` at line 1246, The file uses DPRINTF with the debug categories CommitTrace and IEW (e.g., the DPRINTF call that prints head_inst->genDisassembly()) but does not include their debug headers; add the missing debug header includes for those categories (for example include the corresponding debug/CommitTrace.hh and debug/IEW.hh headers) near the other includes in the file so the CommitTrace and IEW symbols are declared and compilation no longer depends on transitive includes.

coderabbitai · 2026-02-28T09:57:18Z

src/cpu/o3/rename_map.cc

+        DPRINTF(Scoreboard, "Get free reg p%i\n", renamed_reg.PhyReg()->flatIndex());
        map[arch_reg.index()] = renamed_reg;
        renamed_reg.PhyReg()->setNumPinnedWrites(arch_reg.getNumPinnedWrites());
        renamed_reg.PhyReg()->setNumPinnedWritesToComplete(
            arch_reg.getNumPinnedWrites() + 1);
-        DPRINTF(Rename, "set refcnt of p%i to %i\n",
+        DPRINTF(Scoreboard, "set refcnt of p%i to %i\n",
                renamed_reg.PhyReg()->flatIndex(), renamed_reg.PhyReg()->getRef());
    }

-    DPRINTF(Rename, "Renamed reg %d to physical reg %d (%d) old mapping was"
+    DPRINTF(Scoreboard, "Renamed reg %d to physical reg %d (%d) old mapping was"
            " %d (%d)\n",


⚠️ Potential issue | 🔴 Critical

Add the missing debug header for Scoreboard logging.

DPRINTF(Scoreboard, ...) is used on Line 111, Line 116, and Line 120, but this file still only includes debug/Rename.hh. Add debug/Scoreboard.hh to avoid unresolved debug flag symbol errors at build time.

Proposed fix

`#include` "cpu/reg_class.hh" `#include` "debug/Rename.hh" +#include "debug/Scoreboard.hh"

🧰 Tools

🪛 Cppcheck (2.19.0)

[error] 119-119: Reference to temporary returned.

(returnTempReference)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cpu/o3/rename_map.cc` around lines 111 - 121, The file uses DPRINTF(Scoreboard, ...) (seen around the map assignment and refcount logs) but doesn't include the Scoreboard debug header; add an include for "debug/Scoreboard.hh" alongside the existing "debug/Rename.hh" include in src/cpu/o3/rename_map.cc so the Scoreboard debug flag symbol is defined and builds resolve (search for the DPRINTF(Scoreboard, ...) calls to place the include near other debug headers).

tastynoob requested review from happy-lx and jensen-yan February 9, 2026 11:32

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

tastynoob force-pushed the refactor-stalls branch from 4fc39e5 to 33b75e7 Compare February 28, 2026 07:31

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

tastynoob added the perf label Feb 28, 2026

cpu-o3: using reverse ordered tick & refactor the stalls logic

f77f257

Change-Id: I1c63c8081da4c313df326ea7cc40e391e8a1d922

tastynoob force-pushed the refactor-stalls branch from 33b75e7 to f77f257 Compare February 28, 2026 09:42

tastynoob added perf and removed perf labels Feb 28, 2026

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

		boost::circular_buffer<DynInstPtr> fixedbuffer[MaxThreads];

		StallSignals* stallSig;

		panic_if(!rob->findInst(0, inst->seqNum), "[sn:%llu] Committed instruction not found in ROB",
		inst->seqNum);

		DPRINTF(Rename, "[sn:%llu] instruction with PC %s is squashed, skipping.\n",
		tid, inst->seqNum, inst->pcState());

		void setStallSignals(StallSignals* stall_signals) { stallSig = stall_signals; }

Conversation

tastynoob commented Feb 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 28, 2026

🚀 Coremark Smoke Test Results

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

🚀 Coremark Smoke Test Results

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

tastynoob commented Feb 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 9, 2026 •

edited

Loading