Skip to content

[Build] Fix flaky test_clock_accuracy by using LCG workload with in-loop timing#436

Open
hughperkins wants to merge 1 commit intomainfrom
hp/fix-clock-accuracy-flaky
Open

[Build] Fix flaky test_clock_accuracy by using LCG workload with in-loop timing#436
hughperkins wants to merge 1 commit intomainfrom
hp/fix-clock-accuracy-flaky

Conversation

@hughperkins
Copy link
Copy Markdown
Collaborator

The test was flaky (~10% failure rate) due to two issues:

  • block_dim=1 caused each thread to run on a different SM with slightly different clock rates, breaking the proportionality check at high i
  • Cold instruction cache on first kernel execution inflated a[0]

Replace the sin-based workload with an LCG, use parallel execution within a single warp, capture timing inside the loop via a data-dependent store to prevent compiler hoisting, and warm up the instruction cache by delaying the start timestamp to j=10.

Issue: #

Brief Summary

copilot:summary

Walkthrough

copilot:walkthrough

The test was flaky (~10% failure rate) due to two issues:
- block_dim=1 caused each thread to run on a different SM with slightly
  different clock rates, breaking the proportionality check at high i
- Cold instruction cache on first kernel execution inflated a[0]

Replace the sin-based workload with an LCG, use parallel execution within
a single warp, capture timing inside the loop via a data-dependent store
to prevent compiler hoisting, and warm up the instruction cache by
delaying the start timestamp to j=10.
@hughperkins
Copy link
Copy Markdown
Collaborator Author

Technically, Opus physically wrote the code, but I had to painfully spoon feed it pretty much every line in practice. (it wanted to serialize the loop, and I think it's more elegant to run it on a warp, considering it's a 32-size loop in the first place). Anyway... I have read every line added in this PR, and reviewed the lines. I take responsibilty for the lines added and removed in this PR, and won't blame any issues on Opus.

@hughperkins hughperkins marked this pull request as ready for review March 30, 2026 18:53
x = (1664527 * x + 1013904223) % 2147483647
if j == 10:
start = qd.clock_counter()
if x > 10:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a comment explaining why we are doing this x comparison

if j == 10:
start = qd.clock_counter()
if x > 10:
a[i] = qd.clock_counter() - start
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a comment explaining why we write inside the loop, not after it.

start = qd.i64(0)
for j in range((i + 1) * 50000):
x = (1664527 * x + 1013904223) % 2147483647
if j == 10:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nees a comment explaining why we dont sart the loop till j=1, and why we do it inside the loop

x = state[i]
start = qd.i64(0)
for j in range((i + 1) * 50000):
x = (1664527 * x + 1013904223) % 2147483647
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a comment saying that this is an LCG, and why we use it

start = qd.clock_counter()
if x > 10:
a[i] = qd.clock_counter() - start
state[i] = x
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a commment saying why we write the result back to state

a[i] = qd.clock_counter() - start

foo()
x = state[i]
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a comment saying why we read x from state initially

def foo():
qd.loop_config(block_dim=1)
def measure_sequence_timings():
for i in range(32):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a comment saying the significant of shape 32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant