Skip to content

Fix incorrect DROP_SIZE usage #296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Mike1117
Copy link

Previously, measure() only recorded timings for indices in the middle range [DROP_SIZE, N_MEASURES - DROP_SIZE), while update_statistics() assumed all entries were available from index 0.

This mismatch led to statistical analysis on unmeasured (zero-filled) samples, potentially skewing results or preventing detection thresholds from being reached.

Now:

  • measure() records all samples
  • update_statistics() discards DROP_SIZE samples on both ends
  • Sample accounting matches ENOUGH_MEASURE estimation

Change-Id: Ibb1515043da5f56d72fe34fd5c78e2283df9a993

@Mike1117 Mike1117 changed the title Fix: correct DROP_SIZE usage Fix incorrect DROP_SIZE usage Jun 28, 2025
Previously, measure() only recorded timings for indices in the middle
range [DROP_SIZE, N_MEASURES - DROP_SIZE), while update_statistics()
assumed all entries were available from index 0.

This mismatch led to statistical analysis on unmeasured (zero-filled)
samples, potentially skewing results or preventing detection thresholds
from being reached.

Now:
- measure() records all samples
- update_statistics() discards DROP_SIZE samples on both ends
- Sample accounting matches ENOUGH_MEASURE estimation

Change-Id: Ibb1515043da5f56d72fe34fd5c78e2283df9a993
Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show more experiment results about "statistical analysis on unmeasured (zero-filled)
samples, potentially skewing results or preventing detection thresholds
from being reached."

@Mike1117
Copy link
Author

To validate the claim about "statistical analysis on unmeasured (zero-filled) samples", I added debug output in update_statistics() to print all exec_times[i] and corresponding classes[i].

Below are two outputs:


Before fix (measure() only measures middle range)

Note: exec_times[i] = 0 for unmeasured indices (head and tail), but update_statistics() still included them starting from i = 10.

i = 10 | class = 1 | exec_time = 0
i = 11 | class = 1 | exec_time = 0
i = 12 | class = 0 | exec_time = 0
...
i =   18 | class = 0 | exec_time = 0
i =   19 | class = 1 | exec_time = 0
i =   20 | class = 0 | exec_time = 1178
i =   21 | class = 1 | exec_time = 1140
i =   22 | class = 1 | exec_time = 1064
...
i = 129 | class = 1 | exec_time = 1064
i = 130 | class = 0 | exec_time = 0
...
i =  143 | class = 1 | exec_time = 0
i =  144 | class = 0 | exec_time = 0
i =  145 | class = 1 | exec_time = 0
i =  146 | class = 1 | exec_time = 0
i =  147 | class = 0 | exec_time = 0
i =  148 | class = 1 | exec_time = 0
i =  149 | class = 0 | exec_time = 0

This shows:

  • [10,19] and [130,149] contain zero values
  • These were being pushed into t_push(...) → skewing statistical results
  • Consider the t-value definition:
    $t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{s_1^2}{N_1} + \frac{s_2^2}{N_2}}}$

Zero-filled execution times will

  • Reduce the sample means $\bar{X}_1$, $\bar{X}_2$
  • Inflate variances $s_1^2$, $s_2^2$

Both of them will suppress the t-value, especially when the sample size is still small, and will cause the test to overestimate the number of additional measurements required.
Furthermore, this may delay or entirely prevent the detection of actual timing leaks, resulting in false negatives.


After fix (measure() records all, update_statistics() uses DROP_SIZE range)

i = 0 | class = 0 | exec_time = 1102
i = 1 | class = 1 | exec_time = 1101
...
i = 149 | class = 1 | exec_time = 1063
  • Now all samples have proper timing values
  • update_statistics() starts from i = DROP_SIZE and end in i < N_MEASURES - DROP_SIZE.
    This avoids head/tail measurements to eliminate measurement bias caused by warm-up effects
  • Results become consistent and threshold will be reached as expected

This supports the need for consistent range handling between measurement and statistical analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants