Fix incorrect DROP_SIZE usage #296

Mike1117 · 2025-06-28T06:42:10Z

Previously, measure() only recorded timings for indices in the middle range [DROP_SIZE, N_MEASURES - DROP_SIZE), while update_statistics() assumed all entries were available from index 0.

This mismatch led to statistical analysis on unmeasured (zero-filled) samples, potentially skewing results or preventing detection thresholds from being reached.

Now:

measure() records all samples
update_statistics() discards DROP_SIZE samples on both ends
Sample accounting matches ENOUGH_MEASURE estimation

Change-Id: Ibb1515043da5f56d72fe34fd5c78e2283df9a993

Previously, measure() only recorded timings for indices in the middle range [DROP_SIZE, N_MEASURES - DROP_SIZE), while update_statistics() assumed all entries were available from index 0. This mismatch led to statistical analysis on unmeasured (zero-filled) samples, potentially skewing results or preventing detection thresholds from being reached. Now: - measure() records all samples - update_statistics() discards DROP_SIZE samples on both ends - Sample accounting matches ENOUGH_MEASURE estimation Change-Id: Ibb1515043da5f56d72fe34fd5c78e2283df9a993

jserv

Show more experiment results about "statistical analysis on unmeasured (zero-filled)
samples, potentially skewing results or preventing detection thresholds
from being reached."

Mike1117 · 2025-06-28T15:09:03Z

To validate the claim about "statistical analysis on unmeasured (zero-filled) samples", I added debug output in update_statistics() to print all exec_times[i] and corresponding classes[i].

Below are two outputs:

Before fix (measure() only measures middle range)

Note: exec_times[i] = 0 for unmeasured indices (head and tail), but update_statistics() still included them starting from i = 10.

i = 10 | class = 1 | exec_time = 0
i = 11 | class = 1 | exec_time = 0
i = 12 | class = 0 | exec_time = 0
...
i =   18 | class = 0 | exec_time = 0
i =   19 | class = 1 | exec_time = 0
i =   20 | class = 0 | exec_time = 1178
i =   21 | class = 1 | exec_time = 1140
i =   22 | class = 1 | exec_time = 1064
...
i = 129 | class = 1 | exec_time = 1064
i = 130 | class = 0 | exec_time = 0
...
i =  143 | class = 1 | exec_time = 0
i =  144 | class = 0 | exec_time = 0
i =  145 | class = 1 | exec_time = 0
i =  146 | class = 1 | exec_time = 0
i =  147 | class = 0 | exec_time = 0
i =  148 | class = 1 | exec_time = 0
i =  149 | class = 0 | exec_time = 0

This shows:

[10,19] and [130,149] contain zero values
These were being pushed into t_push(...) → skewing statistical results
Consider the t-value definition:
$t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{s_1^2}{N_1} + \frac{s_2^2}{N_2}}}$

Zero-filled execution times will

Reduce the sample means $\bar{X}_1$, $\bar{X}_2$
Inflate variances $s_1^2$, $s_2^2$

Both of them will suppress the t-value, especially when the sample size is still small, and will cause the test to overestimate the number of additional measurements required.
Furthermore, this may delay or entirely prevent the detection of actual timing leaks, resulting in false negatives.

After fix (measure() records all, update_statistics() uses DROP_SIZE range)

i = 0 | class = 0 | exec_time = 1102
i = 1 | class = 1 | exec_time = 1101
...
i = 149 | class = 1 | exec_time = 1063

Now all samples have proper timing values
update_statistics() starts from i = DROP_SIZE and end in i < N_MEASURES - DROP_SIZE.
This avoids head/tail measurements to eliminate measurement bias caused by warm-up effects
Results become consistent and threshold will be reached as expected

This supports the need for consistent range handling between measurement and statistical analysis.

Mike1117 changed the title ~~Fix: correct DROP_SIZE usage~~ Fix incorrect DROP_SIZE usage Jun 28, 2025

Mike1117 force-pushed the fix/drop-size branch from aea35da to 75aeade Compare June 28, 2025 08:25

jserv requested changes Jun 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix incorrect DROP_SIZE usage #296

Fix incorrect DROP_SIZE usage #296

Uh oh!

Mike1117 commented Jun 28, 2025

Uh oh!

jserv left a comment

Uh oh!

Mike1117 commented Jun 28, 2025

Uh oh!

Uh oh!

Fix incorrect DROP_SIZE usage #296

Are you sure you want to change the base?

Fix incorrect DROP_SIZE usage #296

Uh oh!

Conversation

Mike1117 commented Jun 28, 2025

Uh oh!

jserv left a comment

Choose a reason for hiding this comment

Uh oh!

Mike1117 commented Jun 28, 2025

Before fix (measure() only measures middle range)

After fix (measure() records all, update_statistics() uses DROP_SIZE range)

Uh oh!

Uh oh!