-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
Reimplemented Pearson's correlation to use two pass Welford's model #62750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
0823950
e20b045
60471c2
28fb765
d1c1e83
8af06ed
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -345,7 +345,8 @@ def nancorr(const float64_t[:, :] mat, bint cov=False, minp=None): | |
float64_t[:, ::1] result | ||
uint8_t[:, :] mask | ||
int64_t nobs = 0 | ||
float64_t vx, vy, dx, dy, meanx, meany, divisor, ssqdmx, ssqdmy, covxy, val | ||
float64_t vx, vy, meanx, meany, divisor, ssqdmx, ssqdmy, cxy, val | ||
float64_t sumx, sumy | ||
|
||
N, K = (<object>mat).shape | ||
if minp is None: | ||
|
@@ -358,39 +359,42 @@ def nancorr(const float64_t[:, :] mat, bint cov=False, minp=None): | |
|
||
with nogil: | ||
for xi in range(K): | ||
for yi in range(xi + 1): | ||
for yi in range(xi+1): | ||
# Welford's method for the variance-calculation | ||
# https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance | ||
nobs = ssqdmx = ssqdmy = covxy = meanx = meany = 0 | ||
# Changed to Welford's two-pass for improved numeric stability | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This comment is misleading. |
||
nobs = ssqdmx = ssqdmy = cxy = meanx = meany = 0 | ||
sumx = sumy = 0 | ||
for i in range(N): | ||
if mask[i, xi] and mask[i, yi]: | ||
vx = mat[i, xi] | ||
vy = mat[i, yi] | ||
sumx += mat[i, xi] | ||
sumy += mat[i, yi] | ||
nobs += 1 | ||
dx = vx - meanx | ||
dy = vy - meany | ||
meanx += 1. / nobs * dx | ||
meany += 1. / nobs * dy | ||
ssqdmx += (vx - meanx) * dx | ||
ssqdmy += (vy - meany) * dy | ||
covxy += (vx - meanx) * dy | ||
|
||
if nobs < minpv: | ||
result[xi, yi] = result[yi, xi] = NaN | ||
continue | ||
meanx = sumx / nobs | ||
meany = sumy / nobs | ||
for i in range(N): | ||
if mask[i, xi] and mask[i, yi]: | ||
vx = mat[i, xi] - meanx | ||
vy = mat[i, yi] - meany | ||
Comment on lines
+380
to
+381
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One of the reasons the Welford's algorithm is considered a "stable" algorithm is by mitigating cancellation in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem with the current implementation is the accumulation of round off errors. The Kahan Summation can mitigate this issue. |
||
cxy += vx * vy | ||
ssqdmx += vx * vx | ||
ssqdmy += vy * vy | ||
divisor = (nobs - 1.0) if cov else sqrt(ssqdmx * ssqdmy) | ||
|
||
# clip `covxy / divisor` to ensure coeff is within bounds | ||
if divisor != 0: | ||
val = cxy / divisor | ||
if not cov: | ||
if val > 1.0: | ||
val = 1.0 | ||
elif val < -1.0: | ||
val = -1.0 | ||
result[xi, yi] = result[yi, xi] = val | ||
else: | ||
divisor = (nobs - 1.0) if cov else sqrt(ssqdmx * ssqdmy) | ||
|
||
# clip `covxy / divisor` to ensure coeff is within bounds | ||
if divisor != 0: | ||
val = covxy / divisor | ||
if not cov: | ||
if val > 1.0: | ||
val = 1.0 | ||
elif val < -1.0: | ||
val = -1.0 | ||
result[xi, yi] = result[yi, xi] = val | ||
else: | ||
result[xi, yi] = result[yi, xi] = NaN | ||
result[xi, yi] = result[yi, xi] = NaN | ||
|
||
return result.base | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it exists. You are simply using two-pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must have conflated the two, will update