Conversation
|
Thanks a lot! I think it is better that you also create a small test case to check the FDP and Power of the method on simulated data (ideally FDP < threshold and Power is greater than 0); you can check examples in the test folder. A related thing is that maybe we need a general procedure for creating e-value, since as far as I'm aware there is no such implementation on Python. Note that there are many unorganized things with the knockoff module that I might plan to do a refactoring soon, but I have not found the time yet. That being said, I will do it after the merge of this PR. |
|
Indeed, this PR obviously needs test + example. |
|
Also note that Zhimei Ren has an R implementation here: https://github.com/zhimeir/derandomized_knockoffs_fdr |
bthirion
left a comment
There was a problem hiding this comment.
Sounds promising, thx !
| @@ -11,6 +11,8 @@ | |||
| To reduce the script runtime it is desirable to increase n_jobs parameter. | |||
There was a problem hiding this comment.
What is the status of this 'examples_not_exhibited' directory ? if these are non-working examples it should be removed ;-)
There was a problem hiding this comment.
This is one of my TODOs for the knockoff module: after refactoring knockoffs to the main branch using different example for a fast build (the current one is with n=500, p=1000 and 2500 simulations, hence not really the most friendly example to run).
| print('Done!') | ||
|
|
||
|
|
||
| main() |
There was a problem hiding this comment.
We avoid this structure in examples to improve readability.
There was a problem hiding this comment.
Yes, the all the examples need rework IMO.
| return np.array(pvals) | ||
|
|
||
|
|
||
| def _empirical_eval(test_score, fdr=0.1, offset=1): |
There was a problem hiding this comment.
Why do you expose the offset parameter ? I think it should be fixed.
There was a problem hiding this comment.
It is shown in the rest of the code (i.e. it is a parameter of knockoff_aggregation etc), should we remove this altogether?
There was a problem hiding this comment.
I would, as I don't see any use case for changing it. @tbng any opinion on this ?
There was a problem hiding this comment.
Yes, this is done long before, anyway I think it's ok we remove the offset and simple use
| evals_sorted = -np.sort(-evals) # sort in descending order | ||
| selected_index = 2 * n_features | ||
| for i in range(n_features - 1, -1, -1): | ||
| if evals_sorted[i] >= n_features / (fdr * (i + 1)): |
There was a problem hiding this comment.
I think that you can avoid the for loop.
| else: | ||
| return -1.0 | ||
|
|
||
| def _ebh_threshold(evals, fdr=0.1): |
There was a problem hiding this comment.
this function should have a test.
There was a problem hiding this comment.
should i add a test_utils file to the test section? the utils file is not tested as of now
|
Resurrecting this PR as I'm working to finish it right now. Edit: with @alexblnn |
as discussed with @tbng @bthirion @pneuvial, this PR aims at adding Knockoff aggregation using e-values as described in Ren and Barber 2022 (https://arxiv.org/pdf/2205.15461.pdf)