-
Notifications
You must be signed in to change notification settings - Fork 79
Add 'mutation' mode #3314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add 'mutation' mode #3314
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3314 +/- ##
=======================================
Coverage 89.80% 89.80%
=======================================
Files 29 29
Lines 31043 31052 +9
Branches 5682 5685 +3
=======================================
+ Hits 27877 27886 +9
Misses 1779 1779
Partials 1387 1387
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
divergence_matrix to this mode; document divergence_matrix
b5fcad5 to
a05b7ad
Compare
|
Note that the docs are a bit light on what the mode means generally, but that's intentional, since currently it's only used in these two functions. |
|
I vote for the breaking change and throwing an error because the proposed change is doing the right thing. |
|
I agree with @gregorgorjanc |
|
I've added an informative warning saying to change "site" to "mutation" now, which removes any hesitation I had about the breaking change. (It still errors, though.) |
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good idea, but I've gotten lost in what's actually being done here. Why are we updating all the code for divergence matrix, if it's not mentioned in the changelog? The changes made here feel a lot bigger than what's being discussed, but I've forgotten where we actually landed with divmat in the end.
|
The connection is that genetic_relatedness_matrix uses divergence _matrix under the hood, so this is actually affecting both. My mistake missing it from the changelog. |
Co-authored-by: Jerome Kelleher <jk@well.ox.ac.uk>
|
I did not put the warning in for divmat because it was previously undocumented. |
|
Hi @petrelharp, I'm trying to get my head around this. |
|
@benjeffery yes, that is my understanding. |
|
I also can't see any tests (I might have missed them) that aren't |
Sorry, re-read Peter's initial message and he mentions it there. |
|
See updated explanation above. |
It is mentioned in the CHANGELOG: the changelog says |
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! That all makes sense, thanks.
I've logged #3320 to track documenting threading parameters.
|
I'm still wondering if we need tests with multiple mutations per site here? |
|
Easy enough to add? |
Ah right, missed that query, sorry. We do already - this is not new functionality, and when @jeromekelleher implemented this he did have those tests, for instance here. |
|
Sorry @petrelharp, I'm still confused. In #3299 you say that the existing site mode is only a valid mutation mode for a tree sequence where there is only one mutation per site (i.e. infinite sites model). If we are are renaming the existing site mode to be a true mutation mode, don't we need to make it a valid mutation mode for all tree sequences? |
Hm, that's certainly not what I meant to say. What I meant to say is that there are two similar but distinct modes, "site" and "mutation", that we would sometimes previously confuse for each other. As I've written in this PR:
We might phrase it a different way: suppose that we have a function
The thing about infinite sites is that if no site has more than one mutation, then We would ideally implement Does this answer your question? |
Yes, I get the distinction between the different modes, and how infinite sites makes In To explain differently, the only place Sorry to keep on! |
|
Ah! Sorry, let me investigate. |
|
Dang, good catch - looks like you're right, @benjeffery. I'm not sure how that happened (but suspect it was my fault). I'll just verify (probably with a test) that this is what's going on, and if so, put up a new PR with just a corrected docstring (and documentation for |

Edit: more detail here. Context:
genetic_relatedness_matrixis built on top ofdivergence_matrixdivergence_matrixwas undocumenteddivergence_matrix, and hencegenetic_relatedness_matrix, uses the Scheiber-Vishkin algorithm, which has to do with TMRCAs; so, it is not suitable tositemode, and instead computesmutationmodemutationmode, andsitemode is very similar, so previouslygenetic_relatedness_matrixhadsiteandbranchmode, and a caveat in the docs saying "this isn't really quite site mode in the same sense as elsewhere in the docs"So, here I'm
genetic_relatedness_matrixanddivergence_matrixtake "mutation" mode, and not site modedivergence_matrixgenetic_relatedness_matrixwithsitemode saying "you should use mutation mode" (it still throws an error)divergence_matrixbecause it was previously undocumented.This has very few changes, because it's just a re-naming of the previous behavior. In other words, the mechanism has not changed, just what it's called. I couldn't help but do a little bit of refactoring of some tests along the way, apologies.
This is a breaking change, since now
ts.genetic_relatedness_matrix(..., mode="site")will now throw an error. Note however thatts.genetic_relatedness_matrix(...)does not change, because previously the default wasmode="site"and now it ismode="mutation", which is the same thing. So, we could allowmode="site"to keep working, but deprecated, but this isn't right, because it is not computing the same thing asgenetic_relatedness(..., mode="site"); it will be computing the same thing asgenetic_relatedness(..., mode="mutation")once the latter is implemented.