Clarification of trim order of operations on allele normalization #594
theferrit32
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Just something that came to mind. May not need much discussion if order doesn't matter, but if so maybe it could be clarified in the spec that the order doesn't matter? Or maybe swap the order to at least match the existing reference implementation in
bioutils?a. Trim common suffix sequence (if any) from both of the Allele Sequences and decrement end by the length of the trimmed suffix.
b. Trim common prefix sequence (if any) from both of the Allele Sequences and increment start by the length of the trimmed prefix.
Source: https://vrs.ga4gh.org/en/latest/conventions/normalization.html#allele-normalization
This says to trim suffix, then trim prefix. Does this order matter? In vrs-python, the trimmed sequences are always then rolled back out left and right (mode=EXPAND), so maybe not? The trimmed result is used as the repeat subunit in cases of ReferenceLengthExpressions, but the trimmed coordinates are not used. So probably no distinction is made between if the trim had left-aligned or right-aligned something that then became an RLE.
Here is the bioutils trim code, which trims prefix, then suffix:
https://github.com/biocommons/bioutils/blob/0.6.1/src/bioutils/normalize.py#L114-L122
Beta Was this translation helpful? Give feedback.
All reactions