🐞 fix: TSNE now encodes query sequence correctly +++ fixed types in C… #22
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi There
Even though it looks like the repo is pretty stale at this point, I still would like to submit a fix for a bug that I found when it comes to computing the TSNE in the script
ClusterMSA.py. Specifically, in line200the sequences are converted to lists and and subsequently added together. However, thequery_.sequence.tolist()is itself wrapped inside square brackets which causes it to be another list. As a consequence, the final inputs for theencode_seqscall contain a single-entry list as final element (i.e. the one for the query sequence) rather than a string. Therefore the encoding does not work properly for this case, which lets the query sequence also appear in a wrong location in the TSNE plot.What the inputs actually look like
As a simple test to exemplify this, run the following (with any arbitrary a3m you have lying around as input)
Also, I found that when submitting custom values to the command line interface for arguments like
min_samplesthey were retained asstreven though the defaults were numeric. Therefore, I added type declarations to the CLI forto ensure that the values are properly converted.
Cheers,
Noah ☀️