Fix off by one bam add tests#1
Merged
ghuls merged 2 commits intoghuls:fix_off_by_one_bamfrom Feb 4, 2025
Merged
Conversation
Author
|
Here is the existing bam file used to test d4tools which I am looking at above if you would like to take a look: https://github.com/ghuls/d4-format/blob/master/d4tools/test/create/from-bam/small.bam |
Owner
|
You need to make sure to use the exact same filter settings for the reads: # SAMtools depth:
# -aa: Report all positons
# -J: Count deletions as coverage
# -g 1796: Do not filter out reads with the following flags (UNMAP,SECONDARY,QCFAIL,DUP) ==> are normally filtered out.
# Convert SAMtools depth to bedGraph output with: awk '{print $1 "\t" $2 - 1 "\t" $2 "\t" $3}'
# Compact adjacent bedGraph intervals with the same value with bedGraphPack: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedGraphPack
samtools depth -aa -J -g 1796 -Q 0 ./d4tools/test/create/from-bam/small.bam \
| awk '{print $1 "\t" $2 - 1 "\t" $2 "\t" $3}' \
| bedGraphPack /dev/stdin small.samtools_depth.keep_bad_reads.d4
# Create d4 file, but keep reads with minimum mapping quality of 0 or higher:
d4tools create -q 0 ./d4tools/test/create/from-bam/small.bam small.d4tools_create.keep_bad_reads.d4
# Convert d4 file to bedGraph:
d4tools view small.d4tools_create.keep_bad_reads.d4 > small.d4tools_create.keep_bad_reads.bdg
# Check checksum of both files
$ md5sum small.samtools_depth.keep_bad_reads.bdg small.d4tools_create.keep_bad_reads.bdg
cfee19ad801ce0f09ec513e38e7df302 small.samtools_depth.keep_bad_reads.bdg
cfee19ad801ce0f09ec513e38e7df302 small.d4tools_create.keep_bad_reads.bdg
# SAMtools depth:
# -aa: Report all positons
# -J: Count deletions as coverage
# Filter out reads with the following flags (UNMAP,SECONDARY,QCFAIL,DUP)
# Convert SAMtools depth to bedGraph output with: awk '{print $1 "\t" $2 - 1 "\t" $2 "\t" $3}'
# Compact adjacent bedGraph intervals with the same value with bedGraphPack: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedGraphPack
samtools depth -aa -J -Q 0 ./d4tools/test/create/from-bam/small.bam \
| awk '{print $1 "\t" $2 - 1 "\t" $2 "\t" $3}' \
| bedGraphPack /dev/stdin small.samtools_depth.d4
# Create d4 file, but keep reads with minimum mapping quality of 0 or higher if they don't have one of the following flags set UNMAP,SECONDARY,QCFAIL,DUP:
d4tools create -q 0 -F '~1796' ./d4tools/test/create/from-bam/small.bam small.d4tools_create.d4
# Convert d4 file to bedGraph:
d4tools view small.d4tools_create.d4 > small.d4tools_create.bdg
# Check checksum of both files.
$ md5sum small.samtools_depth.d4 small.d4tools_create.bdg
12d7a0f2aa8c29f334d77edd001f79d7 small.samtools_depth.d4
12d7a0f2aa8c29f334d77edd001f79d7 small.d4tools_create.bdg
# Check that position:
$ rg 10150 small.*.bdg
small.d4tools_create.bdg
56:1 10149 10150 2
57:1 10150 10157 1
small.d4tools_create_keep_bad_reads.bdg
58:1 10149 10150 3
59:1 10150 10157 1
small.samtools_depth.bdg
55:1 10149 10150 2
56:1 10150 10157 1
small.samtools_depth_keep_bad_reads.bdg
58:1 10149 10150 3
59:1 10150 10157 1
# To get the filter flags settings, you can run samtools flags with the flag names:
$ samtools flags UNMAP,SECONDARY,QCFAIL,DUP
0x704 1796 UNMAP,SECONDARY,QCFAIL,DUPSetting/recommending |
Author
OK great, thanks 👌 Feel free to go ahead merging this branch into your if you feel OK with it (I have no write permissions here so this is in your hands) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adding two unit tests - one for the single read scenario, and one with multiple reads as suggested by @ghuls over at 38#97 (comment)
The change naturally fails a different test though with an existing bam file. This would need some more inspection.
Results are not entirely in line with
samtools depthoutput, even when runningsamtools depth -J.Would be good to verify the results here, such that there are no other bugs sneaking through.
Looking at the results, they look similar to me, but not exactly the same. There are coverage levels present in
samtools depththat are not present ind4tools.Looking for instance specifically at position 10150 in the samtools output. I guess this would correspond to
10149in the d4 output. It should not be 3, but 2 (confirmed in IGV).Is this related to some of the other issues you have spotted you think @ghuls ?