Skip to content

Fix off by one bam add tests#1

Merged
ghuls merged 2 commits intoghuls:fix_off_by_one_bamfrom
Jakob37:fix_off_by_one_bam_add_tests
Feb 4, 2025
Merged

Fix off by one bam add tests#1
ghuls merged 2 commits intoghuls:fix_off_by_one_bamfrom
Jakob37:fix_off_by_one_bam_add_tests

Conversation

@Jakob37
Copy link

@Jakob37 Jakob37 commented Jan 23, 2025

Adding two unit tests - one for the single read scenario, and one with multiple reads as suggested by @ghuls over at 38#97 (comment)

The change naturally fails a different test though with an existing bam file. This would need some more inspection.

Results are not entirely in line with samtools depth output, even when running samtools depth -J.

Would be good to verify the results here, such that there are no other bugs sneaking through.

Looking at the results, they look similar to me, but not exactly the same. There are coverage levels present in samtools depth that are not present in d4tools.

$ samtools depth -J small.bam | tail -20
1       10138   21
1       10139   19
1       10140   19
1       10141   19
1       10142   18
1       10143   18
1       10144   15
1       10145   15
1       10146   13
1       10147   12
1       10148   9
1       10149   3
1       10150   2
1       10151   1
1       10152   1
1       10153   1
1       10154   1
1       10155   1
1       10156   1
1       10157   1
$ d4tools view small.d4 | tail
1       10138   10141   21
1       10141   10143   20
1       10143   10145   17
1       10145   10146   15
1       10146   10147   14
1       10147   10148   11
1       10148   10149   4
1       10149   10150   3
1       10150   10157   1
1       10157   20000   0

Looking for instance specifically at position 10150 in the samtools output. I guess this would correspond to 10149 in the d4 output. It should not be 3, but 2 (confirmed in IGV).

two_pos

Is this related to some of the other issues you have spotted you think @ghuls ?

@Jakob37
Copy link
Author

Jakob37 commented Jan 23, 2025

Here is the existing bam file used to test d4tools which I am looking at above if you would like to take a look:

https://github.com/ghuls/d4-format/blob/master/d4tools/test/create/from-bam/small.bam

@ghuls
Copy link
Owner

ghuls commented Jan 23, 2025

You need to make sure to use the exact same filter settings for the reads:

# SAMtools depth:
#   -aa: Report all positons
#   -J: Count deletions as coverage
#   -g 1796: Do not filter out reads with the following flags (UNMAP,SECONDARY,QCFAIL,DUP) ==> are normally filtered out.
# Convert SAMtools depth to bedGraph output with: awk '{print $1 "\t" $2 - 1 "\t" $2 "\t" $3}'
# Compact adjacent bedGraph intervals with the same value with bedGraphPack: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedGraphPack
samtools depth -aa -J -g 1796 -Q 0 ./d4tools/test/create/from-bam/small.bam \
  | awk '{print $1 "\t" $2 - 1 "\t" $2 "\t" $3}' \
  | bedGraphPack /dev/stdin small.samtools_depth.keep_bad_reads.d4

# Create d4 file, but keep reads with minimum mapping quality of 0 or higher:
d4tools create -q 0 ./d4tools/test/create/from-bam/small.bam small.d4tools_create.keep_bad_reads.d4

# Convert d4 file to bedGraph:
d4tools view small.d4tools_create.keep_bad_reads.d4 > small.d4tools_create.keep_bad_reads.bdg

# Check checksum of both files
$ md5sum small.samtools_depth.keep_bad_reads.bdg small.d4tools_create.keep_bad_reads.bdg 
cfee19ad801ce0f09ec513e38e7df302  small.samtools_depth.keep_bad_reads.bdg
cfee19ad801ce0f09ec513e38e7df302  small.d4tools_create.keep_bad_reads.bdg




# SAMtools depth:
#   -aa: Report all positons
#   -J: Count deletions as coverage
#   Filter out reads with the following flags (UNMAP,SECONDARY,QCFAIL,DUP)
# Convert SAMtools depth to bedGraph output with: awk '{print $1 "\t" $2 - 1 "\t" $2 "\t" $3}'
# Compact adjacent bedGraph intervals with the same value with bedGraphPack: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedGraphPack
samtools depth -aa -J -Q 0 ./d4tools/test/create/from-bam/small.bam \
  | awk '{print $1 "\t" $2 - 1 "\t" $2 "\t" $3}' \
  | bedGraphPack /dev/stdin small.samtools_depth.d4

# Create d4 file, but keep reads with minimum mapping quality of 0 or higher if they don't have one of the following flags set UNMAP,SECONDARY,QCFAIL,DUP: 
d4tools create -q 0 -F '~1796' ./d4tools/test/create/from-bam/small.bam small.d4tools_create.d4

# Convert d4 file to bedGraph:
d4tools view small.d4tools_create.d4 > small.d4tools_create.bdg

# Check checksum of both files.
$ md5sum small.samtools_depth.d4 small.d4tools_create.bdg
12d7a0f2aa8c29f334d77edd001f79d7  small.samtools_depth.d4
12d7a0f2aa8c29f334d77edd001f79d7  small.d4tools_create.bdg


# Check that position:
$ rg 10150 small.*.bdg
small.d4tools_create.bdg
56:1    10149   10150   2
57:1    10150   10157   1

small.d4tools_create_keep_bad_reads.bdg
58:1    10149   10150   3
59:1    10150   10157   1

small.samtools_depth.bdg
55:1    10149   10150   2
56:1    10150   10157   1

small.samtools_depth_keep_bad_reads.bdg
58:1    10149   10150   3
59:1    10150   10157   1


# To get the filter flags settings, you can run samtools flags with the flag names:
$ samtools flags UNMAP,SECONDARY,QCFAIL,DUP
0x704   1796    UNMAP,SECONDARY,QCFAIL,DUP

Setting/recommending -F '~1796 by default would be a good idea IMHO.

@Jakob37
Copy link
Author

Jakob37 commented Jan 23, 2025

You need to make sure to use the exact same filter settings for the reads:

OK great, thanks 👌

Feel free to go ahead merging this branch into your if you feel OK with it (I have no write permissions here so this is in your hands)

@ghuls ghuls merged commit 4c65f2e into ghuls:fix_off_by_one_bam Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants