When reading alignments from a BAM or CRAM file and writing them to another BAM/CRAM file as they are, the values of integer tags may change.
Repro
$ samtools view int_tag_overflow.bam
r1 4 * 0 0 * * 0 0 ATGC #### XA:i:4294967295
(require '[cljam.io.sasm :as sam])
(with-open [r (sam/reader "int_tag_overflow.bam")]
(doall (sam/read-alignments r)))
;=>
({:qname "r1",
:flag 4,
:rname "*",
...
:seq "ATGC",
:qual "####",
:options ({:XA {:type "i", :value 4294967295}})})
(with-open [r (sam/reader "int_tag_overflow.bam")
w (sam/writer "int_tag_overflow.rewrite.bam")]
(sam/write-header w (sam/read-header r))
(sam/write-refs w (sam/read-refs r))
(sam/write-alignments w (sam/read-alignments r) (sam/read-header r)))
(with-open [r (sam/reader "int_tag_overflow.rewrite.bam")]
(doall (sam/read-alignments r)))
;=>
({:qname "r1",
:flag 4,
:rname "*",
...
:seq "ATGC",
:qual "####",
:options ({:XA {:type "i", :value -1}})}) ;; <- this value has changed from the original one
Cause
- The SAM format defines the only integer tag type
i (signed arbitrary-precision integer) while the BAM/CRAM format has the i integer tag type with different semantics (signed 32bit integer), as well as other integer types (c/C/s/S/I)
- cljam's BAM/CRAM reader interprets any integer tag value as the
i tag type
- cljam's BAM/CRAM writer doesn't check if each integer tag value fits the specified tag type. It writes a tag value as the
i tag type even if it can't be represented as a signed 32bit integer.
When reading alignments from a BAM or CRAM file and writing them to another BAM/CRAM file as they are, the values of integer tags may change.
Repro
Cause
i(signed arbitrary-precision integer) while the BAM/CRAM format has theiinteger tag type with different semantics (signed 32bit integer), as well as other integer types (c/C/s/S/I)itag typeitag type even if it can't be represented as a signed 32bit integer.