Qualimap assigns more reads to exonic regions than total reads in the sample 

### Description of the bug

A while ago we realized that `QUALIMAP` assigns more reads to exonic+intronic+intergenic than total reads in the sample. We subsampled the paired-end SRA FASTQ to exactly 1M fragments by `head -n 4000000 file_{1,2}.fastq > subsampled_{1,2}.fastq`.
Then, the pipeline runs the samples through STAR, then Picard markduplicates, and then qualimap (broadly speaking).

The issue is that although the samples have exactly 1M reads, and the initial FastQC correctly reports that there are indeed exactly 1M reads, Qualimap _consistently assigns more than 1M reads to genomic features, such as ~1.5M reads mapping to exons_, and then a bit more to introns and intergenic regions. The STAR logs also report exactly 1M reads to map. The reads assigned to genomic features by qualimap, when added up, do not add up to 2M, so Qualimap is not quantifying reads instead of fragments, which could explain the discrepancy.

The way qualimap runs when the variables are translated to arguments is correct (our samples are paired-end and the strandedness is reverse):
```
qualimap \
        rnaseq \
        -bam input.markdup.sorted.bam \
        -gtf annotation.gtf \
        -p strand-specific-reverse \
        -pe \
        -outdir out/
```

We also made sure that the counting algorithm is `uniquely-mapped-reads`, as is shown in this screenshot from the Qualimap report. It is my understanding that using this counting algorithm means that chimeric and secondary alignments are not counted, which could explain the higher number if I did not use this counting algorithm:
![image](https://github.com/nf-core/rnaseq/assets/49684557/e4daa0a0-3ae1-4898-96d2-e3de6e6d3fb9)

We know that this is not a `nf-core/rnaseq` issue but we also reported the same issue a while ago on Qualimap's BitBucket repo. The link to the issue is [here](https://bitbucket.org/kokonech/qualimap/issues/81/qualimap-assigns-more-reads-to-exonic) but it can't be accessed because the issue has not been approved yet. It was submitted on the 22nd of January 2024, so 8 weeks have passed and there are no responses or updates.

Maybe there is an explanation that we are missing but we have tried everything and still can't explain how Qualimap assigns more reads to genomic features than total reads in the sample. Personally, we have decided to drop the Qualimap results and compute those statistics with an in-house script, and wanted to open the discussion here.

### Command used and terminal output

_No response_

### Relevant files

_No response_

### System information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualimap assigns more reads to exonic regions than total reads in the sample #1273

Description of the bug

Command used and terminal output

Relevant files

System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qualimap assigns more reads to exonic regions than total reads in the sample #1273

Description

Description of the bug

Command used and terminal output

Relevant files

System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions