Skip to content

Conversation

rcastelo
Copy link
Contributor

Following our discussion in this other PR, this PR adds the parameter strandMode to the function readGAlignmentsList(), which returns a GAlignmentsList object with the real strand calculated according to the strandMode argument. This fixes the call to summarizeOverlaps(..., fragments=TRUE), so that it gives correct results and, more generally, it fixes the use of findOverlaps() with GAlignmentsList objects obtained with this new version of the readGAlignmentsList() function. However, findOverlaps() will still give the wrong result for any previously serialized GAlignmentsList objects and, because the strandMode is not stored in the object, the following functionality will differ with respect to GAlignmentPairs objects:

  • the strandMode used to create the GAlignmentsList object cannot be displayed through the show() method.
  • the original strand of the internal GAlignment objects is lost.
  • the strandMode cannot be changed after reading the object.
  • the coercion method from GAlignmentsList to GAlignmentPairs will produce a GAlignmentPairs object with the wrong strand, since GAlignmentPairs objects store the original strand, but now GAlignmentsList objects will only have the real strand.

@hpages
Copy link
Contributor

hpages commented Mar 26, 2023

Hi Robert,

Remember that we don't want to change the default behavior of readGAlignmentsList()!

Old behavior:

library(GenomicAlignments)
fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
readGAlignmentsList(fl)[[1]]
# GAlignments object with 2 alignments and 0 metadata columns:
#       seqnames strand       cigar    qwidth     start       end     width
#          <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
#   [1]     seq1      +         35M        35        36        70        35
#   [2]     seq1      -         35M        35       185       219        35
#           njunc
#       <integer>
#   [1]         0
#   [2]         0
#   -------
#   seqinfo: 2 sequences from an unspecified genome

New behavior:

readGAlignmentsList(fl)[[1]]
# GAlignments object with 2 alignments and 0 metadata columns:
#       seqnames strand       cigar    qwidth     start       end     width
#          <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
#   [1]     seq1      +         35M        35        36        70        35
#   [2]     seq1      +         35M        35       185       219        35
#           njunc
#       <integer>
#   [1]         0
#   [2]         0
#   -------
#   seqinfo: 2 sequences from an unspecified genome

The old behavior has been around forever i.e. since the introduction of the readBamGAlignmentsList() function in Rsamtools in 2013 by Martin Morgan and Valerie Obenchain, which was later renamed and moved to the GenomicAlignments package. Changing it now 10 years later will have all kinds of unfortunate consequences on existing workflows.

So instead of using strandMode=1 by default, maybe the default should be something like strandMode=NA. Calling readGAlignmentsList(..., strandMode=NA) should provide the traditional behavior, whatever we think of this behavior 😉

If we want to discuss changing the default to strandMode=1, this will need to be a separate discussion, and we should do it after the BioC 3.17 release. If we ever decide to go for it, we will need to be very careful to do it in a way that minimizes disruption as much as possible.

About the description of the strandMode argument in the man page:

  \item{strandMode}{
    Strand mode to set on the returned \link{GAlignmentPairs} object.
    See \code{?\link{strandMode}} for more information.
  }

Should probably be modified to distinguish between the readGAlignmentPairs() and readGAlignmentsList() situations, since what happens in one case or the other is quite different.

Also maybe:

  • add something in the \details section, in the readGAlignmentsList subsection
  • add some example in "C. readGAlignmentsList()" to show how strandMode affects the output

Thanks,
H.

…ards compatibility. Update help page of readGAlignmentsList() on strandMode, including a new example
@rcastelo
Copy link
Contributor Author

Hi Hervé,

sure, no problem, I fully understand the situation. I've updated the PR to set strandMode=NA by default in readGAlignmentsList(), which results in the former historical behavior of readGAlignmentsList(), and updated the documentation in the places that you suggest.

@hpages
Copy link
Contributor

hpages commented Mar 29, 2023

Hi Robert,

Thanks for the update. Will take a 2nd look soon. Won't be before next week though, as I'll be out of town starting tomorrow (Wed), chaperoning for a 3-day field trip to a remote area with no internet and no cell reception.

Best,
H.

@rcastelo
Copy link
Contributor Author

Sounds good, enjoy nature!

@rcastelo
Copy link
Contributor Author

Hi @hpages any chance you can still take a 2nd look at this and if everything looks fine, integrate it into the next release of GenomicAlignments?

@hpages hpages merged commit 9ce3cdc into Bioconductor:devel Apr 11, 2023
@hpages
Copy link
Contributor

hpages commented Apr 11, 2023

Done, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants