Skip to content

'init' not passed on to bpiterate in .reduceByYield_iterate #5

@teunbrand

Description

@teunbrand

Hello everyone,

I was trying to use reduceByYield(..., init = DF, iterate = TRUE, parallel = TRUE), but it didn't seem to pass on the init argument to the downstream reduce function. Adapting an example from the documentation, I can show the problem as follows. Below is identical to the example:

suppressPackageStartupMessages({
    library(Rsamtools)
    library(GenomicFiles)
})

fl <- system.file(package="Rsamtools", "extdata", "ex1.bam")
bf <- BamFile(fl, yieldSize=500)

YIELD <- function(X, ...) {
    flag = scanBamFlag(isUnmappedQuery=FALSE)
    param = ScanBamParam(flag=flag, what="seq")
    scanBam(X, param=param, ...)[[1]][['seq']]
}
MAP <- function(value, ...) {
    requireNamespace("Biostrings", quietly=TRUE)
    Biostrings::alphabetFrequency(value, collapse=TRUE)
}
REDUCE <- `+`

Then, we could want to offset every number by +100 and try to do this through the init parameter.

init <- alphabetFrequency(DNAStringSet())
init <- setNames(rep(100, ncol(init)), colnames(init))
print(init)
#>   A   C   G   T   M   R   W   S   Y   K   V   H   D   B   N   -   +   . 
#> 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100

When we do this, the output is identical to the output we'd get if we had not set the init argument.

outcome <- reduceByYield(bf, YIELD, MAP, REDUCE, parallel=TRUE, init = init)

print(outcome)
#>     A     C     G     T     M     R     W     S     Y     K     V     H     D 
#> 39904 23195 20477 31681     0     0     0     0     0     0     0     0     0 
#>     B     N     -     +     . 
#>     0    29     0     0     0

The following is the outcome I had expected, and is also the outcome when setting parallel = FALSE.

print(outcome + 100)
#>     A     C     G     T     M     R     W     S     Y     K     V     H     D 
#> 40004 23295 20577 31781   100   100   100   100   100   100   100   100   100 
#>     B     N     -     +     . 
#>   100   129   100   100   100

I think that the line mentioned below doesn't pass on the init parameter to bpiterate, but I don't know if this is intended or not.

result <- bpiterate(ITER, FUN=MAP, REDUCE=REDUCE, ...)

I had assumed this is a bug because I thought changing the parallel parameter shouldn't effect the outcome, but it does, so I thought to report it here.

Thanks for reading!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions