-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Description of the bug
The pipeline stops with the error message: A USER ERROR has occurred: Contig chr1_KI270706v1_random not present in reads sequence dictionary
I have downloaded the relevant igenomes data locally:
$ ls /data/ref/igenomes/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta /data/ref/igenomes/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta
My test.csv file contains:
$ cat test.csv sample,bam,bai SLPD001,/datf/sl/users/ivana/BTB/SLPD001/wgs-b/151013_ST-E00269_0029_BHGC5YCCXX/P2233_137_S9_L005_R1_001.bam,/datf/sl/users/ivana/BTB/SLPD001/wgs-b/151013_ST-E00269_0029_BHGC5YCCXX/P2233_137_S9_L005_R1_001.bam.bai
I made sure that the offending contig is absent from the .bam file. (When present, in my first attempt, the same error resulted.)
Command used and terminal output
$nextflow run nf-core/createpanelrefs -r dev -profile singularity --input test.csv --tools germlinecnvcaller --genome GATK.GRCh38 --igenomes_base /data/ref/igenomes/ --outdir test_out
.......
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/createpanelrefs] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_CREATEPANELREFS:CREATEPANELREFS:GERMLINECNVCALLER_COHORT:GATK4_COLLECTREADCOUNTS (SLPD001)'
Caused by:
Process `NFCORE_CREATEPANELREFS:CREATEPANELREFS:GERMLINECNVCALLER_COHORT:GATK4_COLLECTREADCOUNTS (SLPD001)` terminated with an error exit status (2)
Command executed:
gatk --java-options "-Xmx9830M -XX:-UsePerfData" \
CollectReadCounts \
--input P2233_137_S9_L005_R1_001.bam \
--intervals genome.interval_list \
--output SLPD001.hdf5 \
--reference Homo_sapiens_assembly38.fasta \
--tmp-dir . \
--format HDF5 --imr OVERLAPPING_ONLY
cat <<-END_VERSIONS > versions.yml
"NFCORE_CREATEPANELREFS:CREATEPANELREFS:GERMLINECNVCALLER_COHORT:GATK4_COLLECTREADCOUNTS":
gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//')
END_VERSIONS
Command exit status:
2
Command output:
(empty)
Command error:
INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
Using GATK jar /opt/conda/share/gatk4-4.6.2.0-0/gatk-package-4.6.2.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx9830M -XX:-UsePerfData -jar /opt/conda/share/gatk4-4.6.2.0-0/gatk-package-4.6.2.0-local.jar CollectReadCounts --input P2233_137_S9_L005_R1_001.bam --intervals genome.interval_list --output SLPD001.hdf5 --reference Homo_sapiens_assembly38.fasta --tmp-dir . --format HDF5 --imr OVERLAPPING_ONLY
08:23:35.823 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/conda/share/gatk4-4.6.2.0-0/gatk-package-4.6.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
08:23:36.177 INFO CollectReadCounts - ------------------------------------------------------------
08:23:36.182 INFO CollectReadCounts - The Genome Analysis Toolkit (GATK) v4.6.2.0
08:23:36.182 INFO CollectReadCounts - For support and documentation go to https://software.broadinstitute.org/gatk/
08:23:36.182 INFO CollectReadCounts - Executing as peter@monod33.mbb.ki.se on Linux v4.18.0-553.53.1.el8_10.x86_64 amd64
08:23:36.182 INFO CollectReadCounts - Java runtime: OpenJDK 64-Bit Server VM v17.0.11-internal+0-adhoc..src
08:23:36.183 INFO CollectReadCounts - Start Date/Time: December 10, 2025 at 8:23:35 AM GMT
08:23:36.183 INFO CollectReadCounts - ------------------------------------------------------------
08:23:36.183 INFO CollectReadCounts - ------------------------------------------------------------
08:23:36.184 INFO CollectReadCounts - HTSJDK Version: 4.2.0
08:23:36.184 INFO CollectReadCounts - Picard Version: 3.4.0
08:23:36.184 INFO CollectReadCounts - Built for Spark Version: 3.5.0
08:23:36.187 INFO CollectReadCounts - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:23:36.187 INFO CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:23:36.188 INFO CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:23:36.188 INFO CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:23:36.188 INFO CollectReadCounts - Deflater: IntelDeflater
08:23:36.188 INFO CollectReadCounts - Inflater: IntelInflater
08:23:36.188 INFO CollectReadCounts - GCS max retries/reopens: 20
08:23:36.188 INFO CollectReadCounts - Requester pays: disabled
08:23:36.189 INFO CollectReadCounts - Initializing engine
08:23:36.889 INFO FeatureManager - Using codec IntervalListCodec to read file file://genome.interval_list
08:23:45.352 INFO IntervalArgumentCollection - Processing 3043969085 bp from intervals
08:23:45.462 INFO CollectReadCounts - Done initializing engine
08:23:45.470 WARN CollectReadCounts - Sequence dictionary in BAM does not match the master sequence dictionary.
08:23:45.471 INFO CollectReadCounts - Collecting read counts...
08:23:45.471 INFO ProgressMeter - Starting traversal
08:23:45.472 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
08:23:45.566 INFO CollectReadCounts - Shutting down engine
[December 10, 2025 at 8:23:45 AM GMT] org.broadinstitute.hellbender.tools.copynumber.CollectReadCounts done. Elapsed time: 0.17 minutes.
Runtime.totalMemory()=2013265920
***********************************************************************
A USER ERROR has occurred: Contig chr1_KI270706v1_random not present in reads sequence dictionaryRelevant files
System information
Nextflow version 25.10.0
Almalinux 8
nf-core/createpanelrefs 1.0.0
createpanelrefs [thirsty_varahamihira] DSL2 - revision: ab14ab0 [dev]