Skip to content

Pipeline exist with error in reads sequence directory #65

@pl-ki

Description

@pl-ki

Description of the bug

The pipeline stops with the error message: A USER ERROR has occurred: Contig chr1_KI270706v1_random not present in reads sequence dictionary

I have downloaded the relevant igenomes data locally:
$ ls /data/ref/igenomes/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta /data/ref/igenomes/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta
My test.csv file contains:
$ cat test.csv sample,bam,bai SLPD001,/datf/sl/users/ivana/BTB/SLPD001/wgs-b/151013_ST-E00269_0029_BHGC5YCCXX/P2233_137_S9_L005_R1_001.bam,/datf/sl/users/ivana/BTB/SLPD001/wgs-b/151013_ST-E00269_0029_BHGC5YCCXX/P2233_137_S9_L005_R1_001.bam.bai
I made sure that the offending contig is absent from the .bam file. (When present, in my first attempt, the same error resulted.)

Command used and terminal output

$nextflow run nf-core/createpanelrefs -r dev  -profile singularity --input test.csv --tools germlinecnvcaller --genome GATK.GRCh38 --igenomes_base /data/ref/igenomes/ --outdir test_out
.......
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/createpanelrefs] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_CREATEPANELREFS:CREATEPANELREFS:GERMLINECNVCALLER_COHORT:GATK4_COLLECTREADCOUNTS (SLPD001)'

Caused by:
  Process `NFCORE_CREATEPANELREFS:CREATEPANELREFS:GERMLINECNVCALLER_COHORT:GATK4_COLLECTREADCOUNTS (SLPD001)` terminated with an error exit status (2)


Command executed:

  gatk --java-options "-Xmx9830M -XX:-UsePerfData" \
      CollectReadCounts \
      --input P2233_137_S9_L005_R1_001.bam \
      --intervals genome.interval_list \
      --output SLPD001.hdf5 \
      --reference Homo_sapiens_assembly38.fasta \
      --tmp-dir . \
      --format HDF5 --imr OVERLAPPING_ONLY
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CREATEPANELREFS:CREATEPANELREFS:GERMLINECNVCALLER_COHORT:GATK4_COLLECTREADCOUNTS":
      gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//')
  END_VERSIONS

Command exit status:
  2

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  Using GATK jar /opt/conda/share/gatk4-4.6.2.0-0/gatk-package-4.6.2.0-local.jar
  Running:
      java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx9830M -XX:-UsePerfData -jar /opt/conda/share/gatk4-4.6.2.0-0/gatk-package-4.6.2.0-local.jar CollectReadCounts --input P2233_137_S9_L005_R1_001.bam --intervals genome.interval_list --output SLPD001.hdf5 --reference Homo_sapiens_assembly38.fasta --tmp-dir . --format HDF5 --imr OVERLAPPING_ONLY
  08:23:35.823 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/conda/share/gatk4-4.6.2.0-0/gatk-package-4.6.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
  08:23:36.177 INFO  CollectReadCounts - ------------------------------------------------------------
  08:23:36.182 INFO  CollectReadCounts - The Genome Analysis Toolkit (GATK) v4.6.2.0
  08:23:36.182 INFO  CollectReadCounts - For support and documentation go to https://software.broadinstitute.org/gatk/
  08:23:36.182 INFO  CollectReadCounts - Executing as peter@monod33.mbb.ki.se on Linux v4.18.0-553.53.1.el8_10.x86_64 amd64
  08:23:36.182 INFO  CollectReadCounts - Java runtime: OpenJDK 64-Bit Server VM v17.0.11-internal+0-adhoc..src
  08:23:36.183 INFO  CollectReadCounts - Start Date/Time: December 10, 2025 at 8:23:35 AM GMT
  08:23:36.183 INFO  CollectReadCounts - ------------------------------------------------------------
  08:23:36.183 INFO  CollectReadCounts - ------------------------------------------------------------
  08:23:36.184 INFO  CollectReadCounts - HTSJDK Version: 4.2.0
  08:23:36.184 INFO  CollectReadCounts - Picard Version: 3.4.0
  08:23:36.184 INFO  CollectReadCounts - Built for Spark Version: 3.5.0
  08:23:36.187 INFO  CollectReadCounts - HTSJDK Defaults.COMPRESSION_LEVEL : 2
  08:23:36.187 INFO  CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
  08:23:36.188 INFO  CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
  08:23:36.188 INFO  CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
  08:23:36.188 INFO  CollectReadCounts - Deflater: IntelDeflater
  08:23:36.188 INFO  CollectReadCounts - Inflater: IntelInflater
  08:23:36.188 INFO  CollectReadCounts - GCS max retries/reopens: 20
  08:23:36.188 INFO  CollectReadCounts - Requester pays: disabled
  08:23:36.189 INFO  CollectReadCounts - Initializing engine
  08:23:36.889 INFO  FeatureManager - Using codec IntervalListCodec to read file file://genome.interval_list
  08:23:45.352 INFO  IntervalArgumentCollection - Processing 3043969085 bp from intervals
  08:23:45.462 INFO  CollectReadCounts - Done initializing engine
  08:23:45.470 WARN  CollectReadCounts - Sequence dictionary in BAM does not match the master sequence dictionary.
  08:23:45.471 INFO  CollectReadCounts - Collecting read counts...
  08:23:45.471 INFO  ProgressMeter - Starting traversal
  08:23:45.472 INFO  ProgressMeter -        Current Locus  Elapsed Minutes       Reads Processed     Reads/Minute
  08:23:45.566 INFO  CollectReadCounts - Shutting down engine
  [December 10, 2025 at 8:23:45 AM GMT] org.broadinstitute.hellbender.tools.copynumber.CollectReadCounts done. Elapsed time: 0.17 minutes.
  Runtime.totalMemory()=2013265920
  ***********************************************************************
  
  A USER ERROR has occurred: Contig chr1_KI270706v1_random not present in reads sequence dictionary

Relevant files

nextflow.log

System information

Nextflow version 25.10.0
Almalinux 8
nf-core/createpanelrefs 1.0.0
createpanelrefs [thirsty_varahamihira] DSL2 - revision: ab14ab0 [dev]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions