Skip to content

fdog.assembly runs without errors, but .phyloprofile contains only reference species, ignoring all input assemblies #62

@lolala37

Description

@lolala37

Hi there,
I'm using fdog.assembly to search for orthologs of the gene Xrcc2 across ~67 assemblies. The run completes very quickly (~16 sec !!!), without any error messages. However, the resulting .phyloprofile file contains only the reference species (NASVI@7425@2), and no orthologs are detected in any other taxa. Because it runs so fast I am suspecting that fdog.assembly actually ignores my assemblies. I added the assemblies manually, i did not use fdog.addAssembly

This is my test script:

My test script looks like this:

fdog.assembly \
  --gene Xrcc2 \
--refSpec NASVI@7425@2 \
  --assemblyPath /mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/tools/fDOG/fdog/data/assembly_dir \
  --dataPath /mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/tools/fDOG/fdog/data \
  --coregroupPath /mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/data/fdog_input/orthologs/nasvi/core_orthologs \
  --out /mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/results/ \
  --augustus \
  --augustusRefSpec tribolium2012 \
  --checkCoorthologsRef \
  --parallel \
  --gff \
  --isoforms \
  --force

output:

(fdog_env) ./test_assembly.sh 
Gene: Xrcc2
fDOG reference species: NASVI@7425@2 

Building a consensus sequence
	 ...finished

/mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/results//Xrcc2//tmp/Xrcc2.con
Building a block profile ...
	 ...finished 

/mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/results//Xrcc2//tmp/Xrcc2.prfl
Searching for orthologs ...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 67/67 [00:10<00:00,  6.39it/s]
	 ...finished 

Calculating FAS scores ...
	 ...finished 

fDOG-Assembly finished completely in 16.10099506378174seconds.
Group preparation: 0.07499980926513672 	 Ortholog search: 10.596382856369019 	 FAS: 5.388456344604492 

The structure of my --dataPath looks like this:

├── annotation_dir -> /mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/fdog_output/annotation_dir
├── assembly_dir
├── coreTaxa_dir -> /mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/fdog_output/coreTaxa_dir
├── searchTaxa_dir -> /mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/fdog_output/searchTaxa_dir

e.g. I want to find from my ref species (NASVI@7425@2) in ACHCO@229769@15102025

└── assembly_dir
    └── ACHCO@229769@15102025
        ├── blast_dir
        │   ├── ACHCO@229769@15102025.fa.ndb
        │   ├── ACHCO@229769@15102025.fa.nhr
        │   ├── ACHCO@229769@15102025.fa.nin
        │   ├── ACHCO@229769@15102025.fa.njs
        │   ├── ACHCO@229769@15102025.fa.nog
        │   ├── ACHCO@229769@15102025.fa.nos
        │   ├── ACHCO@229769@15102025.fa.not
        │   ├── ACHCO@229769@15102025.fa.nsq
        │   ├── ACHCO@229769@15102025.fa.ntf
        │   └── ACHCO@229769@15102025.fa.nto
        └── ACHCO@229769@15102025.fna

annotation_dir/NASVI@7425@2.json 
coreTaxa_dir/NASVI@7425@2
├── NASVI@7425@2.fa
├── NASVI@7425@2.fa.checked
├── NASVI@7425@2.fa.fai
├── NASVI@7425@2.pdb
├── NASVI@7425@2.phr
├── NASVI@7425@2.pin
├── NASVI@7425@2.pjs
├── NASVI@7425@2.pot
├── NASVI@7425@2.psq
├── NASVI@7425@2.ptf
└── NASVI@7425@2.pto
searchTaxa_dir/NASVI@7425@2
├── NASVI@7425@2.fa
├── NASVI@7425@2.fa.checked
└── NASVI@7425@2.fa.fai

same structure it is for all other assemblies.
If it matters: the headder of every assembly .fa looks similar to this:
>JAQFYK010000001.1 Achipteria coleoptrata isolate Gr-Ori-00870 scaffold1_size53908, whole genome shotgun sequence

Do you have any idea what could be causing this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions