-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Hi there,
I'm using fdog.assembly to search for orthologs of the gene Xrcc2 across ~67 assemblies. The run completes very quickly (~16 sec !!!), without any error messages. However, the resulting .phyloprofile file contains only the reference species (NASVI@7425@2), and no orthologs are detected in any other taxa. Because it runs so fast I am suspecting that fdog.assembly actually ignores my assemblies. I added the assemblies manually, i did not use fdog.addAssembly
This is my test script:
My test script looks like this:
fdog.assembly \
--gene Xrcc2 \
--refSpec NASVI@7425@2 \
--assemblyPath /mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/tools/fDOG/fdog/data/assembly_dir \
--dataPath /mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/tools/fDOG/fdog/data \
--coregroupPath /mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/data/fdog_input/orthologs/nasvi/core_orthologs \
--out /mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/results/ \
--augustus \
--augustusRefSpec tribolium2012 \
--checkCoorthologsRef \
--parallel \
--gff \
--isoforms \
--force
output:
(fdog_env) ./test_assembly.sh
Gene: Xrcc2
fDOG reference species: NASVI@7425@2
Building a consensus sequence
...finished
/mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/results//Xrcc2//tmp/Xrcc2.con
Building a block profile ...
...finished
/mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/results//Xrcc2//tmp/Xrcc2.prfl
Searching for orthologs ...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 67/67 [00:10<00:00, 6.39it/s]
...finished
Calculating FAS scores ...
...finished
fDOG-Assembly finished completely in 16.10099506378174seconds.
Group preparation: 0.07499980926513672 Ortholog search: 10.596382856369019 FAS: 5.388456344604492
The structure of my --dataPath looks like this:
├── annotation_dir -> /mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/fdog_output/annotation_dir
├── assembly_dir
├── coreTaxa_dir -> /mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/fdog_output/coreTaxa_dir
├── searchTaxa_dir -> /mnt/ceph-hdd/workspaces/ws/scc_ubzo_sscheu/u17921-bachelor_project/fdog_output/searchTaxa_dir
e.g. I want to find from my ref species (NASVI@7425@2) in ACHCO@229769@15102025
└── assembly_dir
└── ACHCO@229769@15102025
├── blast_dir
│ ├── ACHCO@229769@15102025.fa.ndb
│ ├── ACHCO@229769@15102025.fa.nhr
│ ├── ACHCO@229769@15102025.fa.nin
│ ├── ACHCO@229769@15102025.fa.njs
│ ├── ACHCO@229769@15102025.fa.nog
│ ├── ACHCO@229769@15102025.fa.nos
│ ├── ACHCO@229769@15102025.fa.not
│ ├── ACHCO@229769@15102025.fa.nsq
│ ├── ACHCO@229769@15102025.fa.ntf
│ └── ACHCO@229769@15102025.fa.nto
└── ACHCO@229769@15102025.fna
annotation_dir/NASVI@7425@2.json
coreTaxa_dir/NASVI@7425@2
├── NASVI@7425@2.fa
├── NASVI@7425@2.fa.checked
├── NASVI@7425@2.fa.fai
├── NASVI@7425@2.pdb
├── NASVI@7425@2.phr
├── NASVI@7425@2.pin
├── NASVI@7425@2.pjs
├── NASVI@7425@2.pot
├── NASVI@7425@2.psq
├── NASVI@7425@2.ptf
└── NASVI@7425@2.pto
searchTaxa_dir/NASVI@7425@2
├── NASVI@7425@2.fa
├── NASVI@7425@2.fa.checked
└── NASVI@7425@2.fa.fai
same structure it is for all other assemblies.
If it matters: the headder of every assembly .fa looks similar to this:
>JAQFYK010000001.1 Achipteria coleoptrata isolate Gr-Ori-00870 scaffold1_size53908, whole genome shotgun sequence
Do you have any idea what could be causing this?