-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Hi @AndreLamurias,
I have no problems following the pre-preprocessing pipeline, however, when I random subsample reads (25%) and then assembly via metaFlye (same parameters), I have the following error after pre-processing.
(graphmb) bash-4.2$ graphmb --assembly ./assembly_input --outdir output_graphmb --numcores 12
logging to output_graphmb/20241026-095226graphmb_output.log
Running GraphMB 0.2.6
using cuda: False
setting seed to 1
setting tf seed
Reading cache from
DATASET STATS:
number of sequences: 1356
assembly length: 0.116 Gb
assembly N50: 0.258 Mb
assembly average length (Mb): 0.086 max: 3.596 min: 0.001
coverage samples: 1
Graph file found and read
graph edges: 522
contig paths: 1335
total ref markers sets: 58
total ref markers: 104
contigs with one or more markers: 472/1356
max SCGs on one contig: 104, average(excluding 0): 6.809
candidate k0s [33, 34, 35, 36, 37, 38]
SCG contig count min: 16 contigs
edges with overlapping scgs (max=20): [(14, 2), (6, 2), (1, 2)]
==============Running VAE model=====================
setting tf seed
edges with overlapping scgs (max=20): [(14, 2), (6, 2), (1, 2)]
deleted 6 edges with same SCGs
**** Num of edges: 1822
******* Running model: CCVAE **********
***** using edge weights: True ******
***** cluster markers only: False *****
***** self edges only: False *****
***** Using raw kmer+abund features: True
***** SCG neg pairs: (15328, 2)
***** input features dimension: (1356, 104)
Uncaught exception
Traceback (most recent call last):
File "/home/jiz322/miniconda3/envs/graphmb/bin/graphmb", line 8, in
sys.exit(main())
File "/home/jiz322/miniconda3/envs/graphmb/lib/python3.9/site-packages/graphmb/main.py", line 499, in main
vae_embs, _, _ = train_ccvae.run_model_ccvae(dataset, args, logger, 0,
File "/home/jiz322/miniconda3/envs/graphmb/lib/python3.9/site-packages/graphmb/train_ccvae.py", line 170, in run_model_ccvae
cluster_labels, stats, _, hq_bins = compute_clusters_and_stats(
File "/home/jiz322/miniconda3/envs/graphmb/lib/python3.9/site-packages/graphmb/evaluate.py", line 367, in compute_clusters_and_stats
unresolved_contigs_with_scgs = np.array([n for i,n in enumerate(node_names)
File "/home/jiz322/miniconda3/envs/graphmb/lib/python3.9/site-packages/graphmb/evaluate.py", line 368, in
if labels[i] not in positive_clusters and len(dataset.contig_markers[n]) > 0])
KeyError: 'edge_26'
additionally data can be found here: https://drive.google.com/file/d/1ztlDGWfkPf7AZlH4Ey39RWUHg8Q8Js7u/view?usp=sharing
What could be the reason (I installed from the most recent github)?
Thanks,
Jianshu