Skip to content

Scaffolding steps and checkpoints #3

@andreaswallberg

Description

@andreaswallberg

Hi @markhilt !

I am running ARBitR on a very large and quite fragmented genome assembly. I have gotten to the point where the program runs many processes in parallel:

[Wed Oct 14 18:45:43 2020] Trimming contig ends...
[Wed Oct 14 20:25:35 2020] [ TRIMMING ] Completed: 100.0% (24098 out of 24098)
[Wed Oct 14 20:29:38 2020] Creating scaffolds...
[Wed Oct 14 20:29:38 2020] Number of paths: 24098
[Thu Oct 15 22:52:48 2020] [ SCAFFOLDING ] Completed: 99.67% (24018 out of 24098)

All my 16 processes are running with 55-99% CPU, respectively. However it seems as if the analysis has been sitting at 99.67% completed for over 10 hours or so by now. Up until this point the scaffolding completion was going quite fast.

Is there a particularly time/CPU consuming step at the very end of the scaffolding process?

Also, seeing as the program has produced various, including the GFA and path text files, I wonder if it is possible to resume the analysis from these checkpoint datasets if I need to cancel the analysis (after it has now been running for one week)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions