Skip to content

Database for MMseqs2 searchh #2

@d-courtine

Description

@d-courtine

State of the art

For the moment, a user can pass 3 different values to --mmseqs-search-db for the reference database:

  1. A path to a valid MMseq database, present somewhere in the system
  2. A FastA/Q[.gz] file, KRYPTON setup the database from this file. The database is stored in the current project, provided by --out
  3. The name of a database that can be downloaded with mmseqs database. The database is stored in the current project too.

Problematic

If the user provides option 2. or 3., and this for X samples, this means that KRYPTON will run X times the steps for the database setup and download. This can be a bottleneck and I do not want this to happen.

Solution

  • Add another parameter, eg --user-db-path, which accepts a path on the system where the user wants to store and save a given database.

It will work as:

  • A valid database is passed to --mmseqs-search-db -> ignore --user-db-path
  • A FastA/Q file is passed to --mmseqs-search-db -> do --user-db-path contains a valid path?
    • No: setup the database within the project (--out)
    • Yes: Do a database with this name already exists in the directory --user-db-path ?
      • No: setup the database here
      • Yes: Perfect, use this path for the reference database
  • A valid name for the downloadable by MMseqs2 is passed to --mmseqs-search-db -> Same as for FastA file

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions