-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
Bug Report
When resolving the GenBank accession of genome assemblies to the RefSeq accession using the _resolve_genbank_accession() function in podp_antismash_downloader.py, a ValueError occurs if no RefSeq accession exists for the given assembly. This leads to crashes when processing these assemblies.
Steps to Reproduce
- Use the NCBI Datasets API to fetch the RefSeq assembly ID for a given GenBank assembly ID.
- If a RefSeq accession is available, the function operates as expected. Example:
{ "assembly_revisions": [ { "genbank_accession": "GCA_000175835.1", "refseq_accession": "GCF_000175835.1", "assembly_name": "ASM17583v1", "assembly_level": "contig", "release_date": "2009-12-15" } ], "total_count": 1 } - However, when the API response does not include a
refseq_accession, the function fails. Example:{ "assembly_revisions": [ { "genbank_accession": "GCA_003326215.1", "assembly_name": "ASM332621v1", "assembly_level": "contig", "release_date": "2018-07-18", "sequencing_technology": "Illumina MiSeq" } ], "total_count": 1 } - This results in the following error:
File ~/coding/NPLinker_workshop_2025/nplinker/src/nplinker/genomics/antismash/podp_antismash_downloader.py:284, in _resolve_genbank_accession(genbank_id) 282 if resp.status_code == httpx.codes.OK: 283 data = resp.json() --> 284 latest_entry = max( 285 (entry for entry in data["assembly_revisions"] if "refseq_accession" in entry), 286 key=lambda x: x["release_date"], 287 ) 288 refseq_id = latest_entry["refseq_accession"] 289 except httpx.ReadTimeout: ValueError: max() arg is an empty sequence
Suggested Fix
Returning an empty string when no RefSeq accession is found.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Backlog