-
|
Hello! In a program I'm writing I need to identify several hundred MS spectra from an MzXML file from a list of scan numbers. I found the get_by_id() function in the pyteomics XML class that generally does this. Looking at the code, it appears there are two options - either it parses from the beginning of the file until it identifies a spectrum with the scan number passed in as a parameter, or it identifies the spectrum by dictionary. This latter option is what I'm looking for, because starting from the beginning of the file causes huge delays for large scan numbers. However, I have not been able to activate the identification by dictionary method. Some ideas I have been trying. My test for determining if these methods work is to check if it still takes significantly longer to identify spectra with large scan numbers compared to those with small scan numbers. Using a dictionary should significantly reduce the time complexity of this test.
I am continuing to investigate the issue and ways to bypass it, but I thought I'd reach out and ask about this. Is there a more straightforward way to trigger this identification by dictionary method? Am I misunderstanding the possible uses of the xml.get_by_id() function? Do you have recommendations for how to move forward? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
Hi! Generally, you shouldn't need to do anything special to achieve this. In any case, the created object should have The behavior you need is indeed defined in
I hope this helps somewhat. If not, I suggest sharing some of the relevant code where you do the lookups. |
Beta Was this translation helpful? Give feedback.
Hi!
Generally, you shouldn't need to do anything special to achieve this.
use_index=Trueis needed if you create the parser withmzxml.read(); if you callmzxml.MzXMLinstead, indexing is enabled by default.In any case, the created object should have
_offset_indexpopulated with byte offsets of elements. You can checklen(reader)and get the amount of items in the index.The behavior you need is indeed defined in
get_by_id, but not the version on theXMLclass, rather the one onIndexedXML. You don't need to call it directly though, you can just use dict-like syntax on the reader object. You should even be able to get all your spectra at once by requesting a list of IDs.build_id_cachei…