mzident parser

Hi,
we would like to use the pyteomics mzident parser for our metaproteomic software Prophane. We require the following infomation about protein groups (“ProteinAmbigousGroup”): 
* the protein accessions per protein group 
* the number and spectra identifiers of PSMs assigned to each accession

This info is in the following mzIdent sections:
*  the “DBSequence.accession” of the “ProteinDetectionHypothesis” from the “ProteinAmbgiousGroup”
*  the spectrumID of the “SpectrumIdentificationResult” belonging to the “SpectrumIdentificationItem”
 *  ProteinDetectionHypothesis.passThreshold, cvParam "protein group passes threshold" from “ProteinAmbigousGroup”,  and  “SpectrumIdentificationItem.passThreshold”

So far, we tried using the iterfind("ProteinAmbgiousGroup") function of the pyteomics.mzid.MzIdentML class but ran into the following issues:
1. it takes too long for large mzident files (>2GB) (several hours), probably due to a lot of not needed information being extracted 
2. it is impossible to get the ‘spectrumID’ of the “SpectrumIdentificationResults” parent of the “SpectrumIdentificationItems”. 

Any ideas how to address this are highly appreciated. Is it possible to adapt the pyteomics.mzid.MzIdentML for our task? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mzident parser #44

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

mzident parser #44

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions