Hi,
we would like to use the pyteomics mzident parser for our metaproteomic software Prophane. We require the following infomation about protein groups (“ProteinAmbigousGroup”):
- the protein accessions per protein group
- the number and spectra identifiers of PSMs assigned to each accession
This info is in the following mzIdent sections:
- the “DBSequence.accession” of the “ProteinDetectionHypothesis” from the “ProteinAmbgiousGroup”
- the spectrumID of the “SpectrumIdentificationResult” belonging to the “SpectrumIdentificationItem”
- ProteinDetectionHypothesis.passThreshold, cvParam "protein group passes threshold" from “ProteinAmbigousGroup”, and “SpectrumIdentificationItem.passThreshold”
So far, we tried using the iterfind("ProteinAmbgiousGroup") function of the pyteomics.mzid.MzIdentML class but ran into the following issues:
- it takes too long for large mzident files (>2GB) (several hours), probably due to a lot of not needed information being extracted
- it is impossible to get the ‘spectrumID’ of the “SpectrumIdentificationResults” parent of the “SpectrumIdentificationItems”.
Any ideas how to address this are highly appreciated. Is it possible to adapt the pyteomics.mzid.MzIdentML for our task?
Hi,
we would like to use the pyteomics mzident parser for our metaproteomic software Prophane. We require the following infomation about protein groups (“ProteinAmbigousGroup”):
This info is in the following mzIdent sections:
So far, we tried using the iterfind("ProteinAmbgiousGroup") function of the pyteomics.mzid.MzIdentML class but ran into the following issues:
Any ideas how to address this are highly appreciated. Is it possible to adapt the pyteomics.mzid.MzIdentML for our task?