Skip to content

mzident parser #44

@jschmacht

Description

@jschmacht

Hi,
we would like to use the pyteomics mzident parser for our metaproteomic software Prophane. We require the following infomation about protein groups (“ProteinAmbigousGroup”):

  • the protein accessions per protein group
  • the number and spectra identifiers of PSMs assigned to each accession

This info is in the following mzIdent sections:

  • the “DBSequence.accession” of the “ProteinDetectionHypothesis” from the “ProteinAmbgiousGroup”
  • the spectrumID of the “SpectrumIdentificationResult” belonging to the “SpectrumIdentificationItem”
  • ProteinDetectionHypothesis.passThreshold, cvParam "protein group passes threshold" from “ProteinAmbigousGroup”, and “SpectrumIdentificationItem.passThreshold”

So far, we tried using the iterfind("ProteinAmbgiousGroup") function of the pyteomics.mzid.MzIdentML class but ran into the following issues:

  1. it takes too long for large mzident files (>2GB) (several hours), probably due to a lot of not needed information being extracted
  2. it is impossible to get the ‘spectrumID’ of the “SpectrumIdentificationResults” parent of the “SpectrumIdentificationItems”.

Any ideas how to address this are highly appreciated. Is it possible to adapt the pyteomics.mzid.MzIdentML for our task?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions