Skip to content

[Discussion] Deisotoping parameters #6

@jspaezp

Description

@jspaezp

[WIP]

@mobiusklein from https://github.com/mobiusklein/ms_deisotope communicated and mentioned a couple of things regarding our current use of deisotoping.

I will try to distill here the contents and implications:

  1. Numpy array support

Reading your code, I noticed you weren’t relying on deconvolute_peaks to do input coercion for you, which it turns out was because I wasn’t calling prepare_peaklist before passing the peak list into the deconvoluter itself. I’ve fixed that. I’ve also made it so prepare_peaklist will work with a pair of numpy arrays for m/z and intensity without needing to zip them together yourself first. This fix will be live in version v0.0.46, which I’ll release tonight.

This entails changing the version, and using the new API

There were two things I wanted to ask about though.

https://github.com/TalusBio/diadem/blob/113521ff7cf5ecb807695f1d706319b7a4ebb053/diadem/mzml.py#L565-L566
The first is where you intend to use the deisotoped output? The way you’re using it, you’re letting ms_deisotope strip out all the isotopic peaks, but then you’re keeping the charge state-specific m/z values. You probably want to work with all singly charged values downstream in your code, which means you should pass your deconvoluted peaks to ms_deisotope.decharge first, which transforms all peaks to be singly charged. Otherwise, your downstream code will miss out on those multiply charged ions unless you search for their m/zs explicitly but you’ll have discarded all the evidence for those charge state assignments.

The second is w.r.t. the comment “I do not really have a reason to use one scorer over other rn”. I think your choice of MSDeconVFitter is probably safe, especially for MS2 data. If you’re finding you’re missing peaks downstream, you can safely lower the threshold from 10 to 0 and/or pass retention_strategy=ms_deisotope.deconvolution.TopNRetentionStrategy(50) to deconvolute_peaks. That will keep the top 50 most abundant peaks as singly charged even if they didn’t pass the deconvolution score threshold. Setting the threshold to 0 means the deconvoluter will reject outright bad matches, but will accept more truncated isotopic patterns.This is especially true of Orbitrap data which will discard low abundance isotopic peaks.

ATM I am using it as

peaks = prepare_peaklist(
    [
        (mz, inten)
        for mz, inten in zip(curr_spec.mz, curr_spec.intensity)
    ]
)
deconvoluted_peaks, _ = ms_deisotope.deconvolute_peaks(
    peaks,
    averagine=ms_deisotope.peptide,
    scorer=ms_deisotope.MSDeconVFitter(10.0),
    charge_range=(1, 3),
)

This would entail changing to

deconvoluted_peaks, _ = ms_deisotope.deconvolute_peaks(
    peaks,
    averagine=ms_deisotope.peptide,
    scorer=ms_deisotope.MSDeconVFitter(0),
    retention_strategy=ms_deisotope.deconvolution.TopNRetentionStrategy(50),
    charge_range=(1, 3),
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions