Skip to content

michaelmarty/MassSpecCodingClub

Repository files navigation

Mass Spec Coding Club

The Mass Spec Coding Club (MSCC) is a community dedicated to education of computer coding applied to mass spectrometry applications. Our goal is to make coding accessible to mass spectrometry researchers and provide free resources and open-source examples.

As the community develops, we will continue to post more content, and we welcome contributions from anyone.

Discord Server

Want to chat with community members and join meetings, join us on the Mass Spec Coding Club Discord Server. It's easy to set up, and you can run it from a browser if you'd like. We will pick a time soon and start hosting meetings/office hours there.

In the meantime, feel free to post questions to the text channels there, and people can answer.

Learning Modules

Module 0: Setting Up Python and Plotting A Spectrum

This series of lessons will cover how to set up Python from scratch and write a simple script to plot a mass spectrum. Skills and learning outcomes are outlined below each video

The data files, Python code, and notes used in this module are available in the "Module 0" folder.

Module 1: Calculating Masses

The goal of module 1 is to show how Python can be used to predict masses of various molecules, starting with proteins.

Module 2: Parsing Fasta and Excel Files

The goal of module 2 is to parse larger FASTA and Excel files to pull in the desired information and write it to an Excel file output.

  • Lesson 2.0: Importing FASTA Files
    • Importing Pyteomics library
    • Importing our own functions from Module 1
    • Looping through a FASTA file to calculate masses
  • Lesson 2.1: Parsing FASTA Descriptions and Writing to Excel
    • Using string splitting to parse protein names and entries
    • Creating lists and adding to a Pandas DataFrame
    • Exporting DataFrames
  • Lesson 2.2: Reading, Parsing, and Exporting DataFrames - DDA to MRM
    • Reading Excel files to Pandas DataFrames
    • Parsing DataFrame rows
    • Extracting the position of the most abundant product ion
    • Exporting Transition List to Excel
  • Lesson 2.3: Too Fast, Go Back - NumPy Array Slicing and Boolean Indexing
    • Slicing NumPy arrays
    • Using Boolean indexing for more sophisticated array slicing
  • Homework 2
    • For those who want more practice, follow up on my suggestion from Lesson 2.2 to turn an MRM list into a PRM list. There are several ways you could do this. You could pick the top N peaks in the product ion spectrum (np.sort might be useful here). You could also pick any peak above a specific relative intensity threshold (Boolean indexing might be useful here). Just add each as a new row to the datalist with the rest of the things the same.

Check back for more videos, and reach out if you like these mtmarty@utexas.edu.

Ideas for Future Tutorials

Here are some ideas that users have suggested. If you have other suggestions, please enter them in the "What Projects Would You Like to See?" discussion. If you would like to volunteer to make a module on one of these topics, please add your name here.

  • Plotting multiple spectra with for loops and string parsing (Michael Marty)
  • Reading vendor files
  • Writing to different output files
  • Exploring other Python MS packages
  • How to use public databases (Ming?)
  • Applications to polymers and oligonucleotides
  • Ion mobility
  • Using Git and GitHub
  • Gasp, R!
    • There are a lot of great R resources for MS already, so maybe we could organize and link those here too.

Funding

Funding is provided by the National Science Foundation: CHE-1845230.

About

A community dedicated to education of computer coding applied to mass spectrometry applications.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages