The Mass Spec Coding Club (MSCC) is a community dedicated to education of computer coding applied to mass spectrometry applications. Our goal is to make coding accessible to mass spectrometry researchers and provide free resources and open-source examples.
As the community develops, we will continue to post more content, and we welcome contributions from anyone.
Want to chat with community members and join meetings, join us on the Mass Spec Coding Club Discord Server. It's easy to set up, and you can run it from a browser if you'd like. We will pick a time soon and start hosting meetings/office hours there.
In the meantime, feel free to post questions to the text channels there, and people can answer.
This series of lessons will cover how to set up Python from scratch and write a simple script to plot a mass spectrum. Skills and learning outcomes are outlined below each video
- Lesson 0.0: Setting up Python from Scratch
- How to set up and run Python
- Setting variables
- Printing to the terminal
- Lesson 0.1: Loading Data Into Python
- Importing libraries
- Reading from text files into NumPy arrays
- Intro to array slicing
- Lesson 0.2: Plotting a Spectrum
- Plotting a spectrum with MatPlotLib
- Normalizing the y-axis
- Lesson 0.3: Too Fast, Go Back - Review and Background from Module 0
- Fundamentals of how computers work
- Basics of code concepts
- Discussion of variables, functions, and classes
- How to define functions
The data files, Python code, and notes used in this module are available in the "Module 0" folder.
The goal of module 1 is to show how Python can be used to predict masses of various molecules, starting with proteins.
- Lesson 1.0: Calculating Protein Masses
- Using a Dictionary
- Creating a function
- Looping through protein sequence to calculate the protein mass
- Lesson 1.1: Improving Our Protein Mass Calculator
- String manipulation
- Passing variables through functions
- If/then statements
- Monoisotopic mass calculation for protein
- Lesson 1.2: Calculating Masses from Glycans, SMILES, and Formulas
- Using Glypy to calculate masses from GlycoCT strings
- Using molmass to calculate masses from formulas
- Using RDKit to calculate masses from SMILES strings
- Lesson 1.3: Too Fast, Go Back - For Loops, If/Then, and Function Options
- Writing For loops
- If/Then statements and Boolean tests
- Passing options to functions
- Homework 1
- For those who want to test their skills and calculate some RNA masses, check out homework1.py in Module 1.
The goal of module 2 is to parse larger FASTA and Excel files to pull in the desired information and write it to an Excel file output.
- Lesson 2.0: Importing FASTA Files
- Importing Pyteomics library
- Importing our own functions from Module 1
- Looping through a FASTA file to calculate masses
- Lesson 2.1: Parsing FASTA Descriptions and Writing to Excel
- Using string splitting to parse protein names and entries
- Creating lists and adding to a Pandas DataFrame
- Exporting DataFrames
- Lesson 2.2: Reading, Parsing, and Exporting DataFrames - DDA to MRM
- Reading Excel files to Pandas DataFrames
- Parsing DataFrame rows
- Extracting the position of the most abundant product ion
- Exporting Transition List to Excel
- Lesson 2.3: Too Fast, Go Back - NumPy Array Slicing and Boolean Indexing
- Slicing NumPy arrays
- Using Boolean indexing for more sophisticated array slicing
- Homework 2
- For those who want more practice, follow up on my suggestion from Lesson 2.2 to turn an MRM list into a PRM list. There are several ways you could do this. You could pick the top N peaks in the product ion spectrum (np.sort might be useful here). You could also pick any peak above a specific relative intensity threshold (Boolean indexing might be useful here). Just add each as a new row to the datalist with the rest of the things the same.
Check back for more videos, and reach out if you like these mtmarty@utexas.edu.
Here are some ideas that users have suggested. If you have other suggestions, please enter them in the "What Projects Would You Like to See?" discussion. If you would like to volunteer to make a module on one of these topics, please add your name here.
- Plotting multiple spectra with for loops and string parsing (Michael Marty)
- Reading vendor files
- Writing to different output files
- Exploring other Python MS packages
- How to use public databases (Ming?)
- Applications to polymers and oligonucleotides
- Ion mobility
- Using Git and GitHub
- Gasp, R!
- There are a lot of great R resources for MS already, so maybe we could organize and link those here too.
Funding is provided by the National Science Foundation: CHE-1845230.