This repository contains materials for the Introduction to Paleo- and Environmental Proteomics course (March 17-21, 2025) held at the University of Tartu. The course provides a foundational introduction to paleoproteomics, focusing on software and computational methods used for data analysis of ancient proteins.
By the end of this course, students will:
- Understand the theoretical basis of paleoproteomics and its applications
- Gain knowledge of computational tools used in protein sequence identification and analysis
- Learn Python as a tool for proteomic data analysis
- Explore challenges in analyzing ancient protein data, including degradation patterns and contamination risks
notebooks/: Jupyter notebooks for data analysis exerciseslectures/: Course lecture materialsdata/: Sample datasets for practical exercisesexercises/: Hands-on exercises and instructionsresources/: Additional reading materials and references
The course consists of daily lectures and practical sessions:
- Introduction to paleoproteomics, ZooMS, and LC-MS/MS
- Overview of software tools for data analysis
- MassDebating: Review of literature and selection of presentation groups
- Detailed instruction on mass spectrometry data processing
- Challenges in palaeoproteomic data interpretation
- MassDebating: Discussion of format for the session
- Guided setup and analysis of pre-existing ZooMS data
- Case studies and troubleshooting
- Introduction to PRIDE and UNIPROT
- Guided setup and analysis of pre-existing LC-MS/MS data
- Advanced analysis concepts, including spectral annotation
- Reporting and presenting findings from data analysis
- Best practices in data deposition and accessibility
- MassDebate: Current condition and future perspectives in paleoproteomics
The course focuses on several key tools:
- NovorCloud: For de novo peptide sequencing and protein identification
- Python Libraries: For data processing and visualization
- PRIDE Database: For accessing and depositing proteomics data
- UniProt: For protein sequence and functional information
A key exercise in the course involves analyzing data from the following study:
Tuinstra, L. et al. Evidence for endogenous collagen in Edmontosaurus fossil bone. Anal. Chem. (2025) doi:10.1021/acs.analchem.4c03115.
This practical explores the controversial field of dinosaur protein analysis, addressing whether original proteins can be preserved over extremely long time periods.
The course also covers a comprehensive data standard for archaeological and biomolecular research, including:
- Taxonomic identification
- Site and stratigraphic context
- Tissue and sample type
- Geospatial coordinates
- Dating information
- Sample preparation details
- Mass spectrometry parameters
- Clone this repository
- Install required dependencies:
pip install -r requirements.txt - Follow the setup instructions in the
setup/directory - Explore the Jupyter notebooks in the
notebooks/directory
This course is taught by:
- Matthew Collins
- Mari Tõrv
- Ester Oras
This project is licensed under the MIT License - see the LICENSE file for details.
If you use these materials in your research, please cite:
Collins, M., Tõrv, M., & Oras, E. (2025). Introduction to Paleo- and Environmental Proteomics.
University of Tartu, Estonia. https://github.com/YourUsername/PalaeoProteomics
For questions regarding this repository, please contact [your email or contact information].