Move beyond the basics and start analyzing real-world research data.
This 2-hour, hands-on workshop is designed for researchers, students, and faculty who have a basic grasp of Python and are ready to tackle the "messy" side of data science. Transitioning from basic scripts & notebooks to a professional local development workflow, we will use Positron—the new data science IDE from the creators of RStudio—to manage a complete analysis pipeline.
Using the OASIS-1 neuroscience dataset, we will work through the practical steps of transforming raw MRI demographics into research insights. By the end of the session, we will answer a specific clinical question: Does normalized brain volume differ by dementia rating?
By the end of this session, learners will be able to:
- Establish a reproducible research workflow by setting up a local environment with Positron and virtual environments.
- Load tabular data into a pandas DataFrame and inspect its structure using
info(),head(),describe(), and related methods - Select specific rows, columns, and subsets of data using bracket notation and
.loc[] - Clean a real-world dataset by handling missing values (
dropna(),fillna()), renaming columns, dropping unnecessary columns, and converting data types - Summarize data by grouping and aggregating with
groupby()to answer a specific research question - Visualize data distributions and relationships using seaborn (if time permits)
| Time | Topic |
|---|---|
| 20 min | Conceptual overview (slides) |
| 15 min | Live environment setup |
| 70 min | Hands-on notebook: pandas, cleaning, groupby, seaborn |
| 12 min | Script demo: analysis.py |
| 3 min | Wrap-up & resources |
Research question we answer: Does normalized brain volume differ by dementia rating in the OASIS-1 dataset?
- Completion of Python 1 (or equivalent familiarity with Python basics, loops, functions, NumPy, and matplotlib)
- A local Python environment — see setup instructions below
Download and install Positron — a data science IDE built on VS Code.
git clone https://github.com/CWML/python2.git
cd python2# macOS/Linux
python3 -m venv .venv
source .venv/bin/activate
# Windows
python -m venv .venv
.venv\Scripts\activatepip install -r requirements.txtOpen python2_workshop.ipynb in Positron and select your .venv kernel.
python2/
├── python2_workshop.ipynb # Main workshop notebook
├── analysis.py # Standalone script for end-of-class demo
├── requirements.txt # Python dependencies
├── data/
│ ├── oasis_cross-sectional.csv # Raw OASIS-1 dataset
│ └── oasis_cleaned.csv # Cleaned dataset (produced during workshop)
├── outputs/ # Script output directory (PNG plots)
└── docs/
├── python2_workshop_slides.html # Slide deck (open in any browser)
├── python2_workshop_slides.pptx # PowerPoint version
└── python2_workshop_slides.md # MARP source (editable)
OASIS-1: Open Access Series of Imaging Studies Cross-sectional MRI data and demographics for 436 subjects (ages 18–96), including nondemented and demented older adults.
Marcus, D.S., Wang, T.H., Parker, J., Csernansky, J.G., Morris, J.C., Buckner, R.L. (2007). Open Access Series of Imaging Studies (OASIS): Cross-Sectional MRI Data in Young, Middle Aged, Nondemented and Demented Older Adults. Journal of Cognitive Neuroscience, 19(9), 1498–1507.
The slide deck is available as:
- HTML — open
docs/python2_workshop_slides.htmlin any browser (recommended for presenting) - PPTX — open
docs/python2_workshop_slides.pptxin PowerPoint or Google Slides
To regenerate slides from the Markdown source after edits:
npm install # first time only
npx marp docs/python2_workshop_slides.md --html --pptxCushing/Whitney Medical Library Data Services library.medicine.yale.edu/research-data