diff --git a/.gitignore b/.gitignore
index 1cee3417b..aedd23156 100644
--- a/.gitignore
+++ b/.gitignore
@@ -25,3 +25,5 @@ _freeze/
 *.pdf
 rsconnect
 *.md
+
+**/*.quarto_ipynb
diff --git a/_quarto-book.yml b/_quarto-book.yml
index 7f777a49d..f78818872 100644
--- a/_quarto-book.yml
+++ b/_quarto-book.yml
@@ -38,6 +38,7 @@ book:
     - appendices-are-prereqs.qmd
     - math-prereqs.qmd
     - probability.qmd
+    - classification.qmd
     - estimation.qmd
     - inference.qmd
     - intro-MLEs.qmd
diff --git a/classification.qmd b/classification.qmd
new file mode 100644
index 000000000..b85756ad5
--- /dev/null
+++ b/classification.qmd
@@ -0,0 +1,195 @@
+{{< include macros.qmd >}}
+
+# Classification {#sec-classification}
+
+---
+
+Classification problems occur frequently in epidemiology and diagnostic medicine.
+For example, we may need to determine whether an individual has a particular disease or condition based on test results or other indicators.
+
+---
+
+:::{#def-classification}
+
+#### Classification
+
+A **classification problem** is a statistical problem in which we seek to assign observations to one of two or more discrete categories (classes) based on observed features or predictors.
+In the binary case, we assign each observation to one of two classes, often labeled as "positive" or "negative", "diseased" or "healthy", etc.
+
+:::
+
+---
+
+Understanding how to interpret diagnostic tests requires knowledge of key statistical concepts including sensitivity, specificity, and predictive values.
+
+In this section, we explore how Bayes' theorem allows us to calculate the probability that a person has a disease given a positive test result.
+This is particularly important in public health decision-making, where we must understand not just how accurate a test is in general, but how to interpret test results for individuals in specific populations.
+
+---
+
+### Diagnostic test characteristics
+
+When evaluating a diagnostic test, we consider several key performance measures:
+
+:::{#def-sensitivity}
+
+#### Sensitivity
+
+The probability that the test is positive given that the person has the disease, denoted $\pmf{\text{positive} \mid \text{disease}}$.
+
+:::
+
+:::{#def-specificity}
+
+#### Specificity
+
+The probability that the test is negative given that the person does not have the disease, denoted $\pmf{\text{negative} \mid \text{no disease}}$.
+
+:::
+
+:::{#def-ppv}
+
+#### Positive Predictive Value (PPV)
+
+The probability that a person has the disease given that their test is positive, denoted $\pmf{\text{disease} \mid \text{positive}}$.
+
+:::
+
+:::{#def-npv}
+
+#### Negative Predictive Value (NPV)
+
+The probability that a person does not have the disease given that their test is negative, denoted $\pmf{\text{no disease} \mid \text{negative}}$.
+
+:::
+
+---
+
+### Example: COVID-19 testing
+
+Suppose we have a COVID-19 test with the following characteristics:
+
+- **99% sensitive**: If a person has COVID-19, the test will be positive 99% of the time
+- **99% specific**: If a person does not have COVID-19, the test will be negative 99% of the time
+
+---
+
+Let's define our events:
+
+- Let $D$ denote the event "person has COVID-19"
+- Let $+$ denote the event "test is positive"
+
+Then our test characteristics can be written as:
+
+$$
+\pmf{+ \mid D} = 0.99 \quad \text{(sensitivity)}
+$$
+
+$$
+\pmf{- \mid \neg D} = 0.99 \quad \text{(specificity)}
+$$
+
+---
+
+Note that if specificity is 0.99, then the false positive rate is:
+$$
+\pmf{+ \mid \neg D} = 1 - 0.99 = 0.01
+$$
+
+Suppose the **prevalence** of COVID-19 in the population is 7%:
+
+$$
+\pmf{D} = 0.07
+$$
+
+$$
+\pmf{\neg D} = 0.93
+$$
+
+---
+
+### Calculating positive predictive value
+
+The key question we want to answer is: **If someone tests positive, what is the probability they actually have COVID-19?**
+
+This is the positive predictive value:
+$$
+\pmf{D \mid +} = \, ?
+$$
+
+---
+
+We can use **Bayes' theorem** to calculate this:
+
+$$
+\pmf{D \mid +} = \frac{\pmf{+ \mid D} \cd \pmf{D}}{\pmf{+}}
+$$
+
+To find $\pmf{+}$, we use the **law of total probability**:
+
+$$
+\pmf{+} = \pmf{+ \mid D} \cd \pmf{D} + \pmf{+ \mid \neg D} \cd \pmf{\neg D}
+$$
+
+---
+
+Now we can calculate each component:
+
+**Probability of being positive with disease:**
+$$
+\pmf{+ \mid D} \cd \pmf{D} = 0.99 \times 0.07 = 0.0693
+$$
+
+**Probability of being positive without disease (false positive):**
+$$
+\pmf{+ \mid \neg D} \cd \pmf{\neg D} = 0.01 \times 0.93 = 0.0093
+$$
+
+---
+
+**Total probability of positive test:**
+$$
+\pmf{+} = 0.0693 + 0.0093 = 0.0786
+$$
+
+**Positive predictive value:**
+$$
+\pmf{D \mid +} = \frac{0.0693}{0.0786} = 0.88
+$$
+
+---
+
+Therefore, even with a highly accurate test (99% sensitive and 99% specific), only about 88% of people who test positive actually have COVID-19.
+This is because the disease prevalence is relatively low (7%), so false positives make up a meaningful fraction of all positive tests.
+
+::: notes
+This counterintuitive result demonstrates the importance of considering disease prevalence when interpreting test results.
+Even highly accurate tests can have relatively low positive predictive values when the disease is rare.
+:::
+
+---
+
+### Alternative formulation
+
+We can rearrange Bayes' theorem to express the positive predictive value in terms of the sensitivity, specificity, and disease prevalence:
+
+$$
+\begin{aligned}
+\pmf{D \mid +} &= \frac{\pmf{+ \mid D} \cd \pmf{D}}{\pmf{+}} \\
+&= \frac{\pmf{+ \mid D} \cd \pmf{D}}{\pmf{+ \mid D} \cd \pmf{D} + \pmf{+ \mid \neg D} \cd \pmf{\neg D}} \\
+&= \frac{\pmf{D}}{\pmf{D} + \frac{\pmf{+ \mid \neg D}}{\pmf{+ \mid D}} \cd \pmf{\neg D}} \\
+&= \frac{1}{1 + \frac{\pmf{+ \mid \neg D}}{\pmf{+ \mid D}} \cd \frac{\pmf{\neg D}}{\pmf{D}}} \\
+&= \frac{1}{1 + \frac{1 - \text{spec}}{\text{sens}} \cd \frac{1 - \text{prev}}{\text{prev}}}
+\end{aligned}
+$$
+
+---
+
+This final form emphasizes the ratio of the false positive rate to the sensitivity, weighted by the ratio of non-diseased to diseased individuals in the population.
+It shows that even with a very high sensitivity and specificity, the positive predictive value depends strongly on disease prevalence.
+
+::: notes
+This algebraic form is useful for understanding how the different parameters interact.
+Notice how the prevalence ratio $\pmf{\neg D}/\pmf{D}$ appears explicitly in the denominator.
+When the disease is rare, this ratio is large, which reduces the positive predictive value.
+:::
diff --git a/macros.qmd b/macros.qmd
index c7588919f..8ef506c92 100644
--- a/macros.qmd
+++ b/macros.qmd
@@ -363,6 +363,7 @@
 \def\reglincomb{\vx \cdot \vb}
 \def\regbetasum{\beta_1 x_1+ \dots + \beta_p x_p}
 \def\pdf{\distop{f}}
+\providecommand{\pmf}[1]{\distop{P}\paren{#1}}
 \def\cdf{\distop{F}}
 \def\defLik{\Lik(\theta) \eqdef \p(\vX = \vx | \Theta = \theta)}
 \def\defLogLik{\lik \eqdef \logf{\Lik(\vx|\th)}}
diff --git a/probability.qmd b/probability.qmd
index 31062a145..fbde5c0ec 100644
--- a/probability.qmd
+++ b/probability.qmd
@@ -524,6 +524,7 @@ $\dsn{X}$.
 
 {{< include sec-CLT.qmd >}}
 
+{{< include classification.qmd >}}
 
 ## Additional resources