From b5efbe4bd6ddf151d278e118b44b586b8ed851b1 Mon Sep 17 00:00:00 2001 From: Douglas Ezra Morrison Date: Thu, 13 Nov 2025 22:11:09 -0800 Subject: [PATCH 01/12] add classification notes --- .gitignore | 2 ++ classification.qmd | 51 ++++++++++++++++++++++++++++++++++++++++++++++ macros.qmd | 1 + probability.qmd | 1 + 4 files changed, 55 insertions(+) create mode 100644 classification.qmd diff --git a/.gitignore b/.gitignore index 1cee3417b..aedd23156 100644 --- a/.gitignore +++ b/.gitignore @@ -25,3 +25,5 @@ _freeze/ *.pdf rsconnect *.md + +**/*.quarto_ipynb diff --git a/classification.qmd b/classification.qmd new file mode 100644 index 000000000..cfb5437b1 --- /dev/null +++ b/classification.qmd @@ -0,0 +1,51 @@ +{{< include macros.qmd >}} + +## Introduction to classification {#sec-classification} + +### Positive predictive value + +Suppose a test is 99 sensitive, 99 specific; + +99% Sensitive means if the person has disease, the test is positive, 99% of +time: + +$$\pmf{ + | D} = .99$$ + +99% specific means if they don't have covid, the test says no covid, 99% +time + +7% of people actually have covid: + +$$\mass(A) = 0.07$$ + +$$\mass(\neg A) = .93$$ + + + +$p\left( negative \middle| no\ covid \right) = .99$: +$p\left( B \middle| !A \right)$ + +$$p\left( Covid \middle| positive \right) = ?$$ + +$$p\left( A \middle| B \right) = \frac{p\left( B \middle| A \right)p(A)}{p(B)}$$ + +$$p(B) = p\left( B \middle| A \right)p(A) + p\left( B \middle| !A \right)p(!A)$$ + +$$p\left( B \middle| A \right)p(A) = .99*\ .07 = .0693$$ + +$$\ p\left( B \middle| !A \right)p(!A) = .01*.93 = .0093$$ + +$$p(B) = .0693 + .0093 = .0786$$ + +$$p\left( A \middle| B \right) = .0693/.0786$$ + +$$= .88$$ + +$${p\left( A \middle| B \right) = \frac{p\left( B \middle| A \right)p(A)}{p(B)} +}{= p\left( B \middle| A \right)\frac{p(A)}{p(B)} +}{= p\left( B \middle| A \right)\frac{p(A)}{p\left( B \middle| A \right)p(A) + p\left( B \middle| !A \right)p(!A)}}$$ + +$$= \frac{p(A)}{p(A) + \frac{p\left( B \middle| !A \right)}{p\left( B \middle| A \right)}p(!A)}$$ + +$$= \frac{1}{1 + \frac{p\left( B \middle| !A \right)}{p\left( B \middle| A \right)}\frac{p(!A)}{p(A)}} +$$ diff --git a/macros.qmd b/macros.qmd index c7588919f..8ef506c92 100644 --- a/macros.qmd +++ b/macros.qmd @@ -363,6 +363,7 @@ \def\reglincomb{\vx \cdot \vb} \def\regbetasum{\beta_1 x_1+ \dots + \beta_p x_p} \def\pdf{\distop{f}} +\providecommand{\pmf}[1]{\distop{P}\paren{#1}} \def\cdf{\distop{F}} \def\defLik{\Lik(\theta) \eqdef \p(\vX = \vx | \Theta = \theta)} \def\defLogLik{\lik \eqdef \logf{\Lik(\vx|\th)}} diff --git a/probability.qmd b/probability.qmd index 31062a145..fbde5c0ec 100644 --- a/probability.qmd +++ b/probability.qmd @@ -524,6 +524,7 @@ $\dsn{X}$. {{< include sec-CLT.qmd >}} +{{< include classification.qmd >}} ## Additional resources From bf43deeb293b2b712c8ac14fcb2c5fb34f0d0b86 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 14 Nov 2025 06:12:49 +0000 Subject: [PATCH 02/12] Initial plan From 11ba1a6e7cb02cd0d43aa696553937681076c038 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 14 Nov 2025 06:17:00 +0000 Subject: [PATCH 03/12] Complete and polish classification section introduction Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- classification.qmd | 112 +++++++++++++++++++++++++++++++++++---------- 1 file changed, 87 insertions(+), 25 deletions(-) diff --git a/classification.qmd b/classification.qmd index cfb5437b1..1085b949e 100644 --- a/classification.qmd +++ b/classification.qmd @@ -2,50 +2,112 @@ ## Introduction to classification {#sec-classification} -### Positive predictive value +Classification is a fundamental concept in epidemiology and diagnostic medicine, where we need to determine whether an individual has a particular disease or condition based on test results or other indicators. Understanding how to interpret diagnostic tests requires knowledge of key statistical concepts including sensitivity, specificity, and predictive values. -Suppose a test is 99 sensitive, 99 specific; +In this section, we explore how Bayes' theorem allows us to calculate the probability that a person has a disease given a positive test result. This is particularly important in public health decision-making, where we must understand not just how accurate a test is in general, but how to interpret test results for individuals in specific populations. -99% Sensitive means if the person has disease, the test is positive, 99% of -time: +### Diagnostic test characteristics -$$\pmf{ + | D} = .99$$ +When evaluating a diagnostic test, we consider several key performance measures: -99% specific means if they don't have covid, the test says no covid, 99% -time +- **Sensitivity**: The probability that the test is positive given that the person has the disease, denoted $\Pr(\text{positive} \mid \text{disease})$ +- **Specificity**: The probability that the test is negative given that the person does not have the disease, denoted $\Pr(\text{negative} \mid \text{no disease})$ +- **Positive Predictive Value (PPV)**: The probability that a person has the disease given that their test is positive, denoted $\Pr(\text{disease} \mid \text{positive})$ +- **Negative Predictive Value (NPV)**: The probability that a person does not have the disease given that their test is negative, denoted $\Pr(\text{no disease} \mid \text{negative})$ -7% of people actually have covid: +### Example: COVID-19 testing -$$\mass(A) = 0.07$$ +Suppose we have a COVID-19 test with the following characteristics: -$$\mass(\neg A) = .93$$ +- **99% sensitive**: If a person has COVID-19, the test will be positive 99% of the time +- **99% specific**: If a person does not have COVID-19, the test will be negative 99% of the time +Let's define our events: +- Let $D$ denote the event "person has COVID-19" +- Let $+$ denote the event "test is positive" -$p\left( negative \middle| no\ covid \right) = .99$: -$p\left( B \middle| !A \right)$ +Then our test characteristics can be written as: -$$p\left( Covid \middle| positive \right) = ?$$ +$$ +\Pr(+ \mid D) = 0.99 \quad \text{(sensitivity)} +$$ -$$p\left( A \middle| B \right) = \frac{p\left( B \middle| A \right)p(A)}{p(B)}$$ +$$ +\Pr(- \mid \neg D) = 0.99 \quad \text{(specificity)} +$$ -$$p(B) = p\left( B \middle| A \right)p(A) + p\left( B \middle| !A \right)p(!A)$$ +Note that if specificity is 0.99, then the false positive rate is: +$$ +\Pr(+ \mid \neg D) = 1 - 0.99 = 0.01 +$$ -$$p\left( B \middle| A \right)p(A) = .99*\ .07 = .0693$$ +Suppose the **prevalence** of COVID-19 in the population is 7%: -$$\ p\left( B \middle| !A \right)p(!A) = .01*.93 = .0093$$ +$$ +\Pr(D) = 0.07 +$$ -$$p(B) = .0693 + .0093 = .0786$$ +$$ +\Pr(\neg D) = 0.93 +$$ -$$p\left( A \middle| B \right) = .0693/.0786$$ +### Calculating positive predictive value -$$= .88$$ +The key question we want to answer is: **If someone tests positive, what is the probability they actually have COVID-19?** -$${p\left( A \middle| B \right) = \frac{p\left( B \middle| A \right)p(A)}{p(B)} -}{= p\left( B \middle| A \right)\frac{p(A)}{p(B)} -}{= p\left( B \middle| A \right)\frac{p(A)}{p\left( B \middle| A \right)p(A) + p\left( B \middle| !A \right)p(!A)}}$$ +This is the positive predictive value: +$$ +\Pr(D \mid +) = \, ? +$$ -$$= \frac{p(A)}{p(A) + \frac{p\left( B \middle| !A \right)}{p\left( B \middle| A \right)}p(!A)}$$ +We can use **Bayes' theorem** to calculate this: -$$= \frac{1}{1 + \frac{p\left( B \middle| !A \right)}{p\left( B \middle| A \right)}\frac{p(!A)}{p(A)}} $$ +\Pr(D \mid +) = \frac{\Pr(+ \mid D) \cdot \Pr(D)}{\Pr(+)} +$$ + +To find $\Pr(+)$, we use the **law of total probability**: + +$$ +\Pr(+) = \Pr(+ \mid D) \cdot \Pr(D) + \Pr(+ \mid \neg D) \cdot \Pr(\neg D) +$$ + +Now we can calculate each component: + +**Probability of being positive with disease:** +$$ +\Pr(+ \mid D) \cdot \Pr(D) = 0.99 \times 0.07 = 0.0693 +$$ + +**Probability of being positive without disease (false positive):** +$$ +\Pr(+ \mid \neg D) \cdot \Pr(\neg D) = 0.01 \times 0.93 = 0.0093 +$$ + +**Total probability of positive test:** +$$ +\Pr(+) = 0.0693 + 0.0093 = 0.0786 +$$ + +**Positive predictive value:** +$$ +\Pr(D \mid +) = \frac{0.0693}{0.0786} = 0.88 +$$ + +Therefore, even with a highly accurate test (99% sensitive and 99% specific), only about 88% of people who test positive actually have COVID-19. This is because the disease prevalence is relatively low (7%), so false positives make up a meaningful fraction of all positive tests. + +### Alternative formulation + +We can rearrange Bayes' theorem to express the positive predictive value in terms of the sensitivity, specificity, and disease prevalence: + +$$ +\begin{align} +\Pr(D \mid +) &= \frac{\Pr(+ \mid D) \cdot \Pr(D)}{\Pr(+)} \\ +&= \frac{\Pr(+ \mid D) \cdot \Pr(D)}{\Pr(+ \mid D) \cdot \Pr(D) + \Pr(+ \mid \neg D) \cdot \Pr(\neg D)} \\ +&= \frac{\Pr(D)}{\Pr(D) + \frac{\Pr(+ \mid \neg D)}{\Pr(+ \mid D)} \cdot \Pr(\neg D)} \\ +&= \frac{1}{1 + \frac{\Pr(+ \mid \neg D)}{\Pr(+ \mid D)} \cdot \frac{\Pr(\neg D)}{\Pr(D)}} +\end{align} +$$ + +This final form emphasizes the ratio of the false positive rate to the sensitivity, weighted by the ratio of non-diseased to diseased individuals in the population. It shows that even with a very high sensitivity and specificity, the positive predictive value depends strongly on disease prevalence. From 4abab2a7c9212d385a712214fba80785a8fa376b Mon Sep 17 00:00:00 2001 From: Douglas Ezra Morrison Date: Thu, 13 Nov 2025 23:54:21 -0800 Subject: [PATCH 04/12] Update classification.qmd Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- classification.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/classification.qmd b/classification.qmd index cfb5437b1..79b889e6a 100644 --- a/classification.qmd +++ b/classification.qmd @@ -4,7 +4,7 @@ ### Positive predictive value -Suppose a test is 99 sensitive, 99 specific; +Suppose a test is 99% sensitive, 99% specific; 99% Sensitive means if the person has disease, the test is positive, 99% of time: From 3a95514091a8055955e5683095f827cc6749839f Mon Sep 17 00:00:00 2001 From: Douglas Ezra Morrison Date: Thu, 13 Nov 2025 23:54:37 -0800 Subject: [PATCH 05/12] Update classification.qmd Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- classification.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/classification.qmd b/classification.qmd index 79b889e6a..748542192 100644 --- a/classification.qmd +++ b/classification.qmd @@ -7,7 +7,7 @@ Suppose a test is 99% sensitive, 99% specific; 99% Sensitive means if the person has disease, the test is positive, 99% of -time: +the time: $$\pmf{ + | D} = .99$$ From e9990f3851e4b3327f5299c65236c5c5670da8cf Mon Sep 17 00:00:00 2001 From: Douglas Ezra Morrison Date: Thu, 13 Nov 2025 23:54:58 -0800 Subject: [PATCH 06/12] Update classification.qmd Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- classification.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/classification.qmd b/classification.qmd index 748542192..8abd6303d 100644 --- a/classification.qmd +++ b/classification.qmd @@ -12,7 +12,7 @@ the time: $$\pmf{ + | D} = .99$$ 99% specific means if they don't have covid, the test says no covid, 99% -time +of the time: 7% of people actually have covid: From 4f30f9bb1f84a91cb92f93b5ce738e6b826d9406 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 9 Dec 2025 21:17:40 +0000 Subject: [PATCH 07/12] Add newlines after sentences and use custom macros Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- classification.qmd | 54 +++++++++++++++++++++++++--------------------- 1 file changed, 29 insertions(+), 25 deletions(-) diff --git a/classification.qmd b/classification.qmd index 1085b949e..aa0ebf494 100644 --- a/classification.qmd +++ b/classification.qmd @@ -2,18 +2,20 @@ ## Introduction to classification {#sec-classification} -Classification is a fundamental concept in epidemiology and diagnostic medicine, where we need to determine whether an individual has a particular disease or condition based on test results or other indicators. Understanding how to interpret diagnostic tests requires knowledge of key statistical concepts including sensitivity, specificity, and predictive values. +Classification is a fundamental concept in epidemiology and diagnostic medicine, where we need to determine whether an individual has a particular disease or condition based on test results or other indicators. +Understanding how to interpret diagnostic tests requires knowledge of key statistical concepts including sensitivity, specificity, and predictive values. -In this section, we explore how Bayes' theorem allows us to calculate the probability that a person has a disease given a positive test result. This is particularly important in public health decision-making, where we must understand not just how accurate a test is in general, but how to interpret test results for individuals in specific populations. +In this section, we explore how Bayes' theorem allows us to calculate the probability that a person has a disease given a positive test result. +This is particularly important in public health decision-making, where we must understand not just how accurate a test is in general, but how to interpret test results for individuals in specific populations. ### Diagnostic test characteristics When evaluating a diagnostic test, we consider several key performance measures: -- **Sensitivity**: The probability that the test is positive given that the person has the disease, denoted $\Pr(\text{positive} \mid \text{disease})$ -- **Specificity**: The probability that the test is negative given that the person does not have the disease, denoted $\Pr(\text{negative} \mid \text{no disease})$ -- **Positive Predictive Value (PPV)**: The probability that a person has the disease given that their test is positive, denoted $\Pr(\text{disease} \mid \text{positive})$ -- **Negative Predictive Value (NPV)**: The probability that a person does not have the disease given that their test is negative, denoted $\Pr(\text{no disease} \mid \text{negative})$ +- **Sensitivity**: The probability that the test is positive given that the person has the disease, denoted $\pmf{\text{positive} \mid \text{disease}}$ +- **Specificity**: The probability that the test is negative given that the person does not have the disease, denoted $\pmf{\text{negative} \mid \text{no disease}}$ +- **Positive Predictive Value (PPV)**: The probability that a person has the disease given that their test is positive, denoted $\pmf{\text{disease} \mid \text{positive}}$ +- **Negative Predictive Value (NPV)**: The probability that a person does not have the disease given that their test is negative, denoted $\pmf{\text{no disease} \mid \text{negative}}$ ### Example: COVID-19 testing @@ -30,26 +32,26 @@ Let's define our events: Then our test characteristics can be written as: $$ -\Pr(+ \mid D) = 0.99 \quad \text{(sensitivity)} +\pmf{+ \mid D} = 0.99 \quad \text{(sensitivity)} $$ $$ -\Pr(- \mid \neg D) = 0.99 \quad \text{(specificity)} +\pmf{- \mid \neg D} = 0.99 \quad \text{(specificity)} $$ Note that if specificity is 0.99, then the false positive rate is: $$ -\Pr(+ \mid \neg D) = 1 - 0.99 = 0.01 +\pmf{+ \mid \neg D} = 1 - 0.99 = 0.01 $$ Suppose the **prevalence** of COVID-19 in the population is 7%: $$ -\Pr(D) = 0.07 +\pmf{D} = 0.07 $$ $$ -\Pr(\neg D) = 0.93 +\pmf{\neg D} = 0.93 $$ ### Calculating positive predictive value @@ -58,44 +60,45 @@ The key question we want to answer is: **If someone tests positive, what is the This is the positive predictive value: $$ -\Pr(D \mid +) = \, ? +\pmf{D \mid +} = \, ? $$ We can use **Bayes' theorem** to calculate this: $$ -\Pr(D \mid +) = \frac{\Pr(+ \mid D) \cdot \Pr(D)}{\Pr(+)} +\pmf{D \mid +} = \frac{\pmf{+ \mid D} \cd \pmf{D}}{\pmf{+}} $$ -To find $\Pr(+)$, we use the **law of total probability**: +To find $\pmf{+}$, we use the **law of total probability**: $$ -\Pr(+) = \Pr(+ \mid D) \cdot \Pr(D) + \Pr(+ \mid \neg D) \cdot \Pr(\neg D) +\pmf{+} = \pmf{+ \mid D} \cd \pmf{D} + \pmf{+ \mid \neg D} \cd \pmf{\neg D} $$ Now we can calculate each component: **Probability of being positive with disease:** $$ -\Pr(+ \mid D) \cdot \Pr(D) = 0.99 \times 0.07 = 0.0693 +\pmf{+ \mid D} \cd \pmf{D} = 0.99 \times 0.07 = 0.0693 $$ **Probability of being positive without disease (false positive):** $$ -\Pr(+ \mid \neg D) \cdot \Pr(\neg D) = 0.01 \times 0.93 = 0.0093 +\pmf{+ \mid \neg D} \cd \pmf{\neg D} = 0.01 \times 0.93 = 0.0093 $$ **Total probability of positive test:** $$ -\Pr(+) = 0.0693 + 0.0093 = 0.0786 +\pmf{+} = 0.0693 + 0.0093 = 0.0786 $$ **Positive predictive value:** $$ -\Pr(D \mid +) = \frac{0.0693}{0.0786} = 0.88 +\pmf{D \mid +} = \frac{0.0693}{0.0786} = 0.88 $$ -Therefore, even with a highly accurate test (99% sensitive and 99% specific), only about 88% of people who test positive actually have COVID-19. This is because the disease prevalence is relatively low (7%), so false positives make up a meaningful fraction of all positive tests. +Therefore, even with a highly accurate test (99% sensitive and 99% specific), only about 88% of people who test positive actually have COVID-19. +This is because the disease prevalence is relatively low (7%), so false positives make up a meaningful fraction of all positive tests. ### Alternative formulation @@ -103,11 +106,12 @@ We can rearrange Bayes' theorem to express the positive predictive value in term $$ \begin{align} -\Pr(D \mid +) &= \frac{\Pr(+ \mid D) \cdot \Pr(D)}{\Pr(+)} \\ -&= \frac{\Pr(+ \mid D) \cdot \Pr(D)}{\Pr(+ \mid D) \cdot \Pr(D) + \Pr(+ \mid \neg D) \cdot \Pr(\neg D)} \\ -&= \frac{\Pr(D)}{\Pr(D) + \frac{\Pr(+ \mid \neg D)}{\Pr(+ \mid D)} \cdot \Pr(\neg D)} \\ -&= \frac{1}{1 + \frac{\Pr(+ \mid \neg D)}{\Pr(+ \mid D)} \cdot \frac{\Pr(\neg D)}{\Pr(D)}} +\pmf{D \mid +} &= \frac{\pmf{+ \mid D} \cd \pmf{D}}{\pmf{+}} \\ +&= \frac{\pmf{+ \mid D} \cd \pmf{D}}{\pmf{+ \mid D} \cd \pmf{D} + \pmf{+ \mid \neg D} \cd \pmf{\neg D}} \\ +&= \frac{\pmf{D}}{\pmf{D} + \frac{\pmf{+ \mid \neg D}}{\pmf{+ \mid D}} \cd \pmf{\neg D}} \\ +&= \frac{1}{1 + \frac{\pmf{+ \mid \neg D}}{\pmf{+ \mid D}} \cd \frac{\pmf{\neg D}}{\pmf{D}}} \end{align} $$ -This final form emphasizes the ratio of the false positive rate to the sensitivity, weighted by the ratio of non-diseased to diseased individuals in the population. It shows that even with a very high sensitivity and specificity, the positive predictive value depends strongly on disease prevalence. +This final form emphasizes the ratio of the false positive rate to the sensitivity, weighted by the ratio of non-diseased to diseased individuals in the population. +It shows that even with a very high sensitivity and specificity, the positive predictive value depends strongly on disease prevalence. From 42af0210eca7df1ca8fc469e776504986cb3d11f Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 5 Jan 2026 23:51:07 +0000 Subject: [PATCH 08/12] Simplify sentence structure and add formal classification definition Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- classification.qmd | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/classification.qmd b/classification.qmd index aa0ebf494..87a329e0a 100644 --- a/classification.qmd +++ b/classification.qmd @@ -2,7 +2,18 @@ ## Introduction to classification {#sec-classification} -Classification is a fundamental concept in epidemiology and diagnostic medicine, where we need to determine whether an individual has a particular disease or condition based on test results or other indicators. +Classification problems occur frequently in epidemiology and diagnostic medicine. +For example, we may need to determine whether an individual has a particular disease or condition based on test results or other indicators. + +:::{#def-classification} + +#### Classification + +A **classification problem** is a statistical problem in which we seek to assign observations to one of two or more discrete categories (classes) based on observed features or predictors. +In the binary case, we assign each observation to one of two classes, often labeled as "positive" or "negative", "diseased" or "healthy", etc. + +::: + Understanding how to interpret diagnostic tests requires knowledge of key statistical concepts including sensitivity, specificity, and predictive values. In this section, we explore how Bayes' theorem allows us to calculate the probability that a person has a disease given a positive test result. From 814d956b3f8daee9c1341617c0af5895dc01f0c5 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 6 Jan 2026 03:09:11 +0000 Subject: [PATCH 09/12] Add slide breaks and speaker notes for RevealJS format Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- classification.qmd | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/classification.qmd b/classification.qmd index 87a329e0a..4f70ffcfb 100644 --- a/classification.qmd +++ b/classification.qmd @@ -2,9 +2,13 @@ ## Introduction to classification {#sec-classification} +--- + Classification problems occur frequently in epidemiology and diagnostic medicine. For example, we may need to determine whether an individual has a particular disease or condition based on test results or other indicators. +--- + :::{#def-classification} #### Classification @@ -14,11 +18,15 @@ In the binary case, we assign each observation to one of two classes, often labe ::: +--- + Understanding how to interpret diagnostic tests requires knowledge of key statistical concepts including sensitivity, specificity, and predictive values. In this section, we explore how Bayes' theorem allows us to calculate the probability that a person has a disease given a positive test result. This is particularly important in public health decision-making, where we must understand not just how accurate a test is in general, but how to interpret test results for individuals in specific populations. +--- + ### Diagnostic test characteristics When evaluating a diagnostic test, we consider several key performance measures: @@ -28,6 +36,8 @@ When evaluating a diagnostic test, we consider several key performance measures: - **Positive Predictive Value (PPV)**: The probability that a person has the disease given that their test is positive, denoted $\pmf{\text{disease} \mid \text{positive}}$ - **Negative Predictive Value (NPV)**: The probability that a person does not have the disease given that their test is negative, denoted $\pmf{\text{no disease} \mid \text{negative}}$ +--- + ### Example: COVID-19 testing Suppose we have a COVID-19 test with the following characteristics: @@ -35,6 +45,8 @@ Suppose we have a COVID-19 test with the following characteristics: - **99% sensitive**: If a person has COVID-19, the test will be positive 99% of the time - **99% specific**: If a person does not have COVID-19, the test will be negative 99% of the time +--- + Let's define our events: - Let $D$ denote the event "person has COVID-19" @@ -50,6 +62,8 @@ $$ \pmf{- \mid \neg D} = 0.99 \quad \text{(specificity)} $$ +--- + Note that if specificity is 0.99, then the false positive rate is: $$ \pmf{+ \mid \neg D} = 1 - 0.99 = 0.01 @@ -65,6 +79,8 @@ $$ \pmf{\neg D} = 0.93 $$ +--- + ### Calculating positive predictive value The key question we want to answer is: **If someone tests positive, what is the probability they actually have COVID-19?** @@ -74,6 +90,8 @@ $$ \pmf{D \mid +} = \, ? $$ +--- + We can use **Bayes' theorem** to calculate this: $$ @@ -86,6 +104,8 @@ $$ \pmf{+} = \pmf{+ \mid D} \cd \pmf{D} + \pmf{+ \mid \neg D} \cd \pmf{\neg D} $$ +--- + Now we can calculate each component: **Probability of being positive with disease:** @@ -98,6 +118,8 @@ $$ \pmf{+ \mid \neg D} \cd \pmf{\neg D} = 0.01 \times 0.93 = 0.0093 $$ +--- + **Total probability of positive test:** $$ \pmf{+} = 0.0693 + 0.0093 = 0.0786 @@ -108,9 +130,18 @@ $$ \pmf{D \mid +} = \frac{0.0693}{0.0786} = 0.88 $$ +--- + Therefore, even with a highly accurate test (99% sensitive and 99% specific), only about 88% of people who test positive actually have COVID-19. This is because the disease prevalence is relatively low (7%), so false positives make up a meaningful fraction of all positive tests. +::: notes +This counterintuitive result demonstrates the importance of considering disease prevalence when interpreting test results. +Even highly accurate tests can have relatively low positive predictive values when the disease is rare. +::: + +--- + ### Alternative formulation We can rearrange Bayes' theorem to express the positive predictive value in terms of the sensitivity, specificity, and disease prevalence: @@ -124,5 +155,13 @@ $$ \end{align} $$ +--- + This final form emphasizes the ratio of the false positive rate to the sensitivity, weighted by the ratio of non-diseased to diseased individuals in the population. It shows that even with a very high sensitivity and specificity, the positive predictive value depends strongly on disease prevalence. + +::: notes +This algebraic form is useful for understanding how the different parameters interact. +Notice how the prevalence ratio $\pmf{\neg D}/\pmf{D}$ appears explicitly in the denominator. +When the disease is rare, this ratio is large, which reduces the positive predictive value. +::: From b4811b521be54082f007d6484c683e3c4866a505 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 6 Jan 2026 05:14:54 +0000 Subject: [PATCH 10/12] Fix LaTeX equation nesting error: use aligned instead of align Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- classification.qmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/classification.qmd b/classification.qmd index 4f70ffcfb..4a010e231 100644 --- a/classification.qmd +++ b/classification.qmd @@ -147,12 +147,12 @@ Even highly accurate tests can have relatively low positive predictive values wh We can rearrange Bayes' theorem to express the positive predictive value in terms of the sensitivity, specificity, and disease prevalence: $$ -\begin{align} +\begin{aligned} \pmf{D \mid +} &= \frac{\pmf{+ \mid D} \cd \pmf{D}}{\pmf{+}} \\ &= \frac{\pmf{+ \mid D} \cd \pmf{D}}{\pmf{+ \mid D} \cd \pmf{D} + \pmf{+ \mid \neg D} \cd \pmf{\neg D}} \\ &= \frac{\pmf{D}}{\pmf{D} + \frac{\pmf{+ \mid \neg D}}{\pmf{+ \mid D}} \cd \pmf{\neg D}} \\ &= \frac{1}{1 + \frac{\pmf{+ \mid \neg D}}{\pmf{+ \mid D}} \cd \frac{\pmf{\neg D}}{\pmf{D}}} -\end{align} +\end{aligned} $$ --- From 08992c4c1bdde64881e23e6e7a4e4c532d4a6488 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 6 Jan 2026 06:16:04 +0000 Subject: [PATCH 11/12] Add definition blocks for diagnostic test characteristics and abbreviated notation Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- classification.qmd | 38 +++++++++++++++++++++++++++++++++----- 1 file changed, 33 insertions(+), 5 deletions(-) diff --git a/classification.qmd b/classification.qmd index 4a010e231..3f883d689 100644 --- a/classification.qmd +++ b/classification.qmd @@ -31,10 +31,37 @@ This is particularly important in public health decision-making, where we must u When evaluating a diagnostic test, we consider several key performance measures: -- **Sensitivity**: The probability that the test is positive given that the person has the disease, denoted $\pmf{\text{positive} \mid \text{disease}}$ -- **Specificity**: The probability that the test is negative given that the person does not have the disease, denoted $\pmf{\text{negative} \mid \text{no disease}}$ -- **Positive Predictive Value (PPV)**: The probability that a person has the disease given that their test is positive, denoted $\pmf{\text{disease} \mid \text{positive}}$ -- **Negative Predictive Value (NPV)**: The probability that a person does not have the disease given that their test is negative, denoted $\pmf{\text{no disease} \mid \text{negative}}$ +:::{#def-sensitivity} + +#### Sensitivity + +The probability that the test is positive given that the person has the disease, denoted $\pmf{\text{positive} \mid \text{disease}}$. + +::: + +:::{#def-specificity} + +#### Specificity + +The probability that the test is negative given that the person does not have the disease, denoted $\pmf{\text{negative} \mid \text{no disease}}$. + +::: + +:::{#def-ppv} + +#### Positive Predictive Value (PPV) + +The probability that a person has the disease given that their test is positive, denoted $\pmf{\text{disease} \mid \text{positive}}$. + +::: + +:::{#def-npv} + +#### Negative Predictive Value (NPV) + +The probability that a person does not have the disease given that their test is negative, denoted $\pmf{\text{no disease} \mid \text{negative}}$. + +::: --- @@ -151,7 +178,8 @@ $$ \pmf{D \mid +} &= \frac{\pmf{+ \mid D} \cd \pmf{D}}{\pmf{+}} \\ &= \frac{\pmf{+ \mid D} \cd \pmf{D}}{\pmf{+ \mid D} \cd \pmf{D} + \pmf{+ \mid \neg D} \cd \pmf{\neg D}} \\ &= \frac{\pmf{D}}{\pmf{D} + \frac{\pmf{+ \mid \neg D}}{\pmf{+ \mid D}} \cd \pmf{\neg D}} \\ -&= \frac{1}{1 + \frac{\pmf{+ \mid \neg D}}{\pmf{+ \mid D}} \cd \frac{\pmf{\neg D}}{\pmf{D}}} +&= \frac{1}{1 + \frac{\pmf{+ \mid \neg D}}{\pmf{+ \mid D}} \cd \frac{\pmf{\neg D}}{\pmf{D}}} \\ +&= \frac{1}{1 + \frac{1 - \text{spec}}{\text{sens}} \cd \frac{1 - \text{prev}}{\text{prev}}} \end{aligned} $$ From c42ca401ae81d923358680e24870cf4889ac3816 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 6 Jan 2026 06:53:18 +0000 Subject: [PATCH 12/12] Move classification section to appendix chapter Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- _quarto-book.yml | 1 + classification.qmd | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/_quarto-book.yml b/_quarto-book.yml index 7f777a49d..f78818872 100644 --- a/_quarto-book.yml +++ b/_quarto-book.yml @@ -38,6 +38,7 @@ book: - appendices-are-prereqs.qmd - math-prereqs.qmd - probability.qmd + - classification.qmd - estimation.qmd - inference.qmd - intro-MLEs.qmd diff --git a/classification.qmd b/classification.qmd index 3f883d689..b85756ad5 100644 --- a/classification.qmd +++ b/classification.qmd @@ -1,6 +1,6 @@ {{< include macros.qmd >}} -## Introduction to classification {#sec-classification} +# Classification {#sec-classification} ---