From a945b714404672e643ccf32fafe5e9ee4d44a6d7 Mon Sep 17 00:00:00 2001
From: r-keller <rachael.keller@columbia.edu>
Date: Mon, 1 Oct 2018 19:31:54 -0400
Subject: [PATCH 1/2] update grammar

---
 book/ens.tex | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/book/ens.tex b/book/ens.tex
index 82aaaa7..ae2ac3f 100644
--- a/book/ens.tex
+++ b/book/ens.tex
@@ -235,7 +235,7 @@ \section{Boosting Weak Learners}
 $\vx$; otherwise $f(\vx) = -1$ for all $\vx$.  To make the problem
 moderately interesting, suppose that in the original training set,
 there are $80$ positive examples and $20$ negative examples.  In this
-case, $f\oth(\vx)=+1$.  It's weighted error rate will be $\hat\ep\oth
+case, $f\oth(\vx)=+1$.  Its weighted error rate will be $\hat\ep\oth
 = 0.2$ because it gets every negative example wrong.  Computing, we
 get $\al\oth = \frac12\log 4$.  Before normalization, we get the new
 weight for each positive (correct) example to be $1 \exp[-\frac12\log
@@ -271,7 +271,7 @@ \section{Boosting Weak Learners}
 
 In fact, a very popular weak learner is a decision \concept{decision
   stump}: a decision tree that can only ask \emph{one} question.  This
-may seem like a silly model (and, in fact, it is on it's own), but
+may seem like a silly model (and, in fact, it is on its own), but
 when combined with boosting, it becomes very effective.  To understand
 why, suppose for a moment that our data consists only of binary
 features, so that any question that a decision tree might ask is of

From 43a0794ea941c0eb288935a178b7826509b822a9 Mon Sep 17 00:00:00 2001
From: r-keller <rachael.keller@columbia.edu>
Date: Mon, 1 Oct 2018 19:33:26 -0400
Subject: [PATCH 2/2] update grammar

---
 book/dt.tex | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/book/dt.tex b/book/dt.tex
index 9ba95c1..b772ecf 100644
--- a/book/dt.tex
+++ b/book/dt.tex
@@ -501,7 +501,7 @@ \section{Formalizing the Learning Problem}
 $\hat \vx$ to corresponding prediction $\hat y$.  The key property
 that $f$ should obey is that it should do well (as measured by $\ell$)
 on future examples that are \emph{also} drawn from $\cD$.  Formally,
-it's \concept{expected loss} $\ep$ over $\cD$ with repsect to $\ell$
+its \concept{expected loss} $\ep$ over $\cD$ with repsect to $\ell$
 should be as small as possible:
 \begin{align} \label{eq:expectederror}
 \ep