From a945b714404672e643ccf32fafe5e9ee4d44a6d7 Mon Sep 17 00:00:00 2001 From: r-keller Date: Mon, 1 Oct 2018 19:31:54 -0400 Subject: [PATCH 1/2] update grammar --- book/ens.tex | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/book/ens.tex b/book/ens.tex index 82aaaa7..ae2ac3f 100644 --- a/book/ens.tex +++ b/book/ens.tex @@ -235,7 +235,7 @@ \section{Boosting Weak Learners} $\vx$; otherwise $f(\vx) = -1$ for all $\vx$. To make the problem moderately interesting, suppose that in the original training set, there are $80$ positive examples and $20$ negative examples. In this -case, $f\oth(\vx)=+1$. It's weighted error rate will be $\hat\ep\oth +case, $f\oth(\vx)=+1$. Its weighted error rate will be $\hat\ep\oth = 0.2$ because it gets every negative example wrong. Computing, we get $\al\oth = \frac12\log 4$. Before normalization, we get the new weight for each positive (correct) example to be $1 \exp[-\frac12\log @@ -271,7 +271,7 @@ \section{Boosting Weak Learners} In fact, a very popular weak learner is a decision \concept{decision stump}: a decision tree that can only ask \emph{one} question. This -may seem like a silly model (and, in fact, it is on it's own), but +may seem like a silly model (and, in fact, it is on its own), but when combined with boosting, it becomes very effective. To understand why, suppose for a moment that our data consists only of binary features, so that any question that a decision tree might ask is of From 43a0794ea941c0eb288935a178b7826509b822a9 Mon Sep 17 00:00:00 2001 From: r-keller Date: Mon, 1 Oct 2018 19:33:26 -0400 Subject: [PATCH 2/2] update grammar --- book/dt.tex | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/book/dt.tex b/book/dt.tex index 9ba95c1..b772ecf 100644 --- a/book/dt.tex +++ b/book/dt.tex @@ -501,7 +501,7 @@ \section{Formalizing the Learning Problem} $\hat \vx$ to corresponding prediction $\hat y$. The key property that $f$ should obey is that it should do well (as measured by $\ell$) on future examples that are \emph{also} drawn from $\cD$. Formally, -it's \concept{expected loss} $\ep$ over $\cD$ with repsect to $\ell$ +its \concept{expected loss} $\ep$ over $\cD$ with repsect to $\ell$ should be as small as possible: \begin{align} \label{eq:expectederror} \ep