diff --git a/book/dt.tex b/book/dt.tex index 9ba95c1..b772ecf 100644 --- a/book/dt.tex +++ b/book/dt.tex @@ -501,7 +501,7 @@ \section{Formalizing the Learning Problem} $\hat \vx$ to corresponding prediction $\hat y$. The key property that $f$ should obey is that it should do well (as measured by $\ell$) on future examples that are \emph{also} drawn from $\cD$. Formally, -it's \concept{expected loss} $\ep$ over $\cD$ with repsect to $\ell$ +its \concept{expected loss} $\ep$ over $\cD$ with repsect to $\ell$ should be as small as possible: \begin{align} \label{eq:expectederror} \ep diff --git a/book/ens.tex b/book/ens.tex index 82aaaa7..ae2ac3f 100644 --- a/book/ens.tex +++ b/book/ens.tex @@ -235,7 +235,7 @@ \section{Boosting Weak Learners} $\vx$; otherwise $f(\vx) = -1$ for all $\vx$. To make the problem moderately interesting, suppose that in the original training set, there are $80$ positive examples and $20$ negative examples. In this -case, $f\oth(\vx)=+1$. It's weighted error rate will be $\hat\ep\oth +case, $f\oth(\vx)=+1$. Its weighted error rate will be $\hat\ep\oth = 0.2$ because it gets every negative example wrong. Computing, we get $\al\oth = \frac12\log 4$. Before normalization, we get the new weight for each positive (correct) example to be $1 \exp[-\frac12\log @@ -271,7 +271,7 @@ \section{Boosting Weak Learners} In fact, a very popular weak learner is a decision \concept{decision stump}: a decision tree that can only ask \emph{one} question. This -may seem like a silly model (and, in fact, it is on it's own), but +may seem like a silly model (and, in fact, it is on its own), but when combined with boosting, it becomes very effective. To understand why, suppose for a moment that our data consists only of binary features, so that any question that a decision tree might ask is of