Skip to content

Commit 3aed695

Browse files
committed
more links, comments on notes
1 parent 2558b89 commit 3aed695

File tree

2 files changed

+9
-11
lines changed

2 files changed

+9
-11
lines changed

docs/src/models/basics.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ julia> Flux.withgradient(g, nt)
7474
(val = 1, grad = ((a = [0.0, 2.0], b = [-0.0, -2.0], c = nothing),))
7575
```
7676

77-
!!! note
77+
!!! note "Implicit gradients"
7878
Flux used to handle many parameters in a different way, using the [`params`](@ref Flux.params) function.
7979
This uses a method of `gradient` which takes a zero-argument function, and returns a dictionary
8080
through which the resulting gradients can be looked up:

docs/src/training/training.md

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ structures are what Zygote calls "explicit" gradients.
5959
It is important that the execution of the model takes place inside the call to `gradient`,
6060
in order for the influence of the model's parameters to be observed by Zygote.
6161

62-
!!! note
62+
!!! note "Explicit vs implicit gradients"
6363
Flux used to use Zygote's "implicit" mode, which looks like this:
6464
```
6565
pars = Flux.params(model)
@@ -79,10 +79,10 @@ within the call to `gradient`. For instance, we could define a function
7979
loss(y_hat, y) = sum((y_hat .- y).^2)
8080
```
8181
or write this directly inside the `do` block above. Many commonly used functions,
82-
like `mse` for mean squared error or `crossentropy` for cross-entropy loss,
82+
like [`mse`](@ref Flux.Losses.mse) for mean-squared error or [`crossentropy`](@ref Flux.Losses.crossentropy) for cross-entropy loss,
8383
are available from the [`Flux.Losses`](../models/losses.md) module.
8484

85-
!!! note
85+
!!! note "Implicit-style loss functions"
8686
Flux used to need a loss function which closed over a reference to the model,
8787
instead of being a pure function. Thus in old code you may see something like
8888
```
@@ -110,14 +110,14 @@ fmap(model, grads[1]) do p, g
110110
end
111111
```
112112

113-
This is wrapped up as a function `update!`, which can be used as follows:
113+
This is wrapped up as a function [`update!`](@ref Flux.Optimise.update!), which can be used as follows:
114114

115115
```julia
116116
Flux.update!(Descent(0.01), model, grads[1])
117117
```
118118

119119
There are many other optimisation rules, which adjust the step size and direction.
120-
Most require some memory of the gradients from earlier steps. The function `setup`
120+
Most require some memory of the gradients from earlier steps. The function [`setup`](@ref Flux.Train.setup)
121121
creates the necessary storage for this, for a particular model. This should be done
122122
once, before training, and looks like this:
123123

@@ -133,11 +133,11 @@ for data in train_set
133133
end
134134
```
135135

136-
Many commonly used optimisation rules, such as `Adam`, are built-in.
136+
Many commonly used optimisation rules, such as [`Adam`](@ref Flux.Optimise.Adam), are built-in.
137137
These are listed on the [optimisers](@ref man-optimisers) page.
138138

139139

140-
!!! note
140+
!!! note "Implicit-style optimiser state"
141141
This `setep` makes another tree-like structure. Old versions of Flux did not do this,
142142
and instead stored a dictionary-like structure within the optimiser `Adam(0.001)`.
143143
This was initialised on first use of the version of `update!` for "implicit" parameters.
@@ -183,8 +183,6 @@ the two words mean the same thing) both for efficiency and for better results.
183183
This can be easily done using the [`DataLoader`](@ref Flux.Data.DataLoader):
184184

185185
```julia
186-
X = rand(28, 28, 60_000)
187-
Y = rand(0:9, 60_000)
188186
data = Flux.DataLoader((X, Y), batchsize=32)
189187

190188
x1, y1 = first(data)
@@ -209,7 +207,7 @@ train!(model, train_set, opt) do m, x, y
209207
end
210208
```
211209

212-
!!! note
210+
!!! note "Implicit-style `train!`"
213211
This is the "explicit" method of `train!`, which takes the result of `setup` as its 4th argument.
214212
The 1st argument (from the `do` block) is a function which accepts the model itself.
215213
Old Flux versions provided a method of `train!` for "implicit" parameters,

0 commit comments

Comments
 (0)