@@ -59,7 +59,7 @@ structures are what Zygote calls "explicit" gradients.
5959It is important that the execution of the model takes place inside the call to ` gradient ` ,
6060in order for the influence of the model's parameters to be observed by Zygote.
6161
62- !!! note
62+ !!! note "Explicit vs implicit gradients"
6363 Flux used to use Zygote's "implicit" mode, which looks like this:
6464 ```
6565 pars = Flux.params(model)
@@ -79,10 +79,10 @@ within the call to `gradient`. For instance, we could define a function
7979loss (y_hat, y) = sum ((y_hat .- y). ^ 2 )
8080```
8181or write this directly inside the ` do ` block above. Many commonly used functions,
82- like ` mse ` for mean squared error or ` crossentropy ` for cross-entropy loss,
82+ like [ ` mse ` ] ( @ ref Flux.Losses.mse) for mean- squared error or [ ` crossentropy ` ] ( @ ref Flux.Losses.crossentropy) for cross-entropy loss,
8383are available from the [ ` Flux.Losses ` ] ( ../models/losses.md ) module.
8484
85- !!! note
85+ !!! note "Implicit-style loss functions"
8686 Flux used to need a loss function which closed over a reference to the model,
8787 instead of being a pure function. Thus in old code you may see something like
8888 ```
@@ -110,14 +110,14 @@ fmap(model, grads[1]) do p, g
110110end
111111```
112112
113- This is wrapped up as a function ` update! ` , which can be used as follows:
113+ This is wrapped up as a function [ ` update! ` ] ( @ ref Flux.Optimise.update!) , which can be used as follows:
114114
115115``` julia
116116Flux. update! (Descent (0.01 ), model, grads[1 ])
117117```
118118
119119There are many other optimisation rules, which adjust the step size and direction.
120- Most require some memory of the gradients from earlier steps. The function ` setup `
120+ Most require some memory of the gradients from earlier steps. The function [ ` setup ` ] ( @ ref Flux.Train.setup)
121121creates the necessary storage for this, for a particular model. This should be done
122122once, before training, and looks like this:
123123
@@ -133,11 +133,11 @@ for data in train_set
133133end
134134```
135135
136- Many commonly used optimisation rules, such as ` Adam ` , are built-in.
136+ Many commonly used optimisation rules, such as [ ` Adam ` ] ( @ ref Flux.Optimise.Adam) , are built-in.
137137These are listed on the [ optimisers] (@ref man-optimisers) page.
138138
139139
140- !!! note
140+ !!! note "Implicit-style optimiser state"
141141 This ` setep ` makes another tree-like structure. Old versions of Flux did not do this,
142142 and instead stored a dictionary-like structure within the optimiser ` Adam(0.001) ` .
143143 This was initialised on first use of the version of ` update! ` for "implicit" parameters.
@@ -183,8 +183,6 @@ the two words mean the same thing) both for efficiency and for better results.
183183This can be easily done using the [ ` DataLoader ` ] (@ref Flux.Data.DataLoader):
184184
185185``` julia
186- X = rand (28 , 28 , 60_000 )
187- Y = rand (0 : 9 , 60_000 )
188186data = Flux. DataLoader ((X, Y), batchsize= 32 )
189187
190188x1, y1 = first (data)
@@ -209,7 +207,7 @@ train!(model, train_set, opt) do m, x, y
209207end
210208```
211209
212- !!! note
210+ !!! note "Implicit-style ` train! ` "
213211 This is the "explicit" method of ` train! ` , which takes the result of ` setup ` as its 4th argument.
214212 The 1st argument (from the ` do ` block) is a function which accepts the model itself.
215213 Old Flux versions provided a method of ` train! ` for "implicit" parameters,
0 commit comments