|
1 | | -## Solution Explanation |
2 | | - |
3 | | -Add intuition, math, and step-by-step reasoning here. |
4 | | - |
5 | | -### Writing Mathematical Expressions with LaTeX |
6 | | - |
7 | | -This editor supports LaTeX for rendering mathematical equations and expressions. Here's how you can use it: |
8 | | - |
9 | | -1. **Inline Math**: |
10 | | - - Wrap your expression with single `$` symbols. |
11 | | - - Example: `$E = mc^2$` → Renders as: ( $E = mc^2$ ) |
12 | | - |
13 | | -2. **Block Math**: |
14 | | - - Wrap your expression with double `$$` symbols. |
15 | | - - Example: |
16 | | - ``` |
17 | | - $$ |
18 | | - \int_a^b f(x) \, dx |
19 | | - $$ |
20 | | - ``` |
21 | | - Renders as: |
22 | | - $$ |
23 | | - \int_a^b f(x) \, dx |
24 | | - $$ |
25 | | -
|
26 | | -3. **Math Functions**: |
27 | | - - Use standard LaTeX functions like `\frac`, `\sqrt`, `\sum`, etc. |
28 | | - - Examples: |
29 | | - - `$\frac{a}{b}$` → ( $\frac{a}{b}$ ) |
30 | | - - `$\sqrt{x}$` → ( $\sqrt{x}$ ) |
31 | | -
|
32 | | -4. **Greek Letters and Symbols**: |
33 | | - - Use commands like `\alpha`, `\beta`, etc., for Greek letters. |
34 | | - - Example: `$\alpha + \beta = \gamma$` → ( $\alpha + \beta = \gamma$ ) |
35 | | -
|
36 | | -5. **Subscripts and Superscripts**: |
37 | | - - Use `_{}` for subscripts and `^{}` for superscripts. |
38 | | - - Examples: |
39 | | - - `$x_i$` → ( $x_i$ ) |
40 | | - - `$x^2$` → ( $x^2$ ) |
41 | | -
|
42 | | -6. **Combined Examples**: |
43 | | - - `$\sum_{i=1}^n i^2 = \frac{n(n+1)(2n+1)}{6}$` |
44 | | - Renders as: |
45 | | - $\sum_{i=1}^n i^2 = \frac{n(n+1)(2n+1)}{6}$ |
46 | | -
|
47 | | -Feel free to write your own mathematical expressions, and they will be rendered beautifully in the preview! |
| 1 | +# **Gradient Checkpointing** |
| 2 | + |
| 3 | +## **1. Definition** |
| 4 | +Gradient checkpointing is a technique used in deep learning to reduce memory usage during training by selectively storing only a subset of intermediate activations (checkpoints) and recomputing the others as needed during the backward pass. This allows training of larger models or using larger batch sizes without exceeding memory limits. |
| 5 | + |
| 6 | +## **2. Why Use Gradient Checkpointing?** |
| 7 | +* **Reduce Memory Usage:** By storing fewer activations, memory requirements are reduced, enabling training of deeper or larger models. |
| 8 | +* **Enable Larger Batches/Models:** Makes it possible to fit larger models or use larger batch sizes on limited hardware. |
| 9 | +* **Tradeoff:** The main tradeoff is increased computation time, as some activations must be recomputed during the backward pass. |
| 10 | + |
| 11 | +## **3. Gradient Checkpointing Mechanism** |
| 12 | +Suppose a model consists of $N$ layers, each represented by a function $f_i$. Normally, the forward pass stores all intermediate activations: |
| 13 | + |
| 14 | +$$ |
| 15 | +A_0 = x \\ |
| 16 | +A_1 = f_1(A_0) \\ |
| 17 | +A_2 = f_2(A_1) \\ |
| 18 | +\ldots \\ |
| 19 | +A_N = f_N(A_{N-1}) |
| 20 | +$$ |
| 21 | + |
| 22 | +With gradient checkpointing, only a subset of $A_i$ are stored (the checkpoints). The others are recomputed as needed during backpropagation. In the simplest case, you can store only the input and output, and recompute all intermediates when needed. |
| 23 | + |
| 24 | +**Example:** |
| 25 | +If you have three functions $f_1, f_2, f_3$ and input $x$: |
| 26 | +* Forward: $A_1 = f_1(x)$, $A_2 = f_2(A_1)$, $A_3 = f_3(A_2)$ |
| 27 | +* With checkpointing, you might only store $x$ and $A_3$, and recompute $A_1$ and $A_2$ as needed. |
| 28 | + |
| 29 | +## **4. Applications of Gradient Checkpointing** |
| 30 | +Gradient checkpointing is widely used in training: |
| 31 | +* **Very Deep Neural Networks:** Transformers, ResNets, and other architectures with many layers. |
| 32 | +* **Large-Scale Models:** Language models, vision models, and more. |
| 33 | +* **Memory-Constrained Environments:** When hardware cannot fit all activations in memory. |
| 34 | +* **Any optimization problem** where memory is a bottleneck during training. |
| 35 | + |
| 36 | +Gradient checkpointing is a powerful tool to enable training of large models on limited hardware, at the cost of extra computation. |
0 commit comments