Skip to content

Commit 073a455

Browse files
committed
add learn about section
1 parent 7d0aeea commit 073a455

File tree

1 file changed

+36
-47
lines changed
  • questions/195_gradient-checkpointing

1 file changed

+36
-47
lines changed
Lines changed: 36 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,36 @@
1-
## Solution Explanation
2-
3-
Add intuition, math, and step-by-step reasoning here.
4-
5-
### Writing Mathematical Expressions with LaTeX
6-
7-
This editor supports LaTeX for rendering mathematical equations and expressions. Here's how you can use it:
8-
9-
1. **Inline Math**:
10-
- Wrap your expression with single `$` symbols.
11-
- Example: `$E = mc^2$` → Renders as: ( $E = mc^2$ )
12-
13-
2. **Block Math**:
14-
- Wrap your expression with double `$$` symbols.
15-
- Example:
16-
```
17-
$$
18-
\int_a^b f(x) \, dx
19-
$$
20-
```
21-
Renders as:
22-
$$
23-
\int_a^b f(x) \, dx
24-
$$
25-
26-
3. **Math Functions**:
27-
- Use standard LaTeX functions like `\frac`, `\sqrt`, `\sum`, etc.
28-
- Examples:
29-
- `$\frac{a}{b}$` → ( $\frac{a}{b}$ )
30-
- `$\sqrt{x}$` → ( $\sqrt{x}$ )
31-
32-
4. **Greek Letters and Symbols**:
33-
- Use commands like `\alpha`, `\beta`, etc., for Greek letters.
34-
- Example: `$\alpha + \beta = \gamma$` → ( $\alpha + \beta = \gamma$ )
35-
36-
5. **Subscripts and Superscripts**:
37-
- Use `_{}` for subscripts and `^{}` for superscripts.
38-
- Examples:
39-
- `$x_i$` → ( $x_i$ )
40-
- `$x^2$` → ( $x^2$ )
41-
42-
6. **Combined Examples**:
43-
- `$\sum_{i=1}^n i^2 = \frac{n(n+1)(2n+1)}{6}$`
44-
Renders as:
45-
$\sum_{i=1}^n i^2 = \frac{n(n+1)(2n+1)}{6}$
46-
47-
Feel free to write your own mathematical expressions, and they will be rendered beautifully in the preview!
1+
# **Gradient Checkpointing**
2+
3+
## **1. Definition**
4+
Gradient checkpointing is a technique used in deep learning to reduce memory usage during training by selectively storing only a subset of intermediate activations (checkpoints) and recomputing the others as needed during the backward pass. This allows training of larger models or using larger batch sizes without exceeding memory limits.
5+
6+
## **2. Why Use Gradient Checkpointing?**
7+
* **Reduce Memory Usage:** By storing fewer activations, memory requirements are reduced, enabling training of deeper or larger models.
8+
* **Enable Larger Batches/Models:** Makes it possible to fit larger models or use larger batch sizes on limited hardware.
9+
* **Tradeoff:** The main tradeoff is increased computation time, as some activations must be recomputed during the backward pass.
10+
11+
## **3. Gradient Checkpointing Mechanism**
12+
Suppose a model consists of $N$ layers, each represented by a function $f_i$. Normally, the forward pass stores all intermediate activations:
13+
14+
$$
15+
A_0 = x \\
16+
A_1 = f_1(A_0) \\
17+
A_2 = f_2(A_1) \\
18+
\ldots \\
19+
A_N = f_N(A_{N-1})
20+
$$
21+
22+
With gradient checkpointing, only a subset of $A_i$ are stored (the checkpoints). The others are recomputed as needed during backpropagation. In the simplest case, you can store only the input and output, and recompute all intermediates when needed.
23+
24+
**Example:**
25+
If you have three functions $f_1, f_2, f_3$ and input $x$:
26+
* Forward: $A_1 = f_1(x)$, $A_2 = f_2(A_1)$, $A_3 = f_3(A_2)$
27+
* With checkpointing, you might only store $x$ and $A_3$, and recompute $A_1$ and $A_2$ as needed.
28+
29+
## **4. Applications of Gradient Checkpointing**
30+
Gradient checkpointing is widely used in training:
31+
* **Very Deep Neural Networks:** Transformers, ResNets, and other architectures with many layers.
32+
* **Large-Scale Models:** Language models, vision models, and more.
33+
* **Memory-Constrained Environments:** When hardware cannot fit all activations in memory.
34+
* **Any optimization problem** where memory is a bottleneck during training.
35+
36+
Gradient checkpointing is a powerful tool to enable training of large models on limited hardware, at the cost of extra computation.

0 commit comments

Comments
 (0)