Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ We added a PyTorch implementation of the sliding window attention that doesn't r

**Limitations**: uses 2x more memory (but fp16 offsets that), and doesn’t support dilation and autoregressive attention (not needed for finetuning)

therefore, it is suitable for finetuning on downstream tasks but not a good choice for language modeling. The code snippit below and the TriviaQA scripts were updated to use this new implementation.
therefore, it is suitable for finetuning on downstream tasks but not a good choice for language modeling. The code snippet below and the TriviaQA scripts were updated to use this new implementation.

**\*\*\*\*\* End new information \*\*\*\*\***

Expand Down