Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 0 additions & 6 deletions src/input-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,6 @@ r[input.syntax]
@root CHAR -> <a Unicode scalar value>

NUL -> U+0000

TAB -> U+0009

LF -> U+000A

CR -> U+000D
```

r[input.intro]
Expand Down
39 changes: 26 additions & 13 deletions src/whitespace.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,34 @@
r[lex.whitespace]
# Whitespace

r[whitespace.syntax]
```grammar,lexer
@root WHITESPACE ->
// end of line
LF
| U+000B // vertical tabulation
| U+000C // form feed
| CR
| U+0085 // Unicode next line
| U+2028 // Unicode LINE SEPARATOR
| U+2029 // Unicode PARAGRAPH SEPARATOR
// Ignorable Code Point
| U+200E // Unicode LEFT-TO-RIGHT MARK
| U+200F // Unicode RIGHT-TO-LEFT MARK
// horizontal whitespace
| TAB
| U+0020 // space ' '

TAB -> U+0009 // horizontal tab ('\t')

LF -> U+000A // line feed ('\n')

CR -> U+000D // carriage return ('\r')
```

r[lex.whitespace.intro]
Whitespace is any non-empty string containing only characters that have the
[`Pattern_White_Space`] Unicode property, namely:

- `U+0009` (horizontal tab, `'\t'`)
- `U+000A` (line feed, `'\n'`)
- `U+000B` (vertical tab)
- `U+000C` (form feed)
- `U+000D` (carriage return, `'\r'`)
- `U+0020` (space, `' '`)
- `U+0085` (next line)
- `U+200E` (left-to-right mark)
- `U+200F` (right-to-left mark)
- `U+2028` (line separator)
- `U+2029` (paragraph separator)
[`Pattern_White_Space`] Unicode property.

r[lex.whitespace.token-sep]
Rust is a "free-form" language, meaning that all forms of whitespace serve only
Expand Down