Skip to content

Commit f4846fc

Browse files
committed
docs: document flow for parsing and compiling
1 parent 78b4fc9 commit f4846fc

File tree

1 file changed

+311
-0
lines changed

1 file changed

+311
-0
lines changed

crates/djc-template-parser/README.md

Lines changed: 311 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,3 +140,314 @@ The parser uses Pest's declarative grammar to define Django template syntax rule
140140
4. Add tests to both test files if needed
141141
5. Run `maturin develop` to test your changes
142142
6. Ensure all tests pass before submitting a PR
143+
144+
## On template tag parser
145+
146+
The template syntax parsing was implemented using [Pest](https://pest.rs/). Pest works in 3 parts:
147+
148+
1. "grammar rules" - definition of patterns that are supported in the.. language? I'm not sure about the correct terminology.
149+
150+
Pest defines it's own language for defining these rules, see `djc-template-parser/src/grammar.pest`.
151+
152+
This is similar to [Backus–Naur Form](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form), e.g.
153+
154+
```
155+
<postal-address> ::= <name-part> <street-address> <zip-part>
156+
<name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL> | <name-part>
157+
<street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL>
158+
<zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL>
159+
```
160+
161+
Or the MDN's formal syntax, e.g. [here](https://developer.mozilla.org/en-US/docs/Web/CSS/border-left-width#formal_syntax):
162+
```
163+
border-left-width =
164+
<line-width>
165+
166+
<line-width> =
167+
[<length [0,∞]>](https://developer.mozilla.org/en-US/docs/Web/CSS/length) [|](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_values_and_units/Value_definition_syntax#single_bar)
168+
thin [|](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_values_and_units/Value_definition_syntax#single_bar)
169+
medium [|](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_values_and_units/Value_definition_syntax#single_bar)
170+
thick
171+
```
172+
173+
Well and this Pest grammar is where all the permissible patterns are defined. E.g. here's a high-level example for a `{% ... %}` template tag (NOTE: outdated version):
174+
175+
```
176+
// The full tag is a sequence of attributes
177+
// E.g. `{% slot key=val key2=val2 %}`
178+
tag_wrapper = { SOI ~ django_tag ~ EOI }
179+
180+
django_tag = { "{%" ~ tag_content ~ "%}" }
181+
182+
// The contents of a tag, without the delimiters
183+
tag_content = ${
184+
spacing* // Optional leading whitespace/comments
185+
~ tag_name // The tag name must come first, MAY be preceded by whitespace
186+
~ (spacing+ ~ attribute)* // Then zero or more attributes, MUST be separated by whitespace/comments
187+
~ spacing* // Optional trailing whitespace/comments
188+
~ self_closing_slash? // Optional self-closing slash
189+
~ spacing* // More optional trailing whitespace
190+
}
191+
```
192+
193+
2. Parsing and handling of the matched grammar rules.
194+
195+
So each defined rule has its own name, e.g. `django_tag`.
196+
197+
When a text is parsed with Pest in Rust, we get a list of parsed rules (or a single rule?).
198+
199+
Since the grammar definition specifies the entire `{% .. %}` template tag, and we pass in a string starting and ending in `{% ... %}`, we should match exactly the top-level `tag_wrapper` rule.
200+
201+
If we match anything else in its place, we raise an error.
202+
203+
Once we have `tag_wrapper`, we walk down it, rule by rule, constructing the AST from the patterns we come across.
204+
205+
3. Constructing the AST.
206+
207+
The AST consists of these nodes - Tag, TagAttr, TagToken, TagValue, TagValueFilter
208+
209+
- `Tag` - the entire `{% ... %}`, e.g `{% my_tag x ...[1, 2, 3] key=val / %}`
210+
211+
- The first word inside a `Tag` is the `tag_name`, e.g. `my_tag`.
212+
- After the tag name, there are zero or more `TagAttrs`. This is ALL inputs, both positional and keyword
213+
- Tag attrs are `x`, `...[1, 2, 3]`, `key=val`
214+
- If a tag attribute has a key, that's stored on `TagAttrs`.
215+
- But ALL `TagAttrs` MUST have a value.
216+
- TagValue holds a single value, may have a filter, e.g. `"cool"|upper`
217+
- TagValue may be of different kinds, e.g. string, int, float, literal list, literal dict, variable, translation `_('mystr')`, etc. The specific kind is identified by what rules we parse, and the resulting TagValue nodes are distinguished by the `ValueKind`, an enum with values like `"string"`, `"float"`, etc.
218+
- Since TagValue can be also e.g. literal lists, TagValues may contain other TagValues. This implies that:
219+
1. Lists and dicts themselves can have filters applied to them, e.g. `[1, 2, 3]|append:4`
220+
2. items inside lists and dicts can too have filters applied to them. e.g. `[1|add:1, 2|add:2]`
221+
- Any TagValue can have 0 or more filters applied to it. Filters have a name and an optional argument, e.g. `3|add:2` - filter name `add`, arg `2`. These filters are held by `TagValueFilter`.
222+
- While the filter name is a plain identifier, the argument can be yet another TagValue. so even using literal lists and dicts at the position of filter argument is permitted, e.g. `[1]|extend:[2, 3]`
223+
224+
- Lastly, `TagToken` is a secondary object used by the nodes above. It contains info about the original raw string, and the line / col where the string was found.
225+
226+
The final AST can look like this:
227+
228+
INPUT:
229+
```django
230+
{% my_tag value|lower %}
231+
```
232+
233+
AST:
234+
```rs
235+
Tag {
236+
name: TagToken {
237+
token: "my_tag".to_string(),
238+
start_index: 3,
239+
end_index: 9,
240+
line_col: (1, 4),
241+
},
242+
attrs: vec![TagAttr {
243+
key: None,
244+
value: TagValue {
245+
token: TagToken {
246+
token: "value".to_string(),
247+
start_index: 10,
248+
end_index: 15,
249+
line_col: (1, 11),
250+
},
251+
children: vec![],
252+
spread: None,
253+
filters: vec![TagValueFilter {
254+
arg: None,
255+
token: TagToken {
256+
token: "lower".to_string(),
257+
start_index: 16,
258+
end_index: 21,
259+
line_col: (1, 17),
260+
},
261+
start_index: 15,
262+
end_index: 21,
263+
line_col: (1, 16),
264+
}],
265+
kind: ValueKind::Variable,
266+
start_index: 10,
267+
end_index: 21,
268+
line_col: (1, 11),
269+
},
270+
is_flag: false,
271+
start_index: 10,
272+
end_index: 21,
273+
line_col: (1, 11),
274+
}],
275+
is_self_closing: false,
276+
syntax: TagSyntax::Django,
277+
start_index: 0,
278+
end_index: 24,
279+
line_col: (1, 4),
280+
}
281+
```
282+
283+
284+
## On template tag compilation
285+
286+
Another important part is the "tag compiler". This turns the parsed AST into an executable Python function. When this function is called with the `Context` object, it resolves the inputs to a tag into Python args and kwargs.
287+
288+
```py
289+
from djc_core import parse_tag, compile_tag
290+
291+
ast = parse_tag('{% my_tag var1 ...[2, 3] key=val ...{"other": "x"} / %}')
292+
tag_fn = compile_tag(ast)
293+
294+
args, kwargs = tag_fn({"var1": "hello", "val": "abc"})
295+
296+
assert args == ["hello", 2, 3]
297+
assert kwargs == {"key": "abc", "other": "x"}
298+
```
299+
300+
How it works is:
301+
302+
1. We start with the AST of the template tag.
303+
2. TagAttrs with keys become function's kwargs, and TagAttrs without keys are functions args.
304+
3. For each TagAttr, we walk down it's value, and handle each ValueKind differently
305+
- Literals - 1, 1.5, "abc", etc - These are compiled as literal Python values
306+
- Variables - e.g. `my_var` - we replace that with function call `variable(context, "my_var")`
307+
- Filters - `my_var|add:"txt"` - replaced with function call `filter(context, "add", my_var, "txt")`
308+
- Translation `_("abc")` - function call `translation(context, "abc")`
309+
- String with nested template tags, e.g. `"Hello {{ first_name }}"` - function call `template_string(context, "Hello {{ first_name }}")`
310+
- Literal lists and dicts - structure preserved, and we walk down and convert each item, key, value.
311+
312+
Input:
313+
314+
```django
315+
{% component my_var|add:"txt" / %}
316+
```
317+
318+
Generated function:
319+
320+
```py
321+
def compiled_func(context, *, template_string, translation, variable, filter):
322+
args = []
323+
kwargs = []
324+
args.append(filter(context, 'add', variable(context, 'my_var'), "txt"))
325+
return args, kwargs
326+
```
327+
328+
4. Apply Django-specific logic
329+
330+
As you can see, the generated function accepts the definitions for the functions `variable()`, `filter()`, etc.
331+
332+
This means that the implementation for these is defined in Python. So we can still easily change how individual features are handled. These definitions of `variable()`, etc are NOT exposed to the users of django-components.
333+
334+
The implementation is defined in django-components, and it looks something like below.
335+
336+
There you can see e.g. that when the Rust compiler came across a variable `my_var`, it generated `variable(..)` call. And the implementation for `variable(...)` calls Django's `Variable(var).resolve(ctx)`.
337+
338+
So at the end of the day we're still using the same Django logic to actually resolve variables into actual values.
339+
340+
```py
341+
def resolve_template_string(ctx: Context, expr: str) -> Any:
342+
return DynamicFilterExpression(
343+
expr_str=expr,
344+
filters=filters,
345+
tags=tags,
346+
).resolve(ctx)
347+
348+
def resolve_filter(_ctx: Context, name: str, value: Any, arg: Any) -> Any:
349+
if name not in filters:
350+
raise TemplateSyntaxError(f"Invalid filter: '{name}'")
351+
352+
filter_func = filters[name]
353+
if arg is None:
354+
return filter_func(value)
355+
else:
356+
return filter_func(value, arg)
357+
358+
def resolve_variable(ctx: Context, var: str) -> Any:
359+
try:
360+
return Variable(var).resolve(ctx)
361+
except VariableDoesNotExist:
362+
return ""
363+
364+
def resolve_translation(ctx: Context, var: str) -> Any:
365+
# The compiler gives us the variable stripped of `_(")` and `"),
366+
# so we put it back for Django's Variable class to interpret it as a translation.
367+
translation_var = "_('" + var + "')"
368+
return Variable(translation_var).resolve(ctx)
369+
370+
args, kwargs = compiled_tag(
371+
context=context,
372+
template_string=template_string,
373+
variable=resolve_variable,
374+
translation=resolve_translation,
375+
filter=resolve_filter,
376+
)
377+
```
378+
379+
5. Call the component with the args and kwargs
380+
381+
The compiled function returned a list of args and a dict of kwargs. We then simply pass these further to the implementation of the `{% component %}` node.
382+
383+
So a template tag like this:
384+
385+
```django
386+
{% component "my_table" var1 ...[2, 3] key=val ...{"other": "x"} / %}
387+
```
388+
389+
Eventually gets resolved to something like so:
390+
391+
```py
392+
ComponentNode.render("my_table", var1, 2, 3, key=val, other="x")
393+
```
394+
395+
**Validation**
396+
397+
The template tag inputs respect Python's convetion of not allowing args after kwargs.
398+
399+
When compiling AST into a Python function, we're able to detect obvious cases and raise an error early, like:
400+
401+
```django
402+
{% component key=val my_var / %} {# Error! #}
403+
```
404+
However, some cases can be figured out only at render time. Becasue the spread syntax `...my_var` can be used with both a list of args or a dict of kwargs.
405+
406+
So we need to wait for the Context object to figure out whether this:
407+
```django
408+
{% component ...items my_var / %}
409+
```
410+
Resolves to lists (OK):
411+
```django
412+
{% component ...[1, 2, 3] my_var / %}
413+
```
414+
Or to dict (Error):
415+
```django
416+
{% component ...{"key": "x"} my_var / %}
417+
```
418+
419+
So when we detect that there is a spread within the template tag, we add a render-time function that checks whether the spread resolves to list or a dict, and raises if it's not permitted:
420+
421+
INPUT:
422+
```django
423+
{% component ...options1 key1="value1" ...options2 key1="value1" / %}
424+
```
425+
426+
Generated function:
427+
```py
428+
def compiled_func(context, *, expression, translation, variable, filter):
429+
def _handle_spread(value, raw_token_str, args, kwargs, kwarg_seen):
430+
if hasattr(value, "keys"):
431+
kwargs.extend(value.items())
432+
return True
433+
else:
434+
if kwarg_seen:
435+
raise SyntaxError("positional argument follows keyword argument")
436+
try:
437+
args.extend(value)
438+
except TypeError:
439+
raise TypeError(
440+
f"Value of '...{raw_token_str}' must be a mapping or an iterable, "
441+
f"not {type(value).__name__}."
442+
)
443+
return False
444+
445+
args = []
446+
kwargs = []
447+
kwargs.append(('key1', "value1"))
448+
kwarg_seen = True
449+
kwarg_seen = _handle_spread(variable(context, 'options1'), """options1""", args, kwargs, kwarg_seen)
450+
kwargs.append(('key2', "value2"))
451+
kwarg_seen = _handle_spread(variable(context, 'options2'), """options2""", args, kwargs, kwarg_seen)
452+
return args, kwargs
453+
```

0 commit comments

Comments
 (0)