You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: text/0000-symbol-name-mangling-v2.md
+73-1Lines changed: 73 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -482,6 +482,63 @@ The syntax of mangled names is given in extended Backus-Naur form:
482
482
<instantiating-crate> := <path-prefix>
483
483
```
484
484
485
+
### Punycode Identifiers
486
+
487
+
Punycode generates strings of the form `([[:ascii:]]+-)?[[:alnum:]]+`. This is problematic for two reasons:
488
+
489
+
- Generated strings can contain a `-` character; which is not in the supported character set.
490
+
- Generated strings can start with a digit; which makes them clash with the byte-count prefix of the `<identifier>` production.
491
+
492
+
For these reasons, vanilla Punycode string are further encoded during mangling:
493
+
494
+
- The `-` character is simply replaced by a `_` character.
495
+
- The part of the Punycode string that encodes the non-ASCII characters is a base-36 number, using `[a-z0-9]` as its "digits". We want to get rid of the decimal digits in there, so we simply remap `0-9` to `A-J`.
With this post-processing in place the Punycode strings can be treated like regular identifiers and need no further special handling.
508
+
509
+
510
+
## Compression
511
+
512
+
The compression algorithm is defined in terms of the AST: Starting at the root, recursively substitute each child node with its compressed version. A node is compressed by replacing it with a `<substitution>` node from the dictionary (which the dictionary will contain if an *equivalent* node has already been encountered) or, if the dictionary doesn't contain a matching substitution, recursively apply compression to all child nodes and then add the current node to the dictionary.
513
+
514
+
Things to note:
515
+
516
+
- Child nodes have to be compressed in the same order in which they lexically occur in the mangled name. Processing order matters because it defines which substitution indices are allocated for which node.
517
+
518
+
- Nodes are "equivalent" if they result in the *same demangling*. Usually that means that equivalence can be tested by just comparing the sub-tree that the nodes are roots of. However, there are some *additional* equivalences that have to be considered when doing a dictionary lookup:
519
+
520
+
- A `<absolute-path>` node is equivalent to its `<path-prefix>` child node if its `<generic-arguments>` child node is empty.
521
+
522
+
- A `<path-root>` node of the from `M <type>` is equivalent to its `<type>` child node.
523
+
524
+
- A `<type>` node with a single `<absolute-path>` child is equivalent to this child node.
525
+
526
+
All productions that have a `<substitution>` on their right-hand side are added to the substitution dictionary: `<absolute-path>`, `<path-prefix>`, and `<type>`. The only exception are `<type>` nodes that are a `<basic-type>`. Those are not added to the dictionary. Also, if there is a node `X` and there already is an equivalent node `Y` in the dictionary, `X` is not added either. For example, we don't add `<absolute-path>` nodes with empty `<generic-arguments>` to the dictionary because it always already contains the `<path-prefix>` child node equivalent to its parent `<absolute-path>`.
527
+
528
+
529
+
TODO: add pseudo code implementation?
530
+
531
+
## Decompression
532
+
533
+
534
+
### Note on Efficient Demangling
535
+
536
+
537
+
## Mapping Rust Items to Mangled Names
538
+
539
+
540
+
541
+
485
542
486
543
# Drawbacks
487
544
[drawbacks]: #drawbacks
@@ -528,8 +585,23 @@ Itanium mangling).
528
585
# Unresolved questions
529
586
[unresolved-questions]: #unresolved-questions
530
587
588
+
# Appendix A - Suggested Demangling
589
+
590
+
This RFC suggests that names are demangling to a form that matches Rust syntax as it is used in source code and compiler error messages:
591
+
592
+
- Path components should be separated by `::`.
593
+
594
+
- If the path root is a `<crate-id>` it should be printed as the crate name. If the context requires it for correctness, the crate disambiguator should be printed too, as in, for example, `std[a0b1c2d3]::collections::HashMap`. In this case `a0b1c2d3` would be the disambiguator. Usually, the disambiguator can be omitted for better readability.
595
+
596
+
- If the path root is a trait impl, it should be printed as `<SelfType as Trait>`, like the compiler does in error messages.
597
+
598
+
- The list of generic arguments should be demangled as `<T1, T2, T3>`.
599
+
600
+
- Identifiers and trait impl path roots can have a numeric disambiguator (the `<disambiguator>` production). The syntactic version of the numeric disambiguator maps to a numeric index. If the disambiguator is not present, this index is 0. If it is of the form `s_` then the index is 1. If it is of the form `s<hex-digit>_` then the index is `<hex-digit> + 2`. The suggested demangling of a disambiguator is `'<index>`. However, for better readability, these disambiguators should usually be omitted in the demangling altogether. Disambiguators with index zero can always emitted.
601
+
The exception here are closures. Since these do not have a name, the disambiguator is the only thing identifying them. The suggested demangling for closures is thus `{closure}'<index>`.
0 commit comments