Skip to content

Commit 836fc86

Browse files
Switch to from hex to base-62 as the non-decimal encoding.
1 parent 40037cf commit 836fc86

File tree

1 file changed

+6
-5
lines changed

1 file changed

+6
-5
lines changed

text/0000-symbol-name-mangling-v2.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,7 @@ _RN12std_a1b2c3d43mem8align_ofVIjEE12foo_a1b2c3d4 (for crate "foo/a1b2c3d4")
162162
_RN12std_a1b2c3d43mem8align_ofVIjEE12bar_11223344 (for crate "bar/11223344")
163163
```
164164

165+
165166
### Closures and Closure Environments
166167

167168
The scheme needs to be able to generate symbol names for the function containing the code of a closure and it needs to be able to refer to the type of a closure if it occurs as a type argument. As closures don't have a name, we need to generate one. The scheme proposes to use the namespace and disambiguation mechanisms already introduced above for this purpose. Closures get their own "namespace" (i.e. they are neither in the type nor the value namespace), and each closure has an empty name with a disambiguation index (like for macro hygiene) identifying them within their parent. The full name of a closure is then constructed like for any other named item:
@@ -180,6 +181,7 @@ mod foo {
180181

181182
In the above example we have two closures, the one assigned to `a` and the one assigned to `b`. The first one would get the local name `0C` and the second one the name `0Cs_`. The `0` signifies then length of their (empty) name. The `C` is the namespace tag, analogous to the `V` tag for the value namespace. The `s_` for the second closure is the disambiguation index (index `0` is, again, encoded by not appending a suffix). Their full names would then be `N15mycrate_4a3b56d3foo3barV0CE` and `N15mycrate_4a3b56d3foo3barV0Cs_E` respectively.
182183

184+
183185
### Methods
184186

185187
Methods are nested within `impl` or `trait` items. As such it would be possible construct their symbol names as paths like `my_crate::foo::{{impl}}::some_method` where `{{impl}}` somehow identifies the the `impl` in question. Since `impl`s don't have names, we'd have to use an indexing scheme like the one used for closures (and indeed, this is what the compiler does internally). Adding in generic arguments to, this would lead to symbol names looking like `my_crate::foo::impl'17::<u32, char>::some_method`.
@@ -265,6 +267,7 @@ impl<T: Default> Foo<T> for Bar<T> {
265267

266268
Notice that both `MSG` statics have the path `<Bar as Foo>::foo::MSG` if you just leave off the type arguments. However, we also don't have any concrete types to substitute the arguments for. Therefore, we have to disambiguate the `impls`. Since trait specialization is an unstable feature of Rust and the details are in flux, this RFC does not try to provide a mangling based on the `where` clauses of the specialized `impls`. Instead it proposes a scheme that re-uses the introduced numeric disambiguator form already used for macro hygiene and closures. Thus, conflicting `impls` would be disambiguated via an implementation defined suffix, as in `<Bar as Foo>'1::foo::MSG` and `<Bar as Foo>'2::foo::MSG`. This encoding introduces minimal additional syntax and can be replaced with something more human-readable once the definition of trait specialization is final.
267269

270+
268271
### Unicode Identifiers
269272

270273
Rust allows Unicode identifiers but our character set is restricted to ASCII alphanumerics, and `_`. In order to transcode the former to the latter, we use the same approach as Swift, which is: encode all non-ascii identifiers via [Punycode][punycode], a standardized and efficient encoding that keeps encoded strings in a rather human-readable format. So for example, the string
@@ -378,9 +381,7 @@ The reference-level explanation consists of three parts:
378381
2. A specification of the compression scheme.
379382
3. A mapping of Rust entities to the mangling syntax.
380383

381-
For implementing a demangler, only the first to sections are needed, that is, a
382-
demangler only needs to understand syntax and compression of names, but it does
383-
not have to care how the compiler generates mangled names.
384+
For implementing a demangler, only the first two sections are of interest, that is, a demangler only needs to understand syntax and compression of names, but it does not have to care about how the compiler generates mangled names.
384385

385386

386387
## Syntax Of Mangled Names
@@ -471,11 +472,11 @@ Mangled names conform to the following grammar:
471472
"u" // Unadjusted
472473
)
473474
474-
<disambiguator> = "s" [<hex-digit>] "_"
475+
<disambiguator> = "s" [<base-62-digit>] "_"
475476
476477
<generic-arguments> = "I" {<type>} "E"
477478
478-
<substitution> = "S" [<hex-digit>] "_"
479+
<substitution> = "S" [<base-62-digit>] "_"
479480
480481
// We use <path-prefix> here, so that we don't have to add a special rule for
481482
// compression. In practice, only <identifier> is expected.

0 commit comments

Comments
 (0)