Skip to content

Commit 09862a7

Browse files
committed
sembr backend/libs-and-metadata.md
1 parent 4ac8f2d commit 09862a7

File tree

1 file changed

+77
-69
lines changed

1 file changed

+77
-69
lines changed

src/backend/libs-and-metadata.md

Lines changed: 77 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,41 @@
11
# Libraries and metadata
22

33
When the compiler sees a reference to an external crate, it needs to load some
4-
information about that crate. This chapter gives an overview of that process,
4+
information about that crate.
5+
This chapter gives an overview of that process,
56
and the supported file formats for crate libraries.
67

78
## Libraries
89

9-
A crate dependency can be loaded from an `rlib`, `dylib`, or `rmeta` file. A
10-
key point of these file formats is that they contain `rustc`-specific
11-
[*metadata*](#metadata). This metadata allows the compiler to discover enough
10+
A crate dependency can be loaded from an `rlib`, `dylib`, or `rmeta` file.
11+
A key point of these file formats is that they contain `rustc`-specific
12+
[*metadata*](#metadata).
13+
This metadata allows the compiler to discover enough
1214
information about the external crate to understand the items it contains,
1315
which macros it exports, and *much* more.
1416

1517
### rlib
1618

17-
An `rlib` is an [archive file], which is similar to a tar file. This file
18-
format is specific to `rustc`, and may change over time. This file contains:
19+
An `rlib` is an [archive file], which is similar to a tar file.
20+
This file format is specific to `rustc`, and may change over time.
21+
This file contains:
1922

20-
* Object code, which is the result of code generation. This is used during
21-
regular linking. There is a separate `.o` file for each [codegen unit]. The
22-
codegen step can be skipped with the [`-C
23+
* Object code, which is the result of code generation.
24+
This is used during regular linking.
25+
There is a separate `.o` file for each [codegen unit].
26+
The codegen step can be skipped with the [`-C
2327
linker-plugin-lto`][linker-plugin-lto] CLI option, which means each `.o`
2428
file will only contain LLVM bitcode.
2529
* [LLVM bitcode], which is a binary representation of LLVM's intermediate
26-
representation, which is embedded as a section in the `.o` files. This can
27-
be used for [Link Time Optimization] (LTO). This can be removed with the
30+
representation, which is embedded as a section in the `.o` files.
31+
This can be used for [Link Time Optimization] (LTO).
32+
This can be removed with the
2833
[`-C embed-bitcode=no`][embed-bitcode] CLI option to improve compile times
2934
and reduce disk space if LTO is not needed.
3035
* `rustc` [metadata], in a file named `lib.rmeta`.
3136
* A symbol table, which is essentially a list of symbols with offsets to the
32-
object files that contain that symbol. This is pretty standard for archive
33-
files.
37+
object files that contain that symbol.
38+
This is pretty standard for archive files.
3439

3540
[archive file]: https://en.wikipedia.org/wiki/Ar_(Unix)
3641
[LLVM bitcode]: https://llvm.org/docs/BitCodeFormat.html
@@ -41,46 +46,46 @@ format is specific to `rustc`, and may change over time. This file contains:
4146

4247
### dylib
4348

44-
A `dylib` is a platform-specific shared library. It includes the `rustc`
45-
[metadata] in a special link section called `.rustc`.
49+
A `dylib` is a platform-specific shared library.
50+
It includes the `rustc` [metadata] in a special link section called `.rustc`.
4651

4752
### rmeta
4853

49-
An `rmeta` file is a custom binary format that contains the [metadata] for the
50-
crate. This file can be used for fast "checks" of a project by skipping all code
54+
An `rmeta` file is a custom binary format that contains the [metadata] for the crate.
55+
This file can be used for fast "checks" of a project by skipping all code
5156
generation (as is done with `cargo check`), collecting enough information for
5257
documentation (as is done with `cargo doc`), or for [pipelining](#pipelining).
5358
This file is created if the [`--emit=metadata`][emit] CLI option is used.
5459

55-
`rmeta` files do not support linking, since they do not contain compiled
56-
object files.
60+
`rmeta` files do not support linking, since they do not contain compiled object files.
5761

5862
[emit]: https://doc.rust-lang.org/rustc/command-line-arguments.html#option-emit
5963

6064
## Metadata
6165

62-
The metadata contains a wide swath of different elements. This guide will not go
63-
into detail about every field it contains. You are encouraged to browse the
66+
The metadata contains a wide swath of different elements.
67+
This guide will not go into detail about every field it contains.
68+
You are encouraged to browse the
6469
[`CrateRoot`] definition to get a sense of the different elements it contains.
65-
Everything about metadata encoding and decoding is in the [`rustc_metadata`]
66-
package.
70+
Everything about metadata encoding and decoding is in the [`rustc_metadata`] package.
6771

6872
Here are a few highlights of things it contains:
6973

70-
* The version of the `rustc` compiler. The compiler will refuse to load files
71-
from any other version.
72-
* The [Strict Version Hash](#strict-version-hash) (SVH). This helps ensure the
73-
correct dependency is loaded.
74-
* The [Stable Crate Id](#stable-crate-id). This is a hash used
75-
to identify crates.
76-
* Information about all the source files in the library. This can be used for
77-
a variety of things, such as diagnostics pointing to sources in a
74+
* The version of the `rustc` compiler.
75+
The compiler will refuse to load files from any other version.
76+
* The [Strict Version Hash](#strict-version-hash) (SVH).
77+
This helps ensure the correct dependency is loaded.
78+
* The [Stable Crate Id](#stable-crate-id).
79+
This is a hash used to identify crates.
80+
* Information about all the source files in the library.
81+
This can be used for a variety of things, such as diagnostics pointing to sources in a
7882
dependency.
79-
* Information about exported macros, traits, types, and items. Generally,
80-
anything that's needed to be known when a path references something inside a
81-
crate dependency.
82-
* Encoded [MIR]. This is optional, and only encoded if needed for code
83-
generation. `cargo check` skips this for performance reasons.
83+
* Information about exported macros, traits, types, and items.
84+
Generally,
85+
anything that's needed to be known when a path references something inside a crate dependency.
86+
* Encoded [MIR].
87+
This is optional, and only encoded if needed for code generation.
88+
`cargo check` skips this for performance reasons.
8489

8590
[`CrateRoot`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/struct.CrateRoot.html
8691
[`rustc_metadata`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/index.html
@@ -89,10 +94,10 @@ Here are a few highlights of things it contains:
8994
### Strict Version Hash
9095

9196
The Strict Version Hash ([SVH], also known as the "crate hash") is a 64-bit
92-
hash that is used to ensure that the correct crate dependencies are loaded. It
93-
is possible for a directory to contain multiple copies of the same dependency
94-
built with different settings, or built from different sources. The crate
95-
loader will skip any crates that have the wrong SVH.
97+
hash that is used to ensure that the correct crate dependencies are loaded.
98+
It is possible for a directory to contain multiple copies of the same dependency
99+
built with different settings, or built from different sources.
100+
The crate loader will skip any crates that have the wrong SVH.
96101

97102
The SVH is also used for the [incremental compilation] session filename,
98103
though that usage is mostly historic.
@@ -114,14 +119,15 @@ See [`compute_hir_hash`] for where the hash is actually computed.
114119
### Stable Crate Id
115120

116121
The [`StableCrateId`] is a 64-bit hash used to identify different crates with
117-
potentially the same name. It is a hash of the crate name and all the
118-
[`-C metadata`] CLI options computed in [`StableCrateId::new`]. It is
119-
used in a variety of places, such as symbol name mangling, crate loading, and
122+
potentially the same name.
123+
It is a hash of the crate name and all the
124+
[`-C metadata`] CLI options computed in [`StableCrateId::new`].
125+
It is used in a variety of places, such as symbol name mangling, crate loading, and
120126
much more.
121127

122128
By default, all Rust symbols are mangled and incorporate the stable crate id.
123-
This allows multiple versions of the same crate to be included together. Cargo
124-
automatically generates `-C metadata` hashes based on a variety of factors, like
129+
This allows multiple versions of the same crate to be included together.
130+
Cargo automatically generates `-C metadata` hashes based on a variety of factors, like
125131
the package version, source, and target kind (a lib and test can have the same
126132
crate name, so they need to be disambiguated).
127133

@@ -131,30 +137,31 @@ crate name, so they need to be disambiguated).
131137

132138
## Crate loading
133139

134-
Crate loading can have quite a few subtle complexities. During [name
135-
resolution], when an external crate is referenced (via an `extern crate` or
140+
Crate loading can have quite a few subtle complexities.
141+
During [name resolution], when an external crate is referenced (via an `extern crate` or
136142
path), the resolver uses the [`CStore`] which is responsible for finding
137-
the crate libraries and loading the [metadata] for them. After the dependency
138-
is loaded, the `CStore` will provide the information the resolver needs
143+
the crate libraries and loading the [metadata] for them.
144+
After the dependency is loaded, the `CStore` will provide the information the resolver needs
139145
to perform its job (such as expanding macros, resolving paths, etc.).
140146

141147
To load each external crate, the `CStore` uses a [`CrateLocator`] to
142-
actually find the correct files for one specific crate. There is some great
143-
documentation in the [`locator`] module that goes into detail on how loading
148+
actually find the correct files for one specific crate.
149+
There is some great documentation in the [`locator`] module that goes into detail on how loading
144150
works, and I strongly suggest reading it to get the full picture.
145151

146-
The location of a dependency can come from several different places. Direct
147-
dependencies are usually passed with `--extern` flags, and the loader can look
148-
at those directly. Direct dependencies often have references to their own
149-
dependencies, which need to be loaded, too. These are usually found by
152+
The location of a dependency can come from several different places.
153+
Direct dependencies are usually passed with `--extern` flags, and the loader can look
154+
at those directly.
155+
Direct dependencies often have references to their own dependencies, which need to be loaded, too.
156+
These are usually found by
150157
scanning the directories passed with the `-L` flag for any file whose metadata
151-
contains a matching crate name and [SVH](#strict-version-hash). The loader
152-
will also look at the [sysroot] to find dependencies.
158+
contains a matching crate name and [SVH](#strict-version-hash).
159+
The loader will also look at the [sysroot] to find dependencies.
153160

154161
As crates are loaded, they are kept in the [`CStore`] with the crate metadata
155-
wrapped in the [`CrateMetadata`] struct. After resolution and expansion, the
156-
`CStore` will make its way into the [`GlobalCtxt`] for the rest of the
157-
compilation.
162+
wrapped in the [`CrateMetadata`] struct.
163+
After resolution and expansion, the
164+
`CStore` will make its way into the [`GlobalCtxt`] for the rest of the compilation.
158165

159166
[name resolution]: ../name-resolution.md
160167
[`CrateLocator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/locator/struct.CrateLocator.html
@@ -167,20 +174,21 @@ compilation.
167174
## Pipelining
168175

169176
One trick to improve compile times is to start building a crate as soon as the
170-
metadata for its dependencies is available. For a library, there is no need to
171-
wait for the code generation of dependencies to finish. Cargo implements this
172-
technique by telling `rustc` to emit an [`rmeta`](#rmeta) file for each
173-
dependency as well as an [`rlib`](#rlib). As early as it can, `rustc` will
174-
save the `rmeta` file to disk before it continues to the code generation
175-
phase. The compiler sends a JSON message to let the build tool know that it
177+
metadata for its dependencies is available.
178+
For a library, there is no need to wait for the code generation of dependencies to finish.
179+
Cargo implements this technique by telling `rustc` to emit an [`rmeta`](#rmeta) file for each
180+
dependency as well as an [`rlib`](#rlib).
181+
As early as it can, `rustc` will
182+
save the `rmeta` file to disk before it continues to the code generation phase.
183+
The compiler sends a JSON message to let the build tool know that it
176184
can start building the next crate if possible.
177185

178186
The [crate loading](#crate-loading) system is smart enough to know when it
179-
sees an `rmeta` file to use that if the `rlib` is not there (or has only been
180-
partially written).
187+
sees an `rmeta` file to use that if the `rlib` is not there (or has only been partially written).
181188

182189
This pipelining isn't possible for binaries, because the linking phase will
183-
require the code generation of all its dependencies. In the future, it may be
190+
require the code generation of all its dependencies.
191+
In the future, it may be
184192
possible to further improve this scenario by splitting linking into a separate
185193
command (see [#64191]).
186194

0 commit comments

Comments
 (0)