Skip to content

Commit ea28b17

Browse files
authored
Merge pull request #2661 from rust-lang/tshepang/sembr
sembr backend/libs-and-metadata.md
2 parents 4ac8f2d + caafc24 commit ea28b17

File tree

1 file changed

+78
-71
lines changed

1 file changed

+78
-71
lines changed

src/backend/libs-and-metadata.md

Lines changed: 78 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,40 @@
11
# Libraries and metadata
22

33
When the compiler sees a reference to an external crate, it needs to load some
4-
information about that crate. This chapter gives an overview of that process,
4+
information about that crate.
5+
This chapter gives an overview of that process,
56
and the supported file formats for crate libraries.
67

78
## Libraries
89

9-
A crate dependency can be loaded from an `rlib`, `dylib`, or `rmeta` file. A
10-
key point of these file formats is that they contain `rustc`-specific
11-
[*metadata*](#metadata). This metadata allows the compiler to discover enough
10+
A crate dependency can be loaded from an `rlib`, `dylib`, or `rmeta` file.
11+
A key point of these file formats is that they contain `rustc`-specific
12+
[*metadata*](#metadata).
13+
This metadata allows the compiler to discover enough
1214
information about the external crate to understand the items it contains,
1315
which macros it exports, and *much* more.
1416

1517
### rlib
1618

17-
An `rlib` is an [archive file], which is similar to a tar file. This file
18-
format is specific to `rustc`, and may change over time. This file contains:
19+
An `rlib` is an [archive file], which is similar to a tar file.
20+
This file format is specific to `rustc`, and may change over time.
21+
This file contains:
1922

20-
* Object code, which is the result of code generation. This is used during
21-
regular linking. There is a separate `.o` file for each [codegen unit]. The
22-
codegen step can be skipped with the [`-C
23-
linker-plugin-lto`][linker-plugin-lto] CLI option, which means each `.o`
24-
file will only contain LLVM bitcode.
23+
* Object code, which is the result of code generation.
24+
This is used during regular linking.
25+
There is a separate `.o` file for each [codegen unit].
26+
The codegen step can be skipped with the [`-C linker-plugin-lto`][linker-plugin-lto] CLI option,
27+
which means each `.o` file will only contain LLVM bitcode.
2528
* [LLVM bitcode], which is a binary representation of LLVM's intermediate
26-
representation, which is embedded as a section in the `.o` files. This can
27-
be used for [Link Time Optimization] (LTO). This can be removed with the
29+
representation, which is embedded as a section in the `.o` files.
30+
This can be used for [Link Time Optimization] (LTO).
31+
This can be removed with the
2832
[`-C embed-bitcode=no`][embed-bitcode] CLI option to improve compile times
2933
and reduce disk space if LTO is not needed.
3034
* `rustc` [metadata], in a file named `lib.rmeta`.
3135
* A symbol table, which is essentially a list of symbols with offsets to the
32-
object files that contain that symbol. This is pretty standard for archive
33-
files.
36+
object files that contain that symbol.
37+
This is pretty standard for archive files.
3438

3539
[archive file]: https://en.wikipedia.org/wiki/Ar_(Unix)
3640
[LLVM bitcode]: https://llvm.org/docs/BitCodeFormat.html
@@ -41,46 +45,46 @@ format is specific to `rustc`, and may change over time. This file contains:
4145

4246
### dylib
4347

44-
A `dylib` is a platform-specific shared library. It includes the `rustc`
45-
[metadata] in a special link section called `.rustc`.
48+
A `dylib` is a platform-specific shared library.
49+
It includes the `rustc` [metadata] in a special link section called `.rustc`.
4650

4751
### rmeta
4852

49-
An `rmeta` file is a custom binary format that contains the [metadata] for the
50-
crate. This file can be used for fast "checks" of a project by skipping all code
53+
An `rmeta` file is a custom binary format that contains the [metadata] for the crate.
54+
This file can be used for fast "checks" of a project by skipping all code
5155
generation (as is done with `cargo check`), collecting enough information for
5256
documentation (as is done with `cargo doc`), or for [pipelining](#pipelining).
5357
This file is created if the [`--emit=metadata`][emit] CLI option is used.
5458

55-
`rmeta` files do not support linking, since they do not contain compiled
56-
object files.
59+
`rmeta` files do not support linking, since they do not contain compiled object files.
5760

5861
[emit]: https://doc.rust-lang.org/rustc/command-line-arguments.html#option-emit
5962

6063
## Metadata
6164

62-
The metadata contains a wide swath of different elements. This guide will not go
63-
into detail about every field it contains. You are encouraged to browse the
65+
The metadata contains a wide swath of different elements.
66+
This guide will not go into detail about every field it contains.
67+
You are encouraged to browse the
6468
[`CrateRoot`] definition to get a sense of the different elements it contains.
65-
Everything about metadata encoding and decoding is in the [`rustc_metadata`]
66-
package.
69+
Everything about metadata encoding and decoding is in the [`rustc_metadata`] package.
6770

6871
Here are a few highlights of things it contains:
6972

70-
* The version of the `rustc` compiler. The compiler will refuse to load files
71-
from any other version.
72-
* The [Strict Version Hash](#strict-version-hash) (SVH). This helps ensure the
73-
correct dependency is loaded.
74-
* The [Stable Crate Id](#stable-crate-id). This is a hash used
75-
to identify crates.
76-
* Information about all the source files in the library. This can be used for
77-
a variety of things, such as diagnostics pointing to sources in a
73+
* The version of the `rustc` compiler.
74+
The compiler will refuse to load files from any other version.
75+
* The [Strict Version Hash](#strict-version-hash) (SVH).
76+
This helps ensure the correct dependency is loaded.
77+
* The [Stable Crate Id](#stable-crate-id).
78+
This is a hash used to identify crates.
79+
* Information about all the source files in the library.
80+
This can be used for a variety of things, such as diagnostics pointing to sources in a
7881
dependency.
79-
* Information about exported macros, traits, types, and items. Generally,
80-
anything that's needed to be known when a path references something inside a
81-
crate dependency.
82-
* Encoded [MIR]. This is optional, and only encoded if needed for code
83-
generation. `cargo check` skips this for performance reasons.
82+
* Information about exported macros, traits, types, and items.
83+
Generally,
84+
anything that's needed to be known when a path references something inside a crate dependency.
85+
* Encoded [MIR].
86+
This is optional, and only encoded if needed for code generation.
87+
`cargo check` skips this for performance reasons.
8488

8589
[`CrateRoot`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/struct.CrateRoot.html
8690
[`rustc_metadata`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/index.html
@@ -89,10 +93,10 @@ Here are a few highlights of things it contains:
8993
### Strict Version Hash
9094

9195
The Strict Version Hash ([SVH], also known as the "crate hash") is a 64-bit
92-
hash that is used to ensure that the correct crate dependencies are loaded. It
93-
is possible for a directory to contain multiple copies of the same dependency
94-
built with different settings, or built from different sources. The crate
95-
loader will skip any crates that have the wrong SVH.
96+
hash that is used to ensure that the correct crate dependencies are loaded.
97+
It is possible for a directory to contain multiple copies of the same dependency
98+
built with different settings, or built from different sources.
99+
The crate loader will skip any crates that have the wrong SVH.
96100

97101
The SVH is also used for the [incremental compilation] session filename,
98102
though that usage is mostly historic.
@@ -114,14 +118,15 @@ See [`compute_hir_hash`] for where the hash is actually computed.
114118
### Stable Crate Id
115119

116120
The [`StableCrateId`] is a 64-bit hash used to identify different crates with
117-
potentially the same name. It is a hash of the crate name and all the
118-
[`-C metadata`] CLI options computed in [`StableCrateId::new`]. It is
119-
used in a variety of places, such as symbol name mangling, crate loading, and
121+
potentially the same name.
122+
It is a hash of the crate name and all the
123+
[`-C metadata`] CLI options computed in [`StableCrateId::new`].
124+
It is used in a variety of places, such as symbol name mangling, crate loading, and
120125
much more.
121126

122127
By default, all Rust symbols are mangled and incorporate the stable crate id.
123-
This allows multiple versions of the same crate to be included together. Cargo
124-
automatically generates `-C metadata` hashes based on a variety of factors, like
128+
This allows multiple versions of the same crate to be included together.
129+
Cargo automatically generates `-C metadata` hashes based on a variety of factors, like
125130
the package version, source, and target kind (a lib and test can have the same
126131
crate name, so they need to be disambiguated).
127132

@@ -131,30 +136,31 @@ crate name, so they need to be disambiguated).
131136

132137
## Crate loading
133138

134-
Crate loading can have quite a few subtle complexities. During [name
135-
resolution], when an external crate is referenced (via an `extern crate` or
139+
Crate loading can have quite a few subtle complexities.
140+
During [name resolution], when an external crate is referenced (via an `extern crate` or
136141
path), the resolver uses the [`CStore`] which is responsible for finding
137-
the crate libraries and loading the [metadata] for them. After the dependency
138-
is loaded, the `CStore` will provide the information the resolver needs
142+
the crate libraries and loading the [metadata] for them.
143+
After the dependency is loaded, the `CStore` will provide the information the resolver needs
139144
to perform its job (such as expanding macros, resolving paths, etc.).
140145

141146
To load each external crate, the `CStore` uses a [`CrateLocator`] to
142-
actually find the correct files for one specific crate. There is some great
143-
documentation in the [`locator`] module that goes into detail on how loading
147+
actually find the correct files for one specific crate.
148+
There is some great documentation in the [`locator`] module that goes into detail on how loading
144149
works, and I strongly suggest reading it to get the full picture.
145150

146-
The location of a dependency can come from several different places. Direct
147-
dependencies are usually passed with `--extern` flags, and the loader can look
148-
at those directly. Direct dependencies often have references to their own
149-
dependencies, which need to be loaded, too. These are usually found by
151+
The location of a dependency can come from several different places.
152+
Direct dependencies are usually passed with `--extern` flags, and the loader can look
153+
at those directly.
154+
Direct dependencies often have references to their own dependencies, which need to be loaded, too.
155+
These are usually found by
150156
scanning the directories passed with the `-L` flag for any file whose metadata
151-
contains a matching crate name and [SVH](#strict-version-hash). The loader
152-
will also look at the [sysroot] to find dependencies.
157+
contains a matching crate name and [SVH](#strict-version-hash).
158+
The loader will also look at the [sysroot] to find dependencies.
153159

154160
As crates are loaded, they are kept in the [`CStore`] with the crate metadata
155-
wrapped in the [`CrateMetadata`] struct. After resolution and expansion, the
156-
`CStore` will make its way into the [`GlobalCtxt`] for the rest of the
157-
compilation.
161+
wrapped in the [`CrateMetadata`] struct.
162+
After resolution and expansion, the
163+
`CStore` will make its way into the [`GlobalCtxt`] for the rest of the compilation.
158164

159165
[name resolution]: ../name-resolution.md
160166
[`CrateLocator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/locator/struct.CrateLocator.html
@@ -167,20 +173,21 @@ compilation.
167173
## Pipelining
168174

169175
One trick to improve compile times is to start building a crate as soon as the
170-
metadata for its dependencies is available. For a library, there is no need to
171-
wait for the code generation of dependencies to finish. Cargo implements this
172-
technique by telling `rustc` to emit an [`rmeta`](#rmeta) file for each
173-
dependency as well as an [`rlib`](#rlib). As early as it can, `rustc` will
174-
save the `rmeta` file to disk before it continues to the code generation
175-
phase. The compiler sends a JSON message to let the build tool know that it
176+
metadata for its dependencies is available.
177+
For a library, there is no need to wait for the code generation of dependencies to finish.
178+
Cargo implements this technique by telling `rustc` to emit an [`rmeta`](#rmeta) file for each
179+
dependency as well as an [`rlib`](#rlib).
180+
As early as it can, `rustc` will
181+
save the `rmeta` file to disk before it continues to the code generation phase.
182+
The compiler sends a JSON message to let the build tool know that it
176183
can start building the next crate if possible.
177184

178185
The [crate loading](#crate-loading) system is smart enough to know when it
179-
sees an `rmeta` file to use that if the `rlib` is not there (or has only been
180-
partially written).
186+
sees an `rmeta` file to use that if the `rlib` is not there (or has only been partially written).
181187

182188
This pipelining isn't possible for binaries, because the linking phase will
183-
require the code generation of all its dependencies. In the future, it may be
189+
require the code generation of all its dependencies.
190+
In the future, it may be
184191
possible to further improve this scenario by splitting linking into a separate
185192
command (see [#64191]).
186193

0 commit comments

Comments
 (0)