Skip to content

Soundness fix: respect read_scalar errors in read_from_const_alloc. #353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 30, 2025

Conversation

eddyb
Copy link
Collaborator

@eddyb eddyb commented Jul 29, 2025

(If you've seen me mention how various work in this area, like #348, or even #350 or #341 which have already landed, is somewhat of a prerequisite, or at least arose during some bug fix, this is it, but I decided this is far too important to block on any other improvements, so this PR contains its most minimal form I can think of - the benefits from extra work is mostly diagnostics, but correctness comes first)

The method read_from_const_alloc(_at) (suffixed after this PR) is responsible for reading the components of a SPIR-V constant of some type ty at an offset in some constant alloc (i.e. a miri/CTFE memory allocation, so a mix of plain bytes, symbolic pointers, and uninitialized memory).


The first problem was it using &mut for offset, and its mutation being relied on for both auto-advancing (e.g. between the components of a SPIR-V vector/matrix), but also for some ad-hoc checks.

So the first commit in this PR refactors it to:

  • strongly rely on passing down a read-only offset (structs/arrays adding field/element sub-offsets to it)
  • have only 4 groups of cases: primitive (int/float/pointer leaves), structs, array(-like), and unsupported
  • return an overall Size for the constant value that was read, alongside that value
    • for sized types ty, this is guaranteed (and checked) to equal the size of ty
      (i.e. cx.lookup_type(ty).sizeof(cx) == Some(read_size))
    • for unsized types ty, this mimics Rust mem::size_of_val,
      (if ty is, or ends in [T], this will fit as many T elements as possible in alloc.size(),
      after offset, so it'll almost always be the whole alloc, minus at most a gap smaller than T)
  • replace the separate create_const_alloc function (which used to check the final offset w/ an assert_eq!) with an Option-returning try_read_from_const_alloc that checks the read Size against alloc.size()
    • the main reason to check the size is to avoid truncating some &CONST because of pointer casts
      (e.g. if ARRAY[i] is equivalent to *ARRAY.as_ptr().add(i), and .as_ptr() is a &[T; N] -> *T cast,
      you really don't want that to become *(const { &{ARRAY[0]} } as *const T).add(i) and UB for i > 0)
    • the opportunistic read_from_const_alloc in const_bitcast (the main way &CONST gets a type) already
      fits the conditional nature of try_read_from_const_alloc (and other refactors break w/o such a check)
    • only non-&CONST use of create_const_alloc was for the initializer of statics, and that can always
      unwrap the try_read_from_const_alloc (initializer alloc is always the size of the static's type)

Most of that refactor isn't, strictly speaking, necessary right now (other than making the code less fragile/error-prone), but it's a much cleaner solution than all the workarounds I had previously come up with, downstream of the soundness fix (and e.g. #348 + calling const_bitcast from pointercast in more cases).


The big soundness issue, however, was that read_from_const_alloc, for primitive (int/float/pointer) leaves, would call alloc.read_scalar(offset, ...), and treat Err(_) as "undef value at that location".

But a whole undef value is a very specific case, while the returned AllocError can indicate:

  • some bytes of the value are uninitialized
    • only if every single byte of the value is uninitialized, can the value be undef
  • some bytes of the value are pointer, and either:
    • a non-pointer (int/float) type is being read
    • a pointer type is being read, but the read range only partially covers the pointer
      (i.e. alloc has a pointer that starts just before/after offset, but not at offset exactly)

Unsoundness arises from spurious undef (OpUndef in SPIR-V) being misused instead of reporting an error, because it's designed to be ignored by optimizations (or even routine transformations like control-flow structurization), and treated like it can take on any value (i.e. it makes it UB to care about the exact value).

Even worse, Rust-GPU is prone to attempt to represent constant data as e.g. [u32; N], and if the alloc contains any pointers, reading them as u32 will result in Err(AllocError::ReadPointerAsInt(_)), and before this PR the pointers would silently be ignored and turned into uninitialized memory.

So the second commit in this PR actually handles the AllocError, and only uses a plain undef when all bytes are uninitialized, all other cases being errors - with the caveat that doing more work to produce the correct constant may be possible in some cases, but I haven't put too much effort into it.

For now, the one special-case is that it does try to turn "whole pointer attempted to be read as an usize" errors into ptr->int const_bitcasts (of the actual pointers) instead, which doesn't do much in terms of debuggability, just yet, but future work to improve const_bitcast does help here.

In theory, OpSpecConstantOp would let us represent e.g. only some bits being OpUndef/some pointer, by mixing constants using bitwise ops (e.g. (undef << 24) | ((ptr as u32) >> 8)), but it's more likely we'll first get more untyped constant data, than ever need this.

@eddyb eddyb enabled auto-merge July 29, 2025 19:59
Comment on lines -293 to +301
let value = if offset.bytes() == 0 {
base_addr
} else {
self.tcx
.dcx()
.fatal("Non-zero scalar_to_backend ptr.offset not supported")
// let offset = self.constant_bit64(ptr.offset.bytes());
// self.gep(base_addr, once(offset))
};
if let Primitive::Pointer(_) = layout.primitive() {
assert_ty_eq!(self, value.ty, ty);
value
} else {
self.tcx
.dcx()
.fatal("Non-pointer-typed scalar_to_backend Scalar::Ptr not supported");
// unsafe { llvm::LLVMConstPtrToInt(llval, llty) }
}
self.const_bitcast(self.const_ptr_byte_offset(base_addr, offset), ty)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This drive-by change is the minimal subset of backporting this (not yet landed) upstream PR:

I already have later changes to this code that bring it even closer to that version, but I only did this part for the special-case of "reading a Scalar::Ptr as an integer", because the old assert_ty_eq! would fail (while compiling libcore, IIRC?) even though the whole point is to end in a const_bitcast regardless of what ty is.

Copy link
Collaborator

@LegNeato LegNeato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More straightforward than the other ones! I agree, a result would be better, but I wouldn't block landing this on it.

/// returning that constant if its size covers the entirety of `alloc`.
//
// FIXME(eddyb) should this use something like `Result<_, PartialRead>`?
pub fn try_read_from_const_alloc(
Copy link
Collaborator

@LegNeato LegNeato Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this would be better with something like Result<(SpirvValue, Size), ReadConstError>

enum ReadConstError {
    PartialRead {
        read: Size,
        expected: Size,
    },
    Unsupported(&str),
    InvalidLayout(String),
    Zombie(String),
}

and having the other fns return it too, giving something like:

match read_from_const_alloc_at(...) {
   Ok((val, size)) if size == alloc.size() => Ok(val),
   Ok((_, size)) => Err(ReadConstError::PartialRead {
       read: size,
       expected: alloc.size(),
   }),
   Err(e) => Err(e),
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's good reasons for the way this works, for the two kinds of "errors":

  • individual errors read_from_const_alloc_at turns into "zombies"
    • these are like Rust diagnostics (but deferred in the "zombie" way)
    • eventually all of these will go away anyway (worst case showing up as qptr diagnostics)
    • Result is an anti-pattern for diagnostics, because it stops at the first error
      (which can be fine for leaves, but anything recursive runs into "error buffering" needs)
  • None potentially being returned by try_read_from_const_alloc
    • this isn't even an error, maybe I should've named it try_read_whole_const_alloc
    • the size check prevents the opportunistic replacement of &CONST_A w/ &CONST_B
      (if CONST_B is effectively some prefix of CONST_A, i.e. a truncation)
    • if we wanted to e.g. make const_fold_load more flexible, this wouldn't be used
      (instead, read_from_const_alloc_at would be called directly and always succeed)

@eddyb eddyb added this pull request to the merge queue Jul 30, 2025
Merged via the queue into Rust-GPU:main with commit cf59e54 Jul 30, 2025
13 checks passed
@eddyb eddyb deleted the fix-read_scalar-unsoundness branch July 30, 2025 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants