Skip to content

Conversation

@Jenya705
Copy link
Contributor

@Jenya705 Jenya705 commented Nov 30, 2025

Objective

Enables accessing slices from tables directly via Queries.

This pr is a draft to get feedback on the design.

Fixes: #21861

Solution

One new trait:

  • ContiguousQueryData allows to fetch all values from tables all at once (an implementation for &T returns a slice of components in the set table, for &mut T returns a mutable slice of components in the set table as well as a struct with methods to set update ticks (to match the fetch implementation))

And a method as_contiguous_iter in QueryIter making possible to iterate using these traits.

Testing

  • sparse_set_contiguous_query test verifies that you can't use next_contiguous with sparse set components
  • test_contiguous_query_data test verifies that returned values are valid
  • base_contiguous benchmark (file is named iter_simple_contiguous.rs)
  • base_no_detection benchmark (file is named iter_simple_no_detection.rs)
  • base_no_detection_contiguous benchmark (file is named iter_simple_no_detection_contiguous.rs)
  • base_contiguous_avx2 benchmark (file is named iter_simple_contiguous_avx2.rs)

Showcase

Example

let mut world = World::new();
let mut query = world.query::<(&Velocity, &mut Position)>();
let mut iter = query.iter_mut(&mut world);
// velocity's type is &[Velocity]
// position's type is &mut [Position]
// ticks's type is ContiguousComponentTicks
for (velocity, (position, mut ticks)) in iter.as_contiguous_iter().unwrap() {
    for (v, p) in velocity.iter().zip(position.iter_mut()) {
        p.0 += v.0;
    }
    // sets ticks
    ticks.mark_all_as_updated();
}

Benchmarks

Code for base benchmark:

#[derive(Component, Copy, Clone)]
struct Transform(Mat4);

#[derive(Component, Copy, Clone)]
struct Position(Vec3);

#[derive(Component, Copy, Clone)]
struct Rotation(Vec3);

#[derive(Component, Copy, Clone)]
struct Velocity(Vec3);

pub struct Benchmark<'w>(World, QueryState<(&'w Velocity, &'w mut Position)>);

impl<'w> Benchmark<'w> {
    pub fn new() -> Self {
        let mut world = World::new();

        world.spawn_batch(core::iter::repeat_n(
            (
                Transform(Mat4::from_scale(Vec3::ONE)),
                Position(Vec3::X),
                Rotation(Vec3::X),
                Velocity(Vec3::X),
            ),
            10_000,
        ));

        let query = world.query::<(&Velocity, &mut Position)>();
        Self(world, query)
    }

    #[inline(never)]
    pub fn run(&mut self) {
        for (velocity, mut position) in self.1.iter_mut(&mut self.0) {
            position.0 += velocity.0;
        }
    }
}

Iterating over 10000 entities from one table and increasing a 3-dimensional vector from component Position by a 3-dimensional vector from component Velocity

Name Time Time (AVX2) Description
base 5.5828 µs 5.5122 µs Iteration over components
base_contiguous 4.8825 µs 1.8665 µs Iteration over contiguous chunks
base_contiguous_avx2 2.0740 µs 1.8665 µs Iteration over contiguous chunks with enforced avx2 optimizations
base_no_detection 4.8065 µs 4.7723 µs Iteration over components while bypassing change detection through bypass_change_detection() method
base_no_detection_contiguous 4.3979 µs 1.5797 µs Iteration over components without registering update ticks

Using contiguous 'iterator' makes the program a little bit faster and it can be further vectorized to make it even faster

Things to think about

  • The neediness of offset parameter in ContiguousQueryData
  • If it is not needed, won't it be more efficient to introduce an update tick value setting which lazily propagates to all other entities in the same table

@Jenya705 Jenya705 marked this pull request as draft November 30, 2025 15:04
@Jondolf Jondolf added C-Feature A new feature, making something new possible A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward D-Complex Quite challenging from either a design or technical perspective. Ask for help! D-Unsafe Touches with unsafe code in some way labels Nov 30, 2025
@hymm
Copy link
Contributor

hymm commented Dec 1, 2025

/// - The result of [`ContiguousQueryFilter::filter_fetch_contiguous`] must be the same as
/// The value returned by every call of [`QueryFilter::filter_fetch`] on the same table for every entity
/// (i.e., the value depends on the table not an entity)
pub unsafe trait ContiguousQueryFilter: QueryFilter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? Is there a reason we can't use QueryFilter<IsArchetypal = true>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do solely IsArchetypal = true then filter may exclude only some entities from a table (because a table can have many archetypes 'attached' to it), which wouldn't allow us to return a whole slice (effectively rendering the whole thing obsolete).

If we do solely IsDense = true and IsArchetypal = true then there might be a case when QueryFilter::filter_fetch returns false for some tables (which it doesn't have to exclude via other methods) and if we don't have a counter part (ContiguousQueryFilter::filter_fetch_contiguous) then the results of the contiguous fetch and the non-contiguous might be different.

I am not fully sure though whether this kind of query filter should exist in the first place. And I am pretty certain rust doesn't support QueryFilter<IsArchetypal = true> generic, so having ContiguousQueryFilter (at least without any additional methods like by ReadOnlyQueryFilter) is necessary in my opinion.

Copy link
Contributor Author

@Jenya705 Jenya705 Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so basically because IS_ARCHETYPAL=true requires for an implementer to make QueryFilter::filter_fetch always return true (which makes ContiguousQueryFilter::filter_fetch_contiguous meaningless) and because we check whether the query is dense, it effectively means that ContiguousQueryFilter is the same as ArchetypeFilter (It excludes archetypes and because the query is dense, more precisely it excludes tables, which was ContiguousQueryFilter's meaning), thus I removed ContiguousQueryFilter in favor of ArchetypeFilter.

@Jenya705
Copy link
Contributor Author

Jenya705 commented Dec 1, 2025

This pr just enables slices from tables to be returned directly when applicable, it doesn't implement any batches and it doesn't ensure any specific (other than rust's) alignment (yet these slices may be used to apply simd).

  • Am I right in my understanding that some things might not properly vectorize due to alignment issues even if they use as_contiguous_iter?

This pr doesn't deal with any alignments but (as of my understanding) you can always take sub-slices which would meet your alignment requirements. And just referring to the issue #21861, even without any specific alignment the code gets vectorized.

No, the returned slices do not have any specific (other than rust's) alignment requirements.

@chengts95
Copy link

The solution looks promising to solve issue #21861.

If you want to use SIMD instructions explicitly, alignment is something you usually have to manage yourself (with an aligned allocator or a peeled prologue). Auto-vectorization won’t “update” the alignment for you – it just uses whatever alignment it can prove and otherwise emits unaligned loads. From that perspective, a contiguous slice is already sufficient; fully aligned SIMD is a separate concern on top of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ECS Entities, components, systems, and events C-Feature A new feature, making something new possible C-Performance A change motivated by improving speed, memory usage or compile times D-Complex Quite challenging from either a design or technical perspective. Ask for help! D-Unsafe Touches with unsafe code in some way S-Needs-Review Needs reviewer attention (from anyone!) to move forward

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Raw table iteration to improve query iteration speed by bypassing change ticks

4 participants