Skip to content

Conversation

@daniel-noland
Copy link
Collaborator

@daniel-noland daniel-noland commented Nov 22, 2025

Partially addresses #1051

This is somewhat messy due to the integration push. I intend to clean it up, but I think it should be merged as soon as possible in order to avoid bit rot on the dpdk integration which follows this.

@daniel-noland daniel-noland force-pushed the pr/daniel-noland/init-handoff branch 4 times, most recently from 2357024 to 67620ec Compare November 22, 2025 06:09
@daniel-noland daniel-noland marked this pull request as ready for review November 22, 2025 06:16
@daniel-noland daniel-noland requested a review from a team as a code owner November 22, 2025 06:16
@daniel-noland daniel-noland requested review from Fredi-raspall and removed request for a team November 22, 2025 06:16
@daniel-noland daniel-noland self-assigned this Nov 22, 2025
@daniel-noland daniel-noland added Integration Branch that should NOT be merged but kept stable as other branches depend on it. and removed Integration Branch that should NOT be merged but kept stable as other branches depend on it. labels Nov 22, 2025
@daniel-noland daniel-noland added this to the GW R1 milestone Nov 22, 2025
@daniel-noland daniel-noland linked an issue Nov 22, 2025 that may be closed by this pull request
@daniel-noland daniel-noland added the enhancement New feature or request label Nov 22, 2025
Copilot finished reviewing on behalf of daniel-noland November 22, 2025 06:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements an "init handoff" mechanism to transfer configuration from the dataplane-init process to the dataplane worker process using sealed memory file descriptors (memfd). The changes introduce comprehensive argument parsing, hardware device scanning, and zero-copy configuration serialization using rkyv.

Key changes:

  • Complete rewrite of init/src/main.rs to handle device detection, VFIO binding, and configuration handoff
  • Extensive documentation and type refinement in args/src/lib.rs including NetworkDeviceDescription, LaunchConfiguration, and FinalizedMemFile
  • Addition of ProfilingConfigSection for Pyroscope integration

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
init/src/main.rs Implements hardware scanning, device validation, VFIO binding, and configuration serialization/handoff to dataplane worker
init/Cargo.toml Adds dependencies for args parsing, command-fds, miette, rkyv, serde, and tracing
args/src/lib.rs Adds extensive documentation, renames PortArg to NetworkDeviceDescription, adds ProfilingConfigSection, refines configuration types
dataplane/Cargo.toml Adds rkyv dependency for configuration deserialization
Dockerfile Changes entrypoint from /dataplane to /dataplane-init and copies artifact to both paths
Cargo.lock Updates various dependency versions (clap, cc, hashbrown, indexmap, etc.)

)]
#[rkyv(attr(derive(PartialEq, Eq, Debug)))]
pub struct ProfilingConfigSection {
/// The URL of the pryroscope url
Copy link

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'pryroscope' to 'pyroscope'.

Suggested change
/// The URL of the pryroscope url
/// The URL of the pyroscope server

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fixed

pub tracing: TracingConfigSection,
/// Metrics collection configuration
pub metrics: MetricsConfigSection,
/// Profileing collection configuration
Copy link

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'Profileing' to 'Profiling'.

Suggested change
/// Profileing collection configuration
/// Profiling collection configuration

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fixed

Comment on lines 752 to 673
#[derive(
Debug, Clone, PartialEq, Eq, serde::Serialize, rkyv::Serialize, rkyv::Deserialize, rkyv::Archive,
)]
Copy link

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProfilingConfigSection is missing the CheckBytes derive macro. Since it's included in LaunchConfiguration (line 749) which derives CheckBytes, all nested types must also derive CheckBytes for rkyv validation to work correctly. Add CheckBytes to the derive list and add the appropriate use bytecheck::CheckBytes; if not already present.

Copilot uses AI. Check for mistakes.
Dockerfile Outdated
Comment on lines 5 to 6
COPY --link --chown=0:0 "${ARTIFACT}" /dataplane
COPY --link --chown=0:0 "${ARTIFACT}" /dataplane-init
Copy link

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Dockerfile copies the same ${ARTIFACT} to both /dataplane (line 5) and /dataplane-init (line 6). Based on the context, it appears /dataplane-init should be a separate binary built from the init crate. The build argument should likely be ${ARTIFACT_INIT} or similar to distinguish between the init and dataplane binaries.

Suggested change
COPY --link --chown=0:0 "${ARTIFACT}" /dataplane
COPY --link --chown=0:0 "${ARTIFACT}" /dataplane-init
ARG ARTIFACT_INIT
COPY --link --chown=0:0 "${ARTIFACT}" /dataplane
COPY --link --chown=0:0 "${ARTIFACT_INIT}" /dataplane-init

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needed to be punted to the next PR (where it will be fixed)

@daniel-noland daniel-noland force-pushed the pr/daniel-noland/init-handoff branch 3 times, most recently from f6a5cf8 to 33ede97 Compare November 22, 2025 07:25
init/src/main.rs Outdated

let launch_config = launch_config.to_owned_fd();

let io_err = std::process::Command::new("/bin/dataplane")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the path to the dataplane should be configurable via cmd line of init itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? It is fixed in the container?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The location of dataplane will be fix in image, yes. Both things (init and dataplane) will be part of the image, so it would be okay to hardcode it. OTH, making it configurable just gives you more flexibility and makes things less implicit. But we can address this later if needed. We just need to make sure that the image builder ensures that dataplane is at /bin/dataplane (or change the init to just "dataplane" and make sure /bin is in the PATH?)

Copy link
Contributor

@Fredi-raspall Fredi-raspall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me Daniel. Thanks!
Please address the small typos detected by copilot.

Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split your first commit. Description says “update documentation”, but it does also a number of other modifications to the code. They look minor, but the size of the commit makes it hard to follow what it does.

I think we should also consider mentioning when some code or docs is AI-generated (it looks like some of you docs are).

Comment on lines +736 to +648
/// Number of dataplane worker threads / cores
pub dataplane_workers: usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be used in this commit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to fill out the file. Not consume the file. That is for another PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, in that case it would be worth mentioning it in the commit description, in my opinion. At the moment the title says docs(args): basic docs for LaunchConfiguration, which implies we only add docs in that commit, and it's not clear at all why we also add new fields to the struct.

/// Metrics collection configuration
pub metrics: MetricsConfigSection,
/// Profileing collection configuration
pub profiling: ProfilingConfigSection,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I see where you set the field but I can't find where you read it. I may have missed it, which is also a good indicator this commit contains too many things.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't read it in this PR. The goal is to fill out the file, not consume it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same discussion as above #1072 (comment)

@qmonnet qmonnet modified the milestones: GW R1, GW R2 Nov 24, 2025
@qmonnet
Copy link
Member

qmonnet commented Nov 24, 2025

Now that we've done GW R1, I updated target milestone to GW R2.

@daniel-noland daniel-noland force-pushed the pr/daniel-noland/init-handoff branch from 33ede97 to 4d1cab8 Compare November 25, 2025 01:25
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
New config file section to set up and manage pyroscope.

Signed-off-by: Daniel Noland <daniel@githedgehog.com>
This allows a child process to inherit configuration from
a parent process.

Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Fixes for clippy pedantic and general style.

Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
This commit series introduces new functionality which broke the prior tests.
This commit amends the tests to account for that.

Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
@daniel-noland daniel-noland marked this pull request as draft November 25, 2025 04:30
auto-merge was automatically disabled November 25, 2025 04:30

Pull request was converted to draft

@daniel-noland daniel-noland force-pushed the pr/daniel-noland/init-handoff branch from f3ecb35 to 3018f63 Compare November 25, 2025 04:36
@daniel-noland daniel-noland changed the title init handoff IMPOSSIBLE TO BREAK DPDK INTO TWO STEPS Nov 25, 2025
pub fn kernel_num_workers(&self) -> usize {
self.num_workers.into()
}
// backwards-compatible, to deprecate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably keep that comment along with the new one?

Comment on lines -416 to -431
pub type KiB = NonZero<u64>;

#[derive(
Debug,
Default,
serde::Serialize,
serde::Deserialize,
rkyv::Serialize,
rkyv::Deserialize,
rkyv::Archive,
)]
pub enum WorkerStackSize {
#[default]
Default,
Size(KiB),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit docs(args): basic docs for DpdkDriverConfigSection - it's not clear at all why this is going away.

@qmonnet
Copy link
Member

qmonnet commented Nov 25, 2025

Thank you for splitting the first commit. I didn't ask for so many commits, it feels a bit like we went from one extreme to the other 😅. I think most docs commits could remain together, as long as they only add docs. But OK, given the choice between the two options I much prefer the 30+ commits than the big one with all the clean-up mixed together. Thanks for this work!

IMPOSSIBLE TO BREAK DPDK INTO TWO STEPS

Noted. Can we come back to a meaningful PR name, maybe? 🙂

@daniel-noland daniel-noland changed the title IMPOSSIBLE TO BREAK DPDK INTO TWO STEPS DPDK attempt 2 Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ordered recovery of dpdk integration work

4 participants