feat(path): Add probabilistic edge generation with p parameter#90
feat(path): Add probabilistic edge generation with p parameter#90niklas-uhl wants to merge 2 commits intomainfrom
Conversation
niklas-uhl
commented
Feb 3, 2026
- Introduce p parameter for directed path generator, defaulting p=1.0
- Introduce p parameter for directed path generator, defaulting p=1.0
There was a problem hiding this comment.
Pull request overview
Adds support for probabilistic edge generation in the directed path generator via a new edge probability parameter p, intended to default to 1.0 when unspecified.
Changes:
- Extend
PathDirectedto sample each potential path edge with probabilitypusing hash-based seeding. - Add
-p/--probCLI option for thepath-directedsubcommand. - Update existing path generator tests to set
p=1.0explicitly.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| tests/path/general_path_generator_test.cpp | Sets config.p = 1.0 so existing path tests continue to generate full paths. |
| kagen/generators/path/path_directed.h | Adds RNGWrapper member needed for probabilistic edge sampling. |
| kagen/generators/path/path_directed.cpp | Implements per-edge probability sampling and normalizes p. |
| app/KaGen.cpp | Exposes p as a CLI parameter for path-directed. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (config.p == 0.0) { | ||
| config.p = 1.0; | ||
| } | ||
| if (config.p > 1.0) { | ||
| throw ConfigurationError("edge probability p must be in [0, 1]"); | ||
| } |
There was a problem hiding this comment.
NormalizeParameters() forces p==0.0 to 1.0 and only checks the upper bound. This makes it impossible to request a true p=0 (even though the error message claims [0, 1]) and allows negative/NaN values to flow into std::binomial_distribution, which has undefined behavior for out-of-range probabilities. Consider (a) validating p with std::isfinite(p) and 0.0 <= p && p <= 1.0, and (b) setting the default to 1.0 at the CLI layer (or via a dedicated “unset” sentinel) instead of rewriting p==0.0.
| if (is_valid) { | ||
| PushEdge(i, j); | ||
| SInt edge_seed = std::min(i, j) *config_.n + std::max(i, j); | ||
| SInt h = sampling::Spooky::hash(config_.seed + edge_seed); | ||
| if (rng_.GenerateBinomial(h, 1, config_.p)) { | ||
| PushEdge(i, j); | ||
| } |
There was a problem hiding this comment.
GenerateEdgeList() uses config_.p directly when deciding whether to emit each edge. When PathDirected is constructed directly (as in these tests), NormalizeParameters() is not invoked, so the default config.p=0.0 currently means “generate no edges” rather than the PR’s stated default of p=1.0. To preserve the intended default consistently, consider deriving an internal edge_probability_ (e.g., p==0 ? 1 : p) and validating it in the constructor/GenerateEdgeList instead of relying on factory normalization.
| auto* params = cmd->add_option_group("Parameters"); | ||
| add_option_n(params); | ||
| params->require_option(1); | ||
| add_option_p(params); |
There was a problem hiding this comment.
The CLI option -p/--prob is added for path-directed, but the help/default shown to users will still reflect config.p’s initial value (0.0), while runtime behavior normalizes it to 1.0. Consider setting the CLI default explicitly (e.g., default value 1.0 for this subcommand) so --help matches the documented “defaulting p=1.0” behavior and so p=0 can remain a valid, explicit input.
| add_option_p(params); | |
| params | |
| ->add_option( | |
| "-p,--prob", config.p, | |
| "Edge probability p for the directed path (defaults to p=1.0 for this generator).") | |
| ->default_val(1.0); |
| PGeneratorConfig config; | ||
| config.n = GetParam(); | ||
| config.p = 1.0; | ||
|
|
||
| PathDirected generator(config, rank, size); |
There was a problem hiding this comment.
Tests were updated to set config.p = 1.0, but the new probabilistic behavior itself isn’t exercised. To cover the feature, add assertions for at least p=0.0 (no edges) and one intermediate probability with a fixed seed (deterministic subset of edges), so regressions in the hashing/sampling logic are caught.