feat(executor, gw): error propagation #299

ardatan · 2025-07-28T12:15:35Z

Reverts #298

gemini-code-assist

Summary of Changes

Hello @ardatan, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request re-introduces previously reverted changes to enhance GraphQL error propagation. It focuses on improving the accuracy and context of errors returned from federated queries by normalizing error paths and enriching error details from downstream services, ensuring a more robust and user-friendly error handling experience.

Highlights

Re-introduction of Error Propagation: This pull request re-introduces and refines the error propagation logic within the query plan executor, specifically addressing how errors are handled and presented in federated GraphQL queries.
Enhanced Error Path Normalization: A new error_normalization module has been added to accurately map internal _entities error paths (used during federated entity fetches) to their corresponding user-facing GraphQL query paths, providing more meaningful error locations.
Improved Downstream Error Context: Errors originating from downstream subgraphs are now automatically enriched with code: "DOWNSTREAM_SERVICE_ERROR" and serviceName extensions, offering clearer insights into the source of the error.
Refactored Data Traversal and Merging: The core traverse_and_callback utility function has been updated to track and provide VecDeque<usize> (path indexes) during data traversal. This enables a more precise and direct merging of fetched data into the main result, replacing the previous index-based approach.
Comprehensive Testing: New test fixtures and an integration test have been added to thoroughly validate the end-to-end error propagation and normalization behavior in a federated GraphQL environment.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request re-introduces error propagation logic by reverting a previous revert. The core of the changes is to correctly propagate error paths from subgraph responses (especially for _entities queries) back to the client in the context of the original query. This is achieved by tracking index paths during data traversal and using this information to normalize error paths.

The implementation is mostly solid, but I've identified one critical bug that would lead to incorrect error paths, one high-severity risk of a panic, and a medium-severity opportunity for code simplification. Addressing these points will improve the robustness and maintainability of the new error propagation mechanism.

Once these issues are addressed, the changes will be in a good state to be merged.

gemini-code-assist · 2025-07-28T12:17:44Z

lib/query-plan-executor/src/lib.rs

+        let mut result = fetch_node
            .execute_for_projected_representations(
                execution_context,
                filtered_representations,
-                BTreeSet::new(),
+                vec![],
            )
            .await;


In FlattenNode::execute, execute_for_projected_representations is called with an empty vector (vec![]) for indexes_in_paths. This is a bug. These paths are essential for normalize_errors_for_representations to correctly map error paths from the subgraph back to the original query's structure.

Without passing the collected paths, any errors occurring in this fetch will have incorrect or missing path information in the final GraphQL response. The indexes_in_paths collected during the traverse_and_callback must be passed to this function.

let paths_for_fetch: Vec<VecDeque<usize>> = representations.iter().map(|(_, path)| path.clone()).collect(); let mut result = fetch_node .execute_for_projected_representations( execution_context, filtered_representations, paths_for_fetch, ) .await;

gemini-code-assist · 2025-07-28T12:17:44Z

lib/query-plan-executor/src/lib.rs

+                                for path_segment in path.iter() {
+                                    match path_segment {
+                                        FlattenNodePathSegment::List => {
+                                            let index = indexes_in_path.pop_front().unwrap();


Using .unwrap() here poses a risk of panicking the gateway if the invariant (that indexes_in_path has an element for each List segment in the path) is ever broken. While this might be an unlikely scenario, a panic in the gateway is a serious issue.

To make the code more robust and to aid in debugging if this invariant is ever violated, it would be better to use .expect() with a descriptive message. This clearly documents the assumption and provides more context in case of a panic.

Suggested change

let index = indexes_in_path.pop_front().unwrap();

let index = indexes_in_path.pop_front().expect("BUG: Mismatched path segments and indexes");

gemini-code-assist · 2025-07-28T12:17:44Z

lib/query-plan-executor/src/executors/map.rs

+                        match error.extensions {
+                            Some(ref mut extensions) => {
+                                if extensions.get("code").is_none() {
+                                    extensions.insert(
+                                        "code".to_string(),
+                                        serde_json::Value::String(
+                                            "DOWNSTREAM_SERVICE_ERROR".to_string(),
+                                        ),
+                                    );
+                                }
+                                if extensions.get("serviceName").is_none() {
+                                    extensions.insert(
+                                        "serviceName".to_string(),
+                                        serde_json::Value::String(subgraph_name.to_string()),
+                                    );
+                                }
+                            }
+                            None => {
+                                error.extensions = Some(HashMap::from([
+                                    (
+                                        "code".to_string(),
+                                        serde_json::Value::String(
+                                            "DOWNSTREAM_SERVICE_ERROR".to_string(),
+                                        ),
+                                    ),
+                                    (
+                                        "serviceName".to_string(),
+                                        serde_json::Value::String(subgraph_name.to_string()),
+                                    ),
+                                ]));
+                            }
+                        }


The logic for adding default extensions to GraphQL errors is a bit verbose and can be simplified using more idiomatic Rust. By using get_or_insert_with and the entry API on HashMap, you can make the code more concise and easier to read, avoiding the explicit match on error.extensions.

let extensions = error.extensions.get_or_insert_with(HashMap::new); extensions.entry("code".to_string()).or_insert( serde_json::Value::String("DOWNSTREAM_SERVICE_ERROR".to_string()), ); extensions .entry("serviceName".to_string()) .or_insert(serde_json::Value::String(subgraph_name.to_string()));

github-actions · 2025-07-28T12:18:43Z

✅ `k6-benchmark` results

     ✓ response code was 200
     ✓ no graphql errors
     ✓ valid response structure

     █ setup

     checks.........................: 100.00% ✓ 25566      ✗ 0   
     data_received..................: 751 MB  25 MB/s
     data_sent......................: 10 MB   332 kB/s
     http_req_blocked...............: avg=11.43µs  min=821ns   med=2.12µs   max=4.23ms   p(90)=2.85µs   p(95)=3.32µs  
     http_req_connecting............: avg=8.14µs   min=0s      med=0s       max=4.21ms   p(90)=0s       p(95)=0s      
     http_req_duration..............: avg=173.63ms min=3.61ms  med=173.07ms max=339.14ms p(90)=209.53ms p(95)=227.31ms
       { expected_response:true }...: avg=173.63ms min=3.61ms  med=173.07ms max=339.14ms p(90)=209.53ms p(95)=227.31ms
     http_req_failed................: 0.00%   ✓ 0          ✗ 8542
     http_req_receiving.............: avg=473.97µs min=31.41µs med=68.37µs  max=67.07ms  p(90)=1.56ms   p(95)=2.59ms  
     http_req_sending...............: avg=50.45µs  min=7.22µs  med=12.96µs  max=19.05ms  p(90)=21.99µs  p(95)=34.33µs 
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s       max=0s       p(90)=0s       p(95)=0s      
     http_req_waiting...............: avg=173.1ms  min=3.53ms  med=172.54ms max=339.06ms p(90)=208.74ms p(95)=226.69ms
     http_reqs......................: 8542    282.753347/s
     iteration_duration.............: avg=176.35ms min=40.77ms med=173.55ms max=659.19ms p(90)=210.13ms p(95)=228.97ms
     iterations.....................: 8522    282.091317/s
     vus............................: 50      min=50       max=50
     vus_max........................: 50      min=50       max=50

dotansimha · 2025-07-29T14:18:31Z

lib/query-plan-executor/src/lib.rs

-                        let normalized_path = flatten_node.path.as_slice();
                        filtered_representations.push(b'[');
+                        let mut number_of_indexes = 0;
+                        let normalized_path = flatten_node.path.as_slice();


let number_of_indexes = flatten_node.path.as_slice().iter().filter(|v| matches!(v, FlattenNodePathSegment::List)).count()

Hmm not sure if this is optimal enough. With this usage, it will create a new set of path segments then count it, no? But I avoid to create a new set/vec and count them directly with a single iteration.

dotansimha · 2025-07-29T14:20:02Z

lib/query-plan-executor/src/lib.rs

+                                            None,
+                                        )
+                                    }
+                                    .unwrap_or(false);


This unwrap_or hides an error inside?
I mean, if this fail do we need some printing to the log? what's the scenario where an error is returned from here, and why the error is not relevant and dropped in favor of a default value?

I added a warn! to print the error message when it happens. It shouldn't happen since we write to a string buffer but just in case it fails.

dotansimha · 2025-07-29T14:20:57Z

lib/query-plan-executor/src/lib.rs

-                                        index_of_entities += 1;
+                    ParallelJob::Flatten(mut result, path) => {
+                        if let Some(entities) = result.entities {
+                            'entity_loop: for (entity, indexes_in_path) in


confirming this part for now, but with the new refactor i wonder if this section of code can be more readable?

I left comments why they are different.

dotansimha · 2025-07-29T14:21:22Z

lib/query-plan-executor/src/lib.rs

        let requires_nodes = fetch_node.requires.as_ref().unwrap();
        filtered_representations.push(b'[');
        let mut first = true;
+        let normalized_path = self.path.as_slice();


Same suggestion as before, it can be simplified.

I left comments for the one above.

github-actions · 2025-07-29T15:06:01Z

Federation Audit Results

177 tests - 12 177 ✅ - 12 3s ⏱️ ±0s
41 suites - 1 0 💤 ± 0
42 files ± 0 0 ❌ ± 0
1 errors

For more details on these parsing errors, see this check.

Results for commit 1287c74. ± Comparison against base commit f594210.

This pull request removes 12 tests.

test ‑ union-intersection_0
test ‑ union-intersection_1
test ‑ union-intersection_10
test ‑ union-intersection_11
test ‑ union-intersection_2
test ‑ union-intersection_3
test ‑ union-intersection_4
test ‑ union-intersection_5
test ‑ union-intersection_6
test ‑ union-intersection_7
…

♻️ This comment has been updated with latest results.

dotansimha · 2025-08-17T15:49:02Z

Re-done in #339

ardatan requested a review from dotansimha July 28, 2025 12:15

gemini-code-assist bot reviewed Jul 28, 2025

View reviewed changes

dotansimha reviewed Jul 29, 2025

View reviewed changes

dotansimha changed the title ~~Revert "Revert "Error propagation""~~ feat(executor, gw): error propagation Jul 29, 2025

Error propagation

1287c74

ardatan force-pushed the revert-298-revert-243-error-propagation branch from e3909e7 to 1287c74 Compare August 1, 2025 11:15

ardatan mentioned this pull request Aug 14, 2025

feat(gw): error propagation #339

Open

dotansimha closed this Aug 17, 2025

dotansimha deleted the revert-298-revert-243-error-propagation branch September 1, 2025 12:41

	let index = indexes_in_path.pop_front().unwrap();
	let index = indexes_in_path.pop_front().expect("BUG: Mismatched path segments and indexes");

feat(executor, gw): error propagation #299

feat(executor, gw): error propagation #299

Uh oh!

Conversation

ardatan commented Jul 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ k6-benchmark results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Federation Audit Results

Uh oh!

dotansimha commented Aug 17, 2025

Uh oh!

Uh oh!

github-actions bot commented Jul 28, 2025 •

edited

Loading

✅ `k6-benchmark` results

github-actions bot commented Jul 29, 2025 •

edited

Loading