Skip to content

Conversation

ardatan
Copy link
Member

@ardatan ardatan commented Jul 28, 2025

Reverts #298

@ardatan ardatan requested a review from dotansimha July 28, 2025 12:15
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @ardatan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request re-introduces previously reverted changes to enhance GraphQL error propagation. It focuses on improving the accuracy and context of errors returned from federated queries by normalizing error paths and enriching error details from downstream services, ensuring a more robust and user-friendly error handling experience.

Highlights

  • Re-introduction of Error Propagation: This pull request re-introduces and refines the error propagation logic within the query plan executor, specifically addressing how errors are handled and presented in federated GraphQL queries.
  • Enhanced Error Path Normalization: A new error_normalization module has been added to accurately map internal _entities error paths (used during federated entity fetches) to their corresponding user-facing GraphQL query paths, providing more meaningful error locations.
  • Improved Downstream Error Context: Errors originating from downstream subgraphs are now automatically enriched with code: "DOWNSTREAM_SERVICE_ERROR" and serviceName extensions, offering clearer insights into the source of the error.
  • Refactored Data Traversal and Merging: The core traverse_and_callback utility function has been updated to track and provide VecDeque<usize> (path indexes) during data traversal. This enables a more precise and direct merging of fetched data into the main result, replacing the previous index-based approach.
  • Comprehensive Testing: New test fixtures and an integration test have been added to thoroughly validate the end-to-end error propagation and normalization behavior in a federated GraphQL environment.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request re-introduces error propagation logic by reverting a previous revert. The core of the changes is to correctly propagate error paths from subgraph responses (especially for _entities queries) back to the client in the context of the original query. This is achieved by tracking index paths during data traversal and using this information to normalize error paths.

The implementation is mostly solid, but I've identified one critical bug that would lead to incorrect error paths, one high-severity risk of a panic, and a medium-severity opportunity for code simplification. Addressing these points will improve the robustness and maintainability of the new error propagation mechanism.

Once these issues are addressed, the changes will be in a good state to be merged.

Comment on lines +758 to 778
let mut result = fetch_node
.execute_for_projected_representations(
execution_context,
filtered_representations,
BTreeSet::new(),
vec![],
)
.await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

In FlattenNode::execute, execute_for_projected_representations is called with an empty vector (vec![]) for indexes_in_paths. This is a bug. These paths are essential for normalize_errors_for_representations to correctly map error paths from the subgraph back to the original query's structure.

Without passing the collected paths, any errors occurring in this fetch will have incorrect or missing path information in the final GraphQL response. The indexes_in_paths collected during the traverse_and_callback must be passed to this function.

let paths_for_fetch: Vec<VecDeque<usize>> =
            representations.iter().map(|(_, path)| path.clone()).collect();
        let mut result = fetch_node
            .execute_for_projected_representations(
                execution_context,
                filtered_representations,
                paths_for_fetch,
            )
            .await;

for path_segment in path.iter() {
match path_segment {
FlattenNodePathSegment::List => {
let index = indexes_in_path.pop_front().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using .unwrap() here poses a risk of panicking the gateway if the invariant (that indexes_in_path has an element for each List segment in the path) is ever broken. While this might be an unlikely scenario, a panic in the gateway is a serious issue.

To make the code more robust and to aid in debugging if this invariant is ever violated, it would be better to use .expect() with a descriptive message. This clearly documents the assumption and provides more context in case of a panic.

Suggested change
let index = indexes_in_path.pop_front().unwrap();
let index = indexes_in_path.pop_front().expect("BUG: Mismatched path segments and indexes");

Comment on lines +42 to +73
match error.extensions {
Some(ref mut extensions) => {
if extensions.get("code").is_none() {
extensions.insert(
"code".to_string(),
serde_json::Value::String(
"DOWNSTREAM_SERVICE_ERROR".to_string(),
),
);
}
if extensions.get("serviceName").is_none() {
extensions.insert(
"serviceName".to_string(),
serde_json::Value::String(subgraph_name.to_string()),
);
}
}
None => {
error.extensions = Some(HashMap::from([
(
"code".to_string(),
serde_json::Value::String(
"DOWNSTREAM_SERVICE_ERROR".to_string(),
),
),
(
"serviceName".to_string(),
serde_json::Value::String(subgraph_name.to_string()),
),
]));
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for adding default extensions to GraphQL errors is a bit verbose and can be simplified using more idiomatic Rust. By using get_or_insert_with and the entry API on HashMap, you can make the code more concise and easier to read, avoiding the explicit match on error.extensions.

let extensions = error.extensions.get_or_insert_with(HashMap::new);
                        extensions.entry("code".to_string()).or_insert(
                            serde_json::Value::String("DOWNSTREAM_SERVICE_ERROR".to_string()),
                        );
                        extensions
                            .entry("serviceName".to_string())
                            .or_insert(serde_json::Value::String(subgraph_name.to_string()));

Copy link

github-actions bot commented Jul 28, 2025

k6-benchmark results

     ✓ response code was 200
     ✓ no graphql errors
     ✓ valid response structure

     █ setup

     checks.........................: 100.00% ✓ 25566      ✗ 0   
     data_received..................: 751 MB  25 MB/s
     data_sent......................: 10 MB   332 kB/s
     http_req_blocked...............: avg=11.43µs  min=821ns   med=2.12µs   max=4.23ms   p(90)=2.85µs   p(95)=3.32µs  
     http_req_connecting............: avg=8.14µs   min=0s      med=0s       max=4.21ms   p(90)=0s       p(95)=0s      
     http_req_duration..............: avg=173.63ms min=3.61ms  med=173.07ms max=339.14ms p(90)=209.53ms p(95)=227.31ms
       { expected_response:true }...: avg=173.63ms min=3.61ms  med=173.07ms max=339.14ms p(90)=209.53ms p(95)=227.31ms
     http_req_failed................: 0.00%   ✓ 0          ✗ 8542
     http_req_receiving.............: avg=473.97µs min=31.41µs med=68.37µs  max=67.07ms  p(90)=1.56ms   p(95)=2.59ms  
     http_req_sending...............: avg=50.45µs  min=7.22µs  med=12.96µs  max=19.05ms  p(90)=21.99µs  p(95)=34.33µs 
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s       max=0s       p(90)=0s       p(95)=0s      
     http_req_waiting...............: avg=173.1ms  min=3.53ms  med=172.54ms max=339.06ms p(90)=208.74ms p(95)=226.69ms
     http_reqs......................: 8542    282.753347/s
     iteration_duration.............: avg=176.35ms min=40.77ms med=173.55ms max=659.19ms p(90)=210.13ms p(95)=228.97ms
     iterations.....................: 8522    282.091317/s
     vus............................: 50      min=50       max=50
     vus_max........................: 50      min=50       max=50

let normalized_path = flatten_node.path.as_slice();
filtered_representations.push(b'[');
let mut number_of_indexes = 0;
let normalized_path = flatten_node.path.as_slice();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let number_of_indexes = flatten_node.path.as_slice().iter().filter(|v| matches!(v, FlattenNodePathSegment::List)).count()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure if this is optimal enough. With this usage, it will create a new set of path segments then count it, no? But I avoid to create a new set/vec and count them directly with a single iteration.

None,
)
}
.unwrap_or(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unwrap_or hides an error inside?
I mean, if this fail do we need some printing to the log? what's the scenario where an error is returned from here, and why the error is not relevant and dropped in favor of a default value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a warn! to print the error message when it happens. It shouldn't happen since we write to a string buffer but just in case it fails.

index_of_entities += 1;
ParallelJob::Flatten(mut result, path) => {
if let Some(entities) = result.entities {
'entity_loop: for (entity, indexes_in_path) in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirming this part for now, but with the new refactor i wonder if this section of code can be more readable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left comments why they are different.

let requires_nodes = fetch_node.requires.as_ref().unwrap();
filtered_representations.push(b'[');
let mut first = true;
let normalized_path = self.path.as_slice();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion as before, it can be simplified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left comments for the one above.

@dotansimha dotansimha changed the title Revert "Revert "Error propagation"" feat(executor, gw): error propagation Jul 29, 2025
Copy link

github-actions bot commented Jul 29, 2025

Federation Audit Results

177 tests   - 12   177 ✅  - 12   3s ⏱️ ±0s
 41 suites  -  1     0 💤 ± 0 
 42 files   ± 0     0 ❌ ± 0 
  1 errors

For more details on these parsing errors, see this check.

Results for commit 1287c74. ± Comparison against base commit f594210.

This pull request removes 12 tests.
test ‑ union-intersection_0
test ‑ union-intersection_1
test ‑ union-intersection_10
test ‑ union-intersection_11
test ‑ union-intersection_2
test ‑ union-intersection_3
test ‑ union-intersection_4
test ‑ union-intersection_5
test ‑ union-intersection_6
test ‑ union-intersection_7
…

♻️ This comment has been updated with latest results.

@ardatan ardatan force-pushed the revert-298-revert-243-error-propagation branch from e3909e7 to 1287c74 Compare August 1, 2025 11:15
@dotansimha
Copy link
Member

Re-done in #339

@dotansimha dotansimha closed this Aug 17, 2025
@dotansimha dotansimha deleted the revert-298-revert-243-error-propagation branch September 1, 2025 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants