Skip to content

Conversation

kamilkisiela
Copy link
Contributor

  • Refactors the request fingerprinting logic to use a function rather than a struct with Hash and PartialEq traits. This simplifies the code and improves performance. It also adds header normalization in config to handle case-insensitivity.
  • Passes upstream headers to subgraph executors
  • Log to stdout in dev cli

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @kamilkisiela, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the request fingerprinting mechanism within the router, moving from a struct-based approach to a more efficient functional implementation. This change, coupled with the introduction of header normalization for case-insensitivity, enhances the accuracy and performance of request deduplication. Furthermore, it enables the propagation of upstream headers to subgraph executors, providing richer context for downstream services. A minor but useful change also directs development CLI logs to stdout for improved debugging.

Highlights

  • Refactored Request Fingerprinting: The logic for generating request fingerprints has been transformed from a RequestFingerprint struct with Hash and PartialEq traits into a standalone function, request_fingerprint, improving code simplicity and performance.
  • Header Normalization for Fingerprinting: The configuration for dedupe_fingerprint_headers now includes a deserialization step that normalizes header names to lowercase, ensuring case-insensitive matching during fingerprint generation.
  • Upstream Header Propagation: Upstream HTTP headers are now explicitly passed to subgraph executors, allowing for more comprehensive request fingerprinting and potentially enabling more advanced routing or caching strategies based on these headers.
  • Dev CLI Logging to Stdout: The development command-line interface has been updated to direct its tracing logs to std::io::stdout, making it easier to monitor output during development.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the request fingerprinting logic, moving from a struct to a function, passes upstream headers to subgraph executors, and configures logging to stdout for the dev CLI. The changes simplify the code and correctly plumb the upstream headers. However, I've found a regression in the new fingerprinting logic that makes header name matching case-sensitive when fingerprinting all headers, which could break request deduplication. My review includes a specific comment with a suggested fix for this issue.

Copy link

github-actions bot commented Sep 15, 2025

k6-benchmark results

     ✓ response code was 200
     ✓ no graphql errors
     ✓ valid response structure

     █ setup

     checks.........................: 100.00% ✓ 236025      ✗ 0    
     data_received..................: 6.9 GB  230 MB/s
     data_sent......................: 92 MB   3.1 MB/s
     http_req_blocked...............: avg=2.8µs    min=631ns   med=1.74µs  max=3.44ms   p(90)=2.48µs  p(95)=2.87µs  
     http_req_connecting............: avg=82ns     min=0s      med=0s      max=512.75µs p(90)=0s      p(95)=0s      
     http_req_duration..............: avg=18.62ms  min=1.88ms  med=17.7ms  max=166.67ms p(90)=25.65ms p(95)=28.68ms 
       { expected_response:true }...: avg=18.62ms  min=1.88ms  med=17.7ms  max=166.67ms p(90)=25.65ms p(95)=28.68ms 
     http_req_failed................: 0.00%   ✓ 0           ✗ 78695
     http_req_receiving.............: avg=130.85µs min=24.04µs med=39.39µs max=118.52ms p(90)=90.22µs p(95)=371.06µs
     http_req_sending...............: avg=24.97µs  min=5.33µs  med=10.41µs max=22.11ms  p(90)=16.22µs p(95)=30.15µs 
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s      max=0s       p(90)=0s      p(95)=0s      
     http_req_waiting...............: avg=18.46ms  min=1.83ms  med=17.57ms max=157.54ms p(90)=25.38ms p(95)=28.38ms 
     http_reqs......................: 78695   2618.071665/s
     iteration_duration.............: avg=19.05ms  min=4.54ms  med=18.04ms max=276.24ms p(90)=26.1ms  p(95)=29.18ms 
     iterations.....................: 78675   2617.406293/s
     vus............................: 50      min=50        max=50 
     vus_max........................: 50      min=50        max=50 

Copy link

github-actions bot commented Sep 15, 2025

🐋 This PR was built and pushed to the following Docker images:

Image Names: ghcr.io/graphql-hive/router

Platforms: linux/amd64,linux/arm64

Image Tags: ghcr.io/graphql-hive/router:pr-422 ghcr.io/graphql-hive/router:sha-1cb1697

Docker metadata
{
"buildx.build.ref": "builder-1a6dd710-83ee-45b5-87d3-3438cdce0199/builder-1a6dd710-83ee-45b5-87d3-3438cdce01990/s3xwy6mgatbxngzf3o1op3uxc",
"containerimage.descriptor": {
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "digest": "sha256:857727dc58ccc65606d1804b21c2c4ab77e622e2f54b67fd94845555a76cd595",
  "size": 1609
},
"containerimage.digest": "sha256:857727dc58ccc65606d1804b21c2c4ab77e622e2f54b67fd94845555a76cd595",
"image.name": "ghcr.io/graphql-hive/router:pr-422,ghcr.io/graphql-hive/router:sha-1cb1697"
}

}
for (key, value) in upstream_headers.iter() {
if let Ok(value_str) = value.to_str() {
headers.insert(key.as_str(), value_str);
Copy link
Member

@ardatan ardatan Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we respect the gateway request headers in the fingerprint?
If a subgraph request doesn't use anything from the gateway request, then they can be deduplicated safely. So I think we should only respect subgraph request's headers. A subgraph request that doesn't use anything from GW request can be shared with other GW request that doesn't have a subgraph request uses the GW request headers no?

GW Request from A client -> Subgraph request w/ nothing propagated from A but the body etc are the same with the one below
GW Request from B client -> Subgraph request w/ nothing propagated from B but the body etc are the same with one above

And those subgraph requests can be deduplicated. But if we add GW request headers to the fingerprint then it won't be possible to dedupe these two.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the gateway receives a request with Authorization: Bearer xyz, but in the request to the subgraph, you chose to send this information in some other form, like through the extensions or something else?

Copy link
Member

@ardatan ardatan Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is in the extensions then it changes the request body which is also included in the fingerprint, no?
So the fingerprint should only respect what is sent to the subgraph. If it has anything specific for that GW request, then it will already change the subgraph request headers, body or even the endpoint, so not sure if having GW request headers or anything GW request specific in the fingerprint is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but what when you want to deduplicate based on a custom header that is not sent to the subgraph? Like a feature flag or whatever. Subgraph calls will be the same, but for some reason you want to execute those anyway.

If I follow your example, then why we even need to selectively pick headers, and not just hash the whole thing?

Deduplicating purely on a subgraph call and rely only on that, means users have no control over what is deduplicated and what is not based on the upstream call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we can have a configuration option to modify the way of creating the fingerprint instead of adding all of the headers from the GW request to the fingerprint. By default, it can use subgraph request details then we can make the fingerprint elements configurable by the user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An option to specifically pass the gw headers? Come on :) Passing them by default has 0 cost.
How can you make the elements configurable? For extensions for example. We would introduce some weird syntax for no reason.

Copy link
Member

@ardatan ardatan Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the fingerprint is the hash of = request body + headers, then why we need a weird syntax for extensions? Extensions are part of the body. If the subgraph request has anything from GW request headers, then it will change the body or the headers, so the fingerprint will be different. If the user needs to send that query without deduplication then it can be disabled by them or fingerprint components can be configured but not sure if that's needed for 90% of cases. The subgraph request with a "query" operation will be safely deduplicated if the request body(query, variables, extensions which is the JSON body sent) + headers, then we are good in most cases.
I'm not talking about the cost of adding GW request headers to the fingerprint but the efficiency of the deduplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain to me the efficiency of the fingerprint

Copy link
Member

@ardatan ardatan Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right word is the efficiency of the deduplication actually.
Let's say I have two requests sent to the GW with different auth headers, and they both have a subgraph request which are identical per body (query, variables and extensions) and headers(content-type etc but no auth header.
Then adding GW request headers to the fingerprint will not deduplicate these identical inflight subgraph requests because you have GW request headers in the fingerprint. But if we only respect what's sent(subgraph request), then these will be deduped which is more efficient or I am fully missing the point.

Refactors the request fingerprinting logic to use a function rather than
a struct with `Hash` and `PartialEq` traits. This simplifies the code
and improves performance. It also adds header normalization in config to
handle case-insensitivity.
@kamilkisiela kamilkisiela force-pushed the kamil-simpler-req-fingerprint branch from 57c1da8 to e32f4aa Compare September 15, 2025 10:37
@dotansimha dotansimha changed the title Refactor request fingerprinting refactor(router, executor): refactor request fingerprinting Sep 16, 2025
@@ -51,3 +54,17 @@ fn default_dedupe_enabled() -> bool {
fn default_dedupe_fingerprint_headers() -> Vec<String> {
vec!["authorization".to_string()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also include Cookie? It's fairly common (and more secure) to use it instead of Authorization header

}
} else {
for header_name in fingerprint_headers.iter() {
if let Some(value) = req_headers.get(header_name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood this one correctly, then it means that it will filter the downstream (to-subgraph) as well, and we talked about now allowing the user to change these? 👀


// BTreeMap to ensure case-insensitivity and consistent order for hashing
let mut headers = BTreeMap::new();
if fingerprint_headers.is_empty() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pondering on what should be the default case when the list is set to empty value. should it be all headers? or no headers at all?

#[serde(
default = "default_dedupe_fingerprint_headers",
deserialize_with = "deserialize_and_normalize_dedupe_fingerprint_headers"
)]
pub dedupe_fingerprint_headers: Vec<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could already be HeaderName instead of String, and then you can deserialize it (and validate that the string is an actual valid header name) while parsing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants