refactor(router, executor): refactor request fingerprinting #422

kamilkisiela · 2025-09-15T09:58:59Z

Refactors the request fingerprinting logic to use a function rather than a struct with Hash and PartialEq traits. This simplifies the code and improves performance. It also adds header normalization in config to handle case-insensitivity.
Passes upstream headers to subgraph executors
Log to stdout in dev cli

gemini-code-assist

Summary of Changes

Hello @kamilkisiela, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the request fingerprinting mechanism within the router, moving from a struct-based approach to a more efficient functional implementation. This change, coupled with the introduction of header normalization for case-insensitivity, enhances the accuracy and performance of request deduplication. Furthermore, it enables the propagation of upstream headers to subgraph executors, providing richer context for downstream services. A minor but useful change also directs development CLI logs to stdout for improved debugging.

Highlights

Refactored Request Fingerprinting: The logic for generating request fingerprints has been transformed from a RequestFingerprint struct with Hash and PartialEq traits into a standalone function, request_fingerprint, improving code simplicity and performance.
Header Normalization for Fingerprinting: The configuration for dedupe_fingerprint_headers now includes a deserialization step that normalizes header names to lowercase, ensuring case-insensitive matching during fingerprint generation.
Upstream Header Propagation: Upstream HTTP headers are now explicitly passed to subgraph executors, allowing for more comprehensive request fingerprinting and potentially enabling more advanced routing or caching strategies based on these headers.
Dev CLI Logging to Stdout: The development command-line interface has been updated to direct its tracing logs to std::io::stdout, making it easier to monitor output during development.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the request fingerprinting logic, moving from a struct to a function, passes upstream headers to subgraph executors, and configures logging to stdout for the dev CLI. The changes simplify the code and correctly plumb the upstream headers. However, I've found a regression in the new fingerprinting logic that makes header name matching case-sensitive when fingerprinting all headers, which could break request deduplication. My review includes a specific comment with a suggested fix for this issue.

lib/executor/src/executors/dedupe.rs

github-actions · 2025-09-15T10:03:30Z

✅ `k6-benchmark` results

     ✓ response code was 200
     ✓ no graphql errors
     ✓ valid response structure

     █ setup

     checks.........................: 100.00% ✓ 236025      ✗ 0    
     data_received..................: 6.9 GB  230 MB/s
     data_sent......................: 92 MB   3.1 MB/s
     http_req_blocked...............: avg=2.8µs    min=631ns   med=1.74µs  max=3.44ms   p(90)=2.48µs  p(95)=2.87µs  
     http_req_connecting............: avg=82ns     min=0s      med=0s      max=512.75µs p(90)=0s      p(95)=0s      
     http_req_duration..............: avg=18.62ms  min=1.88ms  med=17.7ms  max=166.67ms p(90)=25.65ms p(95)=28.68ms 
       { expected_response:true }...: avg=18.62ms  min=1.88ms  med=17.7ms  max=166.67ms p(90)=25.65ms p(95)=28.68ms 
     http_req_failed................: 0.00%   ✓ 0           ✗ 78695
     http_req_receiving.............: avg=130.85µs min=24.04µs med=39.39µs max=118.52ms p(90)=90.22µs p(95)=371.06µs
     http_req_sending...............: avg=24.97µs  min=5.33µs  med=10.41µs max=22.11ms  p(90)=16.22µs p(95)=30.15µs 
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s      max=0s       p(90)=0s      p(95)=0s      
     http_req_waiting...............: avg=18.46ms  min=1.83ms  med=17.57ms max=157.54ms p(90)=25.38ms p(95)=28.38ms 
     http_reqs......................: 78695   2618.071665/s
     iteration_duration.............: avg=19.05ms  min=4.54ms  med=18.04ms max=276.24ms p(90)=26.1ms  p(95)=29.18ms 
     iterations.....................: 78675   2617.406293/s
     vus............................: 50      min=50        max=50 
     vus_max........................: 50      min=50        max=50

github-actions · 2025-09-15T10:04:36Z

🐋 This PR was built and pushed to the following Docker images:

Image Names: ghcr.io/graphql-hive/router

Platforms: linux/amd64,linux/arm64

Image Tags: ghcr.io/graphql-hive/router:pr-422 ghcr.io/graphql-hive/router:sha-1cb1697

Docker metadata

{
"buildx.build.ref": "builder-1a6dd710-83ee-45b5-87d3-3438cdce0199/builder-1a6dd710-83ee-45b5-87d3-3438cdce01990/s3xwy6mgatbxngzf3o1op3uxc",
"containerimage.descriptor": {
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "digest": "sha256:857727dc58ccc65606d1804b21c2c4ab77e622e2f54b67fd94845555a76cd595",
  "size": 1609
},
"containerimage.digest": "sha256:857727dc58ccc65606d1804b21c2c4ab77e622e2f54b67fd94845555a76cd595",
"image.name": "ghcr.io/graphql-hive/router:pr-422,ghcr.io/graphql-hive/router:sha-1cb1697"
}

ardatan · 2025-09-15T10:07:11Z

lib/executor/src/executors/dedupe.rs

+        }
+        for (key, value) in upstream_headers.iter() {
+            if let Ok(value_str) = value.to_str() {
+                headers.insert(key.as_str(), value_str);


Why do we respect the gateway request headers in the fingerprint?
If a subgraph request doesn't use anything from the gateway request, then they can be deduplicated safely. So I think we should only respect subgraph request's headers. A subgraph request that doesn't use anything from GW request can be shared with other GW request that doesn't have a subgraph request uses the GW request headers no?

GW Request from A client -> Subgraph request w/ nothing propagated from A but the body etc are the same with the one below
GW Request from B client -> Subgraph request w/ nothing propagated from B but the body etc are the same with one above

And those subgraph requests can be deduplicated. But if we add GW request headers to the fingerprint then it won't be possible to dedupe these two.

What if the gateway receives a request with Authorization: Bearer xyz, but in the request to the subgraph, you chose to send this information in some other form, like through the extensions or something else?

If it is in the extensions then it changes the request body which is also included in the fingerprint, no?
So the fingerprint should only respect what is sent to the subgraph. If it has anything specific for that GW request, then it will already change the subgraph request headers, body or even the endpoint, so not sure if having GW request headers or anything GW request specific in the fingerprint is needed.

sure, but what when you want to deduplicate based on a custom header that is not sent to the subgraph? Like a feature flag or whatever. Subgraph calls will be the same, but for some reason you want to execute those anyway.

If I follow your example, then why we even need to selectively pick headers, and not just hash the whole thing?

Deduplicating purely on a subgraph call and rely only on that, means users have no control over what is deduplicated and what is not based on the upstream call.

Then we can have a configuration option to modify the way of creating the fingerprint instead of adding all of the headers from the GW request to the fingerprint. By default, it can use subgraph request details then we can make the fingerprint elements configurable by the user?

An option to specifically pass the gw headers? Come on :) Passing them by default has 0 cost.
How can you make the elements configurable? For extensions for example. We would introduce some weird syntax for no reason.

If the fingerprint is the hash of = request body + headers, then why we need a weird syntax for extensions? Extensions are part of the body. If the subgraph request has anything from GW request headers, then it will change the body or the headers, so the fingerprint will be different. If the user needs to send that query without deduplication then it can be disabled by them or fingerprint components can be configured but not sure if that's needed for 90% of cases. The subgraph request with a "query" operation will be safely deduplicated if the request body(query, variables, extensions which is the JSON body sent) + headers, then we are good in most cases.
I'm not talking about the cost of adding GW request headers to the fingerprint but the efficiency of the deduplication.

explain to me the efficiency of the fingerprint

The right word is the efficiency of the deduplication actually.
Let's say I have two requests sent to the GW with different auth headers, and they both have a subgraph request which are identical per body (query, variables and extensions) and headers(content-type etc but no auth header.
Then adding GW request headers to the fingerprint will not deduplicate these identical inflight subgraph requests because you have GW request headers in the fingerprint. But if we only respect what's sent(subgraph request), then these will be deduped which is more efficient or I am fully missing the point.

Refactors the request fingerprinting logic to use a function rather than a struct with `Hash` and `PartialEq` traits. This simplifies the code and improves performance. It also adds header normalization in config to handle case-insensitivity.

dotansimha · 2025-09-18T12:02:20Z

lib/router-config/src/traffic_shaping.rs

@@ -51,3 +54,17 @@ fn default_dedupe_enabled() -> bool {
 fn default_dedupe_fingerprint_headers() -> Vec<String> {
    vec!["authorization".to_string()]


Should this also include Cookie? It's fairly common (and more secure) to use it instead of Authorization header

dotansimha · 2025-09-18T12:08:49Z

lib/executor/src/executors/dedupe.rs

+        }
+    } else {
+        for header_name in fingerprint_headers.iter() {
+            if let Some(value) = req_headers.get(header_name) {


If I understood this one correctly, then it means that it will filter the downstream (to-subgraph) as well, and we talked about now allowing the user to change these? 👀

dotansimha · 2025-09-18T12:10:22Z

lib/executor/src/executors/dedupe.rs

+
+    // BTreeMap to ensure case-insensitivity and consistent order for hashing
+    let mut headers = BTreeMap::new();
+    if fingerprint_headers.is_empty() {


I'm pondering on what should be the default case when the list is set to empty value. should it be all headers? or no headers at all?

dotansimha · 2025-09-18T12:12:07Z

lib/router-config/src/traffic_shaping.rs

+    #[serde(
+        default = "default_dedupe_fingerprint_headers",
+        deserialize_with = "deserialize_and_normalize_dedupe_fingerprint_headers"
+    )]
    pub dedupe_fingerprint_headers: Vec<String>,


nit: this could already be HeaderName instead of String, and then you can deserialize it (and validate that the string is an actual valid header name) while parsing

gemini-code-assist bot reviewed Sep 15, 2025

View reviewed changes

lib/executor/src/executors/dedupe.rs Show resolved Hide resolved

ardatan reviewed Sep 15, 2025

View reviewed changes

kamilkisiela added 6 commits September 15, 2025 12:37

Refactor request fingerprinting

33e5dbb

Refactors the request fingerprinting logic to use a function rather than a struct with `Hash` and `PartialEq` traits. This simplifies the code and improves performance. It also adds header normalization in config to handle case-insensitivity.

Refactor dedupe executor code comments

efe2ac2

Pass upstream headers to subgraph executors

faa0821

Log to stdout in dev cli

34c55c4

Update ntex-http to 0.1.15

b78ea74

Fix fingerprint usage in HTTP executor

e32f4aa

kamilkisiela force-pushed the kamil-simpler-req-fingerprint branch from 57c1da8 to e32f4aa Compare September 15, 2025 10:37

dotansimha changed the title ~~Refactor request fingerprinting~~ refactor(router, executor): refactor request fingerprinting Sep 16, 2025

dotansimha reviewed Sep 18, 2025

View reviewed changes

		@@ -51,3 +54,17 @@ fn default_dedupe_enabled() -> bool {
		fn default_dedupe_fingerprint_headers() -> Vec<String> {
		vec!["authorization".to_string()]

refactor(router, executor): refactor request fingerprinting #422

Are you sure you want to change the base?

refactor(router, executor): refactor request fingerprinting #422

Uh oh!

Conversation

kamilkisiela commented Sep 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ k6-benchmark results

Uh oh!

github-actions bot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ardatan Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ardatan Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ardatan Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ardatan Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Sep 15, 2025 •

edited

Loading

✅ `k6-benchmark` results

github-actions bot commented Sep 15, 2025 •

edited

Loading

ardatan Sep 15, 2025 •

edited

Loading

ardatan Sep 15, 2025 •

edited

Loading

ardatan Sep 15, 2025 •

edited

Loading

ardatan Sep 15, 2025 •

edited

Loading