-
Notifications
You must be signed in to change notification settings - Fork 0
refactor(router, executor): refactor request fingerprinting #422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @kamilkisiela, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly refactors the request fingerprinting mechanism within the router, moving from a struct-based approach to a more efficient functional implementation. This change, coupled with the introduction of header normalization for case-insensitivity, enhances the accuracy and performance of request deduplication. Furthermore, it enables the propagation of upstream headers to subgraph executors, providing richer context for downstream services. A minor but useful change also directs development CLI logs to stdout for improved debugging.
Highlights
- Refactored Request Fingerprinting: The logic for generating request fingerprints has been transformed from a
RequestFingerprint
struct withHash
andPartialEq
traits into a standalone function,request_fingerprint
, improving code simplicity and performance. - Header Normalization for Fingerprinting: The configuration for
dedupe_fingerprint_headers
now includes a deserialization step that normalizes header names to lowercase, ensuring case-insensitive matching during fingerprint generation. - Upstream Header Propagation: Upstream HTTP headers are now explicitly passed to subgraph executors, allowing for more comprehensive request fingerprinting and potentially enabling more advanced routing or caching strategies based on these headers.
- Dev CLI Logging to Stdout: The development command-line interface has been updated to direct its tracing logs to
std::io::stdout
, making it easier to monitor output during development.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the request fingerprinting logic, moving from a struct to a function, passes upstream headers to subgraph executors, and configures logging to stdout for the dev CLI. The changes simplify the code and correctly plumb the upstream headers. However, I've found a regression in the new fingerprinting logic that makes header name matching case-sensitive when fingerprinting all headers, which could break request deduplication. My review includes a specific comment with a suggested fix for this issue.
✅
|
🐋 This PR was built and pushed to the following Docker images: Image Names: Platforms: Image Tags: Docker metadata{
"buildx.build.ref": "builder-1a6dd710-83ee-45b5-87d3-3438cdce0199/builder-1a6dd710-83ee-45b5-87d3-3438cdce01990/s3xwy6mgatbxngzf3o1op3uxc",
"containerimage.descriptor": {
"mediaType": "application/vnd.oci.image.index.v1+json",
"digest": "sha256:857727dc58ccc65606d1804b21c2c4ab77e622e2f54b67fd94845555a76cd595",
"size": 1609
},
"containerimage.digest": "sha256:857727dc58ccc65606d1804b21c2c4ab77e622e2f54b67fd94845555a76cd595",
"image.name": "ghcr.io/graphql-hive/router:pr-422,ghcr.io/graphql-hive/router:sha-1cb1697"
} |
} | ||
for (key, value) in upstream_headers.iter() { | ||
if let Ok(value_str) = value.to_str() { | ||
headers.insert(key.as_str(), value_str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we respect the gateway request headers in the fingerprint?
If a subgraph request doesn't use anything from the gateway request, then they can be deduplicated safely. So I think we should only respect subgraph request's headers. A subgraph request that doesn't use anything from GW request can be shared with other GW request that doesn't have a subgraph request uses the GW request headers no?
GW Request from A client -> Subgraph request w/ nothing propagated from A but the body etc are the same with the one below
GW Request from B client -> Subgraph request w/ nothing propagated from B but the body etc are the same with one above
And those subgraph requests can be deduplicated. But if we add GW request headers to the fingerprint then it won't be possible to dedupe these two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the gateway receives a request with Authorization: Bearer xyz
, but in the request to the subgraph, you chose to send this information in some other form, like through the extensions or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is in the extensions then it changes the request body which is also included in the fingerprint, no?
So the fingerprint should only respect what is sent to the subgraph. If it has anything specific for that GW request, then it will already change the subgraph request headers, body or even the endpoint, so not sure if having GW request headers or anything GW request specific in the fingerprint is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, but what when you want to deduplicate based on a custom header that is not sent to the subgraph? Like a feature flag or whatever. Subgraph calls will be the same, but for some reason you want to execute those anyway.
If I follow your example, then why we even need to selectively pick headers, and not just hash the whole thing?
Deduplicating purely on a subgraph call and rely only on that, means users have no control over what is deduplicated and what is not based on the upstream call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we can have a configuration option to modify the way of creating the fingerprint instead of adding all of the headers from the GW request to the fingerprint. By default, it can use subgraph request details then we can make the fingerprint elements configurable by the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An option to specifically pass the gw headers? Come on :) Passing them by default has 0 cost.
How can you make the elements configurable? For extensions for example. We would introduce some weird syntax for no reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the fingerprint is the hash of = request body + headers, then why we need a weird syntax for extensions? Extensions are part of the body. If the subgraph request has anything from GW request headers, then it will change the body or the headers, so the fingerprint will be different. If the user needs to send that query without deduplication then it can be disabled by them or fingerprint components can be configured but not sure if that's needed for 90% of cases. The subgraph request with a "query" operation will be safely deduplicated if the request body(query, variables, extensions which is the JSON body sent) + headers, then we are good in most cases.
I'm not talking about the cost of adding GW request headers to the fingerprint but the efficiency of the deduplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explain to me the efficiency of the fingerprint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right word is the efficiency of the deduplication
actually.
Let's say I have two requests sent to the GW with different auth headers, and they both have a subgraph request which are identical per body
(query
, variables
and extensions
) and headers
(content-type etc but no auth header.
Then adding GW request headers to the fingerprint will not deduplicate these identical inflight subgraph requests because you have GW request headers in the fingerprint. But if we only respect what's sent(subgraph request), then these will be deduped which is more efficient or I am fully missing the point.
Refactors the request fingerprinting logic to use a function rather than a struct with `Hash` and `PartialEq` traits. This simplifies the code and improves performance. It also adds header normalization in config to handle case-insensitivity.
57c1da8
to
e32f4aa
Compare
@@ -51,3 +54,17 @@ fn default_dedupe_enabled() -> bool { | |||
fn default_dedupe_fingerprint_headers() -> Vec<String> { | |||
vec!["authorization".to_string()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also include Cookie
? It's fairly common (and more secure) to use it instead of Authorization
header
} | ||
} else { | ||
for header_name in fingerprint_headers.iter() { | ||
if let Some(value) = req_headers.get(header_name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood this one correctly, then it means that it will filter the downstream (to-subgraph) as well, and we talked about now allowing the user to change these? 👀
|
||
// BTreeMap to ensure case-insensitivity and consistent order for hashing | ||
let mut headers = BTreeMap::new(); | ||
if fingerprint_headers.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pondering on what should be the default case when the list is set to empty value. should it be all headers? or no headers at all?
#[serde( | ||
default = "default_dedupe_fingerprint_headers", | ||
deserialize_with = "deserialize_and_normalize_dedupe_fingerprint_headers" | ||
)] | ||
pub dedupe_fingerprint_headers: Vec<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this could already be HeaderName
instead of String
, and then you can deserialize it (and validate that the string is an actual valid header name) while parsing
Hash
andPartialEq
traits. This simplifies the code and improves performance. It also adds header normalization in config to handle case-insensitivity.