-
Notifications
You must be signed in to change notification settings - Fork 0
refactor(router, executor): refactor request fingerprinting #422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
33e5dbb
efe2ac2
faa0821
34c55c4
b78ea74
e32f4aa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,9 @@ | ||
use ahash::AHasher; | ||
use bytes::Bytes; | ||
use http::{HeaderMap, Method, StatusCode, Uri}; | ||
use ntex_http::HeaderMap as NtexHeaderMap; | ||
use std::collections::BTreeMap; | ||
use std::hash::{BuildHasherDefault, Hash, Hasher}; | ||
use std::hash::{BuildHasher, BuildHasherDefault, Hash, Hasher}; | ||
|
||
#[derive(Debug, Clone)] | ||
pub struct SharedResponse { | ||
|
@@ -11,66 +12,52 @@ pub struct SharedResponse { | |
pub body: Bytes, | ||
} | ||
|
||
#[derive(Debug, Clone, Eq)] | ||
pub struct RequestFingerprint { | ||
method: Method, | ||
url: Uri, | ||
/// BTreeMap to ensure case-insensitivity and consistent order for hashing | ||
headers: BTreeMap<String, String>, | ||
body: Vec<u8>, | ||
} | ||
pub fn request_fingerprint( | ||
method: &Method, | ||
url: &Uri, | ||
req_headers: &HeaderMap, | ||
upstream_headers: &NtexHeaderMap, | ||
body_bytes: &[u8], | ||
fingerprint_headers: &[String], | ||
) -> u64 { | ||
let build_hasher = ABuildHasher::default(); | ||
let mut hasher = build_hasher.build_hasher(); | ||
|
||
// BTreeMap to ensure case-insensitivity and consistent order for hashing | ||
let mut headers = BTreeMap::new(); | ||
if fingerprint_headers.is_empty() { | ||
// fingerprint all headers | ||
|
||
impl RequestFingerprint { | ||
pub fn new( | ||
method: &Method, | ||
url: &Uri, | ||
req_headers: &HeaderMap, | ||
body_bytes: &[u8], | ||
fingerprint_headers: &[String], | ||
) -> Self { | ||
let mut headers = BTreeMap::new(); | ||
if fingerprint_headers.is_empty() { | ||
// fingerprint all headers | ||
for (key, value) in req_headers.iter() { | ||
for (key, value) in req_headers.iter() { | ||
if let Ok(value_str) = value.to_str() { | ||
headers.insert(key.as_str(), value_str); | ||
} | ||
} | ||
for (key, value) in upstream_headers.iter() { | ||
if let Ok(value_str) = value.to_str() { | ||
headers.insert(key.as_str(), value_str); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we respect the gateway request headers in the fingerprint? GW Request from A client -> Subgraph request w/ nothing propagated from A but the body etc are the same with the one below And those subgraph requests can be deduplicated. But if we add GW request headers to the fingerprint then it won't be possible to dedupe these two. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if the gateway receives a request with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it is in the extensions then it changes the request body which is also included in the fingerprint, no? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure, but what when you want to deduplicate based on a custom header that is not sent to the subgraph? Like a feature flag or whatever. Subgraph calls will be the same, but for some reason you want to execute those anyway. If I follow your example, then why we even need to selectively pick headers, and not just hash the whole thing? Deduplicating purely on a subgraph call and rely only on that, means users have no control over what is deduplicated and what is not based on the upstream call. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then we can have a configuration option to modify the way of creating the fingerprint instead of adding all of the headers from the GW request to the fingerprint. By default, it can use subgraph request details then we can make the fingerprint elements configurable by the user? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An option to specifically pass the gw headers? Come on :) Passing them by default has 0 cost. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the fingerprint is the hash of = request body + headers, then why we need a weird syntax for extensions? Extensions are part of the body. If the subgraph request has anything from GW request headers, then it will change the body or the headers, so the fingerprint will be different. If the user needs to send that query without deduplication then it can be disabled by them or fingerprint components can be configured but not sure if that's needed for 90% of cases. The subgraph request with a "query" operation will be safely deduplicated if the request body(query, variables, extensions which is the JSON body sent) + headers, then we are good in most cases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. explain to me the efficiency of the fingerprint There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The right word is |
||
} | ||
} | ||
} else { | ||
for header_name in fingerprint_headers.iter() { | ||
if let Some(value) = req_headers.get(header_name) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I understood this one correctly, then it means that it will filter the downstream (to-subgraph) as well, and we talked about now allowing the user to change these? 👀 |
||
if let Ok(value_str) = value.to_str() { | ||
headers.insert(key.as_str().to_lowercase(), value_str.to_string()); | ||
headers.insert(header_name, value_str); | ||
} | ||
} | ||
} else { | ||
for header_name in fingerprint_headers.iter() { | ||
if let Some(value) = req_headers.get(header_name) { | ||
if let Ok(value_str) = value.to_str() { | ||
headers.insert(header_name.to_lowercase(), value_str.to_string()); | ||
} | ||
} else if let Some(value) = upstream_headers.get(header_name) { | ||
if let Ok(value_str) = value.to_str() { | ||
headers.insert(header_name, value_str); | ||
} | ||
} | ||
} | ||
|
||
Self { | ||
method: method.clone(), | ||
url: url.clone(), | ||
headers, | ||
body: body_bytes.to_vec(), | ||
} | ||
} | ||
dotansimha marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
impl Hash for RequestFingerprint { | ||
fn hash<H: Hasher>(&self, state: &mut H) { | ||
self.method.hash(state); | ||
self.url.hash(state); | ||
self.headers.hash(state); | ||
self.body.hash(state); | ||
} | ||
} | ||
method.hash(&mut hasher); | ||
url.hash(&mut hasher); | ||
headers.hash(&mut hasher); | ||
body_bytes.hash(&mut hasher); | ||
|
||
impl PartialEq for RequestFingerprint { | ||
fn eq(&self, other: &Self) -> bool { | ||
self.method == other.method | ||
&& self.url == other.url | ||
&& self.headers == other.headers | ||
&& self.body == other.body | ||
} | ||
hasher.finish() | ||
} | ||
|
||
pub type ABuildHasher = BuildHasherDefault<AHasher>; |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,7 +21,10 @@ pub struct TrafficShapingExecutorConfig { | |
/// A list of headers that should be used to fingerprint requests for deduplication. | ||
/// | ||
/// If not provided, the default is to use the "authorization" header only. | ||
#[serde(default = "default_dedupe_fingerprint_headers")] | ||
#[serde( | ||
default = "default_dedupe_fingerprint_headers", | ||
deserialize_with = "deserialize_and_normalize_dedupe_fingerprint_headers" | ||
)] | ||
pub dedupe_fingerprint_headers: Vec<String>, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: this could already be |
||
} | ||
|
||
|
@@ -51,3 +54,17 @@ fn default_dedupe_enabled() -> bool { | |
fn default_dedupe_fingerprint_headers() -> Vec<String> { | ||
vec!["authorization".to_string()] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this also include |
||
} | ||
|
||
fn deserialize_and_normalize_dedupe_fingerprint_headers<'de, D>( | ||
deserializer: D, | ||
) -> Result<Vec<String>, D::Error> | ||
where | ||
D: serde::Deserializer<'de>, | ||
{ | ||
let headers: Vec<String> = Deserialize::deserialize(deserializer)?; | ||
Ok(normalize_dedupe_fingerprint_headers(headers)) | ||
} | ||
|
||
fn normalize_dedupe_fingerprint_headers(headers: Vec<String>) -> Vec<String> { | ||
headers.into_iter().map(|h| h.to_lowercase()).collect() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pondering on what should be the default case when the list is set to empty value. should it be all headers? or no headers at all?