-
Notifications
You must be signed in to change notification settings - Fork 1
Implement full Query By Example matching. #87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Implement full Query By Example matching. #87
Conversation
- Creates new queryByExample module with comprehensive field matching logic - Converts VPR examples to JSON pointers for precise credential filtering - Updates presentations.js filter chain to use new matching capabilities - Adds comprehensive unit tests covering all matching scenarios - Replaces basic context-only filtering with full example-based matching Previous implementation only matched @context and specific Open Badge fields. New implementation matches ALL fields specified in Query By Example requests using JSON pointer traversal and flexible value comparison logic.
|
@bparth24 can you test this new code with the Academic Course credential's queries both on |
dlongley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't do a full review yet, but left some comments so things could get kicked off in parallel.
lib/presentations.js
Outdated
| * the example to match against. | ||
| * | ||
| * @returns {Function} A filter function that returns true if the credential | ||
| * matches the Query By Example specification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * the example to match against. | |
| * | |
| * @returns {Function} A filter function that returns true if the credential | |
| * matches the Query By Example specification. | |
| * the example to match against. | |
| * | |
| * @returns {Function} A filter function that returns true if the credential | |
| * matches the Query By Example specification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That function got removed as not needed in this commit - 4d42d81
lib/presentations.js
Outdated
| return ({record: {content}}) => { | ||
| // Use reusable module to check if this single credential matches | ||
| const matches = matchCredentials([content], credentialQuery); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like matchCredentials should instead just be matchCredential and match against a single VC. Callers could then run it over an array if they desired or not -- and clearly this code doesn't need to be using an array (it's just wasted cycles).
Hmm, but I see below that there's savings in only having to process credentialQuery once within the matchCredentials call, but we're losing all of that benefit here. So, instead, we need to reorganize the code such that we don't waste this effort. There a few ways to do that that come to mind real quick:
- Create an array of each record's
content(VCs), runmatchCredentials()on it, and then run over the records against the results doing an object comparison (shallow or "reference" comparison, not deep) to see which records match. A Map could also be built from content object => record and that could be used in post processing, but I doubt this saves many cycles and is maybe even less efficient that not using one. - Modifying the
matchCredentialsoutput to include which indexes matched and do an index match instead of an object comparison. - Modifying the
matchCredentialsinput to accept an object with thecredentialproperty and return the whole given object in the match results, allowing the object to carry any other arbitrary data for post processing. Post process the results to convert the objects back to records. This seems to leak details about the structure being worked on into thematchCredentialsAPI which is suboptimal / probably not a good idea. - Do something funky with Symbols and Sets or something (probably not unless it's not messy / hard to understand or follow as a maintainer).
Open to other better ideas too (of course), but idea 1 in the list above (vs. any of the others in that list) keeps the interfaces and separation of concerns the cleanest, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like
matchCredentialsshould instead just bematchCredentialand match against a single VC. Callers could then run it over an array if they desired or not -- and clearly this code doesn't need to be using an array (it's just wasted cycles).Hmm, but I see below that there's savings in only having to process
credentialQueryonce within thematchCredentialscall, but we're losing all of that benefit here. So, instead, we need to reorganize the code such that we don't waste this effort. There a few ways to do that that come to mind real quick:
- Create an array of each record's
content(VCs), runmatchCredentials()on it, and then run over the records against the results doing an object comparison (shallow or "reference" comparison, not deep) to see which records match. A Map could also be built from content object => record and that could be used in post processing, but I doubt this saves many cycles and is maybe even less efficient that not using one.- Modifying the
matchCredentialsoutput to include which indexes matched and do an index match instead of an object comparison.- Modifying the
matchCredentialsinput to accept an object with thecredentialproperty and return the whole given object in the match results, allowing the object to carry any other arbitrary data for post processing. Post process the results to convert the objects back to records. This seems to leak details about the structure being worked on into thematchCredentialsAPI which is suboptimal / probably not a good idea.- Do something funky with Symbols and Sets or something (probably not unless it's not messy / hard to understand or follow as a maintainer).
Open to other better ideas too (of course), but idea 1 in the list above (vs. any of the others in that list) keeps the interfaces and separation of concerns the cleanest, I think.
Attempt to address your concern and going with option 1 in this commit - 4d42d81
lib/queryByExample.js
Outdated
| * @returns {Array} Array of credentials that match the Query By Example | ||
| * specification. | ||
| */ | ||
| export function matchCredentials(credentials, queryByExample) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use named params in exported APIs:
| export function matchCredentials(credentials, queryByExample) { | |
| export function matchCredentials({credentials, queryByExample} = {}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Convert example to JSON pointers, excluding @context as it's handled | ||
| // separately | ||
| const expectedPointers = _convertExampleToPointers(example); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But @context shouldn't be handled externally (well, this wouldn't be known by this module when we're ready to separate it), so let's make sure it handles @context too. We should treat this function as if it could be separated and put into an external package for reuse by other applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But
@contextshouldn't be handled externally (well, this wouldn't be known by this module when we're ready to separate it), so let's make sure it handles@contexttoo. We should treat this function as if it could be separated and put into an external package for reuse by other applications.
Attempt to address your concern in this commit - c8fad33
| // Convert to JSON pointer dictionary and extract pointer/value pairs | ||
| try { | ||
| const dict = JsonPointer.dict(exampleWithoutContext); | ||
| for(const [pointer, value] of Object.entries(dict)) { | ||
| // Skip empty objects, arrays, or null/undefined values | ||
| if(_isMatchableValue(value)) { | ||
| pointers.push({ | ||
| pointer, | ||
| expectedValue: value | ||
| }); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this code over here:
bedrock-web-wallet/lib/presentations.js
Line 238 in 212ac42
| const pointers = _adjustPointers(Object.keys(jsonpointer.dict(object))); |
Can we find a way to expose it as an exported API in this module that this module can also use internally? There might need to be a few different options when converting an object to pointers, but it seems like we could get more reuse here.
There might be something common we can do with this as well:
https://github.com/digitalbazaar/di-sd-primitives/blob/v3.1.0/lib/select.js#L9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this code over here:
bedrock-web-wallet/lib/presentations.js
Line 238 in 212ac42
const pointers = _adjustPointers(Object.keys(jsonpointer.dict(object))); Can we find a way to expose it as an exported API in this module that this module can also use internally? There might need to be a few different options when converting an object to pointers, but it seems like we could get more reuse here.
There might be something common we can do with this as well:
https://github.com/digitalbazaar/di-sd-primitives/blob/v3.1.0/lib/select.js#L9
Attempt to address your concern in this commit - c8fad33
| // Skip null, undefined | ||
| if(value == null) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think we need to be careful here -- perhaps a null value means "value must not be present" -- which is different from "ignore this". It's true that undefined should be treated as if the associated property wasn't even there to begin with (and generally shouldn't ever happen, but good to cover the case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think we need to be careful here -- perhaps a
nullvalue means "value must not be present" -- which is different from "ignore this". It's true thatundefinedshould be treated as if the associated property wasn't even there to begin with (and generally shouldn't ever happen, but good to cover the case).
Attempt to address your concern in this commit - c8fad33
| // Skip empty arrays | ||
| if(Array.isArray(value) && value.length === 0) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think empty arrays are wildcards that mean "any array will match". Clearly under-specified in the spec, but I think this was the aim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if(typeof value === 'object' && !Array.isArray(value) && | ||
| Object.keys(value).length === 0) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think empty objects are considered "wildcards" -- meaning, any value will match. Clearly under-specified in the spec, but I think this was the aim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Check if they have the same number of keys | ||
| if(keys1.length !== keys2.length) { | ||
| return false; | ||
| } | ||
|
|
||
| // Check if all keys and values match | ||
| return keys1.every(key => | ||
| keys2.includes(key) && _valuesMatch(obj1[key], obj2[key]) | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the object matching is "must match this object exactly", but rather, "must include these fields and values". So I suspect this function (or something else) needs an adjustment for this.
In general, if you think about this query mechanism visually, the "example" is to be placed as an overlay on a possible candidate for a match -- and as long as everything in the example "overlaps" with something in the candidate, then the candidate is considered a match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be clearer if we provided wildcard values? Because the value thing being an actual value...which is then not used in the match...is very confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the object matching is "must match this object exactly", but rather, "must include these fields and values". So I suspect this function (or something else) needs an adjustment for this.
In general, if you think about this query mechanism visually, the "example" is to be placed as an overlay on a possible candidate for a match -- and as long as everything in the example "overlaps" with something in the candidate, then the candidate is considered a match.
Attempt to address your concern in this commit - c8fad33
| const actualValue = JsonPointer.get(credential, pointer); | ||
| const result = _valuesMatch(actualValue, expectedValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need to be more careful here with arrays? We should make sure there are tests with arrays that have objects nested within them that are also arrays still work.
This code might also benefit from using that other code I linked to where just the "deepest" pointers are considered, I'm not sure.
In other words, it might be best to:
- Generating the deepest pointers for both the example.
- Get the leaf "expected values" (similar to what the code is already doing).
- Do some kind of adjustment for "arrays" (pointer values with numbers) -- perhaps multiplex these or save them for special multiphase processing? For example, each pointer that includes an array is really just another "example" that starts at that array and will need to try to match against any element found in the associated array in any candidate match.
- For each VC, check just these deepest pointers with the appropriate "match" algorithm(s).
- For each array case, see if there are any "candidate matches" to consider... recursively? Anyway, something to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need to be more careful here with arrays? We should make sure there are tests with arrays that have objects nested within them that are also arrays still work.
This code might also benefit from using that other code I linked to where just the "deepest" pointers are considered, I'm not sure.
In other words, it might be best to:
- Generating the deepest pointers for both the example.
- Get the leaf "expected values" (similar to what the code is already doing).
- Do some kind of adjustment for "arrays" (pointer values with numbers) -- perhaps multiplex these or save them for special multiphase processing? For example, each pointer that includes an array is really just another "example" that starts at that array and will need to try to match against any element found in the associated array in any candidate match.
- For each VC, check just these deepest pointers with the appropriate "match" algorithm(s).
- For each array case, see if there are any "candidate matches" to consider... recursively? Anyway, something to consider.
Attempt to address your concern in this commit - c8fad33
Replaces individual credential filtering with efficient batch processing to address code review feedback on performance. Processes all credentials at once instead of calling matchCredentials() separately for each item. - Updates presentations.js to use batch processing approach - Removes inefficient _matchQueryByExampleFilter function - Extracts all credential contents, processes in single call - Maps results back using reference comparison for O(1) lookup - Adds integration test to verify batch processing functionality - Removes unused profile setup from integration test
- Implements comprehensive QBE matching system with matchCredentials() function supporting semantic match types (mustBeNull, anyArray, anyValue, exactMatch) - Adds selectiveDisclosureUtil.js with JSON-LD selection and pointer utilities including selectJsonLd(), adjustPointers(), and parsePointer() functions - Integrates array containment matching vs positional matching for flexible credential queries supporting overlay matching where credentials can have extra fields beyond the example specification - Handles null value vs missing field distinction with strict semantic rules - Includes string normalization and type coercion for robust value comparison - Integrates with existing presentations.match() system while maintaining backward compatibility with existing query types - Adds debug.js utility to conditionally enable/disable console logs during development with support for environment variables and browser flags - Provides comprehensive test suite with 45+ tests covering semantic features, edge cases, array matching, nested objects, and real-world scenarios
|
@dlongley cc: @BigBlueHat Outputs for newly added tests: |
dlongley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a partial review again -- making great progress here!
| export { | ||
| ageCredentialHelpers, capabilities, cryptoSuites, exchanges, | ||
| helpers, inbox, presentations, users, validator, zcap | ||
| helpers, inbox, presentations, queryByExample, users, validator, zcap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something minor to consider here about this export:
Our long term goal is to move the queryByExample code to a separate module where anyone could import that API (from there) if desired. However, once we do that, we'd have this API surface here (if we continue to export this), that we will have to do backwards compatible support for until we do a major release to remove it. I expect that we're exporting this now for testing purposes.
I think we'd probably export it with an underscore _queryByExample to signal this (and leave a comment that it is exposed for testing purposes only) ... and we wouldn't mention exposing this in API in the changelog. We can make that change now if you'd like, or we can just support it until we do a major change in the future -- it's not that big of a deal, but worth keeping this sort of future planning in mind when exporting new API, so thought I'd mention it.
| // Full query by example matching implemented via queryByExample module | ||
| // Process all credentials at once for efficiency | ||
| if(credentialQuery?.example) { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| result.matches = result.matches.filter(match => | ||
| matchingContents.includes(match.record.content) | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style fix and could be more efficient:
| result.matches = result.matches.filter(match => | |
| matchingContents.includes(match.record.content) | |
| ); | |
| const contentSet = new Set(matchingContents); | |
| result.matches = result.matches.filter( | |
| match => contentSet.has(match.record.content)); |
| } | ||
|
|
||
| result.matches = | ||
| result.matches.filter(_openBadgeFilter({credentialQuery})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| result.matches.filter(_openBadgeFilter({credentialQuery})); | |
| result.matches.filter(_openBadgeFilter({credentialQuery})); |
| if(!Array.isArray(credentials)) { | ||
| debugLog('Credentials is not an array:', credentials); // DEBUG | ||
| return []; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me like we'd want this to be a TypeError:
| if(!Array.isArray(credentials)) { | |
| debugLog('Credentials is not an array:', credentials); // DEBUG | |
| return []; | |
| } | |
| if(!Array.isArray(credentials)) { | |
| throw new TypeError('"credentials" must be an array.'); | |
| } |
| debugLog('Input credentials count:', credentials.length); // DEBUG | ||
| debugLog('Input credentials:', credentials.map(c => ({ // DEBUG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style preference for comments above their related lines:
| debugLog('Input credentials count:', credentials.length); // DEBUG | |
| debugLog('Input credentials:', credentials.map(c => ({ // DEBUG | |
| // DEBUG | |
| debugLog('Input credentials count:', credentials.length); | |
| debugLog('Input credentials:', credentials.map(c => ({ |
| typeof credential === 'object' && | ||
| !Array.isArray(credential) && | ||
| credential.credentialSubject && | ||
| typeof credential.credentialSubject === 'object'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| typeof credential === 'object' && | |
| !Array.isArray(credential) && | |
| credential.credentialSubject && | |
| typeof credential.credentialSubject === 'object'; | |
| typeof credential === 'object' && | |
| !Array.isArray(credential) && | |
| credential.credentialSubject && | |
| typeof credential.credentialSubject === 'object'; |
| const isValid = credential && | ||
| typeof credential === 'object' && | ||
| !Array.isArray(credential) && | ||
| credential.credentialSubject && | ||
| typeof credential.credentialSubject === 'object'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indent here is off, but I also think we can collapse this to:
| const isValid = credential && | |
| typeof credential === 'object' && | |
| !Array.isArray(credential) && | |
| credential.credentialSubject && | |
| typeof credential.credentialSubject === 'object'; | |
| const isValid = credential?.credentialSubject && | |
| typeof credential?.credentialSubject === 'object'; |
It's true that we wouldn't catch a weird case of credential being an array with a property of credentialSubject that is a non-null object, but I think that's ok (that's a problem the caller should have already caught).
Though, I'm also not entirely sure why we're filtering out some invalid credentials here (even the original code wouldn't catch every possible problem) at all instead of allowing errors to occur if the caller failed to provide good inputs. We should probably allow the errors to happen in this API -- and let the caller ensure they are passing valid credentials. Thoughts?
| * {pointer, expectedValue, matchType} | ||
| * where pointer is a JSON pointer string, expectedValues is the | ||
| * expected value, and matchType describes how to match | ||
| * ('exactMatch', 'anyArray', etc.). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent should match other indent level (2), starting from prefix of * (the space after the docs asterisk isn't part of the "indent" on any line for consistency):
| * {pointer, expectedValue, matchType} | |
| * where pointer is a JSON pointer string, expectedValues is the | |
| * expected value, and matchType describes how to match | |
| * ('exactMatch', 'anyArray', etc.). | |
| * {pointer, expectedValue, matchType} | |
| * where pointer is a JSON pointer string, expectedValues is the | |
| * expected value, and matchType describes how to match | |
| * ('exactMatch', 'anyArray', etc.). |
| // Empty array - add it | ||
| pointers.push({pointer: currentPath, value}); | ||
| } else if(typeof value === 'object' && value !== null && | ||
| Object.keys(value).length === 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Object.keys(value).length === 0) { | |
| Object.keys(value).length === 0) { |
|
Might be able to address #91 here as well. |
Previous implementation only matched @context and specific Open Badge fields. New implementation matches ALL fields specified in Query By Example requests using JSON pointer traversal and flexible value comparison logic.