-
Notifications
You must be signed in to change notification settings - Fork 211
fix: support high precision timestamp strings on getRows calls #1596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: support high precision timestamp strings on getRows calls #1596
Conversation
Add 8 system tests to verify BigQuery `getRows` functionality with various combinations of `timestampOutputFormat` and `useInt64Timestamp`. Tests cover all format options: UNSPECIFIED, FLOAT64, INT64, ISO8601_STRING combined with boolean `useInt64Timestamp` values.
Add 8 system tests to verify BigQuery `getRows` functionality with various combinations of `timestampOutputFormat` and `useInt64Timestamp`. Tests cover all format options: UNSPECIFIED, FLOAT64, INT64, ISO8601_STRING combined with boolean `useInt64Timestamp` values. Asserts that timestamps inserted with picosecond precision are retrieved correctly (truncated to microsecond precision as per BigQuery storage).
to the user
This reverts commit b9e81ac.
Add 7 new system test cases to `system-test/timestamp_output_format.ts` to verify the behavior of `Table.getRows` when `formatOptions.timestampOutputFormat` or `formatOptions.useInt64Timestamp` are omitted. These tests confirm that `useInt64Timestamp` defaults to `true`, causing conflicts with incompatible formats like `FLOAT64` unless explicitly disabled.
This change modifies `BigQueryTimestamp` constructor to preserve original string values for timestamps with more than 9 fractional digits (picoseconds etc.), bypassing `PreciseDate` which truncates to nanoseconds. This ensures that high precision timestamps returned by BigQuery are not truncated by the client. - Unskipped test in `system-test/timestamp_output_format.ts` - Added regression test in `test/bigquery.ts` - Implemented logic to check for high precision strings in `src/bigquery.ts` Co-authored-by: danieljbruce <8935272+danieljbruce@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
| 1672574400.123456 | ||
| 2023-01-01T12:00:00.123456789123Z | ||
| */ | ||
| const listParams = options.listParams; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should provide this option to customers, we should default to use the new timestampOutputFormat option under the hood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this comment. What does it mean to use the new timestampOutputFormat option under the hood?
When you say I don't think we should provide this option to customers I think you are talking about the extra parameter I added to convertSchemaFieldValue and mergeSchemaWithRows_ methods right? For parsing the value response from the server I needed this extra argument so that we can read timestampOutputFormat and useInt64Timestamp to determine the type of value the server returned.
Alternatively, we could use predicates like Number.isNaN(value) and Number.parseFloat(value) to determine the type of value returned from the server and then method signatures don't change. Do you prefer that option instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review. I have refactored the code to remove the listParams argument and instead use predicate-based logic (typeof value === 'string' and typeof value === 'number') to determine how to parse the timestamp value, as you suggested. This avoids changing the method signatures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say
I don't think we should provide this option to customersI think you are talking about the extra parameter I added toconvertSchemaFieldValueandmergeSchemaWithRows_methods right? For parsing the value response from the server I needed this extra argument so that we can readtimestampOutputFormatanduseInt64Timestampto determine the type of value the server returned.
I meant that we should not provide the option in the getRows method, but I forgot that is extends from bigquery.tabledata.IListParams and therefore has the data format options in there. I was mixing with the Go SDK where we don't let customers pass that option directly. Sorry for the confusion.
In any case, I think we can have some improvements to the code and maybe let the service throws an error if both format options is informed. One concern that I have is that we already set the default https://github.com/googleapis/nodejs-bigquery/pull/1596/changes#diff-6400d8602f0855266c743f4fb031e97ac9cf07f9dcddfba03833b6f04d547a5eR1889, so I think we need to do two things here:
- We should default to use the new format option:
formatOptions.timestampOutputFormat': 'ISO8601_STRING' - Handle the case where customer only inform one of then and we need to override that default, so we don't have an error raised by the service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. That will introduce "Cannot convert 1672574400.123456 to a BigInt" like errors upstream so those will need to be addressed as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alvarowolfx I've explored this a little bit more and I'm not sure we should use the ISO8601_STRING as a default. The reason is that for users with calls set to timestampOutputFormat undefined and useInt64Timestamp set to true they will now see errors that say Error: Cannot specify both use_int64_timestamp and timestamp_output_format instead of a request that works properly. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We clarified this offline, now formatOptions.timestampOutputFormat only defaults to ISO8601_STRING when no format options are provided. I think now we see errors in the fewest number of cases.
| } else if (typeof value === 'string') { | ||
| if (/^\d{4}-\d{1,2}-\d{1,2}/.test(value)) { | ||
| pd = new PreciseDate(value); | ||
| if (value.match(/\.\d{10,}/) && !Number.isNaN(pd.getTime())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is PreciseDate supporting the picosecond resolution ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Precise date only supports nanosecond precision, not picosecond precision. This is why if the date has picoseconds in it then we cannot convert it to PreciseDate and then convert the precise date to a string because it will lose precision this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to evaluate contribute to PreciseDate to support such resolution. Also, maybe we should not instantiate a PreciseDate at the above line since is gonna generate an invalid time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like work is being done on PreciseDate to support this. I'll integrate this change once PreciseDate has support.
Refactored `convertSchemaFieldValue` to use predicate-based logic for timestamp parsing instead of relying on `listParams`. This simplifies the function signature and correctly handles high-precision timestamp strings. - Modified `src/bigquery.ts` to implement predicate-based timestamp parsing. - Updated `src/table.ts` to remove `listParams` argument. - Maintained regression tests in `test/bigquery.ts` and enabled system test in `system-test/timestamp_output_format.ts`. Co-authored-by: danieljbruce <8935272+danieljbruce@users.noreply.github.com>
…stamp" This reverts commit 44e6e7b.
| 1672574400.123456 | ||
| 2023-01-01T12:00:00.123456789123Z | ||
| */ | ||
| const listParams = options.listParams; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say
I don't think we should provide this option to customersI think you are talking about the extra parameter I added toconvertSchemaFieldValueandmergeSchemaWithRows_methods right? For parsing the value response from the server I needed this extra argument so that we can readtimestampOutputFormatanduseInt64Timestampto determine the type of value the server returned.
I meant that we should not provide the option in the getRows method, but I forgot that is extends from bigquery.tabledata.IListParams and therefore has the data format options in there. I was mixing with the Go SDK where we don't let customers pass that option directly. Sorry for the confusion.
In any case, I think we can have some improvements to the code and maybe let the service throws an error if both format options is informed. One concern that I have is that we already set the default https://github.com/googleapis/nodejs-bigquery/pull/1596/changes#diff-6400d8602f0855266c743f4fb031e97ac9cf07f9dcddfba03833b6f04d547a5eR1889, so I think we need to do two things here:
- We should default to use the new format option:
formatOptions.timestampOutputFormat': 'ISO8601_STRING' - Handle the case where customer only inform one of then and we need to override that default, so we don't have an error raised by the service.
| } else if (typeof value === 'string') { | ||
| if (/^\d{4}-\d{1,2}-\d{1,2}/.test(value)) { | ||
| pd = new PreciseDate(value); | ||
| if (value.match(/\.\d{10,}/) && !Number.isNaN(pd.getTime())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to evaluate contribute to PreciseDate to support such resolution. Also, maybe we should not instantiate a PreciseDate at the above line since is gonna generate an invalid time.
|
|
||
| it('should call getRows with TIMESTAMP_OUTPUT_FORMAT_UNSPECIFIED and useInt64Timestamp undefined', async () => { | ||
| const [rows] = await table.getRows({ | ||
| 'formatOptions.timestampOutputFormat': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test is sending both formatOptions under the hood, so can I assume the service side is not failing when both are informed ? When useInt64Timestamp is undefined, the code default to set it to true here: https://github.com/googleapis/nodejs-bigquery/pull/1596/changes#diff-6400d8602f0855266c743f4fb031e97ac9cf07f9dcddfba03833b6f04d547a5eR1889,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Not failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledged.
| it('should call getRows with FLOAT64 and useInt64Timestamp undefined', async () => { | ||
| try { | ||
| const [rows] = await table.getRows({ | ||
| 'formatOptions.timestampOutputFormat': 'FLOAT64', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, interesting, so here it fails ? Again, it customers informed only one formatOption, I think we need to override the default set at https://github.com/googleapis/nodejs-bigquery/pull/1596/changes#diff-6400d8602f0855266c743f4fb031e97ac9cf07f9dcddfba03833b6f04d547a5eR1889,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We made changes and this test doesn't fail anymore.
| } | ||
| }); | ||
|
|
||
| it('should call getRows with TIMESTAMP_OUTPUT_FORMAT_UNSPECIFIED and useInt64Timestamp=true', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should have only one test here and make it a table-driven test ? I think this became super long for testing a lot of variations, but it can be simplified with some table-driven test and expectation based on the parameter combination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I see the case for that. Sometimes parameterized tests can be harder to read, but it will make things more concise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored the system tests to use a parameterized approach as discussed. This makes the test suite more concise and maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledged.
Refactored `system-test/timestamp_output_format.ts` to use parameterized tests for better maintainability and conciseness. - Implemented predicate-based timestamp parsing in `src/bigquery.ts`. - Removed `listParams` from `src/table.ts`. - Updated `system-test/timestamp_output_format.ts` to use a parameterized test loop. - Adjusted unit tests in `test/bigquery.ts` to match implementation changes. Co-authored-by: danieljbruce <8935272+danieljbruce@users.noreply.github.com>
…ttps://github.com/googleapis/nodejs-bigquery into fix-high-precision-timestamps-12704073509468064918 # Conflicts: # system-test/timestamp_output_format.ts
Refactored `system-test/timestamp_output_format.ts` to use parameterized tests for better maintainability and conciseness. - Implemented predicate-based timestamp parsing in `src/bigquery.ts`. - Removed `listParams` from `src/table.ts`. - Updated `system-test/timestamp_output_format.ts` to use a parameterized test loop. - Adjusted unit tests in `test/bigquery.ts` to match implementation changes. Co-authored-by: danieljbruce <8935272+danieljbruce@users.noreply.github.com>
Refactored `system-test/timestamp_output_format.ts` to use parameterized tests for better maintainability and conciseness. - Implemented predicate-based timestamp parsing in `src/bigquery.ts`. - Removed `listParams` from `src/table.ts`. - Updated `system-test/timestamp_output_format.ts` to use a parameterized test loop. - Adjusted unit tests in `test/bigquery.ts` to match implementation changes. Co-authored-by: danieljbruce <8935272+danieljbruce@users.noreply.github.com>
…ttps://github.com/googleapis/nodejs-bigquery into fix-high-precision-timestamps-12704073509468064918 # Conflicts: # system-test/timestamp_output_format.ts
Description
This PR achieves the same objectives as googleapis/java-bigquery#4010, but for Node as requested in the planning sheet. The idea is that for reading rows, the users should be able to access high precision values for timestamps if high precision values are being stored on the backend.
Impact
This PR follows a test driven development approach against the getRows method.
Tests were written to evaluate what happens when getRows receives various input values for timestampOutputFormat and useInt64Timestamp. These tests revealed that not only are high precision values not being delivered to users for ISO8601_STRING return types, but also other bugs exist like calls hang on getRows calls that fail and conversion logic throughs errors for some calls that fetch rows. This PR fixes all the bugs and ensures the values with the right precision are delivered to users.
This chart details the before code changes / after code changes results with impact highlighted in green:
The highlighted green impact shows that conversion logic has been updated to avoid the 'cannot convert' errors and with this new logic changes are also applied to the BigQueryTimestamp class to maintain high precision on timestamp values returned to users.
Testing
New system tests to capture all useInt64Timestamp/timestampOutputFormat combinations for getRows calls.