-
Notifications
You must be signed in to change notification settings - Fork 163
pg qrep: avro schema nullable by default #3613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
❌ 2 Tests Failed:
View the full list of 2 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
282364d to
90b57fe
Compare
despite our efforts, nulled values still frequently leak through
90b57fe to
90b7203
Compare
ilidemi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic LGTM. Please add test with and without the flag
Current situation: 1. `failed to sync records: failed to write records to S3: failed to write records to OCF writer: failed to write record to OCF: some_column_100: avro: *avro.null is unsupported for Avro long` occurs once a month for one customer 2. The column in question is a nullable integer, all values in it are null, it is also inherited from a parent table 3. Parent table has same name but different schema 4. Parent and child attnums are different, so possibly column deletions were involved, and also child seems to have become inherited later on. 5. I was unable to reproduce the error with that information. It is possible the schema has changed since, so it would be nice to capture the exact data when we have it. This change is a spiritual successor of #3613, but defaults to strict behavior (which has been working fine for everyone else) and puts the lax one under a config. When lax is enabled, it collects all the inputs that go into deciding whether a column would be nullable under strict behavior, then some extra about table inheritance, and logs it later if any mismatch with strict was detected. Tested that the new logic runs and logs if the code is adjusted to under-do nullable, but as-is the test doesn't do much as the issue is not cleanly reproducible yet. Also adding a generic code notification metric that's easy to emit from anywhere, will set up a non-paging alert on it once this goes in. The plan is to enable the setting just for one service, leave it to bake for another month or two, then check back when the notification fires. After the issue is sorted out, all the null tracking can be removed.
despite our efforts, nulled values still frequently leak through. put behind setting