-
Notifications
You must be signed in to change notification settings - Fork 116
Handle multiple updates to the same record in a transaction #3722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Handle multiple updates to the same record in a transaction #3722
Conversation
…latest IndexWriter changes
# Conflicts: # fdb-record-layer-lucene/src/test/java/com/apple/foundationdb/record/lucene/LuceneIndexMaintenanceTest.java
|
|
||
| /** | ||
| * Try to find the document for the given record in the segment index. | ||
| * This method would first try to find the document using teh existing reader. If it can't, it will refresh the reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * This method would first try to find the document using teh existing reader. If it can't, it will refresh the reader | |
| * This method would first try to find the document using the existing reader. If it can't, it will refresh the reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| * writer may cache the changes in NRT and the reader (created earlier) can't see them. Refreshing the reader from the | ||
| * writer can alleviate this. If the index can't find the document with the refresh reader, null is returned. | ||
| * Note that the refresh of the reader will do so at the {@link com.apple.foundationdb.record.lucene.directory.FDBDirectoryWrapper} | ||
| * and so has impact on the entire directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider rewording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| if (newReader != null) { | ||
| // previous reader instantiated but then writer changed | ||
| readersToClose.add(writerReader); | ||
| writerReader = LazyCloseable.supply(() -> newReader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this is touched concurrently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test to update the readers concurrently
| ParameterizedTestUtils.booleans("isGrouped"), | ||
| Stream.of(0, 10), | ||
| Stream.of(0, 1, 4), | ||
| Stream.of(5365)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use RandomizedTestUtils.randomizedSeeds to run extra seeds during nightly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| @ParameterizedTest | ||
| @MethodSource("multiUpdate") | ||
| void multipleUpdatesInTransaction(boolean isSynthetic, boolean isGrouped, int highWatermark, int updateCount, long seed) throws IOException { | ||
| final int documentCount = 15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to have this be random, perhaps between 11 and 19?
As it is you either have 15 documents in the partition, or 10 and 5. Putting some randomness in there might find some interesting edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The issue at hand is that when running multiple update operations in a single transaction, the partition's document counts and the PK-segment index may get into an inconsistent state. The root cause is that the first update in the transaction clears the doc from the Lucene index and the PK index. Since the changes are not flushed, the IndexWriter has them cached in the NRT cache. The second record update would then not find the record in the PK index (because the segment has changed but the IndexReader does not yet reflect that) and therefore the delete is skipped, including updating the partition count. Note that it does attempt a delete-by-query that actually removes the doc from the Lucene index, but since we can't know that, the partition is not updated.
The solution is to
refreshtheDirectoryReaderwhen doing an update, so that any previously written changes are showing up. The refresh operation usesDirectoryReader.openIfChangedthat is more efficient in resources than using a brand newopencall.Resolve #3704