Skip to content

Conversation

jjezra
Copy link
Contributor

@jjezra jjezra commented Aug 4, 2025

Each indexing session, for each index, will create a key-value heartbeat of the format:
[prefix, xid] -> [indexing-type, genesis time, heartbeat time]
Indexing session that are expected to be exclusive will throw an exception if another, active, session exists.

Motivation:
1. Represent the heartbeat in every index during multi target indexing (currently - only the master index has a sync lock)
2. Keep a heartbeat during mutual indexing, which can allow better automatic decision making
3. Decide about exclusiveness according to the indexing method (currently - user input)

With this change, the equivalent of a sync lock will be determined by the indexing type and cannot be set by the users. The index configuration function setUseSynchronizedSession will have no effect on the indexing process.

During graduate code upgrade on multiple servers, there may be a situation where one server is indexing with a synchronized session lock, while another server builds the same index with an exclusive heartbeat "lock". If that happens:
a) There will be no more than two concurrent active sessions
b) The indexing sessions will conflict each other until one of the indexers will give up. While this may be not optimal, the generated index will be valid.

Resolve #3529

@jjezra jjezra added the breaking change Changes that are not backwards compatible label Aug 4, 2025
@jjezra jjezra force-pushed the indexing_heartbeat branch from 89d0263 to f318f65 Compare August 7, 2025 15:55
@jjezra jjezra requested a review from ScottDugas August 8, 2025 17:18
@jjezra jjezra force-pushed the indexing_heartbeat branch from a680857 to 83830c6 Compare August 11, 2025 12:01
@jjezra jjezra marked this pull request as ready for review August 11, 2025 14:13
@jjezra jjezra marked this pull request as draft August 11, 2025 15:06
@jjezra jjezra marked this pull request as ready for review August 11, 2025 20:23
Copy link
Collaborator

@ScottDugas ScottDugas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get a chance to look at all of the code, we can discuss some of the things offline.

@@ -180,7 +178,6 @@ <M extends Message> void singleRebuild(
if (!safeBuild) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does safeBuild here mean in this context? Do we need to continue to prevent concurrency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The things that had left to non-safeBuild, as it seems, are clearing and marking the index as write-only and setting constrains that, IIUIC, should not be happening after the previous clearing.

After so many changes in the code, maybe it is time to re-design (and possibly simplify) the singleRebuild. Should that be a separate PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing and cleaning up safeBuild should not be part of this PR.

@jjezra jjezra force-pushed the indexing_heartbeat branch from 4cf86f4 to 97d02b6 Compare August 15, 2025 13:22
@jjezra jjezra requested a review from ScottDugas August 15, 2025 13:52
Copy link
Collaborator

@ScottDugas ScottDugas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were also two comments left on my previous review, that I want to followup on. I resolved everything else in that review, or answered in a (I hope) definitive way, so it should be easy to go back through.

import java.util.concurrent.CompletableFuture;
import java.util.concurrent.atomic.AtomicInteger;

public class IndexingHeartbeat {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed offline, and decided to take a balanced approach of changing the method to info, so that this is, on disk, future-proofed for being upgraded to be more generic if we want to use it elsewhere. There would still need to be some substantial changes to the code, but it would need to take a subspace instead of a store and an index.

@jjezra jjezra force-pushed the indexing_heartbeat branch 2 times, most recently from 86ad032 to 2a9f7e5 Compare August 25, 2025 15:35
@jjezra jjezra requested a review from ScottDugas August 25, 2025 15:36
   Each indexing session, for each index, will create a key-value heartbeat of the format:
        [prefix, xid] -> [indexing-type, genesis time, heartbeat time]
   Indexing session that are expected to be exclusive will throw an exception if another, active, session exists.

   Motivation:

        1. Represent the heartbeat in every index during multi target indexing (currently - only the master index has a sync lock)
        2. Keep a heartbeat during mutual indexing, which can allow better automatic decision making
        3. Decide about exclusiveness according to the indexing method (currently - user input)

Resolve FoundationDB#3529
@jjezra jjezra force-pushed the indexing_heartbeat branch from 921fd82 to a8eff82 Compare September 4, 2025 12:10
@jjezra jjezra requested a review from ScottDugas September 4, 2025 12:14
@@ -1259,11 +1214,11 @@ protected static <T> T findException(@Nullable Throwable ex, Class<T> classT) {
return null;
}

protected static boolean shouldLessenWork(@Nullable FDBException ex) {
protected static boolean doNotLessenWork(@Nullable FDBException ex) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you inverting this method, this seems completely unrelated to the change at hand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to please an idea warning. But you are right, it is unreadable. Converting back...

FDBRecordStoreTestBase.RecordMetaDataHook hook = myHook(sourceIndex, tgtIndex);
openSimpleMetaData(hook);

Semaphore pauseMutualBuildSemaphore = new Semaphore(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it in a followup PR sounds good. I think you could make it a little better by making a class ConfigLoaderThatPausesAfterOnePass that could then store all of these semaphores, and have some more helpful.
There might be something even better, but we're probably getting diminishing returns here, and trying to clean this up more isn't worth it.

To be clear, looking to see if some other tests could take advantage of pauseAfterOnePass in a followup PR sounds good, but it's probably not worth creating something more full-featured.

@jjezra jjezra requested a review from ScottDugas September 8, 2025 14:11
@jjezra jjezra added enhancement New feature or request and removed breaking change Changes that are not backwards compatible labels Sep 8, 2025
Copy link
Collaborator

@ScottDugas ScottDugas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Teamscale is complaining about two test gaps, these seem easy enough to add coverage in the test you already have.

There's also a few test methods that are beyond the length, or nesting depth threshold, probably worth addressing those.

@jjezra jjezra requested a review from ScottDugas September 10, 2025 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Online Indexer: replace the synchronized runner with a heartbeat
2 participants