Skip to content

BP-63: Force Reschedule Auditor tasks #4025

@wenbingshen

Description

@wenbingshen

BP

This is the master ticket for tracking BP-63 :
Proposal PR - #3964

Motivation

Currently, the Bookie can reschedule Auditor check tasks in several ways, excluding the auditorBookieTask as it provides a separate mechanism to trigger task reexecution. This BP specifically discusses AuditorCheckAllLedgersTask/AuditorPlacementPolicyCheckTask/AuditorReplicasCheckTask:

1: The Bookie provides three execution times based on ZooKeeper, checkallledgersctime/placementpolicycheckctime/replicascheckctime. By updating these execution times, we can dynamically adjust the execution frequency of auditor tasks, but it requires restarting the Auditor process or reopening the Auditor election to trigger task execution.

2: By using the ForceAuditorChecksCmd tool, which is still based on the underlying logic of the first point, restarting the Auditor or performing an election is also necessary to trigger task execution.

3: The Decommission and RecoveryBookie tools tend to focus on executing recovery logic and only check and recover a specific subset of Bookie services.

The above methods are complex and have poor stability when rescheduling the Auditor check tasks in a cluster.

Proposal

Therefore, I propose further optimizing the rescheduling of Auditor tasks.

1: The Auditor monitors the persistent znode path /ZK_LEDGERS_ROOT_PATH/underreplication/scheduleAuditor.
2: Users modify the task ctime using the ForceAuditorChecksCmd tool and forcefully create the above znode path using the force parameter.
3: The Auditor creates callbacks through scheduleAuditor to reschedule the aforementioned three tasks.
4: After the Auditor completes rescheduling the tasks, the scheduleAuditor node is deleted.
5: When the Auditor starts, it deletes the old scheduleAuditor node to avoid logical confusion.

This way, we can trigger the scheduling and execution of Auditor tasks through an online interface without relying on service restart or re-election.

Compatibility, Deprecation, and Migration Plan

There are no compatibility issues. This BP introduces a new trigger flag that does not affect the original logic and does not involve any changes to other existing public APIs. There is no deprecation or migration plan.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions