Skip to content

Evidence of using static thresholds #18

@ruiming-lu

Description

@ruiming-lu

This is NOT about software testing. It is a collection of evidence on how systems use static rules to handle slow faults in the master branch.

HBase

CRDB (slow/stall storage engine (disk))

Important files:

Logic:

  • diskHealthCheckInterval := 5 * time.Second (see code)
  • Pebble checks if last writes exceed diskSlowThreshold (if-condition). If so, return DiskSlowInfo.
  • The diskSlowThreshold is passed explicitly as 5s (cannot be altered by users!) in this function call; the function (WithDiskHealthChecks is defined here)
  • For all DiskSlowInfo (triggered by makeMetricEtcEventListener), fatal the process (fatalOnExceeded, default True) if disk slow duration is above maxSyncDuration (default: 20s). (see the if-condition, maxSyncDurationDefault=20s, fatalOnExceeded=True)
  • Otherwise (between 5s and 20s) trigger an ERROR log (log.Errorf(ctx, "disk stall detected: %s", info))

CRDB (slow logging)

Important files:

Logic:

The common logic

At least the logic of handling slow faults in HBase and CRDB is very similar. Need to further organize.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions