Skip to content

add long gaps#313

Draft
nkhristinin wants to merge 1 commit intomainfrom
long-gaps
Draft

add long gaps#313
nkhristinin wants to merge 1 commit intomainfrom
long-gaps

Conversation

@nkhristinin
Copy link
Copy Markdown
Collaborator

No description provided.

nkhristinin added a commit to elastic/kibana that referenced this pull request Feb 27, 2026
…eduler creating excessively large saved objects. (#254788)

## Summary

Fixes OOM crashes and Kibana restarts caused by the gap auto-fill
scheduler creating excessively large saved objects.

### Problem

The backfill client creates one `AdHocRunSO` saved object per rule,
containing a `schedule` array with one entry per rule-interval step
across all gap ranges. There is no upper bound on the number of entries.

For rules with short intervals and long gaps, this array grows to tens
of thousands of entries. Each entry is ~70 bytes serialized, so a single
SO can reach multiple megabytes. When batched into `bulkCreate` requests
(previously chunks of 10, concurrency of 10), the combined payload
exceeds Elasticsearch's `http.max_content_length`, causing:

- `"payload too large"` errors from Elasticsearch `_bulk` requests
- V8 heap exhaustion during JSON serialization of large SO arrays
- Event loop blocking for minutes (preventing task timeout/cancellation)
- Repeated OOM crashes and pod restarts

Additionally, the scheduler previously processed up to 100 rules per
batch (`DEFAULT_RULES_BATCH_SIZE`), building all SOs in memory before
sending any, which amplified peak memory usage.

### Changes

**Cap schedule entries per SO (`calculateSchedule`)**

- Introduce `MAX_SCHEDULE_ENTRIES = 10,000`, bounding each SO to ~700 KB
regardless of rule interval.
- `calculateSchedule` now returns `{ schedule, truncated }`. When
truncated, the `end` field on the SO is derived from the last scheduled
entry (not the original range end), keeping the SO self-consistent.
- The unfilled remainder stays on the gap document and is picked up by
the next scheduler run.

**Reduce bulk request sizes**

- `bulkCreate` chunk size: `10` → `3` (each SO can be up to ~700 KB, so
a chunk of 3 stays well within payload limits).
- `bulkCreate` concurrency: `10` → `2` (reduces peak memory from
concurrent in-flight requests).

**Reduce rules per scheduler batch and gap page size**

- `DEFAULT_RULES_BATCH_SIZE`: `100` → `10`. Fewer rules per iteration
means less memory pressure and more frequent cancellation checkpoints.
- `DEFAULT_GAPS_PER_PAGE`: `5,000` → `DEFAULT_RULES_BATCH_SIZE * 50`
(`500`). Aligned with the smaller batch size to avoid fetching far more
gaps than can be processed in one batch.

**Limit backfill task concurrency**

- Set `maxConcurrency: 3` on the backfill ad-hoc task runner
registration, preventing Task Manager from running too many backfill
tasks in parallel and overwhelming the Kibana process.
- Added `'ad_hoc_run-backfill'` to the Task Manager
`CONCURRENCY_ALLOW_LIST_BY_TASK_TYPE` to enable the concurrency limit.

**Harden error handling**

- Catch blocks in `bulkQueue` now handle non-`Error` objects safely
(`error instanceof Error ? error.message : String(error)`).

### Both code paths covered

The scheduler task and the UI bulk-fill API both converge on
`processGapsBatch` → `scheduleBackfill` → `bulkQueue`, so the schedule
cap and chunk size changes apply to both.

### Performance comparison

Local benchmarks comparing `main` (default) vs this branch, using rules
with 1-minute interval:

| Scenario | Default (main) | This branch |
|---|---|---|
| 100 rules, 1,000 gaps/rule (~1–2 min per gap) | 2s | 22s |
| 100 rules, 1 long gap/rule (50-day duration) | **OOM crash** | 45s |
| 500 rules, 10 gaps/rule | 2s | 23s |

The new branch is slower due to smaller chunk sizes, lower concurrency,
and per-SO schedule caps. But it no longer crashes. The tradeoff is
intentional: safety over throughput.

## What is next

We should consider to not storing schedule array in SO, and dynamically
calculate next run during backfill execution. It allow us to increase
back chunk size, as SO will be much smaller.

### How to test

To test case where Kibana crash we need to have ~100 rules with small
interval, and long gaps.

I created this
[PR](elastic/security-documents-generator#313)
for utility which generate rules with gaps. It allows to have long gaps:

1. 100 rules with 1m interval with 1 gap of 50 days durations
` npm run start -- rules  --rules  100  -d 50  -i"1m" -c`

2. 100 rules with 1m interval and 1000 gap per rule (~1m gap)

`npm run start -- rules  --rules  100  -g 1000  -i"1m" -c`

Then enable auto gap fill scheduler.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Feb 27, 2026
…eduler creating excessively large saved objects. (elastic#254788)

## Summary

Fixes OOM crashes and Kibana restarts caused by the gap auto-fill
scheduler creating excessively large saved objects.

### Problem

The backfill client creates one `AdHocRunSO` saved object per rule,
containing a `schedule` array with one entry per rule-interval step
across all gap ranges. There is no upper bound on the number of entries.

For rules with short intervals and long gaps, this array grows to tens
of thousands of entries. Each entry is ~70 bytes serialized, so a single
SO can reach multiple megabytes. When batched into `bulkCreate` requests
(previously chunks of 10, concurrency of 10), the combined payload
exceeds Elasticsearch's `http.max_content_length`, causing:

- `"payload too large"` errors from Elasticsearch `_bulk` requests
- V8 heap exhaustion during JSON serialization of large SO arrays
- Event loop blocking for minutes (preventing task timeout/cancellation)
- Repeated OOM crashes and pod restarts

Additionally, the scheduler previously processed up to 100 rules per
batch (`DEFAULT_RULES_BATCH_SIZE`), building all SOs in memory before
sending any, which amplified peak memory usage.

### Changes

**Cap schedule entries per SO (`calculateSchedule`)**

- Introduce `MAX_SCHEDULE_ENTRIES = 10,000`, bounding each SO to ~700 KB
regardless of rule interval.
- `calculateSchedule` now returns `{ schedule, truncated }`. When
truncated, the `end` field on the SO is derived from the last scheduled
entry (not the original range end), keeping the SO self-consistent.
- The unfilled remainder stays on the gap document and is picked up by
the next scheduler run.

**Reduce bulk request sizes**

- `bulkCreate` chunk size: `10` → `3` (each SO can be up to ~700 KB, so
a chunk of 3 stays well within payload limits).
- `bulkCreate` concurrency: `10` → `2` (reduces peak memory from
concurrent in-flight requests).

**Reduce rules per scheduler batch and gap page size**

- `DEFAULT_RULES_BATCH_SIZE`: `100` → `10`. Fewer rules per iteration
means less memory pressure and more frequent cancellation checkpoints.
- `DEFAULT_GAPS_PER_PAGE`: `5,000` → `DEFAULT_RULES_BATCH_SIZE * 50`
(`500`). Aligned with the smaller batch size to avoid fetching far more
gaps than can be processed in one batch.

**Limit backfill task concurrency**

- Set `maxConcurrency: 3` on the backfill ad-hoc task runner
registration, preventing Task Manager from running too many backfill
tasks in parallel and overwhelming the Kibana process.
- Added `'ad_hoc_run-backfill'` to the Task Manager
`CONCURRENCY_ALLOW_LIST_BY_TASK_TYPE` to enable the concurrency limit.

**Harden error handling**

- Catch blocks in `bulkQueue` now handle non-`Error` objects safely
(`error instanceof Error ? error.message : String(error)`).

### Both code paths covered

The scheduler task and the UI bulk-fill API both converge on
`processGapsBatch` → `scheduleBackfill` → `bulkQueue`, so the schedule
cap and chunk size changes apply to both.

### Performance comparison

Local benchmarks comparing `main` (default) vs this branch, using rules
with 1-minute interval:

| Scenario | Default (main) | This branch |
|---|---|---|
| 100 rules, 1,000 gaps/rule (~1–2 min per gap) | 2s | 22s |
| 100 rules, 1 long gap/rule (50-day duration) | **OOM crash** | 45s |
| 500 rules, 10 gaps/rule | 2s | 23s |

The new branch is slower due to smaller chunk sizes, lower concurrency,
and per-SO schedule caps. But it no longer crashes. The tradeoff is
intentional: safety over throughput.

## What is next

We should consider to not storing schedule array in SO, and dynamically
calculate next run during backfill execution. It allow us to increase
back chunk size, as SO will be much smaller.

### How to test

To test case where Kibana crash we need to have ~100 rules with small
interval, and long gaps.

I created this
[PR](elastic/security-documents-generator#313)
for utility which generate rules with gaps. It allows to have long gaps:

1. 100 rules with 1m interval with 1 gap of 50 days durations
` npm run start -- rules  --rules  100  -d 50  -i"1m" -c`

2. 100 rules with 1m interval and 1000 gap per rule (~1m gap)

`npm run start -- rules  --rules  100  -g 1000  -i"1m" -c`

Then enable auto gap fill scheduler.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
(cherry picked from commit afc555f)
nkhristinin added a commit to elastic/kibana that referenced this pull request Mar 7, 2026
…re than 90 days" (#256507)

## Fix: gap auto-fill scheduler fails with "Backfill cannot look back
more than 90 days"

### Problem

Suppose we have gaps that are 100 days old.

Backfill scheduling enforces a 90-day validation limit. The gap
auto-fill scheduler fetches all gaps that overlap the `now-90d` range,
so these 100-day-old gaps can still be fetched because their interval
overlaps that window. Their interval is then clamped to 90 days.

Later, when `scheduleBackfill` validates the ranges, it computes its own
`now`, which is slightly later because some processing time has elapsed
since the task started. As a result, `now - startDate` can become
greater than 90 days, and the validation rejects the range with:

```text
Backfill cannot look back more than 90 days
```

### Fix

After parsing `gapFillRange`, clamp `startDate` so it stays at least 5
minutes inside the 90-day lookback window. This gives enough buffer for
processing delays and ensures the clamped ranges remain safely within
the validation limit.

### How to test


I created this
elastic/security-documents-generator#313 for
utility which generate rules with gaps. It allows to have long gaps:

1 rules with 1m interval with 1 gap of  100 days durations
 `npm run start -- rules  --rules  100  -d 100  -i"1m" -c`

In main enable gap auto fill scheduler, and observe that execution is
failed.

In this PR - it should successfully execute the task

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Mar 7, 2026
…re than 90 days" (elastic#256507)

## Fix: gap auto-fill scheduler fails with "Backfill cannot look back
more than 90 days"

### Problem

Suppose we have gaps that are 100 days old.

Backfill scheduling enforces a 90-day validation limit. The gap
auto-fill scheduler fetches all gaps that overlap the `now-90d` range,
so these 100-day-old gaps can still be fetched because their interval
overlaps that window. Their interval is then clamped to 90 days.

Later, when `scheduleBackfill` validates the ranges, it computes its own
`now`, which is slightly later because some processing time has elapsed
since the task started. As a result, `now - startDate` can become
greater than 90 days, and the validation rejects the range with:

```text
Backfill cannot look back more than 90 days
```

### Fix

After parsing `gapFillRange`, clamp `startDate` so it stays at least 5
minutes inside the 90-day lookback window. This gives enough buffer for
processing delays and ensures the clamped ranges remain safely within
the validation limit.

### How to test

I created this
elastic/security-documents-generator#313 for
utility which generate rules with gaps. It allows to have long gaps:

1 rules with 1m interval with 1 gap of  100 days durations
 `npm run start -- rules  --rules  100  -d 100  -i"1m" -c`

In main enable gap auto fill scheduler, and observe that execution is
failed.

In this PR - it should successfully execute the task

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
(cherry picked from commit c929b3e)
kibanamachine added a commit to elastic/kibana that referenced this pull request Mar 7, 2026
…ack more than 90 days" (#256507) (#256578)

# Backport

This will backport the following commits from `main` to `9.3`:
- [Fix: gap auto-fill scheduler fails with "Backfill cannot look back
more than 90 days"
(#256507)](#256507)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Khristinin
Nikita","email":"nikita.khristinin@elastic.co"},"sourceCommit":{"committedDate":"2026-03-07T07:04:24Z","message":"Fix:
gap auto-fill scheduler fails with \"Backfill cannot look back more than
90 days\" (#256507)\n\n## Fix: gap auto-fill scheduler fails with
\"Backfill cannot look back\nmore than 90 days\"\n\n###
Problem\n\nSuppose we have gaps that are 100 days old.\n\nBackfill
scheduling enforces a 90-day validation limit. The gap\nauto-fill
scheduler fetches all gaps that overlap the `now-90d` range,\nso these
100-day-old gaps can still be fetched because their interval\noverlaps
that window. Their interval is then clamped to 90 days.\n\nLater, when
`scheduleBackfill` validates the ranges, it computes its own\n`now`,
which is slightly later because some processing time has elapsed\nsince
the task started. As a result, `now - startDate` can become\ngreater
than 90 days, and the validation rejects the range
with:\n\n```text\nBackfill cannot look back more than 90
days\n```\n\n### Fix\n\nAfter parsing `gapFillRange`, clamp `startDate`
so it stays at least 5\nminutes inside the 90-day lookback window. This
gives enough buffer for\nprocessing delays and ensures the clamped
ranges remain safely within\nthe validation limit.\n\n### How to
test\n\n\nI created
this\nhttps://github.com/elastic/security-documents-generator/pull/313
for\nutility which generate rules with gaps. It allows to have long
gaps:\n\n1 rules with 1m interval with 1 gap of 100 days durations\n
`npm run start -- rules --rules 100 -d 100 -i\"1m\" -c`\n\nIn main
enable gap auto fill scheduler, and observe that execution
is\nfailed.\n\nIn this PR - it should successfully execute the
task\n\nCo-authored-by: Elastic Machine
<elasticmachine@users.noreply.github.com>","sha":"c929b3e7620e1d71c6de2ac444f979b48f41aad1","branchLabelMapping":{"^v9.4.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","backport:version","v9.3.0","v9.4.0"],"title":"Fix:
gap auto-fill scheduler fails with \"Backfill cannot look back more than
90
days\"","number":256507,"url":"https://github.com/elastic/kibana/pull/256507","mergeCommit":{"message":"Fix:
gap auto-fill scheduler fails with \"Backfill cannot look back more than
90 days\" (#256507)\n\n## Fix: gap auto-fill scheduler fails with
\"Backfill cannot look back\nmore than 90 days\"\n\n###
Problem\n\nSuppose we have gaps that are 100 days old.\n\nBackfill
scheduling enforces a 90-day validation limit. The gap\nauto-fill
scheduler fetches all gaps that overlap the `now-90d` range,\nso these
100-day-old gaps can still be fetched because their interval\noverlaps
that window. Their interval is then clamped to 90 days.\n\nLater, when
`scheduleBackfill` validates the ranges, it computes its own\n`now`,
which is slightly later because some processing time has elapsed\nsince
the task started. As a result, `now - startDate` can become\ngreater
than 90 days, and the validation rejects the range
with:\n\n```text\nBackfill cannot look back more than 90
days\n```\n\n### Fix\n\nAfter parsing `gapFillRange`, clamp `startDate`
so it stays at least 5\nminutes inside the 90-day lookback window. This
gives enough buffer for\nprocessing delays and ensures the clamped
ranges remain safely within\nthe validation limit.\n\n### How to
test\n\n\nI created
this\nhttps://github.com/elastic/security-documents-generator/pull/313
for\nutility which generate rules with gaps. It allows to have long
gaps:\n\n1 rules with 1m interval with 1 gap of 100 days durations\n
`npm run start -- rules --rules 100 -d 100 -i\"1m\" -c`\n\nIn main
enable gap auto fill scheduler, and observe that execution
is\nfailed.\n\nIn this PR - it should successfully execute the
task\n\nCo-authored-by: Elastic Machine
<elasticmachine@users.noreply.github.com>","sha":"c929b3e7620e1d71c6de2ac444f979b48f41aad1"}},"sourceBranch":"main","suggestedTargetBranches":["9.3"],"targetPullRequestStates":[{"branch":"9.3","label":"v9.3.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v9.4.0","branchLabelMappingKey":"^v9.4.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/256507","number":256507,"mergeCommit":{"message":"Fix:
gap auto-fill scheduler fails with \"Backfill cannot look back more than
90 days\" (#256507)\n\n## Fix: gap auto-fill scheduler fails with
\"Backfill cannot look back\nmore than 90 days\"\n\n###
Problem\n\nSuppose we have gaps that are 100 days old.\n\nBackfill
scheduling enforces a 90-day validation limit. The gap\nauto-fill
scheduler fetches all gaps that overlap the `now-90d` range,\nso these
100-day-old gaps can still be fetched because their interval\noverlaps
that window. Their interval is then clamped to 90 days.\n\nLater, when
`scheduleBackfill` validates the ranges, it computes its own\n`now`,
which is slightly later because some processing time has elapsed\nsince
the task started. As a result, `now - startDate` can become\ngreater
than 90 days, and the validation rejects the range
with:\n\n```text\nBackfill cannot look back more than 90
days\n```\n\n### Fix\n\nAfter parsing `gapFillRange`, clamp `startDate`
so it stays at least 5\nminutes inside the 90-day lookback window. This
gives enough buffer for\nprocessing delays and ensures the clamped
ranges remain safely within\nthe validation limit.\n\n### How to
test\n\n\nI created
this\nhttps://github.com/elastic/security-documents-generator/pull/313
for\nutility which generate rules with gaps. It allows to have long
gaps:\n\n1 rules with 1m interval with 1 gap of 100 days durations\n
`npm run start -- rules --rules 100 -d 100 -i\"1m\" -c`\n\nIn main
enable gap auto fill scheduler, and observe that execution
is\nfailed.\n\nIn this PR - it should successfully execute the
task\n\nCo-authored-by: Elastic Machine
<elasticmachine@users.noreply.github.com>","sha":"c929b3e7620e1d71c6de2ac444f979b48f41aad1"}}]}]
BACKPORT-->

Co-authored-by: Khristinin Nikita <nikita.khristinin@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
kapral18 pushed a commit to kapral18/kibana that referenced this pull request Mar 9, 2026
…re than 90 days" (elastic#256507)

## Fix: gap auto-fill scheduler fails with "Backfill cannot look back
more than 90 days"

### Problem

Suppose we have gaps that are 100 days old.

Backfill scheduling enforces a 90-day validation limit. The gap
auto-fill scheduler fetches all gaps that overlap the `now-90d` range,
so these 100-day-old gaps can still be fetched because their interval
overlaps that window. Their interval is then clamped to 90 days.

Later, when `scheduleBackfill` validates the ranges, it computes its own
`now`, which is slightly later because some processing time has elapsed
since the task started. As a result, `now - startDate` can become
greater than 90 days, and the validation rejects the range with:

```text
Backfill cannot look back more than 90 days
```

### Fix

After parsing `gapFillRange`, clamp `startDate` so it stays at least 5
minutes inside the 90-day lookback window. This gives enough buffer for
processing delays and ensures the clamped ranges remain safely within
the validation limit.

### How to test


I created this
elastic/security-documents-generator#313 for
utility which generate rules with gaps. It allows to have long gaps:

1 rules with 1m interval with 1 gap of  100 days durations
 `npm run start -- rules  --rules  100  -d 100  -i"1m" -c`

In main enable gap auto fill scheduler, and observe that execution is
failed.

In this PR - it should successfully execute the task

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
qn895 pushed a commit to qn895/kibana that referenced this pull request Mar 11, 2026
…eduler creating excessively large saved objects. (elastic#254788)

## Summary

Fixes OOM crashes and Kibana restarts caused by the gap auto-fill
scheduler creating excessively large saved objects.

### Problem

The backfill client creates one `AdHocRunSO` saved object per rule,
containing a `schedule` array with one entry per rule-interval step
across all gap ranges. There is no upper bound on the number of entries.

For rules with short intervals and long gaps, this array grows to tens
of thousands of entries. Each entry is ~70 bytes serialized, so a single
SO can reach multiple megabytes. When batched into `bulkCreate` requests
(previously chunks of 10, concurrency of 10), the combined payload
exceeds Elasticsearch's `http.max_content_length`, causing:

- `"payload too large"` errors from Elasticsearch `_bulk` requests
- V8 heap exhaustion during JSON serialization of large SO arrays
- Event loop blocking for minutes (preventing task timeout/cancellation)
- Repeated OOM crashes and pod restarts

Additionally, the scheduler previously processed up to 100 rules per
batch (`DEFAULT_RULES_BATCH_SIZE`), building all SOs in memory before
sending any, which amplified peak memory usage.

### Changes

**Cap schedule entries per SO (`calculateSchedule`)**

- Introduce `MAX_SCHEDULE_ENTRIES = 10,000`, bounding each SO to ~700 KB
regardless of rule interval.
- `calculateSchedule` now returns `{ schedule, truncated }`. When
truncated, the `end` field on the SO is derived from the last scheduled
entry (not the original range end), keeping the SO self-consistent.
- The unfilled remainder stays on the gap document and is picked up by
the next scheduler run.

**Reduce bulk request sizes**

- `bulkCreate` chunk size: `10` → `3` (each SO can be up to ~700 KB, so
a chunk of 3 stays well within payload limits).
- `bulkCreate` concurrency: `10` → `2` (reduces peak memory from
concurrent in-flight requests).

**Reduce rules per scheduler batch and gap page size**

- `DEFAULT_RULES_BATCH_SIZE`: `100` → `10`. Fewer rules per iteration
means less memory pressure and more frequent cancellation checkpoints.
- `DEFAULT_GAPS_PER_PAGE`: `5,000` → `DEFAULT_RULES_BATCH_SIZE * 50`
(`500`). Aligned with the smaller batch size to avoid fetching far more
gaps than can be processed in one batch.

**Limit backfill task concurrency**

- Set `maxConcurrency: 3` on the backfill ad-hoc task runner
registration, preventing Task Manager from running too many backfill
tasks in parallel and overwhelming the Kibana process.
- Added `'ad_hoc_run-backfill'` to the Task Manager
`CONCURRENCY_ALLOW_LIST_BY_TASK_TYPE` to enable the concurrency limit.

**Harden error handling**

- Catch blocks in `bulkQueue` now handle non-`Error` objects safely
(`error instanceof Error ? error.message : String(error)`).

### Both code paths covered

The scheduler task and the UI bulk-fill API both converge on
`processGapsBatch` → `scheduleBackfill` → `bulkQueue`, so the schedule
cap and chunk size changes apply to both.

### Performance comparison

Local benchmarks comparing `main` (default) vs this branch, using rules
with 1-minute interval:

| Scenario | Default (main) | This branch |
|---|---|---|
| 100 rules, 1,000 gaps/rule (~1–2 min per gap) | 2s | 22s |
| 100 rules, 1 long gap/rule (50-day duration) | **OOM crash** | 45s |
| 500 rules, 10 gaps/rule | 2s | 23s |

The new branch is slower due to smaller chunk sizes, lower concurrency,
and per-SO schedule caps. But it no longer crashes. The tradeoff is
intentional: safety over throughput.

## What is next

We should consider to not storing schedule array in SO, and dynamically
calculate next run during backfill execution. It allow us to increase
back chunk size, as SO will be much smaller.

### How to test

To test case where Kibana crash we need to have ~100 rules with small
interval, and long gaps.

I created this
[PR](elastic/security-documents-generator#313)
for utility which generate rules with gaps. It allows to have long gaps:

1. 100 rules with 1m interval with 1 gap of 50 days durations
` npm run start -- rules  --rules  100  -d 50  -i"1m" -c`

2. 100 rules with 1m interval and 1000 gap per rule (~1m gap)

`npm run start -- rules  --rules  100  -g 1000  -i"1m" -c`

Then enable auto gap fill scheduler.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
qn895 pushed a commit to qn895/kibana that referenced this pull request Mar 11, 2026
…re than 90 days" (elastic#256507)

## Fix: gap auto-fill scheduler fails with "Backfill cannot look back
more than 90 days"

### Problem

Suppose we have gaps that are 100 days old.

Backfill scheduling enforces a 90-day validation limit. The gap
auto-fill scheduler fetches all gaps that overlap the `now-90d` range,
so these 100-day-old gaps can still be fetched because their interval
overlaps that window. Their interval is then clamped to 90 days.

Later, when `scheduleBackfill` validates the ranges, it computes its own
`now`, which is slightly later because some processing time has elapsed
since the task started. As a result, `now - startDate` can become
greater than 90 days, and the validation rejects the range with:

```text
Backfill cannot look back more than 90 days
```

### Fix

After parsing `gapFillRange`, clamp `startDate` so it stays at least 5
minutes inside the 90-day lookback window. This gives enough buffer for
processing delays and ensures the clamped ranges remain safely within
the validation limit.

### How to test


I created this
elastic/security-documents-generator#313 for
utility which generate rules with gaps. It allows to have long gaps:

1 rules with 1m interval with 1 gap of  100 days durations
 `npm run start -- rules  --rules  100  -d 100  -i"1m" -c`

In main enable gap auto fill scheduler, and observe that execution is
failed.

In this PR - it should successfully execute the task

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant