Skip to content

fetch_and_import_range_from_sequence log exceptions instead of raise Fixes #1234 Raising the exception will kill all Tasks, the sequence and the fetcher because of the linking from Task.async_stream. This leads to when 1 Task times out in a DB transaction, they all are affected, leading to slowdowns when the sequence has to be rebuilt when the fetcher restarts.#5

Open
ajit2903 wants to merge 3 commits intomasterfrom
1129-2

Conversation

@ajit2903
Copy link
Owner

@ajit2903 ajit2903 commented Sep 17, 2025

GitHub keywords to close any associated issues

Motivation

Why we should merge these changes. If using GitHub keywords to close issues, this is optional as the motivation can be read on the issue page.

Changelog

Enhancements

Things you added that don't break anything. Regression tests for Bug Fixes count as Enhancements.

Bug Fixes

Things you changed that fix bugs. If it fixes a bug but, in so doing, adds a new requirement, removes code, or requires a database reset and reindex, the breaking part of the change should also be added to "Incompatible Changes" below.

Incompatible Changes

Things you broke while doing Enhancements and Bug Fixes. Breaking changes include (1) adding new requirements and (2) removing code. Renaming counts as (2) because a rename is a removal followed by an add.

Upgrading

If you have any Incompatible Changes in the above Changelog, outline how users of prior versions can upgrade once this PR lands or when reviewers are testing locally. A common upgrading step is "Database reset and re-index required".

Checklist for your Pull Request (PR)

  • I verified this PR does not break any public APIs, contracts, or interfaces that external consumers depend on.
  • If I added new functionality, I added tests covering it.
  • If I fixed a bug, I added a regression test to prevent the bug from silently reappearing again.
  • I updated documentation if needed:
  • If I modified API endpoints, I updated the Swagger/OpenAPI schemas accordingly and checked that schemas are asserted in tests.
  • If I added new DB indices, I checked, that they are not redundant, with PGHero or other tools.
  • If I added/removed chain type, I modified the Github CI matrix and PR labels accordingly.

Ecto 3.0 gets us easy access to partitions and the windowing function
`lag`.  `lag` is able to keep track of the previous row without a self
join.  Importantly, the new query with `lag` and no more
`generate_series` completes in ~2.5 seconds on Ethereum mainnet
using `:timer.tc(Explorer.Chain, :missing_block_number_ranges, [6889126..0])`
on ETH Mainnet Test DB 40 while the old version doesn't return after 5+
minutes.

I've heavily commented the new version of the query since this took
awhile to figure out and most importantly finding the proper terminology:
"gaps and islands" was key to find the starting query using `lag`.

The query in SQL is as follows:

```sql
SELECT island.last_number + 1,
       island.next_number - 1
FROM (
  SELECT lag(land.number) OVER "w" AS last_number,
         land.number AS next_number
  FROM (
    SELECT blocks.number AS number
    FROM blocks
    WHERE blocks.consensus = TRUE
    UNION ALL (
      SELECT LEAST(min_blocks.number, before_range_min)
      FROM (
        SELECT min(blocks.number) AS number
        FROM blocks
        WHERE blocks.consensus = TRUE)
      ) AS min_blocks
      WHERE min_blocks.number IS NULL OR
            min_blocks.number != before_range_min
    )
    UNION ALL (
      SELECT GREATEST(max_blocks.number, after_range_max)
      FROM (
        SELECT max(blocks.number) AS number
        FROM blocks
        WHERE blocks.consensus = TRUE
       ) AS max_blocks
       WHERE max_blocks.number IS NULL OR
             max_blocks.number != after_range_max
   ) AS land
   WINDOW "w" AS (ORDER BY land.number)
) AS islands
WHERE islands.last_number != islands.next_number - 1
ORDER BY islands.last_number
```
Fixes blockscout#1232

`{:error, :timeout}` should have been caught, but additionally, it
should never have been allowed to propagate out unassigned to a step as
it is not possible to track the timeout to a specific fetch.
Fixes blockscout#1234

Raising the exception will kill all Tasks, the sequence and the fetcher
because of the linking from `Task.async_stream`.  This leads to when 1
Task times out in a DB transaction, they all are affected, leading to
slowdowns when the sequence has to be rebuilt when the fetcher restarts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants