Refactor `apply_block`

`apply_block` is a bit of a mess at the moment. It works (probably), but its a huge function, instrumentation is missing, and things happen both concurrently and then sequentially again.

As a short summary:

It performs some checks, spawns some tasks to store the raw block data and update the sqlite database, and then carries on with updating the forests inline. Since we consider the sqlite database as the master, that task waits for confirmation that other portions have succeeded before it commits the transaction.

There is further incidental complexity with synchronizing the in-memory structures with the sqlite database commit, while minimizing the time we hold the write locks. This is a sort of separate problem, and I'll have to dig in further to really understand how we can solve this more coherently.

## WIP

This is still undercooked; and I need to do some additional thinking/digging but I want to write it down so long for comments and feedback.

## Proposal

This is inspired by the deferred proving PR, where we now have multiple database writers. It would be great to follow the rust model of single writer only, this simplifies things ito understanding and consistency. Its also a concern e.g. we often have issues with block writes taking 10s+ due to bugs, or poor performance, missing index etc, and this would cause the proof writers to error out, causing a crash (?) most likely.

I think we can solve this, and the complexity of `apply_block` by slightly shifting what we consider the `master`. This can be done in several phases, improving ergonomics and reducing footguns as we go.

### Core principle

Introduce a new `chain_tip: watch::Receiver<BlockHeader>` to the store state, which specifies the current chain tip, regardless of what the database contains.

`apply_block` changes slightly, it now spawns tasks for _every_ update part, and then waits for them all to complete. If all complete successfully, then the chain tip is updated. This should simplify each task, and also the instrumentation of them since they can't be missed now.

On startup we'll need to find the latest _complete_ block, i.e. check databases, block store, and ignore the latest block if its only partial (i.e. `apply_block` failed to complete). This is used to init the `chain_tip`.

### Proofs & single writer

The idea is that `apply_block` should perform the database updates that are done by the proof co-ordinator. At present, this means deleting the proof inputs up until the proven tip.

We use a similar mechanism to indicate the latest proven chain tip. The proof co-ordinator updates the `proven_tip: watch::Receiver<BlockNumber>` whenever it moves, without deleting the proof inputs. `apply_block` reads the latest proven tip and performs the required database deletions as part of its database update.

### Database readers

Database readers should use these chain tip watch channels as a "snapshot" to have a database transaction-like property with the state. This can be forced by updating open-bounded sql queries (i.e. `block_to` is always capped by the chain tip).

We can further make things more ergonomic by making state opaque, e.g. 

```rust
struct ReadOnlyStore {
   ...
}

impl ReadOnlyStore {
    async fn begin_transaction(&self) -> ReadOnlyTransaction {...}
}

struct ReadOnlyTransaction {
    chain_tip: BlockHeader,
    proven_tip: BlockNumber,
    tx_sql: SqliteTransaction,
    forests: Arc<Forest>,
    ...
}
```

### Unknowns

Something that is going to be a concern is in-memory state and pruning. Ideally we would enable concurrent reads and a writer, similar to how sqlite does the WAL file. I'm unsure how things currently stand, but conceptually we could change the in-memory structs to be reference counted by block. This would mean a reader can hold the block's data and once the last reader that snapshotted that block ends, the data is automatically cleaned up.

We could change the `chain_tip` watch channel, to include more than just the header. Readers then snapshot this, while the writer simply updates the snapshot pointer.

However I think unfortunately, the current implementation of things doesn't allow for this. I need to investigate further.

```rust
chain_tip: watch::Receiver<Arc<(BlockHeader, History)>>

/// Holds the last `N` in-memory historial data.
///
/// Each block is also arc'd so it can be shared and re-used.
#[derive(Clone)]
struct History {
   blocks: VecDequeue<Arc<Block>>,
}

impl History {
    fn push(&mut self, block: Arc<Block>) {
        self.blocks.push_back(block);

        if self.blocks.len() > N {
            self.blocks.pop_front();
        }
    }
}

fn apply_block(block: Block, mut tx_chain_tip: watch::Sender<..>) {
    // ....
    
    // Get the current chain tip and update it.
    let mut history = tx_chain_tip.subscribe().borrow().history.clone();
    
    let header = block.header();
    history.push(Arc::new(Block));

    // This doesn't impact existing reader snapshots, only new ones will see this value.
    tx_chain_tip.send((header, history));
}
```

Relates to #1840, #1539, #1541, #1592, #1544, #1537 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `apply_block` #1853

WIP

Proposal

Core principle

Proofs & single writer

Database readers

Unknowns

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor apply_block #1853

Description

WIP

Proposal

Core principle

Proofs & single writer

Database readers

Unknowns

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Refactor `apply_block` #1853