feat: sanity checker for network config #279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

AlexanderLukin wants to merge 15 commits into develop from feat/chain-sanity-checker

Contributor

AlexanderLukin commented Jun 6, 2024

Implement a sanity checker that checks that the information stored in the DB corresponds the chain for which the app is run. If the app is run for one chain, and the DB contains data for another chain, the sanity checker prevents the app from start and throws the error.


          feat: sanity checker for network config

774e97f

Implement a sanity checker that checks that the information stored in
the DB corresponds the chain for which the app is run. If the app is run
for one chain, and the DB contains data for another chain, the sanity
checker prevents the app from start and throws the error.

AlexanderLukin requested a review from Amuhar

June 6, 2024 08:49

Amuhar reviewed

View reviewed changes

src/network-validation/network-validation.constants.ts Outdated

		@@ -0,0 +1,4 @@
		export const MODULE_ADDRESSES_FOR_CHAINS = {

Contributor

Amuhar Jun 6, 2024 •

edited

Loading

Not clear which module is it . also we use contract addresses from lido-nestjs-modules

Contributor Author

AlexanderLukin Jun 6, 2024

Agree. Will rename it to CURATED_MODULE_ADDRESSES_FOR_CHAINS.

Contributor Author

AlexanderLukin Jun 18, 2024

I decided to get rid of this custom app-level constant and use the REGISTRY_CONTRACT_ADDRESSES constant from the "@lido-nestjs/contracts".

Amuhar reviewed

View reviewed changes

src/network-validation/network-validation.service.ts Outdated

+                  const [dbKey, dbCuratedModule, dbOperator] = await Promise.all([
+                    await this.keyStorageService.find({}, { limit: 1 }),
+                    await this.moduleStorageService.findOneById(1),

Contributor

Amuhar Jun 6, 2024

I think it is better to use constant

Contributor Author

AlexanderLukin Jun 18, 2024

As we agreed, I renamed the findOneById method from the SRModuleStorageService to findOneByModuleId. With the new method name, I don't see any problems with passing just "1" as the argument of this method. I think now it should be obvious that it is a module ID.

Amuhar reviewed

View reviewed changes

src/network-validation/network-validation.service.ts

		@@ -0,0 +1,84 @@
		import { MikroORM, UseRequestContext } from '@mikro-orm/core';

Contributor

Amuhar Jun 6, 2024

lets write tests on this service ?

Contributor Author

AlexanderLukin Jun 6, 2024

OK, good idea. Will do.

Contributor Author

AlexanderLukin Jun 18, 2024

I implemented unit tests for the NetworkValidation service.

Amuhar reviewed

View reviewed changes

src/network-validation/network-validation.service.ts

+                ) {}
+                @UseRequestContext()
+                public async validate(): Promise<void> {

Contributor

Amuhar Jun 6, 2024

let's break the method into smaller methods

Contributor Author

AlexanderLukin Jun 18, 2024

I moved the logic of checking that chin ID from the env variables, EL chain ID, and CL chain ID match each other to the separate private method. Despite this change, I feel that it doesn't make much sense to split the logic of the validate method into smaller methods and functions. I think that the logic of this method is holistic and it would be harder to understand it if it is artificially split into separate methods and functions.
If you have any particular ideas about what parts of this code should be extracted to a separate method, let's discuss.

Amuhar reviewed

View reviewed changes

src/network-validation/network-validation.service.ts Outdated

+                  if (appInfo != null) {
+                    if (appInfo.chainId !== configChainId || appInfo.locatorAddress !== this.contract.address) {
+                      throw new Error(

Contributor

Amuhar Jun 6, 2024

lets customize this error ?

Contributor Author

AlexanderLukin Jun 18, 2024

Yes, I agree. I created two new error types: ChainMismatchError and InconsistentDataInDBErrorTypes. The first one is thrown if a chain ID from config, EL chain ID, and CL chain ID don't match each other. The second one is thrown if the chain ID for which the app is run doesn't match the info stored in the DB or data in the DB are corrupted.

Amuhar reviewed

View reviewed changes

src/network-validation/network-validation.service.ts Outdated

+                    }
+                  }
+                  const [dbKey, dbCuratedModule, dbOperator] = await Promise.all([

Contributor

Amuhar Jun 6, 2024 •

edited

Loading

dbKeys, dbOperators?

Contributor Author

AlexanderLukin Jun 18, 2024

Renamed, thansk.

Amuhar reviewed

View reviewed changes

src/network-validation/network-validation.service.ts Outdated

+              export class NetworkValidationService {
+                constructor(
+                  protected readonly orm: MikroORM,
+                  @Inject(LIDO_LOCATOR_CONTRACT_TOKEN) protected readonly contract: LidoLocator,

Contributor

Amuhar Jun 6, 2024

lets rename contract to locatorContract ?
ALso we have LidoLocatorService , maybe we should get address from there

Contributor Author

AlexanderLukin Jun 18, 2024

I renamed the contract to locatorContract.

As far as using the LidoLocatorService instead of the LidoLocator from "@lido-nestjs/contracts". It has only one getStakingRouter method to get the staking router address. And this method internally logs the contract.address from LidoLocator as the locator address.

I'm not sure, do you think that it is better to use the getStakingRouter method to get the locator address instead of the contract.address? If yes, why?

Amuhar reviewed

View reviewed changes

src/network-validation/network-validation.service.ts

+                    return;
+                  }
+                  if (dbCuratedModule == null) {

Contributor

Amuhar Jun 6, 2024

didn't understand why we check only one table

Contributor Author

AlexanderLukin Jun 6, 2024

As we didn't get into this if
if (dbKey.length === 0 && dbCuratedModule == null && dbOperator.length === 0)
it means the DB has non-empty keys, modules, or operators table. If the dbCuratedModule turns out null, the later code where we get dbCuratedModule.stakingModuleAddress will fail with an error. To prevent this we check that the dbCuratedModule is not null before getting the stakingModuleAddress key of this object.

I agree that having only the check for the dbCuratedModule emptiness and not checking dbKey and dbOperator emptiness may be confusing. Will add checkings of non-emptiness for these variables as well and throw the specific error in each of these cases.

Contributor Author

AlexanderLukin Jun 18, 2024

I've added two extra checks that keys and operators tables are not empty.

Amuhar reviewed

View reviewed changes

src/network-validation/network-validation.service.ts

+                    );
+                  }
+                  this.logger.log('DB is not empty and chain info is not found in DB, write chain info into DB');

Contributor

Amuhar Jun 6, 2024

Let's write a comment explaining in what cases this situation is possible

Contributor Author

AlexanderLukin Jun 18, 2024

OK, I agree. I've added more explanation notes to each important case of the validate method.

AlexanderLukin and others added 14 commits

June 8, 2024 00:46


          refactor: rename MODULE_ADDRESSES_FOR_CHAINS const

69af272


          refactor: rename findOneById method

f6051f0

Rename the `findOneById` method of the `SRModuleStorageService` to
`findOneByModuleId`.


          chore: unit tests for sanity checker service

23c807b


          refactor: replace constants with module addresses

ef7d2d3

Replace the internal `CURATED_MODULE_ADDRESSES_FOR_CHAINS` constant with
curated module addresses with the appropriate
`REGISTRY_CONTRACT_ADDRESSES` constant from `@lido-nestjs` repo.


          refactor: fix lint errors in tests

ed54f90

Remove unnecessary `consensusProviderService` variable and fix lint
errors in unit tests for `NetworkValidation` service.


          refactor: move network ID checking

4a6c3ca

Move checking that the network ID specified in .env config matches the
EL chain ID one level up in the `validate` method of the
`NetworkValidation` service.


          refactor: new checkChainIdMismatch method

529f2f8

Move checking that chain ID from the config, EL chain ID and CL chain ID
match each other to the separate `checkChainIdMismatch` private method
of the `NetworkValidation` service.


          refactor: new specific error types

407dabe

Add two new custom error types (`ChainMismatchError` and
`InconsistentDataInDBError`) with properties that help better indicate
specific of the error. Update test cases to make sure that correct error
types with correct properties are returned in each test case.


          refactor: rename symbols

7016a00

Rename `dbKey` local constant to `dbKeys`;
Rename `dbOperator` local constant to `dbOperators`.


          refactor: rename contract to locatorContract

9a08698


          chore: add new specific error types

a194a83

Add new specific error types for cases when keys or operators table is
empty. Change error messages to better reflect the essence of the error.
Add new tests for testing these new error types.


          fix: test for chain and EL IDs mismatch

8c04d2f

Fix test that tests the case when chain ID and EL ID don't match each
other. This test should not depend on the value of the
`VALIDATOR_REGISTRY_ENABLE` env variable.


          docs: add notes with code logic explanations

8bf17bf


          feat: added envs to test workflow's step

be0b2d6

Ziars requested a review from a team as a code owner

June 18, 2024 12:23

gatleas17 approved these changes

View reviewed changes

Contributor

Amuhar commented Jul 9, 2024 •

edited

Loading

I think this solution is mostly good and valid. The code is well-structured and isolated. Changes in the sanity checker will not influence other parts of the code. It contains a lot of checks, but they are straightforward.

It looks valid to check for empty keys, operators, and module tables. Currently, we write data in the table in one transaction, but in future implementations, this can be changed, and checking only the ElMeta table will not be sufficient.

It is important to note that any of the tables can't be empty on Holesky or mainnet. But if we run Lido from scratch on a new network or on devnet, in the beginning, the keys API may have data only in the ElMeta table. In this case, if we stop the Keys API and run it again, the first check of the sanity checker will not be correct as it doesn't check the ElMeta table. It is proposed to add the ElMeta check too, or prioritize the AppInfo values check (will it be valid?). If we prioritize the AppInfo table check during the second run, we can check that the chainId and locator address were not changed.

Is it possible to have an empty AppInfo table and empty keys, operators, modules tables, and a non-empty ElMeta? Only if we run an old release for new Lido deployment.

Also, we need to pay attention to the test coverage of the sanity checker. Are all possible cases well covered?

Why do we need such a difficult solution? If the database already has data, we somehow need to check if it is consistent with the chain we want to run the Keys API. For this case, we check the moduleAddress in the 'key' table.
Also, this solution checks consistency for the case where the database was partially cleaned accidentally.

Contributor Author

AlexanderLukin commented Jul 10, 2024

Hey-hey! Thank you for your strong attention to this sanity checker and such a detailed and comprehensive review! I believe it's very important for the good product quality.

I agree that in case of the scratch deployment of the Lido protocol on a devnet, it is possible that the keys, operators, and modules table might be empty, but the ElMeta table might not. In this case, the sanity checker may not work correctly and this case should be covered.

I also agree with your point regarding the test coverage. Currently, all cases of the sanity checker code are covered by unit tests. But new implementation with the additional ElMeta emptiness checking must also be covered by new tests.

Also, this solution checks consistency for the case where the database was partially cleaned accidentally.

Just to be clear here. This sanity checker is not dedicated to handling cases when the DB is corrupted by side interventions (i. e. when other applications and processes, not Keys API, corrupt data in the DB somehow). You're correct that the sanity checker in its current form is able to prevent some cases of such corruption, but there are way more edge cases, that are not covered (and the code is not designed to cover them).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet