-
Notifications
You must be signed in to change notification settings - Fork 19
feat: sanity checker for network config #279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Implement a sanity checker that checks that the information stored in the DB corresponds the chain for which the app is run. If the app is run for one chain, and the DB contains data for another chain, the sanity checker prevents the app from start and throws the error.
| @@ -0,0 +1,4 @@ | |||
| export const MODULE_ADDRESSES_FOR_CHAINS = { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear which module is it . also we use contract addresses from lido-nestjs-modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Will rename it to CURATED_MODULE_ADDRESSES_FOR_CHAINS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to get rid of this custom app-level constant and use the REGISTRY_CONTRACT_ADDRESSES constant from the "@lido-nestjs/contracts".
|
|
||
| const [dbKey, dbCuratedModule, dbOperator] = await Promise.all([ | ||
| await this.keyStorageService.find({}, { limit: 1 }), | ||
| await this.moduleStorageService.findOneById(1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to use constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we agreed, I renamed the findOneById method from the SRModuleStorageService to findOneByModuleId. With the new method name, I don't see any problems with passing just "1" as the argument of this method. I think now it should be obvious that it is a module ID.
| @@ -0,0 +1,84 @@ | |||
| import { MikroORM, UseRequestContext } from '@mikro-orm/core'; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets write tests on this service ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, good idea. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented unit tests for the NetworkValidation service.
| ) {} | ||
|
|
||
| @UseRequestContext() | ||
| public async validate(): Promise<void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's break the method into smaller methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the logic of checking that chin ID from the env variables, EL chain ID, and CL chain ID match each other to the separate private method. Despite this change, I feel that it doesn't make much sense to split the logic of the validate method into smaller methods and functions. I think that the logic of this method is holistic and it would be harder to understand it if it is artificially split into separate methods and functions.
If you have any particular ideas about what parts of this code should be extracted to a separate method, let's discuss.
|
|
||
| if (appInfo != null) { | ||
| if (appInfo.chainId !== configChainId || appInfo.locatorAddress !== this.contract.address) { | ||
| throw new Error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets customize this error ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. I created two new error types: ChainMismatchError and InconsistentDataInDBErrorTypes. The first one is thrown if a chain ID from config, EL chain ID, and CL chain ID don't match each other. The second one is thrown if the chain ID for which the app is run doesn't match the info stored in the DB or data in the DB are corrupted.
| } | ||
| } | ||
|
|
||
| const [dbKey, dbCuratedModule, dbOperator] = await Promise.all([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dbKeys, dbOperators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed, thansk.
| export class NetworkValidationService { | ||
| constructor( | ||
| protected readonly orm: MikroORM, | ||
| @Inject(LIDO_LOCATOR_CONTRACT_TOKEN) protected readonly contract: LidoLocator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets rename contract to locatorContract ?
ALso we have LidoLocatorService , maybe we should get address from there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the contract to locatorContract.
As far as using the LidoLocatorService instead of the LidoLocator from "@lido-nestjs/contracts". It has only one getStakingRouter method to get the staking router address. And this method internally logs the contract.address from LidoLocator as the locator address.
I'm not sure, do you think that it is better to use the getStakingRouter method to get the locator address instead of the contract.address? If yes, why?
| return; | ||
| } | ||
|
|
||
| if (dbCuratedModule == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't understand why we check only one table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we didn't get into this if
if (dbKey.length === 0 && dbCuratedModule == null && dbOperator.length === 0)
it means the DB has non-empty keys, modules, or operators table. If the dbCuratedModule turns out null, the later code where we get dbCuratedModule.stakingModuleAddress will fail with an error. To prevent this we check that the dbCuratedModule is not null before getting the stakingModuleAddress key of this object.
I agree that having only the check for the dbCuratedModule emptiness and not checking dbKey and dbOperator emptiness may be confusing. Will add checkings of non-emptiness for these variables as well and throw the specific error in each of these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added two extra checks that keys and operators tables are not empty.
| ); | ||
| } | ||
|
|
||
| this.logger.log('DB is not empty and chain info is not found in DB, write chain info into DB'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's write a comment explaining in what cases this situation is possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I agree. I've added more explanation notes to each important case of the validate method.
Rename the `findOneById` method of the `SRModuleStorageService` to `findOneByModuleId`.
Replace the internal `CURATED_MODULE_ADDRESSES_FOR_CHAINS` constant with curated module addresses with the appropriate `REGISTRY_CONTRACT_ADDRESSES` constant from `@lido-nestjs` repo.
Remove unnecessary `consensusProviderService` variable and fix lint errors in unit tests for `NetworkValidation` service.
Move checking that the network ID specified in .env config matches the EL chain ID one level up in the `validate` method of the `NetworkValidation` service.
Move checking that chain ID from the config, EL chain ID and CL chain ID match each other to the separate `checkChainIdMismatch` private method of the `NetworkValidation` service.
Add two new custom error types (`ChainMismatchError` and `InconsistentDataInDBError`) with properties that help better indicate specific of the error. Update test cases to make sure that correct error types with correct properties are returned in each test case.
Rename `dbKey` local constant to `dbKeys`; Rename `dbOperator` local constant to `dbOperators`.
Add new specific error types for cases when keys or operators table is empty. Change error messages to better reflect the essence of the error. Add new tests for testing these new error types.
Fix test that tests the case when chain ID and EL ID don't match each other. This test should not depend on the value of the `VALIDATOR_REGISTRY_ENABLE` env variable.
|
I think this solution is mostly good and valid. The code is well-structured and isolated. Changes in the sanity checker will not influence other parts of the code. It contains a lot of checks, but they are straightforward. It looks valid to check for empty keys, operators, and module tables. Currently, we write data in the table in one transaction, but in future implementations, this can be changed, and checking only the ElMeta table will not be sufficient. It is important to note that any of the tables can't be empty on Holesky or mainnet. But if we run Lido from scratch on a new network or on devnet, in the beginning, the keys API may have data only in the ElMeta table. In this case, if we stop the Keys API and run it again, the first check of the sanity checker will not be correct as it doesn't check the ElMeta table. It is proposed to add the ElMeta check too, or prioritize the AppInfo values check (will it be valid?). If we prioritize the AppInfo table check during the second run, we can check that the chainId and locator address were not changed. Is it possible to have an empty AppInfo table and empty keys, operators, modules tables, and a non-empty ElMeta? Only if we run an old release for new Lido deployment. Also, we need to pay attention to the test coverage of the sanity checker. Are all possible cases well covered? Why do we need such a difficult solution? If the database already has data, we somehow need to check if it is consistent with the chain we want to run the Keys API. For this case, we check the moduleAddress in the 'key' table. |
|
Hey-hey! Thank you for your strong attention to this sanity checker and such a detailed and comprehensive review! I believe it's very important for the good product quality. I agree that in case of the scratch deployment of the Lido protocol on a devnet, it is possible that the keys, operators, and modules table might be empty, but the ElMeta table might not. In this case, the sanity checker may not work correctly and this case should be covered. I also agree with your point regarding the test coverage. Currently, all cases of the sanity checker code are covered by unit tests. But new implementation with the additional ElMeta emptiness checking must also be covered by new tests.
Just to be clear here. This sanity checker is not dedicated to handling cases when the DB is corrupted by side interventions (i. e. when other applications and processes, not Keys API, corrupt data in the DB somehow). You're correct that the sanity checker in its current form is able to prevent some cases of such corruption, but there are way more edge cases, that are not covered (and the code is not designed to cover them). |
Implement a sanity checker that checks that the information stored in the DB corresponds the chain for which the app is run. If the app is run for one chain, and the DB contains data for another chain, the sanity checker prevents the app from start and throws the error.