diff --git a/projects/feature-activation/0001-feature-activation-for-blocks.md b/projects/feature-activation/0001-feature-activation-for-blocks.md index 783252a..945a451 100644 --- a/projects/feature-activation/0001-feature-activation-for-blocks.md +++ b/projects/feature-activation/0001-feature-activation-for-blocks.md @@ -114,9 +114,13 @@ With that being said, it is possible that a feature may or may not be activated #### Burying feature activations [Burying feature activations]: #burying-feature-activations -Burying a feature activation means changing code so "history is erased", and it's not clear that a feature was activated using the activation process. +Burying a feature activation means changing code so "history is erased", making it unclear that a feature was activated using the activation process. -It is allowed to change code that is conditional on feature activation, but **NOT** to remove a feature from the definition, as described in the [Feature and Criteria definition] section. +Considering the implementation of Feature Activation for Transactions, burying features is **NOT** allowed. This means that the code that performs a conditional check on feature activation cannot be removed. The reason for this is that feature activation for transactions depend on their timestamp, and it's not possible to differentiate between an old transaction that was just received now, or a new transaction that was created with an old timestamp. Read the Transactions RFC for more information. + +We should also **NOT** remove a feature from the definition, as described in the [Feature and Criteria definition] section. + +What _can_ be done is substituting the Feature Activation call for a constant, when a new checkpoint is created after a reasonable amount of blocks and time has passed since activation. ## Feature and Criteria definition [Feature and Criteria definition]: #feature-and-criteria-definition diff --git a/projects/feature-activation/0005-feature-activation-for-transactions.md b/projects/feature-activation/0005-feature-activation-for-transactions.md new file mode 100644 index 0000000..9182e29 --- /dev/null +++ b/projects/feature-activation/0005-feature-activation-for-transactions.md @@ -0,0 +1,367 @@ +- Feature Name: Feature Activation for Transactions +- Start Date: 2023-09-11 +- Initial Document: [Feature Activation](https://docs.google.com/document/d/1IiFTVW1wH6ztSP_MnObYIucinOYd-MsJThprmsCIdDE/edit) +- Author: Gabriel Levcovitz <> + +# Summary +[summary]: #summary + +This document describes a way to use the Feature Activation process with Transactions, complementing the existing implementation for Blocks. The original [Feature Activation for Blocks RFC](https://github.com/HathorNetwork/rfcs/blob/master/projects/feature-activation/0001-feature-activation-for-blocks.md) can be read for more detailed information on what the Feature Activation process is, why it is necessary, and how it works. + +# Motivation +[motivation]: #motivation + +Implementing Feature Activation for Transactions was a requirement from the beginning, but during development of the initial RFC, it was determined that its complexity would be better addressed in a separate document. While the former defines a way to retrieve feature states for blocks, allowing for changes in block processing (block verification, block consensus, etc), the latter defines analogous behavior to retrieve feature states for transactions. This will be necessary for some of the main known use cases of the Feature Activation process, for example eventually releasing nano contracts. + +# Guide-level explanation +[Guide-level explanation]: #guide-level-explanation + +## Overview +[Overview]: #overview + +The central idea to solve the calculation of feature states for transactions is to actually use the existing block process that handles all general requirements and is already implemented and tested. By doing that, we can define feature states for transactions as simply a "forward" from the feature states of some block. + +This problem demonstrated to be harder than believed in the beginning of this work, resulting in multiple completely different ideas being explored. Some of those became RFCs by themselves, now kept just for reference in an [iterations directory](./0005-iterations). Eventually we would find problems in each solution, and only by leveraging those initial iterations we arrived at the proposal in this document. The other ideas were convoluted and complex, generating multiple edge cases that were hard to keep track and error-prone. Then, the idea is to go back to the base problem and find a simpler solution. Considering that, some of the text was extracted from the previous documents. + +To use block features states to retrieve feature states for transactions, let's first consider the possible state values. Blocks use multiple states to represent the Feature Activation process, such as `STARTED`, `MUST_SIGNAL`, `LOCKED_IN`, `ACTIVE`, etc, as they're responsible for the whole logic of the system. In the context of transactions, contrary to blocks, not all these states are relevant. In fact, for a transaction, it only matters if a feature is either `ACTIVE` or `INACTIVE`. For brevity, from this point forward when we say that a block is `ACTIVE`, we mean that its state for a certain feature is `ACTIVE`, and when we say that a block is `INACTIVE`, we mean that its state for a certain feature is any state _but_ `ACTIVE`. + +To determine the state of a feature for a Transaction, the most obvious idea is to get the state of the current best block. This is problematic when reorgs happen, as detailed in the previous documents. Another previous idea was adding a two-week delay (one Evaluation Interval) between the process for blocks and transactions, by using the state of the boundary block from the evaluation interval _before_ the current evaluation interval. This creates a "buffer" period that helps make sure a feature is consolidated as `ACTIVE` before enabling it for transactions. This idea is good, because this buffer makes so that only extremely large (and therefore unlikely) reorgs would change the state of a transaction. The problem was how this buffer was defined in [iteration 1](./0005-iterations/iteration1.md), which used a similar idea. There, the reorg problem persisted. + +With that in mind and going back all the way to the motivation of BIP 8 vs BIP 9 (as discussed in the original RFC defining Feature Activation for Blocks), we recall that block heights were used instead of block timestamps, mainly to prevent unexpected behavior when the hash rate is too volatile. However, using timestamps makes for a pretty straightforward solution for transactions. Combining that with the buffer idea, we can guarantee that an unwanted scenario is unlikely enough to consider it unrecoverable, forcing manual intervention. This buffer period can be as long as we want: the longer it is, the more unlikely is a manual intervention, but the delay for activating a feature for transactions increases. This tradeoff will be explored in sections below. + +Let's now define the solution: **for a feature, a transaction is considered `ACTIVE` if its timestamp is greater than one Evaluation Interval after the _first_ `ACTIVE` block's timestamp**. This will be detailed in the Reference-level section. The fact that timestamps are "mutable" and not signed will also be taken into account. + +This is essentially very similar to what was defined in the first iteration, that can be rewritten as "a transaction is `ACTIVE` if the boundary block of the previous evaluation interval is `ACTIVE`". However, a small reorg caused problems there, as it could shift the closest boundary block from before to after the transaction (time-wise), making the transaction state retreat from `ACTIVE` to `INACTIVE`. By using the `block height <-> timestamp` duality, this problem is resolved. In other words, a specific block height _can_ be shifted around the timestamp of transaction, but a specific timestamp _cannot_ — so we define the tipping point to activate transactions as a timestamp, rather than a block height. + +Similarly, a large reorg could also cause problems if it was large enough so that the _first_ `ACTIVE` block for a feature participates in the reorg. Instead of preventing a large reorg in terms of how many blocks were reorged, we will again leverage the `block height <-> timestamp` duality and prevent a large reorg in terms of time. If we've already passed one Evaluation Interval after the _first_ `ACTIVE` boundary block, we will have `ACTIVE` transactions. Therefore, we cannot allow reorgs that could render the feature `INACTIVE`. That would only happen if the reorg includes that _first_ `ACTIVE` block. + +Summing up: **if we've passed one Evaluation Interval after the _first_ `ACTIVE` boundary block AND the reorg includes that _first_ `ACTIVE` boundary block, we cannot recover**. That would require revalidation of transactions. Instead, since this scenario is extremely unlikely, we error out and exit the full node, requiring manual intervention. + +Equipped with only those two definitions, the solution is complete. No other edge cases have to be handled. Also, to be clear, no change in the Feature Activation for Blocks is necessary. + +# Reference-level explanation +[Reference-level explanation]: #reference-level-explanation + +In this section, technical details are expanded for what was described above. Before detailing the solution, let's formalize the context. + +## Overview + +### Premises + +1. After an `ACTIVE` block, all next blocks will also be `ACTIVE` (considering the same blockchain). +2. Feature states for transactions are only `ACTIVE` or `INACTIVE`. + +### Requirements + +Using the premises above, we must define a function that returns a state, given a transaction (note: we actually return multiple states for multiple features). That function must satisfy the following requirements: + +1. After one `ACTIVE` block in the best chain, all transaction that have a timestamp greater than that block's timestamp plus some constant must also be `ACTIVE`. +2. When all transactions are sorted by timestamp (which also guarantees topological sorting), there must not be an `INACTIVE` transaction after an `ACTIVE` transaction. In other words, as soon as one transaction becomes `ACTIVE`, every future transaction will also be `ACTIVE`. + +### Definitions + +Repeated here from the Guide-level section. + +#### State of a transaction + +For a feature, a transaction is considered `ACTIVE` if its timestamp is greater than one Evaluation Interval after the _first_ `ACTIVE` block's timestamp. + +#### Dealing with reorgs + +If we've passed one Evaluation Interval after the _first_ `ACTIVE` boundary block AND the reorg includes that _first_ `ACTIVE` boundary block, the reorg is considered invalid and the full node halts execution. + +## Retrieving Feature States for Transactions + +Analogously to the Feature Activation for Blocks, the state for a transaction will be retrieved from a method in the `FeatureService`. Considering the definition above, a pseudo reference-implementation is provided: + +```python +class FeatureService: + def is_feature_active_for_transaction(self, *, transaction: Transaction, feature: Feature) -> bool: + first_active_block = self._get_first_active_block(feature) + + if not first_active_block: + return False + + # equivalent to two weeks + avg_time_between_boundaries = self._feature_settings.evaluation_interval * settings.AVG_TIME_BETWEEN_BLOCKS + activation_threshold = first_active_block.timestamp + avg_time_between_boundaries + # We also use the MAX_FUTURE_TIMESTAMP_ALLOWED to take into account that we can receive a tx from the future + is_active = transaction.timestamp > activation_threshold + settings.MAX_FUTURE_TIMESTAMP_ALLOWED + + return is_active + + def _get_first_active_block(self, feature: Feature) -> Optional[Block]: + """ + Return the first ever block that became ACTIVE for a specific feature (always a boundary block), + or None if this feature is not ACTIVE. + + This can be implemented by recursively hopping boundary blocks until + we find a block that is ACTIVE and has a parent that is LOCKED_IN. + """ + raise NotImplementedError +``` + +## Dealing with reorgs + +Considering the definition above, a function must be called in the `ConsensusAlgorithm.update()` method that determines whether a reorg is invalid. If it is, the full node will **halt execution**, which means its main process will exit with a non-zero code. + +We can consider a reorg invalid if the current time is greater than the common block's timestamp plus one Evaluation Interval. This is a superset of reorgs that include the definition and simplifies implementation, and is also extremely unlikely. A pseudo reference-implementation is also provided: + +```python +def reorg_is_invalid(common_block: Block) -> bool: + now = self.reactor.seconds() + # equivalent to two weeks + avg_time_between_boundaries = settings.FEATURE_ACTIVATION.evaluation_interval * settings.AVG_TIME_BETWEEN_BLOCKS + # We also use the MAX_FUTURE_TIMESTAMP_ALLOWED to take into account that we can receive a tx from the future. + # This is redundant considering we also use it in is_feature_active_for_transaction(), but we do it here too to restrict reorgs even further. + is_invalid = now >= common_block.timestamp + avg_time_between_boundaries - settings.MAX_FUTURE_TIMESTAMP_ALLOWED + + return is_invalid +``` + +This means that if a reorg voids at least two-weeks worth of blocks, it's possible that the reorg includes a first `ACTIVE` block AND the respective transaction activation threshold. This situation cannot be recovered, as there could be `ACTIVE` transactions that would not be re-validated. Therefore, the full node errors and exits, requiring manual intervention. + +Full node operators must remove the storage and perform a re-sync from a snapshot before the reorg (or a sync from scratch), so the full node doesn't experience this reorg, and operation would resume normally. This scenario is extremely unlikely (this will be explored further down). + +## Discussion and examples + +In this section, we will explore some illustrated examples in an effort to improve clarity and try to prove the solution works. We will first explore a similar idea using block heights instead of timestamps, and only by understanding its issue, we'll introduce timestamps as a solution. Therefore, the first examples will be very similar to the ones in the first iteration document. + +### Analogous solution using block heights + +Here, the proposed solution would be: "a transaction is `ACTIVE` if the boundary block of the current best block's previous evaluation interval is `ACTIVE`". Also, the proposed solution for dealing with reorgs: "a reorg is invalid if it reorgs more than 40.320 blocks (one Evaluation interval)". + +We will define a timeline and put some blocks and transactions in it: + +``` + E0 E1 BB tx1 +---|-----|-----|-----|---> time + ac ac ac ac +``` + +On top of the timeline, we see vertex names. Below it, we see feature states for the respective vertex (`ac` stands for `ACTIVE`). Blocks `E0` and `E1` are evaluation boundaries, meaning that there is one Evaluation Interval between them (40.320 blocks, equivalent to two weeks). Then, `tx1` is a transaction that arrived when `BB` was the current best block. The next evaluation boundary, `E2`, has not yet been reached, so it's not shown. + +To determine the feature state for `tx1`, we first get the current best block (`BB`), then its closest evaluation boundary (`E1`), then the _previous_ evaluation boundary (`E0`). This is all easily calculated by using evaluation interval math. Then, it follows that `tx1`'s state is `ACTIVE`. + +#### Dealing with large reorgs + +What would happen if a reorg of more than one Evaluation Interval occurs? It would be possible that `E0` is included in the reorg, which would mean the new `E0` (let's call it `E0'`) could be `FAILED` instead of `ACTIVE`, for example (let's call it `in` for `INACTIVE`). The new timeline would be: + +``` +before: + E0 E1 BB tx1 +---|-----|-----|-----|---> time + ac ac ac ac + +after: + E0' E1' BB' tx1 +---|-----|-----|-----|---> time + in in in ac +``` + +Since transactions will NOT be re-validated, `tx1` would remain `ACTIVE` even though it shouldn't anymore. Therefore, this reorg cannot be allowed. It is invalid and would halt full node execution, because it reorged more than 40.320 blocks. For the size of this reorg, it is extremely unlikely, so this halting would be extremely rare. + +If the reorg is smaller than that, it's guaranteed that `E0` would NOT participate in the reorg, and then the feature would remain `ACTIVE`. Here's a timeline for such reorg: + +``` +before: + E0 E1 BB tx1 +---|-----|-----|-----|---> time + ac ac ac ac + +after: + E0 E1' BB' tx1 +---|-----|-----|-----|---> time + ac ac ac ac +``` + +Blocks `E1` and `BB` are changed, but `E0` is not, so the feature remains `ACTIVE`, and so `tx1`, correctly. From this analysis, it looks like large reorgs that could generate an invalid state are dealt with, and that small reorgs cannot generate an invalid state. Let's examine that below. + +#### Dealing with small reorgs + +From the previous analysis, it looks like the solution could work. However, we ended up finding a problem with it. Let's suppose a small reorg of 5 blocks, which is plausible. It's possible that the new best chain arranges blocks in such a way that blocks are shifted in time. Below is a perfectly valid small reorg, where we also added a new `tx2` that arrives after the reorg, and `bb` as the new best block observed by `tx1` and `tx2`: + +``` +before: + E0 E1 BB tx1 +---|-----|-----|-----|---------------> time + ac ac ac ac + +after: + E0 bb tx1 tx2 E1' BB' +---|-----------|-----|-----|-----|-----|---> time + ac ac ac in ac ac +``` + +To determine the feature state for `tx2`, we first get its observed best block (`bb`), then its closest evaluation boundary (`E0`), then the _previous_ evaluation boundary, `E(-1)`, which is not shown. `E(-1)` could be `INACTIVE`, rendering `tx2` as `INACTIVE` too. Considering that `tx1`'s state is never recalculated, we would end up with an `ACTIVE` transaction followed by an `INACTIVE` transaction, which is invalid. + +Therefore, we conclude that this solution is flawed. Let's try to fix this issue by introducing the use of timestamps. + +### Fixing the issue by using timestamps + +Using the actual solution defined in this document, let's revisit the examples above. Here's the initial timeline: + +``` + E0 tE1 tx1 +---|-----•-----|---> time + ac ac +``` + +Now, instead of using the `E1` block, we use `tE1`, which is just a timestamp and not an actual block. It's calculated as `tE1 = E0.timestamp + 2 weeks (one Evaluation Interval)`, and can be interpreted as the "expected timestamp of `E1`". Per definition, the state of `tx1` is `ACTIVE` if `tx1.timestamp > tE1`, which is true. + +Here, we'll not consider the `MAX_FUTURE_TIMESTAMP_ALLOWED` (like in the reference implementation), for simplicity. It's as `MAX_FUTURE_TIMESTAMP_ALLOWED = 0`, that is, if a transaction exists, it's guaranteed that `current_time >= tx.timestamp`. + +#### Dealing with large reorgs + +What would happen if a reorg of more than one Evaluation Interval occurs? It would be possible that `E0` is included in the reorg, which would mean the new `E0` could be `FAILED` instead of `ACTIVE`, for example. The new timeline would be: + +``` +before: + E0 tE1 tx1 +---|-----•-----|---> time + ac ac + +after: + E0' tE1' tx1 +---|-----•-----|---> time + in ac +``` + +Since transactions will NOT be re-validated, `tx1` would remain `ACTIVE` even though it shouldn't anymore. Therefore, this reorg cannot be allowed. It is invalid and would halt full node execution, because it reorged more than two-weeks worth of blocks. For the size of this reorg, it is extremely unlikely, so this halting would be extremely rare. This is all completely analogous to the previous solution, except a reorg is considered invalid if it reorgs more than two-weeks (in time), instead of more than 40.320 blocks. In other words, the time between the common reorg block and the current time must be less than two weeks. + +If the reorg is smaller than that, it is guaranteed that either: + +1. `tE1` has been reached by the current time (so there are `ACTIVE` transactions) BUT `E0` did not participate in the reorg, OR +2. `E0` participated in the reorg BUT the current time has not reached `tE1` (so there are no `ACTIVE` transactions). + +In other words, this holds true because by definition transactions can only be `ACTIVE` if `current_time > tE1`, and the time distance between `E0` and `tE1` is two weeks, so it's impossible for a reorg smaller than two weeks to include `E0` if there are `ACTIVE` transactions. Now let's look at the small reorgs issue again. + +#### Dealing with small reorgs + +Let's simulate the same small reorg that was simulated in the previous solution: + +``` +before: + E0 tE1 tx1 +---|-----•-----|---------------> time + ac ac + +after: + E0 tE1 tx1 tx2 E1' +---|-----•-----|-----|-----|---> time + ac ac ac ac ac +``` + +Now, the actual `E1'` block appeared _after_ the expected `tE1`, but it doesn't affect state calculations. + +To determine `tx2`'s state, we check that it satisfies `tx2.timestamp > tE1`, which is true, so `tx2` is `ACTIVE`, differently from the previous solution that resulted in `INACTIVE`. Therefore, the reorg is completely valid. + +### Conclusion + +It appears that the solution using timestamps works for small reorgs, while large reorgs that would cause issues are prevented by halting full node execution. + +An interpretation to aid intuition for comparing both solutions is observing the fact that reorgs can "compress" and "dilate" blocks in relation to time, which ruins the solution using only block heights. By leveraging time instead, the activation threshold is pinned to the timeline such that shifting blocks in relation to time doesn't affect the threshold. + +## Likelihood of halting the full node + +On this section we'll explore how likely it is for the full node execution to be halted (and require manual intervention), which is a possible but undesired scenario. + +The full node will be halted if a reorg affects more than one Evaluation Interval, which is 40.320 blocks, or two weeks considering the average time between blocks of 30 seconds. For reference, in the [wallet-service](https://github.com/HathorNetwork/hathor-wallet-service-sync_daemon/blob/master/src/utils.ts#L67-L73) we have an alert that considers a reorg of more than 30 blocks to be `CRITICAL`, the highest severity possible. Just for the sheer difference in scale to an already unlikely critical reorg, it's clear that a reorg of a full Evaluation Interval is extremely improbable. + +We have to consider we're using time differences instead of amount of blocks, but we'll use this as a premise: a reorg of 40.320 blocks is rare enough that it is acceptable to halt full node execution if it happens. Then, considering the time difference of two weeks, we have three possibilities: + +1. The block average is respected and we have exactly 40.320 blocks in two weeks. In this case, the two-week reorg is exactly as unlikely as the reorg of 40.320 blocks. +2. The hash rate suddenly increases and we have more than 40.320 blocks in two weeks. This means that the two-week reorg would affect more than 40.320 blocks, which is even more unlikely than the previous case. +3. Lastly, the hash rate suddenly drops and we have less than 40.320 blocks in two weeks. Then, the two-week reorg would affect less than 40.320 blocks, so we'll turn our attention to this case. + +If that last case happens, by definition it would also mean that the average time between blocks is higher than 30 seconds. When the amount of blocks in two weeks decreases, the likelihood of a reorg happening increases, but the average time between blocks also increases. That average is defined by `avg = 40.320 * 30 / N` (in seconds), where `N` is the amount of blocks in the two-week interval. + +So if for example there are 20.160 blocks in two-weeks, the average time between blocks would be 60 seconds. Here's a table with some other examples: + +| N | avg | +|--------|----------| +| 40.320 | 30 s | +| 20.160 | 60 s | +| 10.000 | ~120 s | +| 1.000 | ~20 min | +| 100 | ~200 min | +| 10 | ~30 h | + +As `N` decreases, the probability of a reorg increases, but the `avg` also increases. A reorg of 1.000 blocks, which is already extremely unlikely, would only be possible if we were having only one block every 20 minutes, which is also very unlikely. For a reorg of 100 blocks, which is creeping into the likely territory (but it's still way higher than our alert for critical reorgs), we would only have one block for every ~200 minutes, or ~3 hours. This is almost impossible in theory, considering that our DAA would prevent this increase in average block times. + +Then the only real situation where such a reorg would be more or less likely is if we were without miners for most of the two-week interval. At this point, this would be a full-fledged critical incident and maybe it would even be a good thing that full nodes would halt execution and require manual intervention. In practice, given all that, I expect that a full node halting will never actually happen. + +## Example - Releasing new Nano Contracts + +To illustrate the usage of Feature Activation for Transactions, we'll demonstrate how it would be used to release a new Nano Contract. + +To release a new Nano Contract, a new value would be added to the `TxVersion` enum, allowing deserialization of a different kind of transaction. Since the `FeatureService.is_feature_active_for_transaction()` method requires a `Transaction` instance, it cannot be accessed during deserialization, as the instance hasn't been construct yet. The deserialization will always succeed, even if the Feature fails to become `ACTIVE`. + +Then, a new method must be added in the verification phase of `BaseTransaction`, to assert that the deserialized `TxVersion` is valid. In that method, `is_feature_active_for_transaction()` would be called, and depending on the state returned, the `TxVersion` representing the new Nano Contract would be accepted or not. + +This demonstrates that there are two possible known use cases for Feature Activation for Transactions: + +1. Changing behavior in transaction validation. In this case, the usage is straight forward, simply a call to `FeatureService.is_feature_active_for_transaction()` to check whether the new feature is `ACTIVE`. +2. Changing behavior in transaction deserialization. In this case, it may be necessary to create a new transaction validation method that verifies the validity of using a new deserialization feature. + +## Mutability of transaction timestamps + +One complication factor that we haven't considered yet is the fact that the timestamp of a transaction can be tempered with, as it's not part of the transaction signature. This means that after a transaction is pushed to the network, a third-party can re-push a copy of that transaction, only changing its timestamp and weight. Then, a third party could manipulate a transaction in such a way that the transaction's feature state is toggled (to/from `ACTIVE` from/to `INACTIVE`) if the third-party transaction has a higher weight than the original transaction. + +Let's consider an example. A feature is created to activate a new Nano Contract, that is, a new `TxVersion` will become allowed during deserialization. At timestamp `t`, that feature becomes active for transactions, and a transaction using the new Nano contract is pushed to the network, with timestamp `t+1`. It's perfectly valid, and is accepted by the network. Then, a third-party copies this transaction, increasing its weight and changing its timestamp to `t-1`, and pushes it to the network. That copy would win over the original transaction, voiding it (for its weight), but since from the perspective of the copy the new Nano Contract is not activated yet, that transaction would not even be accepted in the first place. Therefore, the original transaction would remain valid and accepted. An analogous example can be created for the opposite situation (an original `INACTIVE` transaction being shifted by a third-party to become `ACTIVE`). + +In other words, if a third-party tries to manipulate a transaction such that it's shifted before or after the activation threshold, the copied transaction will only remain valid if the usage of a feature doesn't affect the validity of that transaction in the first place. + +Therefore, the mutability of transaction timestamps is not an issue for the solution proposed in this document. + +# Drawbacks +[drawbacks]: #drawbacks + +### The two-week delay + +A drawback is that feature states for transactions will always have a two-week delay when compared to feature states for blocks, making the transaction process a bit longer. The tradeoff is worth it as this makes for a very simple implementation that naturally doesn't need to account for reorgs. + +Special care should be taken when designing new features that will affect both blocks and transactions together, as the feature may become `ACTIVE` for blocks before it does for transactions. It's not clear at this point if this will ever be a possibility for a new feature and whether it would be a problem. For the foreseen use cases (Merkle Path length, SIGHASH, and Nano Contracts) this is not an issue. + +### Relaying transaction from the past + +Currently, it is possible to relay new transactions with any timestamp in the past. This means that even after a feature becomes active, enabling new rules, it's possible to create a new transaction that is inactive, that is, still uses the old rules. The restriction is that the new transaction must only spend and confirm older transactions, so they will also be inactive. Therefore, the Feature Activation requirements are respected (the new inactive transaction will not be topologically after an active transaction). + +For the known use cases, this is not an issue. The only impact is in Burying of Feature Activations, which is described in the original RFC for blocks. Considering this, code performing the conditional branching on feature activation for transactions must not be removed, contrary to what was previously described. This will be updated in the original RFC. + +Moreover, when Sync-v2 is completed, we may impose rules on the validity of transactions with past timestamps. After that is done, we can update this so it's impossible to relay new transactions using old rules. + +# Rationale and alternatives +[Rationale and alternatives]: #rationale-and-alternatives + +This RFC passed through a lot of completely different idea iterations before arriving at this solution. + +The first idea that was discarded right from the beginning was using signal bits for transactions, instead of leveraging the Feature Activation for Blocks. This would mean a completely separate process for transactions, and a lot of the already implemented requirements would have to be reimplemented. It made much more sense to leverage the existing block mechanism. + +Then, a whole other group of ideas was studied, that was trying to find a simple way of choosing an "associated feature block" for each transaction. The feature state of a transaction would simply be the feature state of that associated block. The problem was finding an associated block that makes sense, and that works when reorgs happens, while satisfying the requirement that the state of transactions cannot "retreat" from `ACTIVE` to `INACTIVE` when transactions are ordered by timestamp. We tried a lot of different options, using best blocks, first blocks, first blocks of parents, introducing a delay of an evaluation interval between a tx and its associated block, etc. After a lot of work analysing those options, we would always find a case where the requirements would be broken and the solution became convoluted. + +Then, we tried to come up with a solution taking inspiration from the existing Reward Lock mechanism. This seemed to work, but given the differences between the Reward Lock and the Feature Activation, the solution got too complex and with too many edge cases to handle. + +Those solutions are available in the [iterations directory](./0005-iterations), in incomplete form. They're kept there just for reference. After discussing those issues and possible new paths, three new different ideas emerged: + +1. Revalidate all transactions after a common reorg block, and revert the reorg if any transaction becomes invalid. This would have performance issues, and would also require a refactor in the consensus so that a failed consensus update could be rolled back (similar to a database transaction commit/rollback). +2. Refactor the full node so vertex metadata would be different per chain. Currently, we have one metadata object per vertex that is the same for all chains. It was possible that this would solve issues, but was a sizeable refactor. +3. Use soft checkpoints generated by code to guarantee that an `ACTIVE` feature never reverts back to `INACTIVE`. This would touch some sensitive parts of the code and is complex enough to consider it error-prone to implement, so was discarded. An observation is that this could actually be used in the solution of this document, replacing the full node halting to guarantee a first `ACTIVE` block never changes. If we conclude that halting the full node is too extreme, we could resort to soft checkpoints instead, but I believe the tradeoff of halting is better. + +Finally, using all the knowledge gathered from those explorations, we arrived at the solution proposed here. + +# Prior art +[prior-art]: #prior-art + +I couldn't find any prior art relevant to this RFC, considering Hathor's unique DAG architecture for transactions. While the Feature Activation for blocks is heavily based on the Bitcoin solution, the transactions solution is very specific to Hathor. + +# Task breakdown + +Here's a table of main tasks: + +| Task | Dev days | +|--------------------------------------------|----------| +| Implement state retrieval for transactions | 0.5 | +| Implement handling of large reorgs | 0.5 | +| Implement tests and simulations | 2 | +| **Total** | **3** | diff --git a/projects/feature-activation/0005-iterations/iteration1.md b/projects/feature-activation/0005-iterations/iteration1.md new file mode 100644 index 0000000..e48a85e --- /dev/null +++ b/projects/feature-activation/0005-iterations/iteration1.md @@ -0,0 +1,237 @@ +- Feature Name: Feature Activation for Transactions +- Start Date: 2023-08-09 +- Initial Document: [Feature Activation](https://docs.google.com/document/d/1IiFTVW1wH6ztSP_MnObYIucinOYd-MsJThprmsCIdDE/edit) +- Author: Gabriel Levcovitz <> + +# Summary +[summary]: #summary + +This document describes a way to use the Feature Activation process with Transactions, complementing the existing implementation for Blocks. The original [Feature Activation for Blocks RFC](https://github.com/HathorNetwork/rfcs/blob/master/projects/feature-activation/0001-feature-activation-for-blocks.md#evaluation-interval) can be read for more detailed information on what the Feature Activation process is, why it is necessary, and how it works. + +# Motivation +[motivation]: #motivation + +Implementing Feature Activation for Transactions was a requirement from the beginning, but during development of the initial RFC, it was determined that its complexity would be better addressed in a separate document. While the former defines a way to retrieve feature states for blocks, allowing for changes in block processing (block verification, block consensus, etc), the latter defines analogous behavior to retrieve feature states for transactions. This will be necessary for some of the main known use cases of the Feature Activation process, for example eventually releasing nano contracts. + +# Guide-level explanation +[Guide-level explanation]: #guide-level-explanation + +## Overview +[Overview]: #overview + +The central idea to solve the calculation of feature states for transactions is to actually use the existing block process that handles all general requirements and is already implemented and tested. By doing that, we can define feature states for transactions as simply a "forward" from the feature states of some block. Then, our problem becomes, which block to use for each transaction? + +To understand this question, let's think about the block process. In general terms, the feature state for a block depends on the previous blocks, or the blocks "behind" it (meaning, in the past). For blocks, height and timestamp can be used interchangeably, as every block has a greater height and a greater timestamp when compared to its parent. + +It's also a fact that while the block count advances, a feature state cannot "retreat". For example, the feature cannot be `ACTIVE` on one block and then go back to `STARTED` on the next block, as that state transition would be backwards. This is only true when looking for blocks in the best chain (or more generally, in any single blockchain). If instead we look for the best block over time, feature states can indeed retreat, when reorgs happen. After the reorg is complete, the "always-increasing" property of feature states is restored, as blocks with "retreating" states would be part of the voided blockchain. + +For transactions, we have the same requirement. For every transaction that arrives, the feature state for that transaction cannot "retreat" from the feature state of any previous transaction (meaning, any transaction in the past). The complication here is that while blocks follow a linear pattern, transactions do not, as they're organized in a DAG (in other words, transactions don't have a height property). Therefore, contrary to blocks, transactions also have the requirement that feature states cannot retreat when looking for transactions over time. This means that at any point in time, any transaction must have a feature state that is the equal to or "greater" than the feature state of any transaction that arrived before it did. + +This poses a complication for dealing with reorgs. When they happen, blocks are voided, but transactions are not — they may only return to the mempool. This would leave some transactions with features states retrieved from voided blocks. In other words, the transactions would hold the "retreated" state. To solve this problem, one possible solution would be to recalculate the feature states of transactions when a reorg happens, that is, we would have to choose another block for the transaction to retrieve its feature states from, because they could be invalidated after the reorg thanks to their own feature state. This would introduce unnecessary complexity, but there's another option. + +Going back to the question in the beginning of this section, we must define some block that will be associated with some transaction, and to retrieve the feature states for that transaction, we'll simply retrieve the feature states for the associated block. It follows that for any new transaction, its associated block's height must be greater than or equal to the associated block's height of any past transaction. Considering what was described in the previous paragraph, an alternative solution to the reorg problem is adding another requirement for those associated blocks: their state must not change when a reorg occurs. If that is true, the transaction's feature states will always remain valid, even after a reorg. + +Then, finding this associated block for each transaction is our main problem to be solved, and it guarantees that feature states will never "retreat". + +There are a few obvious choices for choosing an associated block, like using the best block, the first block, or other related ideas, but they all introduce problems that will be explored in the following sections. The solution we'll arrive at leverages boundary blocks of Feature Activation evaluation intervals. + +**The associated block for a transaction will always be the boundary block of the previous evaluation interval**. This introduces a delay of at least two weeks between a transaction and the block used to determine its feature states. Intuitively, that delay guarantees the requirement of "feature states of associated blocks cannot change", as the only thing that could change that block's feature states is a two-week reorg, and the probability of that happening is virtually zero. + +--- +THIS IS WRONG + +Our question now becomes, how to retrieve that boundary block? First, we need to determine the boundary block that is closest to our transaction, to the left (in the past). Knowing block heights is an easy way to navigate through boundary blocks, but transactions do not have heights, only timestamps. So we need a way to bridge from the time domain to the height domain. Using the transaction's timestamp, we'll get the best block at that time. Now, with a block in our hands, we can get its height to easily convert between domains. Finally, to get to the boundary block of the previous evaluation interval, only simple height math is necessary. + +--- + +# Reference-level explanation +[Reference-level explanation]: #reference-level-explanation + +In this section, technical details are expanded for what was described above. + +## Rationale + +### Definitions + +1. We use the term tx to refer to a `Transaction`, not a vertex. +2. Considering the Feature Activation context, as [defined in the original RFC](https://github.com/HathorNetwork/rfcs/blob/master/projects/feature-activation/0001-feature-activation-for-blocks.md#evaluation-interval), + 1. An Evaluation Interval is a repeating period in the Feature Activation process. It's defined as `40320` blocks, the equivalent of 2 weeks given the average time between blocks. + 2. A Boundary Block is a block that starts an Evaluation Interval, that is, its height is a multiple of `40320`. + 3. Similarly, a Boundary Height is the height of a Boundary Block. +3. Both blocks and transactions can have different states for different features, but for simplicity sometimes we mention "the block's feature state" or "the transaction's feature state", in singular. + +### Premises + +1. We already have feature states for all blocks, which are readily available through the `FeatureService`. +2. We'll leverage that to define feature states for transactions. That is, the feature state for some tx is the same state as some block, so we need some function `f(tx) -> block`. +3. Analogously to blocks, whenever a new tx arrives, its feature state must not "retreat" when compared to the feature state of a tx that arrived before it did. In other words, for a `tx2` that arrives after a `tx1`, that is, `tx2.timestamp >= tx1.timestamp`, we require that `f(tx2).feature_state >= f(tx1).feature_state`. This must hold true even after reorgs. + +Note: in the context of feature states, `>=` means that a state can only occur after another state in the state machine, or it is the same state. + +### The Problem + +Given the definitions and premises above, our problem becomes a matter of finding a function `f(tx)`. We'll iterate over possible solutions, identifying issues and fixing them until we reach a working result. + +The first obvious solution would be for `f(tx)` to simply return the best block, using `TransactionStorage.get_best_block()`. There are also obvious issues with this option: as the best block changes over time, the feature state of a tx would become transient. + +An option that solves this issue is using `TransactionStorage.get_best_block_tips()` instead. This method receives a `timestamp`, that would be the transaction's timestamp. In other words, we retrieve the best block at the time of the transaction, removing the transient component. Let's call this "best block at the time of the transaction" the "transaction's best block", for brevity. + +The issue in this case would be dealing with reorgs. Let's consider the following timeline: + +``` + b1 tx1 +---|-----|---> time +``` + +We start with `b1` as the best block, and then `tx1` arrives. During the verification process of `tx1`, some feature state is queried. Using the solution described above, `tx1`'s feature state would be the same as `b1`'s feature state, let's say it is `ACTIVE`. + +Then, there's a reorg and `b1` is voided, `b2` replaces it as the new best block, with state `FAILED`. After the reorg, a `tx2` arrives and since `tx2`'s best block is `b2`, its state would also be `FAILED`: + +``` + b2 tx1 tx2 +---|-----|-----|---> time +``` + +Since `tx1`'s best block is still `b1`, even though it's voided, its state is still `ACTIVE`. This would mean that we have two consecutive, valid transactions with retreating feature states, the equivalent to: + +- `tx2.timestamp >= tx1.timestamp` and +- `not (f(tx2).feature_state >= f(tx1).feature_state)` + +Which breaks premise #3. + +We then add boundary heights to the solution. We know that feature states can only change in boundary heights, and then stay the same for the whole evaluation interval. We also know that whenever a new tx arrives it falls in some evaluation interval, between boundary heights `h1` and `h2`. At that time, it's guaranteed that there will be a block at `h1`, and not at `h2`. Also, it holds that `b1.height >= h1.height`, where `b1` is `tx1`'s best block: + +``` + h1 b1 tx1 +---|-----|-----|---> time +``` + +Notice that `h2` is omitted, as there's no block at that height yet (it would eventually appear after `tx1`). + + +This time, instead of associating transaction feature states using best blocks, we'll use the boundary blocks. We can easily retrieve the boundary block that is the closest to a transaction's best block (to the left). In case of `tx1`, that would be `h1`. In other words, `f(tx1) = h1`. + +Similarly to the previous example, let's say there's a reorg that voids both `h1` and `b1`, so the new best block is `b3`. We've used the `~` notation to represent a voided block: + +``` + ~h1 ~b1 tx1 b2 tx2 h1' b3 +---|-----|-----|-----|-----|-----|-----|---> time +``` + +This reorg also added the `b2` and `h1'` blocks, which are part of the new blockchain. The height of `h1'` is the same as `h1`, but its timestamp is greater. + +To calculate `tx2`'s feature state, we'll get its best block (`b2`) and then its closest boundary block, which has some height lower then `b2`. Let's call it `h0`, as it's the boundary block directly before `h1/h1'`: + +``` + h0 ~h1 ~b1 tx1 b2 tx2 h1' b3 +---|-----|-----|-----|-----|-----|-----|-----|---> time +``` + +From that, it follows that `f(tx2).height < f(tx1).height`, which directly contradicts `f(tx2).feature_state >= f(tx1).feature_state`, breaking premise #3. + +Observing that last timeline, we can finally arrive at the working solution. Instead of using the boundary block that is closest to the transaction's best block, we will use the boundary block before that. Let's reproduce the timeline at the beginning of the example: + +``` + h0 h1 b1 tx1 +---|-----|-----|-----|---> time +``` + +Now, we've included `h0`, and we'll define `tx1`'s feature state as the same as `h0`, or `f(tx1) = h0`. Reproducing the reorg: + +``` + h0 ~h1 ~b1 tx1 b2 tx2 h1' b3 +---|-----|-----|-----|-----|-----|-----|-----|---> time +``` + +It now follows that `tx2`'s feature state is also the same as `h0`, making `f(tx2) = f(tx1) = h0`. Using this strategy, premise #3's requirements are hold. + +## Completing timeline scenarios + +In the guide-level section, we said that there are only two possible cases for positioning a transaction and its best block in a timeline. This was in fact incomplete, so let's address this now. Here's a new timeline: + +``` + h0 h1 tx1 +---|-----|-----|---> time +``` + +Let's analyze where `tx1`'s best block, `b1`, could be. The following set of intervals covers the whole timeline: + +1. `b1 < h0` +2. `h0 <= b1 < h1` +3. `h1 <= b1 <= tx1` +4. `tx1 < b1` + +Scenarios #2 and #3 are actually the cases described in the guide-level section, so no need to detail them. + +Scenario #1 means that there are no blocks between `h0` and `h1`, which is an evaluation interval. That would mean at least two weeks without any blocks in the network, which is virtually impossible, so this scenario is considered invalid. + +Lastly, scenario #4 means that calling `TransactionStorage.get_best_block_tips(tx1.timestamp)` would return a block with `b1.timestamp > tx1.timestamp`. In other words, it would return a block from the future, which does not make sense for this method. Therefore, this scenario is also considered invalid. TODO: Is this true? Should we add an `assert` for that in `get_best_block_tips()`? + +## Dealing with reorgs + +Let's detail a bit further what could possibly happen when a reorg occurs, and how to deal with them in the context of transactions. There are only two possibilities that affect the Feature Activation process. + +### Changes in the best block height + +When a reorg occurs, the best block height could change, shifting the transaction's best block between the different timeline intervals described above. The interpretation of what happens in each possible shift is direct from the previous analysis. + +The new best block can only shift FROM scenarios #2 and #3 and TO scenarios #2 and #3, as any other shift combination would result in an invalid scenario. + +This means that even though a transaction's best block can change, `f(tx1)` will always return the block at `h0`, the same it returned before the reorg. It doesn't change over time, and it doesn't change when reorgs occur. In other words, no explicit handling of reorgs is necessary. + +### Changes in the feature state of the best block + +When a reorg occurs, another possibility is that the feature state of the best block would change, for example if the original best chain signaled support and enabled some feature, and the new best chain does not. + +As described before, the best block can only shift TO and FROM scenarios #2 and #3, meaning that it can only affect the feature state at boundary blocks `h1` and a future `h2`. Since `f(tx1)` always returns `h0`, the transaction's feature state is not affected by the reorg. Again, no explicit handling of reorgs is necessary. + +### TODO: Am I missing any other reorg consequence that should be considered? + +## Retrieving feature states for Transactions + +Given the explanations in the sections above, a new method will be added to the `FeatureService`. Here's its reference implementation: + +```python +def get_state_for_transaction(self, *, tx: Transaction, feature: Feature) -> FeatureState: + best_block_tips = self._tx_storage.get_best_block_tips(tx.timestamp) + assert len(best_block_tips) >= 1 + + best_block_hash = best_block_tips[0] + best_block = self._tx_storage.get_transaction(best_block_hash) + assert isinstance(best_block, Block) + + best_block_height = best_block.get_height() + offset_to_boundary = best_block_height % self._feature_settings.evaluation_interval + closest_boundary_height = best_block_height - offset_to_boundary + state_block_height = closest_boundary_height - self._feature_settings.evaluation_interval + state_block = self._tx_storage.get_transaction_by_height(max(state_block_height, 0)) + assert isinstance(state_block, Block) + + return self.get_state_for_block(block=state_block, feature=feature) +``` + +That's all its necessary to enable Feature Activation for transactions. + +## Drawbacks + +### The two-week delay + +A drawback is that feature states for transactions will always have a two-week delay when compared to feature states for blocks, making the transaction process a bit longer. The tradeoff is worth it as this makes for a very simple implementation that naturally doesn't need to account for reorgs. + +Special care should be taken when designing new features that will affect both blocks and transactions at the same time, as the feature may become `ACTIVE` for blocks before it does for transactions. It's not clear at this point if this will ever be a possibility for a new feature and if it would be a problem, but if it is, I believe we can solve it by introducing a two-week delay on the feature state for blocks, making their states synchronized with transactions. + +# Rationale and alternatives +[Rationale and alternatives]: #rationale-and-alternatives + +The rationale is explained in the guide-level section above. + +TODO: add alternatives + +# Prior art +[prior-art]: #prior-art + +I couldn't find any prior art relevant to this RFC, considering Hathor's unique DAG architecture for transactions. While the Feature Activation for blocks is heavily based on the Bitcoin solution, the transactions solution is very specific to Hathor. + +# Task breakdown + +TODO diff --git a/projects/feature-activation/0005-iterations/iteration2.md b/projects/feature-activation/0005-iterations/iteration2.md new file mode 100644 index 0000000..6887407 --- /dev/null +++ b/projects/feature-activation/0005-iterations/iteration2.md @@ -0,0 +1,182 @@ +- Feature Name: Feature Activation for Transactions +- Start Date: 2023-08-09 +- Initial Document: [Feature Activation](https://docs.google.com/document/d/1IiFTVW1wH6ztSP_MnObYIucinOYd-MsJThprmsCIdDE/edit) +- Author: Gabriel Levcovitz <> + +# Summary +[summary]: #summary + +This document describes a way to use the Feature Activation process with Transactions, complementing the existing implementation for Blocks. The original [Feature Activation for Blocks RFC](https://github.com/HathorNetwork/rfcs/blob/master/projects/feature-activation/0001-feature-activation-for-blocks.md#evaluation-interval) can be read for more detailed information on what the Feature Activation process is, why it is necessary, and how it works. + +# Motivation +[motivation]: #motivation + +Implementing Feature Activation for Transactions was a requirement from the beginning, but during development of the initial RFC, it was determined that its complexity would be better addressed in a separate document. While the former defines a way to retrieve feature states for blocks, allowing for changes in block processing (block verification, block consensus, etc), the latter defines analogous behavior to retrieve feature states for transactions. This will be necessary for some of the main known use cases of the Feature Activation process, for example eventually releasing nano contracts. + +# Guide-level explanation +[Guide-level explanation]: #guide-level-explanation + +## Overview +[Overview]: #overview + +The central idea to solve the calculation of feature states for transactions is to actually use the existing block process that handles all general requirements and is already implemented and tested. + +To use block features states to retrieve feature states for transactions, let's first consider the possible state values. Blocks use multiple states to represent the Feature Activation process, such as `STARTED`, `MUST_SIGNAL`, `LOCKED_IN`, `ACTIVE`, etc, as they're responsible for the whole logic of the system. In the context of transactions, contrary to blocks, not all these states are relevant. In fact, for a transaction, it only matters if a feature is either `ACTIVE` or `INACTIVE`. For brevity, from this point forward when we say that a block is `ACTIVE`, we mean that its state for a certain feature is `ACTIVE`, and when we say that a block is `INACTIVE`, we mean that its state for a certain feature is any state _but_ `ACTIVE`. + +Other than Feature Activation for Blocks, another existing system will be used as an inspiration for the Feature Activation for Transactions design, that is the Reward Lock system (called RL from now on). As will be described below, the way this system works can be mostly described as an analogy to the Feature Activation for Transactions (called FATX from now on), with some differences. + +In loose terms, the problem RL solves is mostly determining the validity of some transaction in relation to the height of the **current best block**. This has to be calculated and determined in two different contexts: when a transaction is received and is in the mempool, and when a reorg happens. The FATX problem is essentially the same, but the best block's feature states are relevant, instead of its height. + +Considering the reorg case, we can find a difference between the two systems, that is the FATX has one extra "dimension" when compared to the RL. For RL, a reorg is only relevant if it decreases the height of the best block (which could invalidate a previous valid transaction by re-locking the reward it tries to spend). For FATX, it doesn't matter if the best block's height is changed, it only matters if its state is changed, which could happen even if the best blockchain got larger after the reorg. + +In other words, any time a reorg changes the state of the best block either from `INACTIVE` to `ACTIVE` or from `ACTIVE` to `INACTIVE`, some transactions may become invalid. Also, such change can occur if and only if a boundary block participates in the reorg. Otherwise, by definition in the Feature Activation for Blocks, the state will remain the same. + +Another difference between RL and Feature Activation is their "runtime". Considering the lifecycle of a vertex in the full node, it is first received as a struct, or a byte array, and then it is parsed into one of the `BaseTransaction` subclasses, such as `Block` or `Transaction`. Only then its verification process starts, considering different rules for blocks and transactions, and RL validation is calculated. + +In other words, the verification (or validation) of RL is only made _after_ the vertex bytes are parsed. For Feature Activation, that is not the case. Feature Activation must also be available _before_ bytes are parsed, as it must support changes such as updating `TxVersion`'s possible values, allowing for example for the release of nano contracts. Since those values are used to determine whether the vertex bytes are even parseable in the first place, Feature Activation must be available then. In that case, Feature Activation for Blocks must be used to determine the current state of the network for some feature before parsing the new vertex (using the current best block), and then FATX rules (described below) will be applied normally to verify the validity of the parsed `Transaction`. This will be further detailed. + +# Reference-level explanation +[Reference-level explanation]: #reference-level-explanation + +In this section, technical details are expanded for what was described above. + +## Reward Lock + +Before detailing FATX, let's describe how the RL works in general terms. Then, we'll be able to observe the analogy and define FATX. As explained before, there are two main contexts for calculating RL (new vertices are separated into txs and blocks). + +#### Dealing with new txs + +1. A tx is received, and it spends the reward of a block. It is verified for reward lock. Then, there are two possibilities: + 1. If the current best block's height IS NOT enough to unlock the reward, the tx is invalid and is rejected. + 2. If the current best block's height IS enough to unlock the reward, the tx is valid and remains in the mempool. + +#### Dealing with reorgs + +1. A reorg happens. Then, if the new best height is lower than the best height before the reorg, + 1. Both txs that were already in the mempool, and confirmed txs that came back to the mempool, may have become invalid. For that, all txs in the mempool are re-verified (only for reward lock). If they're invalid, they're marked as such and removed from the mempool and the storage. + +#### Dealing with new blocks + +1. When a block is received, if one of the txs it confirms tries to spend a locked reward, the block is invalid and is rejected. + +This rule is only for guaranteeing no rewards can be spent too early. In practice, it's impossible for such tx to be in the mempool, as it would have been invalidated before by the previous rules. TODO: Is this correct? Why does this rule exist, is it for compatibility with the previous reward lock mechanism? + +## Feature Activation for Transactions + +Now, before describing the contexts above for FATX, let's make some definitions. + +### Premises + +1. After an `ACTIVE` block, all next blocks will also be `ACTIVE` (considering the same blockchain). +2. Feature states for txs are only `ACTIVE` or `INACTIVE`. + +### Requirements + +Using the premises above, we must define a function that returns a state, given a transaction (note: we actually return multiple states for multiple features). That function must satisfy the following requirements: + +1. All txs that are received after (time-wise) an `ACTIVE` block in the best chain, must also be `ACTIVE`. +2. When all txs are ordered by timestamp, there must not be an `INACTIVE` tx after an `ACTIVE` tx. + +### Definitions + +We then define the feature state for transactions function: + +1. A tx is considered `ACTIVE` if + 1. It confirms a tx that is `ACTIVE`, OR + 2. It confirms a tx that has an `ACTIVE` first block. + +Otherwise, it is `INACTIVE`. This guarantees that a tx will never be `INACTIVE` if there's an `ACTIVE` tx "before" it (to its left). + +### Reward Lock analogy + +Tooled with the definitions above, we're ready to extract the analogous contexts from RL to FATX, understanding how the FATX mechanism will work. + +#### Dealing with new txs + +1. A tx is received. It is verified for FATX, that is, the state function defined above is called, and there are two possibilities: + 1. If the current best block is `INACTIVE`, the tx is valid and remains in the mempool. + 2. If the current best block is `ACTIVE`, and + 1. If the tx is `INACTIVE`, it is invalid and is rejected. + 2. If the tx is `ACTIVE`, it is valid and remains in the mempool. + +For RL, the tx is invalid only if the best block's height is not enough. For FATX, the tx is invalid only if the best block is `ACTIVE` and the tx is `INACTIVE` (note: this must be verified for all features). + +#### Dealing with reorgs + +1. A reorg happens. Then, if a boundary block participates in the reorg (that is, there was a boundary block in either the previous or the new best chain): + 1. Both txs that were already in the mempool, and confirmed txs that came back to the mempool, may have become invalid. All txs in the mempool must be invalidated and removed from the storage. + +For RL, if the best chain decreases, all txs in the mempool are re-verified for reward lock. For FATX, if the best block's state changes, all txs in the mempool are invalidated and removed. + +Why is the FATX case more extreme? For RL, a simple RL re-verification guarantees that the new best chain conforms to the tx validation requirements. However, as FATX may affect pre-parsing validation, it's possible that a tx becomes invalid even if its FATX verification does not find any errors. Let's look at an example. + +When updating the `TxVersion` through the Feature Activation process, we can either add or remove values. For example, we could add a new value for a new nano contract, but we could also eventually remove it. This creates a symmetry, having the same consequence for both situations: + +- We set up the removal of an existing NC, and then the best block changes to `ACTIVE`, removing support for that NC +- We set up the addition of a new NC, and then the best block changes back to `INACTIVE`, removing support for that NC + +Therefore, in both cases we may end up with txs in the mempool that were parsed with the now removed NC's `TxVersion`. + +If we were to simply re-run FATX verification for all txs in the mempool, like it's done for RL, the tx could still be considered valid for its state, but in fact it shouldn't even exist, as it could not have been parsed considering the new current `TxVersion` possible values. We could store the tx's raw bytes, and then reparse and re-verify all of it, but for simplicity we just invalidate and remove all txs from the mempool, so the sync algorithm naturally re-syncs valid txs considering the updated `TxVersion`. + +There is indeed a performance penalty for throwing away the whole mempool, but the relevant fact here is that this can only happen when the reorg affects a boundary block, which exists only every two weeks. Therefore, it's guaranteed that this mempool throwaway will not happen during the two-week evaluation interval. Even at the boundaries, it will only happen if by chance the boundary block is part of a reorg, AND if the best block state transitions from or to `ACTIVE`. Therefore, it's guaranteed that the mempool throwaway and re-sync will happen **less than every two weeks**, which seems like a reasonable tradeoff. In fact, it will happen much less, as we'll likely only have a new feature transitioning to (or from) `ACTIVE` every few months. + +#### Dealing with new blocks + +1. An `ACTIVE` block can only confirm txs that are also `ACTIVE` (except for the first `ACTIVE` block). + +This guarantees that no `INACTIVE` txs are accepted after the first `ACTIVE` block is received. + +## Retrieving Feature States for Transactions + +To support the mechanism described above, a new `feature_states` metadata attribute will be introduced for transactions. It will be calculated according to the rules above and set in `BaseTransaction.update_initial_metadata()`. To calculate the feature state for a new tx, we first get this metadata from its parents. If any of them is `ACTIVE`, we return `ACTIVE`. Otherwise, we get the states of its parents' first block. If any of them is `ACTIVE`, we return `ACTIVE`. Otherwise, we return `INACTIVE`. The same function is also used for verification. + +### Mutability of the `feature_states` metadata + +There are two situations that could result in the need of updating this metadata. Instead, we want a solution where this metadata is immutable, for simplicity. + +#### When a reorg happens + +In this case, since affected transactions are discarded from the mempool, there's no need to update the metadata. The txs will be re-synced and their metadata will be calculated accordingly, as if they were new txs. + +#### When one of our parents is confirmed by an `ACTIVE` block + +When a parent tx is confirmed by an `ACTIVE` block, it's possible that our metadata would have to be updated. There are two sub-cases: + +1. This block is NOT the first `ACTIVE` block in the network + 1. In this case, we would have to be `ACTIVE` in the first place, according to the FATX validation rules. Therefore, no metadata update is necessary. +2. This block IS the first `ACTIVE` block in the network + 1. In this case our metadata could transition from `INACTIVE` to `ACTIVE`. If we did this, we would have to update the metadata and re-verify all our children. Instead, we'll mimic the reorg case, and force a purge and re-sync of the mempool. This is also very rare, as it only happens **once** for each feature. + +Other alternatives involve keeping track of mutable metadata states which introduces complexity. Also, it would have been necessary to update `HathorManager.get_new_tx_parents()` according to the FATX rules, so `ACTIVE` txs are returned when necessary. Otherwise, we could end up with `INACTIVE` txs in the mempool that are impossible to confirm. + +# Drawbacks +[drawbacks]: #drawbacks + +The drawback to this solution is the necessity of purging the mempool in some specific cases. As explained above, these cases are very rare, so this purging is guaranteed to not happen more often than once every two weeks. Actually, it's very likely it will only happen once every few months, if it ever happens. + +# Rationale and alternatives +[Rationale and alternatives]: #rationale-and-alternatives + +This RFC passed through a lot of completely different idea iterations before arriving at this solution. + +The first idea that was discarded right from the beginning was using signal bits for transactions, instead of leveraging the Feature Activation for Blocks. This would mean a completely separate process for transactions, and a lot of the already implemented requirements would have to be reimplemented. It made much more sense to leverage the existing block mechanism. + +Then, a whole other group of ideas was studied, that was trying to find a simple way of choosing an "associated feature block" for each transaction. The feature state of a transaction would simply be the feature state of that associated block. The problem was finding an associated block that makes sense, and that works when reorgs happens, while satisfying the requirement that the state of transactions cannot "retreat" from `ACTIVE` to `INACTIVE` when transactions are ordered by timestamp. We tried a lot of different options, using best blocks, first blocks, first blocks of parents, introducing a delay of an evaluation interval between a tx and its associated block, etc. After a lot of work analysing those options, we would always find a case where the requirements would be broken and the solution became convoluted. Finally, taking inspiration from the existing Reward Lock mechanism seemed like the right path. + +# Prior art +[prior-art]: #prior-art + +I couldn't find any prior art relevant to this RFC, considering Hathor's unique DAG architecture for transactions. While the Feature Activation for blocks is heavily based on the Bitcoin solution, the transactions solution is very specific to Hathor. + +# Task breakdown + +Here's a table of main tasks: + +| Task | Dev days | +|------------------------------------------|----------| +| Create the new `feature_states` metadata | 1 | +| Implement verification of new txs | 2 | +| Implement dealing with reorgs | 2 | +| Implement dealing with new blocks | 1 | +| **Total** | **6** |