Recently, while developing Realm VM live migration, we encountered an instruction_abort issue.
The specific scenario is as follows...
When importing the Realm VM on the destination platform:
- We first used
smc_rtt_init_ripas to set the entire RAM area of the Realm VM as unassigned RAM.
- Then, we established the IPA mapping using the smc_data_create interface. (Currently, our temporary solution does not consider efficiency, and we traverse all gfns in the kvm memslot.)
delegate the dst_granule.
smc_create_data .
- if
smc_data_create fails with RMI_ERROR_RTT, we create the missing RTT and retry.
(I've omitted the parts related to Qemu, describing only the operations in RMM here)
- We implemented a register import interface to load REC information from QemuFile.
- After the RAM and registers are imported, upon entering
smc_rec_enter, the vCPU executes the first instruction pointed to by the PC, which results in an instuction_abort.
Environment:
● Simulation platform is FVP, ShrinkWrap cca-3-world.
● All components (QemuVMM, KVM and RMM) in this cca-3-world environment have new code added, but we have kept the original interfaces unchanged.
● This bug might be impracticable to reproduce, so I'll try my best to desceibe it..
ShrinkWrap Log:

Discussions:
The RMM spec describes the cause of instruction abort as follows:

However, for S2TTEs with a valid IPA, the states of RIPAS and HIPAS will not be checked, refer to this in issue #21 :
We re-use some bits in pte (namely bits 5 and 6) for storing the RlPAs state when the pte is invalid. When the pte is valid, TfRMM assumes that RlPAS is always RlPAS_RAM and hence we do not refer to these bits for a valid pte.
By the way, if we don't populate the Realm VM's memory, only load the REC registers, and start running, the Realm VM will enter an endless loop because RMM choose to handle the instruction_abort himself. However, with the memory populated, RMM forwards the instruction_abort to KVM, and the system panics. Therefore, I guess the memory import is at least partially correct...
The logic for handling inst_abort in the relevant code:

Conclusion:
We are not concerned about privacy and performance at this stage; we only wish to verify whether VM can restart successfully on dst platfrom after populating all the plaintext-exported guest pages back to their original IPA.
Our questions can be summarized into two:
- Since RIPAS is not checked, why and how does the CCA hardware trigger
instruction_abort ?
- Did we mess up anything in the import of Realm pages and REC registers?
We sincerely appreciate your ongoing assistance. If you need more information or have any suggestions, please let me know.
Recently, while developing Realm VM live migration, we encountered an
instruction_abortissue.The specific scenario is as follows...
When importing the Realm VM on the destination platform:
smc_rtt_init_ripasto set the entire RAM area of the Realm VM as unassigned RAM.delegatethe dst_granule.smc_create_data.smc_data_createfails withRMI_ERROR_RTT, we create the missing RTT and retry.(I've omitted the parts related to Qemu, describing only the operations in RMM here)
smc_rec_enter, the vCPU executes the first instruction pointed to by the PC, which results in an instuction_abort.Environment:
● Simulation platform is FVP, ShrinkWrap cca-3-world.
● All components (QemuVMM, KVM and RMM) in this cca-3-world environment have new code added, but we have kept the original interfaces unchanged.
● This bug might be impracticable to reproduce, so I'll try my best to desceibe it..
ShrinkWrap Log:
Discussions:
The RMM spec describes the cause of instruction abort as follows:

However, for S2TTEs with a valid IPA, the states of RIPAS and HIPAS will not be checked, refer to this in issue #21 :
By the way, if we don't populate the Realm VM's memory, only load the REC registers, and start running, the Realm VM will enter an endless loop because RMM choose to handle the
instruction_aborthimself. However, with the memory populated, RMM forwards theinstruction_abortto KVM, and the system panics. Therefore, I guess the memory import is at least partially correct...Conclusion:
We are not concerned about privacy and performance at this stage; we only wish to verify whether VM can restart successfully on dst platfrom after populating all the plaintext-exported guest pages back to their original IPA.
Our questions can be summarized into two:
instruction_abort?We sincerely appreciate your ongoing assistance. If you need more information or have any suggestions, please let me know.