Live Migration: Instruction_Abort when executing restored VM

Recently, while developing Realm VM live migration, we encountered an `instruction_abort` issue.

## The specific scenario is as follows...  
When importing the Realm VM on the destination platform:
1. We first used `smc_rtt_init_ripas` to set the entire RAM area of the Realm VM as unassigned RAM.
2. Then, we established the IPA mapping using the smc_data_create interface. (Currently, our temporary solution does not consider efficiency, and we traverse all gfns in the kvm memslot.)
-  `delegate` the dst_granule.
-  `smc_create_data` .
-  if `smc_data_create` fails with `RMI_ERROR_RTT`, we create the missing RTT and retry.
(I've omitted the parts related to Qemu, describing only the operations in RMM here)
3.  We implemented a register import interface to load REC information from QemuFile.
4.  After the RAM and registers are imported, upon entering `smc_rec_enter`, the vCPU executes the first instruction pointed to by the PC, which results in an instuction_abort. 

## Environment:
● Simulation platform is FVP, ShrinkWrap cca-3-world.
● All components (QemuVMM, KVM and RMM) in this cca-3-world environment have new code added, but we have kept the original interfaces unchanged.  
● This bug might be impracticable to reproduce, so I'll try my best to desceibe it..

## ShrinkWrap Log:
![image](https://github.com/TF-RMM/tf-rmm/assets/60060864/37c9d0cf-cb00-4631-bd08-c5350d559022)

## Discussions: 

The RMM spec describes the cause of instruction abort as follows:
![image](https://github.com/TF-RMM/tf-rmm/assets/60060864/fb531fe3-0d10-46b6-8ecb-de512bed02c0)

However, for S2TTEs with a valid IPA, the states of RIPAS and HIPAS will not be checked, refer to this in issue #21 :
> We re-use some bits in pte (namely bits 5 and 6) for storing the RlPAs state when the pte is invalid. When the pte is valid, TfRMM assumes that RlPAS is always RlPAS_RAM and hence we do not refer to these bits for a valid pte.

By the way, if we don't populate the Realm VM's memory, only load the REC registers, and start running, the Realm VM will enter an endless loop because RMM choose to handle the `instruction_abort` himself. However, with the memory populated, RMM forwards the `instruction_abort` to KVM, and the system panics. Therefore, I guess the memory import is at least partially correct...
> The logic for handling inst_abort in the relevant code：
>  ![image](https://github.com/TF-RMM/tf-rmm/assets/60060864/bf1605db-bc3e-475b-abe1-fd3398c5ec99)

## Conclusion:
We are not concerned about privacy and performance at this stage; we only wish to verify whether VM can restart successfully on dst platfrom after populating all the plaintext-exported guest pages back to their original IPA.

 Our questions can be summarized into two:  
1. Since RIPAS is not checked, why and how does the CCA hardware trigger `instruction_abort` ?  
2. Did we mess up anything in the import of Realm pages and REC registers?  


We sincerely appreciate your ongoing assistance. If you need more information or have any suggestions, please let me know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Live Migration: Instruction_Abort when executing restored VM #27

The specific scenario is as follows...

Environment:

ShrinkWrap Log:

Discussions:

Conclusion:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Live Migration: Instruction_Abort when executing restored VM #27

Description

The specific scenario is as follows...

Environment:

ShrinkWrap Log:

Discussions:

Conclusion:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions