Skip to content

Conversation

josephlr
Copy link
Contributor

@josephlr josephlr commented Nov 12, 2019

Depends on #37
Fixes #5

Now you can do the following:

cargo xbuild --target target.json --release --features=rom
sstrip target/target/release/hypervisor-fw
qemu-system-x86_64 \
    -machine type=q35,accel=kvm \
    -cpu host -smp 1 -m 1024 \
    -display none -serial stdio \
    -bios target/target/release/hypervisor-fw \
    -drive id=boot,file=clear-28660-kvm.img,format=raw,if=none \
    -device virtio-blk-pci,drive=boot,disable-legacy=on

And the firmware will go from the real-mode reset vector all the way to Rust.

@josephlr josephlr force-pushed the bios branch 4 times, most recently from ecb4507 to dc97d6b Compare November 13, 2019 10:51
Copy link
Contributor Author

@josephlr josephlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rbradford this code is ready for review (but we may want to merge #23 first). I had some questions (commented inline) that we should resolve before merging this.

src/asm/rom.s Outdated
Comment on lines 79 to 80
reset_vector: # 0xffff_fff0
# This code must be 16 bytes or less, so be careful when adding anyting.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an easy way to check in this file that the reset_vector code is exactly 16 bytes? I had to do objdump to make sure everything was setup correctly.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest placing only the jmp here, and moving the CR0 write after the jump. The interrupt flag is already 0, so CLI is not needed either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this to just have jmp rom16 at the reset vector, and then the rom16 code just follows the exact steps in the SDM to enter protected mode.

I kept the CLI, as the SDM says to call it before entering protected mode, but I might be misreading it (or reading the wrong section).

src/asm/ram32.s Outdated
Comment on lines 6 to 8
movw $0x3f8, %dx
movb $'R', %al
outb %al, %dx
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep these characters being printed to serial output? I found it useful when debugging/writing the assembly, but it is a little weird to see RPL printed right before the firmware starts up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for now, but maybe we could look into splitting out output between 0x402 (QEMU firmware I/O debug port) and the serial port. This could be a candidate for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it would be nice to have all of the log! statements go to debug output and only have SimpleTextOutputProtocol log to the standard serial port.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed all the serial logs from the assembly. I just have them in a separate branch for local debugging.

@rbradford
Copy link
Member

And the firmware will go from the real-mode reset vector all the way to Rust. Driver bug is preventing boot due to VIRTIO feature negotiation.

Looking good.

The virtio issue is really that the PCI bar address is zero but that zero bar is still the one that the virtio device is still saying to be used. I think understanding why that PCI bar is zero is the next step forward.

Copy link

@bonzini bonzini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest only leaving a jmp at the reset vector.

src/asm/rom.s Outdated
Comment on lines 79 to 80
reset_vector: # 0xffff_fff0
# This code must be 16 bytes or less, so be careful when adding anyting.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest placing only the jmp here, and moving the CR0 write after the jump. The interrupt flag is already 0, so CLI is not needed either.

Copy link
Contributor Author

@josephlr josephlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes to simplify and better document the code.

src/asm/ram32.s Outdated
Comment on lines 6 to 8
movw $0x3f8, %dx
movb $'R', %al
outb %al, %dx
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it would be nice to have all of the log! statements go to debug output and only have SimpleTextOutputProtocol log to the standard serial port.

src/asm/rom.s Outdated
Comment on lines 79 to 80
reset_vector: # 0xffff_fff0
# This code must be 16 bytes or less, so be careful when adding anyting.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this to just have jmp rom16 at the reset vector, and then the rom16 code just follows the exact steps in the SDM to enter protected mode.

I kept the CLI, as the SDM says to call it before entering protected mode, but I might be misreading it (or reading the wrong section).

@josephlr
Copy link
Contributor Author

@rbradford @bonzini do the fixes and comments look good? Is there anything else you want me to do here? I could include a high-level doc file explain the multistage boot process (rom16 -> rom32 -> ram32 -> ram64 -> Rust), if you think it's necessary.

@rbradford
Copy link
Member

@josephlr I think what you've got here is great. I do think it's a good idea to boot through to userspace before merging this as there is obviously something missing. We have a good data point now with the fact that PVH mode with Seabios does show up the PCI BAR address.

@josephlr
Copy link
Contributor Author

@josephlr I think what you've got here is great. I do think it's a good idea to boot through to userspace before merging this as there is obviously something missing. We have a good data point now with the fact that PVH mode with Seabios does show up the PCI BAR address.

Seems reasonable, I'll focus on #26, and try to get that booting to user-space first, and then come back to this (once we know why it's broken).

@rbradford
Copy link
Member

@josephlr I was able to dump the registers for QEMU and for CH just before jumping to the kernel. They should be in the same state:

https://gist.github.com/rbradford/7ccc15bc2c55d6423840896a42491ac0
vs
https://gist.github.com/rbradford/b452d5d5b8e3bddb39040996dede9516

I've not done a deep analysis yet but one obvious thing stands out and that is that the CS and CR0 look different.

@josephlr
Copy link
Contributor Author

I've not done a deep analysis yet but one obvious thing stands out and that is that the CS and CR0 look different.

I think the CS segments being different is OK (as most of the bits are ignored), CR0 difference is also interesting. However, manually setting the CR0 and CS bits to match CH didn't seem to help (kernel stopped in the same place).

@rbradford
Copy link
Member

In case you want to reproduce:

Modifying the main.rs just after "Jumping to kernel" to write the 0x80 I/O port. CH has a function that prints out debug messages when that is written to so it wasn't hard to print the KVM registers. For QEMU i modified the firmware to boot infinitely after the same message and used "-serial mon:stdio" and used the monitor (ctrl-a c) to get the registers ("info registers")

@bonzini
Copy link

bonzini commented Nov 15, 2019

Clearing CR0.CD and CR0.NW is a good idea anyway for the firmware, but it's only a matter of tidiness (disabling the cache or writeback doesn't work in VMs).

@josephlr
Copy link
Contributor Author

So setting earlyprintk=serial,keep on the command line is great, you get printing from super early in boot, before Linux even decompresses itself.

So the problem is a kernel panic due to it running out of memory.
Normal Boot with CH
Paniced boot with QEMU

@josephlr
Copy link
Contributor Author

Oh duh, we don't get an e820 map passed in with PVH, so we have to make it ourselves. Right now we're passing in a map with zero memory, and this makes Linux quite unhappy.

@rbradford
Copy link
Member

@josephlr Use the CMOS to get the limits and make one? Although CH currently has a CMOS implementation it might not be sticking around as nothing is currently using it so best use an E820 where available and fall back to CMOS where it's not.

@josephlr
Copy link
Contributor Author

It looks like pvh gives us an e820 map, so we should be able to just use that.

@josephlr
Copy link
Contributor Author

So it looks like regardless of how we run the firmware (QEMU, PVH, CH, Firecracker, etc..) we eventually need to get two pieces of information from the host:

All the other info we get from the host is optional.

We can get the info in the following way:

Boot Method RSDP e820 table
CH/Firecracker acpi_rsdp_addr in boot_params e820_table in boot_params
PVH rsdp_paddr in hvm_start_info memmap_paddr in hvm_start_info
QEMU BIOS etc/e820 fw_cfg file tables passed by QEMU

@rbradford
Copy link
Member

For Q35 rsdp_in_ram is set so I think the RSDP should be findable by scanning the EBDA region for "RSDP" as the spec suggests. For PC however it's only available via QEMU FW_CFG,

As we have a very simple memory model we might be able to get away with hardcoded ranges + the memory output details from the CMOS which might be easier than implementing QEMU FW_CFG.

Our 32-bit GDT just has 1 code and 1 data segment descriptor. They are
both flat (spanning the entire 4G address space) and are placed in
normal static memory (for now).

Signed-off-by: Joe Richey <joerichey@google.com>
The new assembly files handle:
  - reset.s: Jumping from reset
  - rom16.s: Transitioning to 32-bit mode
  - rom32.s: Copying data from ROM to RAM, jumping to PVH entry point

To place this code correctly, we add a new Program Header for the code
and data that expect to be in ROM. See the comments in layout.ld for
more information.

We also place the 32-bit GDT in the ROM. This is mostly for convenience,
as it lets us use the GDT directly from the ROM code without having to
do any complex offset calculations.

As laying out the code for a ROM makes the binary ~45% bigger, we gate
building as a ROM behind an optional feature.

Signed-off-by: Joe Richey <joerichey@google.com>
Without this change, QEMU/CloudHV will attempt to load the ROM into the
memory region below 4GiB. But something (i.e. SeaBIOS) is already there.

We just pick an arbitrary address. It doesn't actually matter where it
gets loaded, as the ROM code isn't used when doing PVH boot.

Signed-off-by: Joe Richey <joerichey@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support building as flat BIOS image

3 participants