Skip to content

instances with read-only boot disk fail to boot #1059

@jordanhendricks

Description

@jordanhendricks

Read-only disks were integrated as a control plane feature in oxidecomputer/omicron#9731. Today in dogfood I attempted to boot a variant of a debian image we've been using to test RFD 605 work. The instance (as well as future attempts to boot an instance with a RO disk) failed to boot, dropping into the UEFI shell, e.g.:

UEFI Interactive Shell v2.2
EDK II
UEFI v2.70 (EDK II, 0x00010000)
Mapping table
      FS0: Alias(s):HD0p:;BLK3:
          PciRoot(0x0)/Pci(0x10,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(15,GPT,6012ACAE-A51D-C14F-BB91-DBBFB6CF0F77,0x2000,0x3E000)
      FS1: Alias(s):F1:;BLK4:
          PciRoot(0x0)/Pci(0x18,0x0)
     BLK0: Alias(s):
          PciRoot(0x0)/Pci(0x10,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)
     BLK1: Alias(s):
          PciRoot(0x0)/Pci(0x10,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(1,GPT,E6842644-BB59-324A-BFF6-F7D01B8B2CB2,0x40000,0x3BFFDF)
     BLK2: Alias(s):
          PciRoot(0x0)/Pci(0x10,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(14,GPT,26E893FC-03C9-6049-B821-276C1F0B03A4,0x800,0x1800)

@hawkw and @jmpesp and I spent some time looking at this today. A collection of observations from today:

  • booting from a read-only disk worked about a week ago (and eliza tested with this exact image; known good propolis sha is 2aa7f9d0ee84a1c45e821d6444b1d2f0e69b743e)
  • it is possible to boot from the same image as a normal (read-write) disk
  • james was able to reproduce this failure trivially in the canada region
  • the md5sum of the image, as uploaded to the control plane from james' desktop, matches the md5sum of the disk when it is attached as read-write disk to a working instance (implying crucible has not seen corruption of the disk)
  • UEFI does not seem to believe there is a file system on the read-only disk (e.g., ls FS0 shows nothing and no files), but there is valid data in the blocks on disk (e.g., seen via dblk blk0 1 in the UEFI shell), and that data seems to match the image file locally
  • the presence of a NIC on the instance did not change the behavior (i.e., I tried booting the image as a RO disk both with and without a NIC on the instance, and they failed the same way)
  • this image is bootable as a read-only disk on qemu

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions