Cloudflare Resolves Firmware-Induced Boot Bottleneck in Gen12 Server Fleet
A firmware regression in Cloudflare's Gen12 bare-metal fleet caused boot times to balloon from minutes to hours due to inefficient UEFI network interface polling.

Cloudflare recently identified and resolved a significant performance regression affecting its Gen12 bare-metal server fleet, which serves as the backbone for the company's control plane, billing, and analytics infrastructure. Following a routine firmware update, engineers observed that server reboot times increased dramatically, stretching from a standard duration of a few minutes to nearly four hours. This delay created substantial operational friction, as maintenance windows expanded and automated firmware upgrade cycles—which require multiple sequential reboots—became prohibitively slow.
The root cause was traced to an over-eager linear search process within the UEFI firmware. During the boot sequence, the affected servers were configured to probe multiple network boot interfaces to locate the operating system image. The firmware attempted to initialize IPv4 HTTPS and iPXE boot sequences, each of which would hang and time out before the system eventually fell back to the successful IPv6 HTTPS boot interface. Because each failed attempt incurred a multi-minute timeout penalty, the cumulative effect of these sequential failures resulted in the observed four-hour boot duration.
Cloudflare's infrastructure relies heavily on Preboot Execution Environment (PXE) and iPXE for automated, scalable server provisioning. iPXE, an open-source network boot firmware, is typically used to orchestrate programmable workflows that allow for rapid deployment across globally distributed data centers. The regression effectively broke this automation, forcing engineering teams to manually monitor and manage reboots that were previously handled by unattended scripts.
To address the issue, the engineering team had to bypass the firmware's default "search-and-wait" behavior. By restructuring the boot automation workflow, they were able to explicitly declare the correct network boot interface order during the pre-boot PXE stage. This configuration change ensured that the UEFI firmware would immediately target the correct interface, eliminating the unnecessary timeout cycles that had been stalling the boot process.
The incident highlights the complexities of managing large-scale bare-metal environments, where minor firmware quirks can have cascading effects on operational availability. Cloudflare noted that the fix required navigating vendor-specific string formats and overcoming limitations in existing boot automation sequences. By optimizing the UEFI boot order, the team successfully restored the expected boot performance, ensuring that future firmware updates can be deployed without the risk of extended downtime.