[syslinux] EFI & PXE-booting: very slow TFTP performance on a VMWare test setup
Oscar Roozen
oscar.roozen at brightcomputing.com
Wed Jun 17 08:24:28 PDT 2015
Dear people on the Syslinux Mailinglist,
Are there any known problems with the performance of TFTP in (U)EFI
environments in general or maybe just on VMWare?
Right now I'm running tests on two virtual environments using either
VMWare Workstation 11.1.0 on a Fedora 21 system or VMWare Player 7.1.0
on an Ubuntu 14.10 system. Both virtual systems support PXE booting in
UEFI mode. Both systems behave exactly the same.
The test-setup is as follows:
Master:
2 CPU, 2GB RAM,
Network Adapter "Host-only"
Centos 7.1
* syslinux-6.03
* isc-dhcpd-V3.0.5-RedHat
* atftp-0.7.1
Slaves:
2 CPU, 2GB RAM,
Network Adapter "Host-only"
Configured to boot from the network only
The Master is configured to provide a pxelinux.0 and supporting stuff
through tftp. The client boots, gets a DHCP answer, loads pxelinux.0,
presents a menu through menu.c32, defaults to loading the kernel and an
initrd and starts booting from there. No problems here in BIOS mode. The
whole process takes about less than 8 seconds, measured from the moment
I hit enter in the menu.
One alternate choice in our menu is loading a rescue-environment that
has a much bigger initrd (87M instead of 13M) which loads in 12 seconds.
Now I'm testing the same setup in UEFI mode. The client boots, gets an
answer from the DHCPd, loads efi64/syslinux.efi and the supporting stuff
from the efi64-subtree, presents a menu, starts loading the kernel and
initrd and continues booting from there. Sounds good? Yes, but instead
of 8 seconds, this all takes at least a few minutes! And quite often the
system just hangs before even presenting the menu or presents an empty
menu, indicating it failed to load one of the .c32 files or menu
definitions. No errors are shown. Sometimes VMWare pop-ups appear
indicating the machine crashed on invalid code or could not load an OS.
If the boot process finishes successfully, the kernel and initrd go on
into our custom installer and after the slave is up and running, its
network performance (actually, all performance) is equal to the slave
that was booted in BIOS mode.
It turns out to be impossible to load the rescue environment though.
Loading the initrd of 87M takes longer than 15 minutes. After 15 minutes
the VMWare workstation (or is it syslinux?) decides to reboot.
By the way, This is no arp issue or something. I did check to see if
both slaves have different MAC addresses and I think I covered most
other obvious causes. The only obvious thing I have not yet tried is to
run the setup on real machines or another Virtual Environment.
(Actually, I tried VirtualBox first but could not get it to PXE boot in
UEFI mode)
So, the simple question is: Is anyone else experiencing slow performance
while network-booting in UEFI mode? Is it caused by the UEFI API
implementation of VMWare, is there an issue in gnu-efi or is it even the
way syslinux interacts with the EFI-layer that causes this?
This is the question I've been asking myself for the past few days.
I've been debugging the syslinux code by inspection and adding debug
statements everywere. I also removed "git submodule update --init"
from efi/build-gnu-efi.sh so I could rebuild syslinux locally using:
# make spotless
# make bios
# make efi64
What I've found so far:
/* TCPdump/Wireshark analysis - problem 1 */
Running tcpdump from the host on /dev/vmnet1 I found the following.
The file "syslinux.efi" loads from a client that does not understand the
additional options. This is probably the native UEFI firmware in action.
Also, this step never fails. The next file will always at least start to
load, even when the slave decides to crash before (or after?) the next
file finnishes downloading. Even ldlinux can fail to load.
On bigger files, there is an increasing chance that core_udp_recv in
efi/udp.c hits the 15ms (or is it 15 undefined jiffies?) timeout and
returns -1 to its caller in core/fs/pxe/tfpt.c:tftp_get_packet().
This function then goes to ack_again, causing atfptd to resend the packet
that just timed out. Then on the next call the original(!) packet is
received anyway and handled according to plan.
But the next packet to be received will be the same packet again,
causing tftp.c to hit this code:
if (serial != last_pkt) {
/*
* Wrong packet, ACK the packet and try again.
* This is presumably because the ACK got lost,
* so the server just resent the previous packet.
*/
< #if 0
< printf("Wrong packet, wanted %04x, got %04x\n", \
< htons(last_pkt), htons(*(uint16_t *)(data+2)));
---
> #if 1
> printf("Wrong packet, wanted %04x, got %04x\n", \
> last_pkt, serial);
#endif
goto ack_again;
}
So, what happens now is that the next packet will also be sent twice.
>From now on, every packet will be sent twice! Until another one hits the
timeout. From then on each packet will be sent three times.
Unfortunately, this is not *the* problem. It accounts for some delay but
this timeout only happens a few times. Typically 1 to 3 times during a
download. It should still load the kernel and initrd in less than 24s,
not in over 5 minutes. So, there's something else going on, but what?
I tried to figure out why a UDP packet on a virtual network with
unlimited bandwidth and zero packet-loss would hit a 15ms timeout. The
comment says it waits for 15ms, but it does this by counting volatile
uint32_t __jiffies, which gets incremented from efi/main.c by an event
initiated from the UEFI API. According to the comment near the #define
DEFAULT_TIMER_TICK_DURATION, the jiffies should increment every 50 msec.
So a jiffie is not 1 msec but 50 times longer. If this comment is
correct, the actual timeout isn't 15ms but 750ms. Or is it 750us because
a few zero's are missing? I don't have the UEFI spec so I can't verify.
What I did next was just increase the timeout in efi/udp.c:core_udp_recv
to 60 jiffies. This made the double-ACK problem disappear completely.
>From now on, every packet is received within the timeout. All calls to
core_udp_recv return 0. But still, the booting process sometimes crashes
and loading the kernel and initrd takes more than a few minutes. I have
yet to successfully boot the rescue environment through TFTP, because it
still takes more than 15 minutes to load.
This leads me to the conclusion that this double-ACK problem is not the
cause of the bad performance, so I had to investigate further.
/* TCPdump/Wireshark analysis - problem 2 */
After configuring wireshark to display the difference in time between
each DATA and ACK packet, I found another weirdness. Obviously, there is
a delay between sending an ACK after receiving a DATA packet on the
client side and between receiving an ACK and sending the next DATA
packet. To illustrate:
Client:
* receives a DATA pkt
(delay1)
* sends an ACK pkt
(wait...)
Server:
* receives an ACK pkt
(delay2)
* sends the next DATA pkt
(wait...)
Delay1 and delay2 are caused by the time it takes to evaluate the packet
and to compute what packet to send next. No fancy stuff, just some easy
calculations and memory transactions. So they both should be very short
and stay constant. But they don't.
The server does just fine. It typically reacts within 0.000100 seconds.
The client also starts out fine, reacting within about 0.000250 seconds,
but this delay slowly increases.
Look at the beginning of the download of vmlinuz:
No. Time Source Destination Protocol Length Info
15002 0.274814 10.4.0.1 10.4.255.254 TFTP
105 Read Request, File: efi64/images/default-image/initrd, Transfer
type: octet, tsize\000=0\000, blksize\000=1408\000
15003 0.001082 10.4.255.254 10.4.0.1 TFTP
72 Option Acknowledgement, tsize\000=13442875\000,
blksize\000=1408\000
15004 0.003735 10.4.0.1 10.4.255.254 TFTP
60 Acknowledgement, Block: 0
15005 0.000081 10.4.255.254 10.4.0.1 TFTP
1454 Data Packet, Block: 1
15006 0.000236 10.4.0.1 10.4.255.254 TFTP
60 Acknowledgement, Block: 1
15007 0.000076 10.4.255.254 10.4.0.1 TFTP
1454 Data Packet, Block: 2
15008 0.000220 10.4.0.1 10.4.255.254 TFTP
60 Acknowledgement, Block: 2
15009 0.000051 10.4.255.254 10.4.0.1 TFTP
1454 Data Packet, Block: 3
15010 0.000244 10.4.0.1 10.4.255.254 TFTP
60 Acknowledgement, Block: 3
15011 0.000038 10.4.255.254 10.4.0.1 TFTP
1454 Data Packet, Block: 4
The time displayed is the time since the previous displayed packet. To
after receving packet 15005, the client needs 0.000236 seconds to send
the ACK for block 1 in packet 15006. The server then takes 0.000076
seconds to send block 2. And so on...
Now fast forward a few thousand packets and observe the delays:
51117 0.000115 10.141.255.254 10.141.0.1 TFTP
1454 Data Packet, Block: 9120
51118 0.023451 10.141.0.1 10.141.255.254 TFTP
60 Acknowledgement, Block: 9120
51119 0.000098 10.141.255.254 10.141.0.1 TFTP
1454 Data Packet, Block: 9121
51120 0.017573 10.141.0.1 10.141.255.254 TFTP
60 Acknowledgement, Block: 9120
51121 0.000088 10.141.255.254 10.141.0.1 TFTP
1454 Data Packet, Block: 9121
51122 0.011682 10.141.0.1 10.141.255.254 TFTP
60 Acknowledgement, Block: 9121
51123 0.000089 10.141.255.254 10.141.0.1 TFTP
1454 Data Packet, Block: 9122
51124 0.017590 10.141.0.1 10.141.255.254 TFTP
60 Acknowledgement, Block: 9121
51125 0.000088 10.141.255.254 10.141.0.1 TFTP
1454 Data Packet, Block: 9122
51126 0.017732 10.141.0.1 10.141.255.254 TFTP
60 Acknowledgement, Block: 9122
51127 0.000114 10.141.255.254 10.141.0.1 TFTP
1454 Data Packet, Block: 9123
51128 0.011833 10.141.0.1 10.141.255.254 TFTP
60 Acknowledgement, Block: 9122
51129 0.000097 10.141.255.254 10.141.0.1 TFTP
1454 Data Packet, Block: 9123
51130 0.017628 10.141.0.1 10.141.255.254 TFTP
60 Acknowledgement, Block: 9123
51131 0.000088 10.141.255.254 10.141.0.1 TFTP
1454 Data Packet, Block: 9124
51132 0.017785 10.141.0.1 10.141.255.254 TFTP
60 Acknowledgement, Block: 9123
51133 0.000090 10.141.255.254 10.141.0.1 TFTP
1454 Data Packet, Block: 9124
51134 0.017949 10.141.0.1 10.141.255.254 TFTP
60 Acknowledgement, Block: 9124
51135 0.000090 10.141.255.254 10.141.0.1 TFTP
1454 Data Packet, Block: 9125
51136 0.030592 10.141.0.1 10.141.255.254 TFTP
60 Acknowledgement, Block: 9124
51137 0.000097 10.141.255.254 10.141.0.1 TFTP
1454 Data Packet, Block: 9125
51138 0.037680 10.141.0.1 10.141.255.254 TFTP
60 Acknowledgement, Block: 9125
See how each Acknowledgement now takes 100 times as long to send? We're
not in 0.000x territory anymore, we're down to a handful of packets per
second.
On the next transfer, the delays are back to normal again, just to start
increasing and increasing, again slowing down to a crawl.
I've been ploughing through the code for quite some time now, but I
cannot explain why this happens. The main obstacle for me is that I
barely understand how this glue-code to the UEFI API actually works and
that I don't have a copy of the spec.
So, is there anybody on this list willing to give it a shot?
--
Oscar Roozen
Linux Developer
Bright Computing BV
More information about the Syslinux
mailing list