[syslinux] Strange behavior

Gene Cumm gene.cumm at gmail.com
Thu Nov 17 14:24:29 PST 2011


I'm sorry.  I think I've gotten down a path that leaves something to be
desired.  How about a fresh start?  I'm going to try to summarize your
environment and situation. Corrections are greatly appreciated.

You're using VMware vCenter 4.1.0 (unknown update/build level) to control a
VMware ESX/ESXi host (of an unknown version/update/build level). On the
host, you have a VM (of unknown VMHW version and current and initial Guest
OS hint) with a vNIC (of unknown type) connected to a port group on a
vSwitch (classic or distributed) performing a PXE boot (unknown dhcpd,
tftpd).

The dhcpd is configured to hand out a tftpd server (via the sname field or
option 66) and file (via the file field or option 67) to hand out
gpxelinux.0 from Syslinux-4.04. You've additionally configured option 209
to "menu.pl" and 210 to "http://10.250.50.72/tftpboot/gpxe/". In the
returned config, you have a SAY directive (which is echoed during config
parsing) and a line (presumably placed correctly) with "UI menu.c32". In
your Apache httpd logs for 10.2850.50.72, you never/rarely see the request
for menu.c32 from the VM but you do see at least some of the requests for
the fallback extensions. It takes about 30 seconds for PXELINUX to fail
fetching menu.c32*.

You then used a physical machine to also test this PXE boot and noticed
similar behaviors. You note that creating the symlink of menu.c32.c32 to
menu.c32 ensured that it always loaded the menu properly.

I just tested a similar scenario using atftpd on the same subnet as the
target VM, ESXi 4.1.0u2-502767, fresh VM, guest OS hint RHEL-6 32b (which
uses VMHWv7 and VMXNET3), gpxelinux.0 and menu.c32 from Syslinux-4.04,
lighttpd on the same OS instance/interface as atftpd, and a static config
file of 280 bytes with "UI menu.c32" as the first line and "SAY filename"
as the last. In this scenario, I notice virtually zero delay between the
SAY and the menu. I do notice some delay between it attempting to load the
config and saying it loaded it OK but this may actually be a delay while I
had dropped the .c32 from the name and the console refresh possibly not
being instantaneous.

Based on this summary and tests of my own, I suspect, as I did before, that
there's a communication issue, likely outside of gPXE/PXELINUX. I'd advise
starting with a packet capture on the machine with the Apache httpd and
from another VM on the same host/port group with the security of the port
group set to allow promiscuous mode (which should allow you to capture all
packets in and out of the VM performing the PXE boot, along with all the
other VMs in the port group unfortunately). If you have experience/interest
in packet captures, examining these two captures should shed some light on
what the issue may be.  Of course, I would classify these captures as
private as they may include personal info, especially if not filtered
properly.

Of all of the unknown details above, the details that might be most
relevant include what type of vSwitch and the effective security controls
of the port group. However, if my assumptions about your phy machine are
correct, these might not be the source of the issue but rather the VM may
exacerbate the root cause.  I'd lean towards security-like options like
firewalls and security options on your Apache httpd.

-- 
-Gene



More information about the Syslinux mailing list