[syslinux] PXELinux sporadic hangs searching for config file

Joe Mroczek mr.joem at gmail.com
Sat Jan 28 19:24:59 PST 2006


Very sporadicially (1 out of every 40-5000 boots), our blade system will
hang indefinately while PXE Linux is attempting to locate it's configuration
file. This is causing our automated testing to hang and generate failures. I
have put more details below. My concerns are threefold:

1) System hangs
2) PXE Linux bootstrap never seems to retry the transaction
3) PXE Linux bootstrap never seems to reboot the system

Versions of PXE Linux Used:
3.11, 2.04

PXE Agent
Intel Boot Agent for e1000 NICs 1.2.14, 1.2.16(latest)

NICs
2x e1000 (82544ei integrated into board)

CPUs
2x LV Xeon @ 1.6Ghz or 2.0Ghz

Chipset
LV E7501

Log from PXE Server:
Jan 24 16:34:34 ssh-pad dhcpd: DHCPREQUEST for 192.168.77.254 (192.168.77.77)
from 00:0e:0c:52:d0:8b via eth1
Jan 24 16:34:34 ssh-pad dhcpd: DHCPACK on 192.168.77.254 to
00:0e:0c:52:d0:8b via eth1
Jan 25 00:34:34 ssh-pad in.tftpd[7324]: RRQ from 192.168.77.254 filename
pxelinux.0
Jan 25 00:34:34 ssh-pad in.tftpd[7324]: tftp: client does not accept options
Jan 25 00:34:34 ssh-pad in.tftpd[7325]: RRQ from 192.168.77.254 filename
pxelinux.0
Jan 25 00:34:35 ssh-pad in.tftpd[7326]: RRQ from 192.168.77.254 filename
pxelinux.cfg/01-00-0e-0c-52-d0-8b
Jan 25 00:34:35 ssh-pad in.tftpd[7326]: sending NAK (1, File not found) to
192.168.77.254
Jan 25 00:34:35 ssh-pad in.tftpd[7327]: RRQ from 192.168.77.254 filename
pxelinux.cfg/C0A84DFE
Jan 25 00:34:35 ssh-pad in.tftpd[7327]: sending NAK (1, File not found) to
192.168.77.254
Jan 25 00:34:35 ssh-pad in.tftpd[7328]: RRQ from 192.168.77.254 filename
pxelinux.cfg/C0A84DF
Jan 25 00:34:35 ssh-pad in.tftpd[7328]: sending NAK (1, File not found) to
192.168.77.254
Jan 25 00:34:35 ssh-pad in.tftpd[7329]: RRQ from 192.168.77.254 filename
pxelinux.cfg/C0A84D
Jan 25 00:34:35 ssh-pad in.tftpd[7329]: sending NAK (1, File not found) to
192.168.77.254
Jan 25 00:34:35 ssh-pad in.tftpd[7330]: RRQ from 192.168.77.254 filename
pxelinux.cfg/C0A84
Jan 25 00:34:35 ssh-pad in.tftpd[7330]: sending NAK (1, File not found) to
192.168.77.254
Jan 25 00:34:35 ssh-pad in.tftpd[7331]: RRQ from 192.168.77.254 filename
pxelinux.cfg/C0A8
Jan 25 00:34:35 ssh-pad in.tftpd[7331]: sending NAK (1, File not found) to
192.168.77.254
Jan 25 00:34:35 ssh-pad in.tftpd[7332]: RRQ from 192.168.77.254 filename
pxelinux.cfg/C0A
Jan 25 00:34:35 ssh-pad in.tftpd[7332]: sending NAK (1, File not found) to
192.168.77.254

Ethereal shows the last NAK going across the wire. We has Intel come in and
look for issues within their boot agent and were able to trace the NAK all
the way up to pxelinux. Further they could not find any sign of corruption
of the PXE structure and the RX and TX routines in boot agent were still
running. For some reason pxelinux was still calling the RX routine as if it
was still expecting something.

Any clues on how to either debug the root issue, or at least get PXELinux to
retry the failed transaction?

Regards,

Joe M.

Can you provide any details on how



More information about the Syslinux mailing list