[syslinux] PXE boot problem.

H. Peter Anvin hpa at zytor.com
Mon Jun 16 14:44:31 PDT 2008


Tony Heaton wrote:
> I've searched all the archives and can't find anything relating to my
> problem.  I have a 128 node cluster.  Al NICS are Nvidia.  I have them
> on a network with other machines.  I do PXE installs of Fedora on them.
> These 128 nodes have a problem with the install.  The initial DHCP
> requests get answered, the machine accepts and then the TFTP sequence
> takes place.  The second DHCP request is sent and the server sees it and
> answers.  The offer gets to the machine ( I've seen it with a tap on the
> wire ).  The machine doesn't respond to the offer.  This happens about
> 99.9% or more of the time on all machines.  If I try and set the address
> by statically I get the same behavior.  It appears the NIC is up but I
> can't ping it but I occasionally see packets coming from it.  I have
> another network that is very similar.  If I move the nodes to that
> network the install works properly  Package/DHCP servers are set up
> identically on both networks.  Each network has similar devices but not
> identical.  The main router on the non-working network is an Extreme
> Aspen.  The main router on the working network is a Cisco 6800.  One
> problem I have is that the 128 nodes are all headless so I can't see any
> logging information on the serial port during the install.  Any
> suggestions are appreciated.

There was a bug in the Nvidia PXE stack at some point that would make 
the NIC unavailable not just to the Linux forcedeth driver, but also to 
the Windows driver.  There seems to have been an update to the driver to 
reset the appropriate state, because I haven't been able to reproduce it 
recently, but depending on your kernel version this might be the problem 
you're seeing.

	-hpa




More information about the Syslinux mailing list