[syslinux] iPXE chain to lpxelinux.0 6.03 inconsistencies and failures

Geert Stappers stappers at stappers.nl
Sat Nov 15 13:43:24 PST 2014


On Sat, Nov 15, 2014 at 02:22:03PM -0600, Alexander Perlis wrote:
> On 15 Nov 2014 05:06:52 +0200, Ady wrote:
> >
> >I would start by updating the BIOS.
> 
> Prudent advice. As it turns out, I'm already at the latest version.
> 
> 
> On 15 Nov 2014 07:31:27 +0100, Geert Stappers wrote:
> >
> >And would reduce 'iPXE => pxe.0 => lpxelinux.0 => "vmlinux"'
> >into 'iPXE => "vmlinux"'
> 
> That makes sense generally, but at the moment doesn't make sense for
> my particular circumstance.
> 
> I should clarify: I do not seek a workaround that eliminates iPXE or
> eliminates lpxelinux.0; instead, since I have a test combination
> that exposes a bug somewhere in iPXE or lpxelinux.0 (or both), I'd
> like to use this opportunity to assist the developers in getting
> that fixed.
> 
> Any iPXE or lpxelinux.0 developers who want to make the code more
> robust? What can I do to isolate the bug?

Quoting the original posting:

| I boot to a USB stick with iPXE, which then is told to "dhcp" and then 
| "chain http://xxx.xxx.xxx.xxx/pxe.0". That loads a version of 
| lpxelinux.0 6.03 that is configured (via pxelinux-options) with an 
| appropriate next-server, path-prefix, and config-file.
| 
| This all works great on a lot of different machines.
| 
| But specifically on the Dell Optiplex GX620 and Optiplex 645, which have 
| built-in Broadcom ethernet (the GX620 has 14e4:1677 [1028:01ad], and the 
| 645 has 14e4:167a [1028:01da]), there's a problem: first lpxelinux.0 is 
| correctly transferred, then control is indeed handed to lpxelinux.0 
| because the "PXELINUX 6.03 lwIP 2014-10-06" banner indeed appears, but 
| then the computer appears to be frozen, although it eventually says 
| "Failed to load ldlinux.c32". (At the server end there were no requests 
| to transfer anything.)

And what is visible with a network sniffer ( tcpdump, tshark, wireshark )
at the client?

| This can be further isolated to the built-in Broadcom ethernet (as 
| opposed to something else on the GX620 or 645) as follows: if on that 
| same hardware I insert a Linksys PCI card, and move the network cable to 
| that iPXE will DHCP & chain via that card, then there is no problem and 
| I end up at the graphical vesamenu.
| 
| 
| Now my question: where more specifically is the bug? What can I do to 
| help a developer isolate this?
| 
| For example, there could be a bug in the iPXE driver for the Broadcom 
| ethernet, a bug that doesn't affect iPXE's ability to load lpxelinux.0, 
| but then *does* affect lpxelinux.0's ability to ask iPXE to load the 
| next component. Or there could be a bug in lpxelinux.0, such as memory 
| management or stack management, which is simply being triggered by say 
| iPXE's Broadcom driver being say a different size than perhaps that of 
| most other drivers. Or who knows. (In case it helps: Back in July, Gene 
| posted that the problem may be related to commit 0c1dff8d.)
| 
| I'm happy to do testing, run a custom debug build and report output, or 
| whatever might help. Just need some pointers as to what to do. Any iPXE 
| or lpxelinux.0 developer is welcome to contact me.

For a "pointer": http://www.syslinux.org/wiki/index.php/Development/Debugging#Syslinux_Dynamic_Debugger


Groeten
Geert Stappers
-- 
Leven en laten leven


More information about the Syslinux mailing list