[syslinux] iPXE chain to lpxelinux.0 6.03-pre17 inconsistencies and failures

Gene Cumm gene.cumm at gmail.com
Wed Jul 2 02:56:18 PDT 2014


On Wed, Jul 2, 2014 at 12:42 AM, Geert Stappers <stappers at stappers.nl> wrote:
> Op 2014-07-01 om 22:55 schreef Gene Cumm:
>> On Jul 1, 2014 10:37 PM, "Alexander Perlis" wrote:
>> >
>> > I believe I'm seeing a bug in lpxelinux.0 6.03-pre17 but I need some
>> > advice on how to isolate and troubleshoot this. (I can't try pre18
>> > at the moment, but did try 4.07 and 5.10 and saw similar behavior,
>> > also with pxelinux.0, so although I'll give pre18 a try soon, some
>> > isolation/troubleshooting advice will be a good education no matter
>> > what.)
>>
>> Odd. 4.07 should be good but the 4.10/5.1*/6.0* revisions make sense.
>>
>> > To get to our PXE-launched tools from hosts on a subnet without proper
>> > DHCP support (e.g., on a NAT or in a different building), we're trying
>> > to use small iPXE USB thumb drives and/or iPXE CD-ROMs, obtained from
>> > rom-o-matic.eu, which then chainload to lpxelinux.0 off our actual
>> > PXE server. (We used pxelinux-options to put a "-b pxe.ip.address"
>> > into lpxelinux.0, so that it would know the server IP for grabbing
>> > the subsequent libxxx and config files.)
>> >
>> > On some hosts we successfully get all the way to the graphical
>> > vesamenu.c32 under lpxelinux.0, while on other hosts we reach the
>> > initial lpxelinux.0 banner line but then the host hangs (and server
>> > shows no attempt to grab libxxx or config files), while on other
>> > hosts there is a reboot as soon as control is handed to lpxelinux.0
>> > (and unclear whether the banner line is printed, as the reboot blanks
>> > the screen too quickly).
>> >
>> >  Intel Macs: local-ipxe->lpxelinux.0->banner->vesamenu.c32->success
>> >  Dell GX620: local-ipxe->lpxelinux.0->banner->hang
>> >  Dell 780:   local-ipxe->lpxelinux.0->instant-reboot
>> >
>> > I'm not sure how to dig deeper. I'm using the precompiled binaries. Is
>> > it easy to compile a debug version that spits out verbose progress
>> > prior and after the banner and perhaps pauses for user input?
>> >
>> > I'm guessing ipxe is somehow setting the stage in a way that is caught
>> > by something finicky in lpxelinux.0 on certain hardware, or perhaps
>> > there's a bug in how ipxe sets the stage. Just to eliminate that latter
>> > variable, any recommendations for a non-ipxe-way to boot off a CD or
>> > USB to then PXE-boot to a specific server (not via DHCP)?
>>
>> This is probably related to a bisect I did recently. I found the culprit
>> commit in my case but a blind revert feels wrong.
>
> Please post in this thread what was found with `git bisect`.
> If it is in our mailinglist archive, then please reference to it.

What I found when I attempted 'git bisect start syslinux-5.10-pre1
syslinux-5.02-pre3' was that there's a range of 19 commits before
4.10-pre1 that are painful to deal with as they had build failures for
me.  Manipulating lwipopts.h closer to the current version, I made it
buildable and found that commit 0c1dff8d which is supposed to prevent
some stack issues appears to cause some others.  My educated guess is
that we run into a state of a buffer underrun/overrun.

This bug is so far only observed in combination with gPXE/iPXE on
select hardware.

-- 
-Gene


More information about the Syslinux mailing list