[syslinux] lpxelinux hangs under Intel Boot Agent 1.3.81 (2.1 build 089) on Dell Optiplex 990 BIOS A16

Gene Cumm gene.cumm at gmail.com
Sat Jul 12 13:26:40 PDT 2014


On Sat, Jul 12, 2014 at 3:38 PM, Alexander Perlis <aperlis at math.lsu.edu> wrote:
> On 07/12/2014 02:24 PM, Gene Cumm wrote:
>>
>> On Sat, Jul 12, 2014 at 3:15 PM, Alexander Perlis <aperlis at math.lsu.edu>
>> wrote:
>>>
>>> On 07/11/2014 09:39 PM, Gene Cumm wrote:
>>>>
>>>>
>>>> With everything else from 6.03-pre18, try this binary (xzip-compressed):
>>>> http://www.zytor.com/~genec/lpxelinux-6.03p18g3.tgz
>>>
>>>
>>> It works! Thanks!
>>
>>
>> Oh fun.  That's the exact same workaround just now for different
>> hardware.
>
>
> I'm curious about the nature of the workaround. I'm guessing a bug in how
> the Dell NIC firmware handles ARP packets, and somehow you work around that?

Far simpler: their UNDI/PXE reports that interrupts should work but
they never trigger so we need to force polling.

> I did look at some packet traces with 6.03p18g3, and noticed some more
> unexpected ARP behavior (see below), which may indicate more things to be
> worked around?
>
> This is with "stock" 6.03p18g3 (no pxelinux-options changes): After all the
> TFTP transfers are complete (at 1.3 seconds into the conversation), there is
> mostly silence, but at 5.3s, 6.3s, and 7.3s the PXE server makes an ARP
> request to the Optiplex990 client asking for the MAC (even though it already
> knows it since the ARP request isn't a broadcast but targeted to the
> client's MAC). These three requests are seemingly not answered by the
> client, and there is seemingly no further communication (I waited a few
> minutes).
>
> If I instead use pxelinux-options to set an HTTP prefix in 6.03p18g3, then
> the initial TFTP transfer followed by all the HTTP data transfers are
> complete after 1.1 seconds, then I see some FIN/ACK closing of one HTTP
> connection at 3.5 seconds, another FIN/ACK HTTP closing at 9 seconds,
> another at 20 seconds, and one more at 42 seconds, then at 47,48,49 seconds
> I again see the PXE server making those three ARP requests targeted to the
> Optiplex990 client asking it for its MAC, no response by the client, silence
> for a while, then at 85 seconds the client sends another FIN/ACK to the
> server on port 80, and now at 85,86,87 seconds the server makes a
> *broadcast* ARP request searching for the MAC of the client, and no one
> answers this and there follows only silence.

Unnecessary repeat traffic and ignoring ARP.  interesting.

> I'm guessing the earlier targeted ARP requests were to update a stale but
> not yet expired ARP entry in the server, in preparation for the server to
> say _something_ to the client, and the latter broadcast ARP requests are
> because the ARP entry is gone but the server still has something it wishes
> to say to the client.
>
> I report this just in case you see something that concerns you and you wish
> to make more changes to your workaround. Certainly if I don't look at packet
> traces, and just wait for my vesamenu to come up, it does indeed come up,
> and I'm happy. :)

Probably no changes to the workaround but I'll try to take a look to
see if I see similar behaviors on other machines that don't need this
workaround.

-- 
-Gene


More information about the Syslinux mailing list