[syslinux] lpxelinux hangs under Intel Boot Agent 1.3.81 (2.1 build 089) on Dell Optiplex 990 BIOS A16

Alexander Perlis aperlis at math.lsu.edu
Mon Jul 14 15:15:36 PDT 2014


On 07/12/2014 03:26 PM, Gene Cumm wrote:
> On Sat, Jul 12, 2014 at 3:38 PM, Alexander Perlis <aperlis at math.lsu.edu> wrote:
>>
>> I did look at some packet traces with 6.03p18g3, and noticed some more
>> unexpected ARP behavior [...] [multiple] FIN/ACK HTTP [...] three ARP
 >> requests [by server] [...] no response by the client [...]
>
> Unnecessary repeat traffic and ignoring ARP.  interesting.
>
>> [...]
>
> Probably no changes to the workaround but I'll try to take a look to
> see if I see similar behaviors on other machines that don't need this
> workaround.

Here's more info. I see a normal packet trace on:

Optiplex GX620 A11, Broadcom UNDI PXE-2.1 v9.4.4
NIC PCI ID 14e4:1677[1028:01ad] MAC 00:13:72::: Broadcom BCM5751

Optiplex   760 A16, Intel Boot Agent 1.3.81 PXE 2.1 Build 091
NIC PCI ID 8086:10de[1028:027f] MAC 00:23:ae::: Intel 82567LM-3

Optiplex   780 A14, Intel Boot Agent 1.3.81 PXE 2.1 Build 091
NIC PCI ID 8086:10de[1028:0276] MAC 84:2b:2b::: Intel 82567LM-3

Optiplex  9010 A18, Intel Boot Agent 1.5.50 PXE 2.1 Build 092
NIC PCI ID 8086:1502[1028:052c] MAC b8:ca:3a::: Intel 82579LM

Optiplex  9020 A05, Intel Boot Agent 1.5.38 PXE 2.1 Build 092
NIC PCI ID 8086:153a[1028:05a4] MAC f8:b1:56::: Intel I217-LM



But then, on:

Optiplex   990 A16, Intel Boot Agent 1.3.81 PXE 2.1 Build 089
NIC PCI ID 8086:1502[1028:047e] MAC 18:03:73::: Intel 82579LM

This is the client for which I needed your patch and still see a 
questionable packet trace. When I switch to TFTP-only (still using 
lpxelinux.0), I also see the client ignoring ARP requests after all the 
TFTP transfers are complete. But when I switch to pxelinux.0, the trace 
is normal. Thus the problem seems specific to lpxelinux.0 and specific 
to this hardware.

(When I say "hardware", it may actually be something like "Build 089". 
Observe above there is a similar client, with a normal trace, who has 
the same Intel Boot Agent and same general NIC model, but different 
subsystem numbers and different MAC prefix and different Build number. 
Do any of these things relate to the ServiceFlags value your workaround 
is testing?)

I suspect what I'm seeing is related to those same interrupts your 
workaround is dealing with. Is it possible your workaround is only 
"active" while the higher layers of lpxelinux are waiting on data? Once 
all the transfers are complete, does the polling continue?

Here's what I'm seeing: only after the last TFTP transfer, or last HTTP 
transfer, does the trace turn abnormal. For example, over HTTP, there 
are many files to be transferred, and this happens sequentially, and 
after each one, the client sends a TCP FIN/ACK to the HTTP port of the 
server, and the server acknowledges that, then the client starts a new 
TCP connection and makes a new HTTP request. It is only after the *last* 
HTTP transfer (in my case the large "graphics.png" needed for my 
vesamenu), after the client has received all the data, it now sends that 
last HTTP FIN/ACK, but doesn't pick up the response sent by the server. 
 From the perspective of the higher lpxelinux layers, the transfer is 
done and the code has moved on (displaying the vesamenu etc), but the 
lower network stack layer still has a half-closed connection to 
dispense... so a few seconds later the client sends another FIN/ACK, 
doesn't get the response, waits a few seconds, repeats etc., and 
concurrently seems to be ignoring the server's ARP requests. So I'm 
guessing somehow the polling enabled by your workaround seems to stop 
happening the moment lpxelinux has gotten all the data it wanted...

Alex


More information about the Syslinux mailing list