[syslinux] lpxelinux hangs under Intel Boot Agent 1.3.81 (2.1 build 089) on Dell Optiplex 990 BIOS A16
Alexander Perlis
aperlis at math.lsu.edu
Mon Jul 14 15:15:36 PDT 2014
On 07/12/2014 03:26 PM, Gene Cumm wrote:
> On Sat, Jul 12, 2014 at 3:38 PM, Alexander Perlis <aperlis at math.lsu.edu> wrote:
>>
>> I did look at some packet traces with 6.03p18g3, and noticed some more
>> unexpected ARP behavior [...] [multiple] FIN/ACK HTTP [...] three ARP
>> requests [by server] [...] no response by the client [...]
>
> Unnecessary repeat traffic and ignoring ARP. interesting.
>
>> [...]
>
> Probably no changes to the workaround but I'll try to take a look to
> see if I see similar behaviors on other machines that don't need this
> workaround.
Here's more info. I see a normal packet trace on:
Optiplex GX620 A11, Broadcom UNDI PXE-2.1 v9.4.4
NIC PCI ID 14e4:1677[1028:01ad] MAC 00:13:72::: Broadcom BCM5751
Optiplex 760 A16, Intel Boot Agent 1.3.81 PXE 2.1 Build 091
NIC PCI ID 8086:10de[1028:027f] MAC 00:23:ae::: Intel 82567LM-3
Optiplex 780 A14, Intel Boot Agent 1.3.81 PXE 2.1 Build 091
NIC PCI ID 8086:10de[1028:0276] MAC 84:2b:2b::: Intel 82567LM-3
Optiplex 9010 A18, Intel Boot Agent 1.5.50 PXE 2.1 Build 092
NIC PCI ID 8086:1502[1028:052c] MAC b8:ca:3a::: Intel 82579LM
Optiplex 9020 A05, Intel Boot Agent 1.5.38 PXE 2.1 Build 092
NIC PCI ID 8086:153a[1028:05a4] MAC f8:b1:56::: Intel I217-LM
But then, on:
Optiplex 990 A16, Intel Boot Agent 1.3.81 PXE 2.1 Build 089
NIC PCI ID 8086:1502[1028:047e] MAC 18:03:73::: Intel 82579LM
This is the client for which I needed your patch and still see a
questionable packet trace. When I switch to TFTP-only (still using
lpxelinux.0), I also see the client ignoring ARP requests after all the
TFTP transfers are complete. But when I switch to pxelinux.0, the trace
is normal. Thus the problem seems specific to lpxelinux.0 and specific
to this hardware.
(When I say "hardware", it may actually be something like "Build 089".
Observe above there is a similar client, with a normal trace, who has
the same Intel Boot Agent and same general NIC model, but different
subsystem numbers and different MAC prefix and different Build number.
Do any of these things relate to the ServiceFlags value your workaround
is testing?)
I suspect what I'm seeing is related to those same interrupts your
workaround is dealing with. Is it possible your workaround is only
"active" while the higher layers of lpxelinux are waiting on data? Once
all the transfers are complete, does the polling continue?
Here's what I'm seeing: only after the last TFTP transfer, or last HTTP
transfer, does the trace turn abnormal. For example, over HTTP, there
are many files to be transferred, and this happens sequentially, and
after each one, the client sends a TCP FIN/ACK to the HTTP port of the
server, and the server acknowledges that, then the client starts a new
TCP connection and makes a new HTTP request. It is only after the *last*
HTTP transfer (in my case the large "graphics.png" needed for my
vesamenu), after the client has received all the data, it now sends that
last HTTP FIN/ACK, but doesn't pick up the response sent by the server.
From the perspective of the higher lpxelinux layers, the transfer is
done and the code has moved on (displaying the vesamenu etc), but the
lower network stack layer still has a half-closed connection to
dispense... so a few seconds later the client sends another FIN/ACK,
doesn't get the response, waits a few seconds, repeats etc., and
concurrently seems to be ignoring the server's ARP requests. So I'm
guessing somehow the polling enabled by your workaround seems to stop
happening the moment lpxelinux has gotten all the data it wanted...
Alex
More information about the Syslinux
mailing list