[syslinux] Very slow download with pxelinux > 4.07 on specific hardware

H. Peter Anvin hpa at zytor.com
Fri Mar 14 10:40:10 PDT 2014


On 03/13/2014 04:09 AM, Eric PEYREMORTE wrote:
> Le 12/03/2014 22:00, H. Peter Anvin a écrit :
>> On 03/10/2014 04:15 PM, Gene Cumm wrote:
>>> It's also a balance of time.  While working on 4.10-pre*/5.10-pre*, I
>>> found that some hardware misreports its behavior.  "Sure, Interrupts
>>> work" but they don't is but one that I worked around on specific
>>> hardware.
>>>
>> The odd part is that people are reporting this even using the legacy PXE
>> implementation (not lpxelinux.0)...
>>
>>     -hpa
> If there is a way to get useful debug traces let me know.
> 
> By the way, everything is slow from the moment the following string
> appears :
> 
> PXELINUX 5.10 0x5321850f
> 

I am *assuming* you are seeing the full copyright banner here, not just
the above string (dumb question, I know, but sometimes it really, really
matters.)

> I tried to search through the code, compare different versions to
> understand what's wrong, but i definitely don't have the required
> skills....

This is very challenging.  One of the big problems is that the legacy
network code (pxelinux.0 as opposed to lpxelinux.0) was pulled out and
then pulled back in, and clearly something changed in the process.

I looked over your wire trace and there is a fixed amount of delay --
just under 20 ms -- between each packet, which strongly implies that it
ends up waiting for some kind of timer to expire.  *What* timer that is
is less clear, because the only *architectural* timer is the 55 ms timer
interrupt, which doesn't fit the observed time.  That implies this is a
timer inside the PXE code.  Why that didn't happen before and does now
is the real mystery.

> What i notice from the wireshark traces, is that pxelinux.0 is loaded
> really quickly. Then it fetches ldlinux.c32 very slowly (for the next
> files too)
> 
> For lpxelinux.0, from the trace, everything is slow too, but at some
> point, the client seems stuck in a loop sending acknowledgement for a
> packet again and again. The server tries to send the next packet but the
> clients keeps sending ack for the previous one.

Right... this implies that the receiver stopped functioning so the
machine "went deaf".  That is a fairly common failure mode, but why it
happens here is again the big question.

Unfortunately I only have my "spare time" to work on Syslinux anymore,
which makes hard problems like this difficult to dig into.  I *really*
appreciate the debugging information you have already given us... it
gives us a starting point at least.  The 20 ms delay is a very important
clue.

	-hpa




More information about the Syslinux mailing list