[syslinux] Trouble with ISOLINUX and IDE bus resets.
H. Peter Anvin
hpa at zytor.com
Tue May 11 19:03:49 PDT 2004
Michael_E_Brown at Dell.com wrote:
> Dell ships a CD called Dell OpenManage Server Assistant that,
> starting with version 8.0 released last November, is a Linux-based
> bootable CD. It uses ISOLINUX to load a linux kernel/initrd combo to
> start the system. From version 8.0 to 8.2 we use isolinux version 1.66.
> Starting with version 8.3 we have upgraded to version 2.08. First of
> all, I'd like to say thanks for your excellent bootloaders, of which I
> have used all three for various projects, they have helped immensely on
> this project.
> We have started to see a problem with the 1.66 version of
> isolinux on Dell PowerEdge 6650 server systems. On the newer 2.08
> version, the problem happens less often, but we still get a problem from
> time to time. The cause of the problem seems to be that the CDROM device
> goes offline or has some sort of problem. A device reset is issued, but
> then it looks like when isolinux tries to re-read the sector, the BIOS
> int 13h call destination for the data is invalid and it crashes the
> system. We believe that the root cause is some hardware problem. But,
> the interesting thing is that the newer isolinux has the problem less
> often, and other OS bootloaders (windows NT in this case) also see the
> Device Reset, but they retry the read calls and continue going just
> fine. The newer isolinux gives "isolinux: Disk error 01, drive 82. Boot
> failed: press a key to retry" usually when it fails. The 1.66 version
> completely crashes the system with really neat video corruption.
> I have searched through the changelogs but did not see any
> likely matches for this problem.
If you could narrow down the range of versions then I might have a
prayer of finding source changes. I know there was at least one BIOS on
which a chunk of low memory got just plain overwritten in certain
circumstances; something like that would definitely explain the problem!
Other possibilities: INT 13h returns with either the wrong value in
(E)SP, or with one of the segment registers corrupted. Another
possibility is that the DAPA is corrupted.
> I have posted ASCII version of IDE bus traces of the problem
> here: http://www.michaels-house.net/~mebrown/IDE_bus_traces.tgz (4.7MB).
> The raw data for this is from "Bus Doctor", and the raw bus doctor data
> files are available upon request, but you need a Windows system to run
> Bus Doctor on. There is an evaluation version available you can use if
> you want to view the raw data.
Not really useful to me, I'm afraid.
> The BIOS team has stepped through the code and that is how we
> determined that, after the read retry, the destination buffer given is a
> bad address. Unfortunately I do not have any raw data from the BIOS guys
> at this point. If there is something specific that you need I can ask
> them to provide it.
> Have you seen this kind of problem, or is there some other data
> that we can provide that would help provide a software workaround for
> this problem?
What would be useful is a dump of all the INT 13h calls including
segment registers and DAPA (the 16-byte buffer pointed to by DS:SI).
The easiest way to do that is to hack in some code just before the int
> * Seen following error once:
> "isolinux: Disk error 01, drive 82
> Boot failed: press a key to retry"
> * Jorge Villaneuva from the RMSD team helped capture traces by hooking
> up an IDE analyzer for both passing and failing cases with DSA 8.2 and a
> passing case with DSA 7.5
> * From the captured traces, there seems a difference in the way the NT
> code and Linux code handles errors. Error handling mechanism for NT code
> seems more robust than the linux version, because error happened in both
> cases but the NT code actually recovers from it.
This could just be pure dumb luck. I've looked at the code in ISOLINUX,
and I'm pretty sure it is correct as written.
Error 01 means "invalid call", it really could mean anything.
More information about the Syslinux