[syslinux] COMBOOT API: Add calls for directory functions; Implement for FAT

Mon Jan 12 13:20:46 PST 2009

On Mon, Jan 12, 2009 at 2:59 PM, H. Peter Anvin <hpa at zytor.com> wrote:
> Gene Cumm wrote:
>>
>> I decided to delve more into the depths and confusion of codepages and
>> UTF-16 (Windows Codepages, Windows OEM codepages, etc).  Unicode Chart
>> "Latin-1", representing U+0080 through U+00FF has pretty much no
>> corellation to my localized codepage (OEM-437 as that appears to be
>> what is used in my BIOS).  Therefore, in the interest of providing
>> accurate information in the displayed name, I will also skip out
>> U+0080 and greater in favor of the short name as it should be more
>> accurate (and hopefully 100% accurate).  UTF-16 below U+0080, by
>> definition, should be consistent, regardless of the codepage the BIOS
>> uses.  If I am mistaken in any of these assumptions, feedback is
>> welcome.
>>
>
> That's not really the best way to do it.  The right way is to search the
> codepage file for any Unicode we find; in particular, the primary Unicode
> column.  It might be a worthwhile idea to change the format of the codepage
> information so that all the primaries preceed all the secondaries; that
> would let us use a simple "repne scasw" to find the entry.
>
> If nothing is found at all, then we flunk the longname and bounce back to
> the shortname.
>

I agree.  That would definitely be a better solution long term.  I was
more looking at trying to move one step at a time.  First, make sure
that what information is provided is accurate.  Converting from
Unicode below U+0080 is merely truncate it to 8 bits.  Above that,
requires a codepage.  At first, I was just ignoring the high byte, but
I was finding that that was inaccurate.  That led me to just ignore
anything above U+007F, for now.

Then look at how to ensure that all of the available character space
could be used in the file system, instead of ignoring the "Extended
ASCII" that is a codepage-specific translation.

>
> Included is an untested patch which implements the format change; with this
> patch you should be able to use the ucs2_to_cp routine to convert a UCS-2
> ("Unicode") character.
>

Nice, I'll have to try that out.  Thanks.  Moving in that direction
would have been my next step but this saves me from thinking about
creating another table wasting space or reimplementing the existing
table as you just did.

Aside from the obvious fact that SYSLINUX needs to be compiled for the
local codepage (in order to properly use the long name), that should
make a complete solution.

-Gene