[syslinux] codepage/UnicodeData: tcase() data

Gene Cumm gene.cumm at gmail.com
Mon Jan 19 05:33:00 PST 2009


On Sun, Jan 18, 2009 at 3:15 PM, H. Peter Anvin <hpa at zytor.com> wrote:
> Gene Cumm wrote:
>> Three questions:
>>
>> - Where did the file come from?
>> - Does tcase() stand for toggle case (or otherwise effectively the same thing)?
>> - Should uppercase characters like, the latin capital A, have tcase()
>> data in addition to the lcase() data?
>>
>
> The file comes from the Unicode Consortium, ftp.unicode.org.  The full
> file is *huge* (over a megabyte), so I have the mksubset.pl to cut it
> down to only those bits needed.

Thanks.  I found where they have it.

> tcase stands for "Title Case": UPPER CASE, lower case, Title Case.  It
> matters for a handful of characters like:
>
> U+01C4 LATIN CAPITAL LETTER DZ WITH CARON
> U+01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON
> U+01C6 LATIN SMALL LETTER DZ WITH CARON
>
> U+01C5 is title case.  I decided title case is so rare (and I'm not even
> sure if we have *any* instances of it in any of the common codepages)
> that adding it would be a waste of space.

That explains it.

Reading UCD.html, "The simple titlecase may be omitted in the data
file if the titlecase is the same as the uppercase." (rendering my
third question answered by the consortium).  Searching the UnicodeData
file that is in Syslinux, I can not find any instance where there is a
title case given and it is not the same as the upper case.   ( grep
';[0-9A-F]\{4\}$' |grep -v
';\([0-9A-F]\{4\}\);\([0-9A-F]\{4\}\)\?;\1$' )

-Gene




More information about the Syslinux mailing list