SC22/WG20 N662
Contribution on Cyrillic ordering
Konstantin V. Chuguev
April 15, 1999
Karlsson Kent - keka wrote:
> How are these letters ordered in well-edited, well-known
> dictionaries, lexica, 'alphabetic' book indices (for books
> that have them), and [by-and-large] public telephone books?
>
> How Cyrillic letters are transliterated does not matter at all, IMHO.
>
> One still cannot expect one ordering to be 'just right' for all
> regions where Cyrillic letters are used, just as there is no
> 'just right' ordering for the Latin letters valid for all regions
> where the Latin script is used. Some degree of compromise must
> be admissible. Tailoring can then adjust for regional needs.
>
> So what we need to know is how Cyrillic letters are customarily
> ordered in dictionaries and similar publications (preferably with
> samples from different regions). (I don't know, I haven't got
> even one such dictionary...)
>
I've got some :-)
Here I have attached one of config files for Yudit (a Unicode editor for
UNIX). The file was originally named Cyrillic.kmap and it serves as a
table for input of Cyrillic characters on latin (ASCII) keyboards by means
of transliteration. The method itself is not important here, but the
ordering of Cyrillic letters is. It seems to be the correct ordering for
all Slavic languages using Cyrillic. The only change needed is to place
the Ukrainian LETTER GHE WITH UPTURN right after LETTER GHE.
It is possible also to rotate LETTER IO and LETTER UKRAINIAN IE, but it is
not so important because Ukrainian has no LETTER IO, just UKRAINIAN IE is
a bit closer to IE than IO is.
I have checked the ordering with Ukrainian, Byelorussian and Serbocroatian
dictionaries. Unfortunately, I have no Macedonian one. So, I am not quite
sure about letters GJE and KJE. But, given into account their
pronunciation, it seems to be true that they are located after DE and TE
correspondingly in the Macedonian alphabet.
Please be aware that this message is being sent in UTF-8, 8-bit, and I am
not sure if you will see it correctly. I can try to resend it in some
other encoding, e.g. in UTF-7.
--
Konstantin V. Chuguev. System administrator of Southern
http://www.urc.ac.ru/~joy/ Ural Regional Center of FREEnet,
mailto:joy@urc.ac.ru Chelyabinsk, Russia.
// Cyrillic input table following the 1995 edition of international
// standard ISO 9 Transliteration of Cyrillic characters:
// Created with Emacs for Yudit and decorated with Yudit
// =C2=A9 1998-04-18 Roman Czyborra@cs.tu-berlin.de
// Additions and improvements welcome
// 0. Quotation marks and special symbols popular with Cyrillic
"<<=3D0x00AB", // =C2=AB =3D LEFT-POINTING DOUBLE ANGLE QUOTATION MARK =3D=
">>=3D0x00BB", // =C2=BB =3D RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK =3D=
",,=3D0x201E", // =E2=80=9E =3D DOUBLE LOW-9 QUOTATION MARK =3D
"``=3D0x201C", // =E2=80=9C =3D LEFT DOUBLE QUOTATION MARK =3D
"N.=3D0x2116", // =E2=84=96 =3D NUMERO SIGN
"C)=3D0x00A9", // =C2=A9 =3D COPYRIGHT SIGN
"x)=3D0x2022", // =E2=80=A2 =3D BULLET
":)=3D0x263A", // =E2=98=BA =3D WHITE SMILING FACE
":(=3D0x2639", // =E2=98=B9 =3D WHITE FROWNING FACE
"C-=3D0x00A4", // =C2=A4 =3D CURRENCY SIGN
"E-=3D0x20AC", // =E2=82=AC =3D EURO SIGN
"L-=3D0x00A3", // =C2=A3 =3D POUND SIGN
// 1. General table for Slavic Cyrillic languages
// The neat ISO 9 transliterations are worth remembering!
// First the full line of Capital letters for clarity:
"A =3D0x0410", // =D0=90 =3D CYRILLIC CAPITAL LETTER A=20
"B =3D0x0411", // =D0=91 =3D CYRILLIC CAPITAL LETTER BE
"V =3D0x0412", // =D0=92 =3D CYRILLIC CAPITAL LETTER VE
"G =3D0x0413", // =D0=93 =3D CYRILLIC CAPITAL LETTER GHE
"D =3D0x0414", // =D0=94 =3D CYRILLIC CAPITAL LETTER DE
"D-=3D0x0402", // =D0=82 =3D CYRILLIC CAPITAL LETTER DJE
"G'=3D0x0403", // =D0=83 =3D CYRILLIC CAPITAL LETTER GJE
"E =3D0x0415", // =D0=95 =3D CYRILLIC CAPITAL LETTER IE
"E:=3D0x0401", // =D0=81 =3D CYRILLIC CAPITAL LETTER IO=20
"E>=3D0x0404", // =D0=84 =3D CYRILLIC CAPITAL LETTER UKRAINIAN IE
"Z<=3D0x0416", // =D0=96 =3D CYRILLIC CAPITAL LETTER ZHE
"Z =3D0x0417", // =D0=97 =3D CYRILLIC CAPITAL LETTER ZE
"Z>=3D0x0405", // =D0=85 =3D CYRILLIC CAPITAL LETTER DZE
"I =3D0x0418", // =D0=98 =3D CYRILLIC CAPITAL LETTER I
"I`=3D0x0406", // =D0=86 =3D CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINI=
AN I
"I:=3D0x0407", // =D0=87 =3D CYRILLIC CAPITAL LETTER YI
"J<=3D0x0408", // =D0=88 =3D CYRILLIC CAPITAL LETTER JE
"J =3D0x0419", // =D0=99 =3D CYRILLIC CAPITAL LETTER SHORT I
"K =3D0x041A", // =D0=9A =3D CYRILLIC CAPITAL LETTER KA
"L =3D0x041B", // =D0=9B =3D CYRILLIC CAPITAL LETTER EL
"L>=3D0x0409", // =D0=89 =3D CYRILLIC CAPITAL LETTER LJE
"M =3D0x041C", // =D0=9C =3D CYRILLIC CAPITAL LETTER EM
"N =3D0x041D", // =D0=9D =3D CYRILLIC CAPITAL LETTER EN
"N>=3D0x040A", // =D0=8A =3D CYRILLIC CAPITAL LETTER NJE
"O =3D0x041E", // =D0=9E =3D CYRILLIC CAPITAL LETTER O
"P =3D0x041F", // =D0=9F =3D CYRILLIC CAPITAL LETTER PE
"R =3D0x0420", // =D0=A0 =3D CYRILLIC CAPITAL LETTER ER
"S =3D0x0421", // =D0=A1 =3D CYRILLIC CAPITAL LETTER ES
"T =3D0x0422", // =D0=A2 =3D CYRILLIC CAPITAL LETTER TE
"C'=3D0x040B", // =D0=8B =3D CYRILLIC CAPITAL LETTER TSHE
"K'=3D0x040C", // =D0=8C =3D CYRILLIC CAPITAL LETTER KJE
"U =3D0x0423", // =D0=A3 =3D CYRILLIC CAPITAL LETTER U
"U<=3D0x040E", // =D0=8E =3D CYRILLIC CAPITAL LETTER SHORT U
"F =3D0x0424", // =D0=A4 =3D CYRILLIC CAPITAL LETTER EF
"H =3D0x0425", // =D0=A5 =3D CYRILLIC CAPITAL LETTER HA
"C =3D0x0426", // =D0=A6 =3D CYRILLIC CAPITAL LETTER TSE
"C<=3D0x0427", // =D0=A7 =3D CYRILLIC CAPITAL LETTER CHE
"D>=3D0x040F", // =D0=8F =3D CYRILLIC CAPITAL LETTER DZHE
"S<=3D0x0428", // =D0=A8 =3D CYRILLIC CAPITAL LETTER SHA
"S>=3D0x0429", // =D0=A9 =3D CYRILLIC CAPITAL LETTER SHCHA
"\"\"=3D0x042A",//=D0=AA =3D CYRILLIC CAPITAL LETTER HARD SIGN
"Y =3D0x042B", // =D0=AB =3D CYRILLIC CAPITAL LETTER YERU
"\"=3D0x042C", // =D0=AC =3D CYRILLIC CAPITAL LETTER SOFT SIGN
"E`=3D0x042D", // =D0=AD =3D CYRILLIC CAPITAL LETTER E
"U>=3D0x042E", // =D0=AE =3D CYRILLIC CAPITAL LETTER YU
"A>=3D0x042F", // =D0=AF =3D CYRILLIC CAPITAL LETTER YA
// Then the same thing in lower case:
"a =3D0x0430", // =D0=B0 =3D CYRILLIC SMALL LETTER A
"b =3D0x0431", // =D0=B1 =3D CYRILLIC SMALL LETTER BE
"v =3D0x0432", // =D0=B2 =3D CYRILLIC SMALL LETTER VE
"g =3D0x0433", // =D0=B3 =3D CYRILLIC SMALL LETTER GHE
"d =3D0x0434", // =D0=B4 =3D CYRILLIC SMALL LETTER DE
"d-=3D0x0452", // =D1=92 =3D CYRILLIC SMALL LETTER DJE
"g'=3D0x0453", // =D0=B3 =3D CYRILLIC SMALL LETTER GJE
"e =3D0x0435", // =D0=B5 =3D CYRILLIC SMALL LETTER IE
"e:=3D0x0451", // =D1=91 =3D CYRILLIC SMALL LETTER IO
"e>=3D0x0454", // =D1=94 =3D CYRILLIC SMALL LETTER UKRAINIAN IE
"z<=3D0x0436", // =D0=B6 =3D CYRILLIC SMALL LETTER ZHE
"z =3D0x0437", // =D0=B7 =3D CYRILLIC SMALL LETTER ZE
"z>=3D0x0455", // =D1=95 =3D CYRILLIC SMALL LETTER DZE
"i =3D0x0438", // =D0=B8 =3D CYRILLIC SMALL LETTER I
"i`=3D0x0456", // =D1=96 =3D CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN=
I
"i:=3D0x0457", // =D1=97 =3D CYRILLIC SMALL LETTER YI
"j<=3D0x0458", // =D1=98 =3D CYRILLIC SMALL LETTER JE
"j =3D0x0439", // =D0=B9 =3D CYRILLIC SMALL LETTER SHORT I
"k =3D0x043A", // =D0=BA =3D CYRILLIC SMALL LETTER KA
"l =3D0x043B", // =D0=BB =3D CYRILLIC SMALL LETTER EL
"l>=3D0x0459", // =D1=99 =3D CYRILLIC SMALL LETTER LJE
"m =3D0x043C", // =D0=BC =3D CYRILLIC SMALL LETTER EM
"n =3D0x043D", // =D0=BD =3D CYRILLIC SMALL LETTER EN
"n>=3D0x045A", // =D1=9A =3D CYRILLIC SMALL LETTER NJE
"o =3D0x043E", // =D0=BE =3D CYRILLIC SMALL LETTER O
"p =3D0x043F", // =D0=BF =3D CYRILLIC SMALL LETTER PE
"r =3D0x0440", // =D1=80 =3D CYRILLIC SMALL LETTER ER
"s =3D0x0441", // =D1=81 =3D CYRILLIC SMALL LETTER ES
"t =3D0x0442", // =D1=82 =3D CYRILLIC SMALL LETTER TE
"c'=3D0x045B", // =D1=9B =3D CYRILLIC SMALL LETTER TSHE
"k'=3D0x045C", // =D1=9C =3D CYRILLIC SMALL LETTER KJE
"u =3D0x0443", // =D1=83 =3D CYRILLIC SMALL LETTER U
"u<=3D0x045E", // =D1=9E =3D CYRILLIC SMALL LETTER SHORT U
"f =3D0x0444", // =D1=84 =3D CYRILLIC SMALL LETTER EF
"h =3D0x0445", // =D1=85 =3D CYRILLIC SMALL LETTER HA
"c =3D0x0446", // =D1=86 =3D CYRILLIC SMALL LETTER TSE
"c<=3D0x0447", // =D1=87 =3D CYRILLIC SMALL LETTER CHE
"d>=3D0x045F", // =D1=9F =3D CYRILLIC SMALL LETTER DZHE
"s<=3D0x0448", // =D1=88 =3D CYRILLIC SMALL LETTER SHA
"s>=3D0x0449", // =D1=89 =3D CYRILLIC SMALL LETTER SCHCHA
"''=3D0x044A", // =D1=8A =3D CYRILLIC SMALL LETTER HARD SIGN
"y =3D0x044B", // =D1=8B =3D CYRILLIC SMALL LETTER YERU
"' =3D0x044C", // =D1=8C =3D CYRILLIC SMALL LETTER SOFT SIGN
"e`=3D0x044D", // =D1=8D =3D CYRILLIC SMALL LETTER E
"u>=3D0x044E", // =D1=8E =3D CYRILLIC SMALL LETTER YU
"a>=3D0x044F", // =D1=8F =3D CYRILLIC SMALL LETTER YA
// 2. The so-called complementary table for the Slavic Cyrillic
// characters used by some communities established outside the
// boundaries of their native countries (Ghe with upturn is also
// officially used in the Ukraine again) contains
"G`=3D0x0490", // =D2=90 =3D CYRILLIC CAPITAL LETTER GHE WITH UPTURN
"g`=3D0x0491", // =D2=91 =3D CYRILLIC SMALL LETTER GHE WITH UPTURN
"E<=3D0x0462", // =D1=A2 =3D CYRILLIC CAPITAL LETTER YAT
"e<=3D0x0463", // =D1=A3 =3D CYRILLIC SMALL LETTER YAT
"A<=3D0x046A", // =D1=AA =3D CYRILLIC CAPITAL LETTER BIG YUS
"a<=3D0x046B", // =D1=AB =3D CYRILLIC SMALL LETTER BIG YUS
"F`=3D0x0472", // =D1=B2 =3D CYRILLIC CAPITAL LETTER FITA
"f`=3D0x0473", // =D1=B3 =3D CYRILLIC SMALL LETTER FITA
"Y`=3D0x0474", // =D1=B4 =3D CYRILLIC CAPITAL LETTER IZHITSA
"y`=3D0x0475", // =D1=B5 =3D CYRILLIC SMALL LETTER IZHITSA
// 3. Cyrillic characters for non-Slavic languages
// ... haven't gotten to looking them all up yet :(
Page: 2