JTC1/SC22
N2466
Date: Tue, 6 May 1997 16:59:37 -0400 (EDT)
From: "william c. rinehuls" <[email protected]>
To: [email protected]
Subject: SC22 M2466 - Vote Summary of LB N2364 - CD 14651
____________________beginning of title page ___________________________
ISO/IEC JTC 1/SC22
Programming languages, their environments and system software interfaces
Secretariat: U.S.A. (ANSI)
ISO/IEC JTC 1/SC22
N2466
May 1997
TITLE: Summary of Voting on CD Approval for CD 14651 -
Information technology - International String
Ordering - Method for Comparing Character Strings
and Description of a Default Tailorable Ordering
SOURCE: Secretariat, ISO/IEC JTC 1/SC22
WORK ITEM: JTC 1.22.30.02.02
STATUS: N/A
CROSS REFERENCE: SC22 N2364
DOCUMENT TYPE: Summary of Voting
ACTION: To SC22 Member Bodies for information.
To WG20 for preparation of a Disposition of Comments
Report and a recommendation on the further processing
of the CD.
Address reply to:
ISO/IEC JTC 1/SC22 Secretariat
William C. Rinehuls
8457 Rushing Creek Court
Springfield, VA 22153 USA
Tel: +1 (703) 912-9680
Fax: +1 (703) 912-2973
email: [email protected]
________________end of title page; beginning of overall summary ________
SUMMARY OF VOTING ON
Letter Ballot Reference No: SC22 N2364
Circulated by: JTC 1/SC22
Circulation Date: 01-20-1997
Closing Date: 04-24-1997
SUBJECT: CD Approval for CD 14651 - Information technology
International String Ordering - Method for Comparing Character
Strings and Description of a Default Tailorable Ordering
The following responses have been received on the subject of approval:
"P" Members supporting approval
without comment 10
"P" Members supporting approval
with comment 1
"P" Members not supporting approval 4
"P" Members abstaining 2
"P" Members not voting 7
"O" Members supporting approval
without comment 1
"O" Members not supporting approval 1
"O" Members abstaining 1
Secretariat Action:
The comment accompanying the abstention vote from Germany was: "There is
no national WG11 (sic) rapporteur." The comments accompanying the
affirmative vote from Denmark; the comments accompanying the abstention
vote from the United Kingdom; and the comments accompanying the negative
votes from Austria, Israel, Japan, Netherlands, and USA are attached.
WG20 is requested to prepare a Disposition of Comments report and make a
recommendation on the further processing of the CD.
_______________end of overall summary; beginning of detail summary ___
ISO/IEC JTC1/SC22 LETTER BALLOT SUMMARY
PROJECT NO: JTC 1.22.30.02.02
SUBJECT: CD approval for CD 14651 - Information technology - International
String Ordering - Method for Comparing Character Strings and
Description of a Default Tailorable Ordering
Reference Document No: N2364 Ballot Document No: N2364
Circulation Date: 01-20-1997 Closing Date: 04-24-1997
Circulated To: SC22 P, O, L Circulated By: Secretariat
SUMMARY OF VOTING AND COMMENTS RECEIVED
Approve Disapprove Abstain Comments Not Voting
'P' Members
Australia (X) ( ) ( ) ( ) ( )
Austria ( ) (X) ( ) (X) ( )
Belgium ( ) ( ) ( ) ( ) (X)
Brazil ( ) ( ) ( ) ( ) (X)
Canada ( ) ( ) ( ) ( ) (X)
China ( ) ( ) ( ) ( ) (X)
Czech Republic (X) ( ) ( ) ( ) ( )
Denmark (X) ( ) ( ) (X) ( )
Egypt ( ) ( ) ( ) ( ) (X)
Finland (X) ( ) ( ) ( ) ( )
France (X) ( ) ( ) ( ) ( )
Germany ( ) ( ) (X) (X) ( )
Ireland ( ) ( ) ( ) ( ) (X)
Japan ( ) (X) ( ) (X) ( )
Netherlands ( ) (X) ( ) (X) ( )
Norway (X) ( ) ( ) ( ) ( )
Romania (X) ( ) ( ) ( ) ( )
Russian Federation (X) ( ) ( ) ( ) ( )
Slovenia (X) ( ) ( ) ( ) ( )
Sweden ( ) ( ) ( ) ( ) (X)
Switzerland (X) ( ) ( ) ( ) ( )
UK ( ) ( ) (X) (X) ( )
Ukraine (X) ( ) ( ) ( ) ( )
USA ( ) (X) ( ) (X) ( )
'O' Members
Argentina ( ) ( ) ( ) ( ) ( )
Bulgaria ( ) ( ) ( ) ( ) ( )
Cuba ( ) ( ) ( ) ( ) ( )
Greece ( ) ( ) ( ) ( ) ( )
Hungary ( ) ( ) ( ) ( ) ( )
Iceland ( ) ( ) ( ) ( ) ( )
India ( ) ( ) ( ) ( ) ( )
Indonesia ( ) ( ) ( ) ( ) ( )
Israel ( ) (X) ( ) (X) ( )
Italy ( ) ( ) ( ) ( ) ( )
Korea Republic (X) ( ) ( ) ( ) ( )
New Zealand ( ) ( ) ( ) ( ) ( )
Poland ( ) ( ) ( ) ( ) ( )
Portugal ( ) ( ) (X) ( ) ( )
Singapore ( ) ( ) ( ) ( ) ( )
Thailand ( ) ( ) ( ) ( ) ( )
Turkey ( ) ( ) ( ) ( ) ( )
Yugoslavia ( ) ( ) ( ) ( ) ( )
____end of detailed summary; beginning of Danish Comments Accompanying
Affirmative Vote__________________________
>From [email protected] Tue Apr 29 15:57:22 1997
Here is the danish ballot on CD 14651:
Title: Comments on CD 14651 - International String Ordering
Source: Danish Standards Association
Date: 1997-04-29
Reference: SC22 N2364
The Danish ballot is: Yes, with general and technical comments
The comments are directed towards the english version of the text,
although the same comments can be done wrt. the French text.
1. The overall technical contents of CD 14652 is sound, and as agreed
by the working group, and thus we can accept the document as a CD.
General comments:
2. There is too much emphasis on the "binary sorting string" concept.
The concept of just comparing two strings should be catered for
overall in the document. Some places only sorting on binary prepared
strings are possible, to reach the functionality. Also there should be ample
warnings a number of places on the binary sorting string concept, as it
is culturally dependent, that is it is dependent on the sorting specification
used to produce the binary representation. Storing data in the precompiled
binary string representation should thus be recommended only for monocultural
environments, and that is actually environments that we should advise against,
having internationalization as our goal.
3. Formal description language, such as ISO 11404 or IDL of ISO 13788 (PCTE)
should be used in the specification of the APIs. The description of
the APIs lack a number of specifications now, including description of the
types of the parameters, and specifications of how to bind to programming
languages, that are inherent in the 11404 and 13788 specification languages.
We are willing to help rewriting the API sepcifications in light of this
comment.
4. We recommend that a thin binding method be used, as demonstrated
in other API papers of WG20. We can provide text for this, in conjunction
with text to address the problems mentioned in comment 3.
5. The APIs have 3 parameters, that should not occur in the API, because
all localisation should be done via the locale. These are the parameters
order_accents, order_case and sign_espace of the COMPCAR and CARABIN functions.
6. The LC_COLLATE specification in 14652 format should be readily useable
and referenceable, without need for retailoring. The different options,
as expressed by the parameters of the 3 parameters in our comment 5, should
be available as different LC_COLLATE specifications each with a well-defined
name.
7. The definitions in section 3 should be numbered and not ordered
alfabetically (in either English or French).
8. The definitions are too centered about a precompiled sorting string
concept. Terminology should also be applicable to comparisons on the
string encoding. Terms that should be useable with plain string comparisons
include: equivalence, ordering key, ordering subkey.
9. The technical specifications should be aligned with 14652. especially
hexadecimal symbolix ellipses "..".
10. The names of the APIs should be less French-oriented.
11. The tables should use names established from the POSIX locale
work, such as ISO/IEC 9945-2 annex G names or 14652 names from the
repertoiremap, especially when not using <Uxxxx> names.
12. A number of scripts have not been ordered properly, such as hiragana and
katakana and thai.
13. A reversability function from binary sort strings to character strings
seems to be missing.
14. There are some spelling errors, and we suggest a spell-checker be used
for production of further documents.
Technical comments:
15. page 5: first paragraph: It is not always required to transform, for example
"4" into a number of strings, sometimes it is only necessary to transform
it into one string. Thus change "requires" to "may require" and "is hence"
to "may thus be".
16. Page 5, last paragraph and following prargraphs: Too much emphasis on the
precompiled sorted character
sting data type. This is not a general type as noted in our comment 2.
17. Page 8, Add after "Scandinavian" "and several other". This incudes languages
like Polish, Finnish, Hungarian, Turkish, and many others.
18. Page 14: "subprogramme" - rather use the word "function". All APIs in this
standard are functions. All references to "subprogrammess" should be
changed to "functions" in the standard.
19. page 15, first paragraph: we recommend that only uppercase characters be
used in hexadecimal numbers, and this is also the specification in CD 14652.
20. Page 15, last paragraph: it seems like it is a requirement that a LC_COLLATE
specification, like the default, can be tailored on the fly. This is not
recommendable, as it would take quite some processing time, and thus delay
the processing considerable. On the fly tailoring should thus not be a
requirement.
21. Page 16, 5.1.1 last paragraph: use the name of the API (COMPCAR)
instead of the number "API 1".
22. Page 17: last paragraph: the names of the functions should be used for
the binding. Of cause the names of the functions may vary for the
different programming languages, but the names are more than "only
indicative".
23. Page 20: The COMPCAR function seems to miss a result value on
whether the first string was lexiographically less, equal or greater
than the second string. We propose the values -1, 0 and 1 for the three
possiblities, in line with current C practice. Also return values seems to
be missing for the other functions.
24. Page 21: It should not be normatively required that COMPCAR be equivalent
to CARABIN and COMPBIN. CARABIN produces output that is not necessary for
some use of COMPCAR.
25. Page 21, last paragraph: It should not be prescribed that there be
binary strings used for comparisons, in the COMPCAR function. Also the
"default" table mentioned here is the global locale, and not the 14651 default.
This should be clarified, maybe using "global" instead of "default".
26. Page 22: all parameters should be spelled out, and references to other
APIs when defining the parameters should be avoided.
27. Page 25 second paragraph: the default table cannot be used per se, as it
needs tailoring. See our comment 6 on how to solve this.
28. Page 27, first paragraph: this description is very oriented towards
the binary sort string. Descriptions also valid for COMPCAR method
without binary sort strings should be present. We would request a separate
descripti on how COMPCAR can be implemented, especially pointing out that only
comparison of the first (few) characters are necessary in many cases, and
that generating binary sort strings is typically not necessary.
29. Page 27: level 1: Some non-letters, for example Kana, may have more than
one character at the first level.
30. Page 27: note of 5.3.2.1: Combining accents may have ignore at level 1,
and then values at level 2. Should that not lead to full predictability?
31. Page 29: level one: Use the API names instead of "SUBPROGRAMME"
32. Page 29: what is the difference between level 2 and 4? In traditional
locale invocation there is not that difference, but some other
difference. Maybe level 4 should always be required.
33. Page 31: COLL_WEIGHT_MAX is not a directive of 14652.
34. Page 31: Some scripts are not (yet) in IS 10646, for example the Yi and
Canadian syllable scripts.
35. Page 31: We should assure that comments are allowable all the places used
here according to 14652, and possibly change 14652 to allow them.
36. Page 41-51: a number of the symbols defined here are also defined later.
Example <a8> defined on page 46 and page 79. This is not allowed according
to 14652 (giving a symbol two weights).
37. Page 111: (4) There needs to be a strong warning that binary strings stored
cannot be used internationally for culturally correct sorting,
as they are stored in a localized form. Or we should simply advise against it.
38. Page 112: the text seems obsolete, as these concepts have been proven.
39. Page 115: Also list ISO/IEC 9945-2 POSIX shell and utilities, especially
annex G, as a source.
40. Page 118, paragraph 7: There is only a need for 4 levels, not 5.
41. Page 118, paragraph 7:
Is it necessary to have an extra level for 10646 conformance level 3? Maybe
in some cases but not generally. When sorting the combining characters
per se, there is no need for a further level.
42. Page 119: paragraph 9: We thought this was proven not to be true. Or is this
some implementation guideline (which then should noted as such).
43. Page 120: Annex I should be explained further, especially how it fits into
the internationalization model.
_________________________end of Danish Comments _______________________
____________beginning of UK comments accompanying abstention vote ______
> N 2364 ISO/IEC CD 14651
>
> The UK ABSTAINS on this ballot, due to lack of participation in this area.
> The UK would however like to bring the following issues to the attention of
> SC 22 :
>
> - a tutorial on problems solved is inappropriate for an IS; either the
> document should be a TR or the tutorial moved to an appendix.
>
> - the statement on page 10 about information being obtainable from
> Alain LaBonte' is also inappropriate for a formal document.
>
>
> There are also a number of minor points:
>
> - there are a disturbingly high number of elementary typographical
> errors (e.g. p 18 'starings' (strings); 'compariosn', 'aat'; also mixed
> languages in chbin1, chbin2 heading). On page 19 there are French
> quotation marks rather than English ones.
>
> - p 25 there is a reference to section 5.8, which does not exist.
>
> - subprogramme is consistently spelled thus, although `subprogram' is
> the correct form in both US and UK (don't know about Canada, Australia
> etc).
______________________________end of UK comments ______________________
_____________beginning of Austrian comments accompanying negative _____
ON (the Austrian NB) votes NO on CD Ballot SC22 N2364
(CD 14651 - Information technology - International String
Ordering - Method for Comparing Character Strings and
Description of a Default Tailorable Ordering) with the
following comments:
(1) It seems doubtful (to say the least) that a reasonable
Default Ordering for all -- or even most -- of the languages
of the world can be found. Consequently, there is reason to
doubt the usefulness of the proposed International Standard.
(2) The "Tutorial" contained in the Introduction should be
moved to an informative annex; it should not remain in the
main part of the document which would have to be considered
normative.
(3) Even though there is a "Tutorial", the proposed methods
do not seem to be well explained. It could at least be
expected that one should be able to read and understand the
tables in Annex 1 without having to consult other sources.
For an example, see page 51 where a rather poor comment, in
itself encoded, supposedly explains the structure of the
following tables by cryptically stating:
"% <Uxxxx> <Base>;<Accent>;<Case>;<Special>"
The sudden change of typeface on the same page seems equally
confusing und unmotivated (except possibly by line length).
Also, it seems that a more detailed description of a
possible practical implementation could prove helpful.
(4) The "Benchmark" in Annex 2 adds to the general confusion
by showing the "sorted" version to be (in excerpt):
"vice-president's"
"offices"
"vice-presidents'"
"offices"
The problem obviously lies in automatic line breaks and can
easily be corrected, but seems to raise the question whether
similar errors have been introduced in areas which are very
difficult -- if not impossible -- to check. To mention the
most prominent example, some errors in Annex 1 might never
be found because this part of the document can hardly be
checked exhaustively.
(5) It is rather difficult to determine the necessity of
text that is not present. ON does therefore not feel able
to decide on Annexes F, G, and H.
(6) The document has obviously been translated from French
to English, which would not be a problem if the process had
been completed. For a counterexample see the description
of procedures chbin1 and chbin2 on page 18. Also, the name
of procedure sign_espace (on page 19) seems to be partially
French.
(7) The document does not appear to have been spell-checked.
Some examples:
p. 19: "precedenceof" should be "precedence of"
p.109: "deafult" should be "default"
p.114: "standaredized" should be "standardized"
(8) Anticipating the answer that ON experts should actively
participate in the process of correction and development of
the document in question, ON states that expert resources
in this area are too limited at this time. However, this
does not imply that any document can be accepted. Sorry.
___________________ end of Austrian comments _______________________
________beginning of Japanese comments accompanying negative _______
Japan disapproves CD 14651 proposed in SC22 N2364.
The CD is not mature enough to proceed to DIS from view point of
completeness as a JTC1 standard as follows.
- not precise enough tuned yet from technical view point,
- still not reaching a consensus on the expected ordering result.
- high dependency on ISO/IEC 14652 which is not in CD stage. and
- style of the document does not meet the JTC1 requirement
Therefore, because of high dependency of this CD on ISO/IEC 14652, Japan
requests to wait and synchronize the review and ballot of CD 14651 until
CD 14652 is registered, or to change the scope of the standard to
"ordering result" only and move API part to i18n API project.
Thus, Japan sees absolutely no reason why we need to proceed to DIS now.
Comment detail.
1. Style (major editorial)
The CD is very different from the what ISO/JTC1 directive requires, (and
also different from the template provided by ITTF and many of JTC1
standards) For example, there are very high dependency on font selection
(usage of bold, slant, point size variation and/or unnecessary type face
mixture. are prohibited). The Definition clause need to have sub-clause
for each terms, two groups of annex --one for normative and another for
informative. Review and rewrite all text according to ISO/JTC1 directive
and template supplied by ITTF.
2. Relation with ISO/IEC 14652. (General process)
The syntax and semantics of Annex 1 are not defined in this draft and are
depending on ISO/IEC 14652 which is not available yet. Synchronize the
project with ISO/IEC 14652 development -- wait for decision until CD 14652
is available at least, or, if it is not accepted, move related part of
the ISO/IEC 14652 into this CD..
3. Tutorial (major editorial)
Heavy tutorial clause at the beginning is not a thing to do, move them to
appropriate place and rewrite them to fit the new place. In addition,
there are many "information only" text in main clauses (such as clause
5.3). Remove them out from main (and mostly normative) part of the
standard, and place them (if really necessary) to appropriate related
place(s).
4. Scope (major technical)
Describe what are this standard defines clearly and straight forward way.
For example, change the word "a method" to much clear specific word (which
is API). Once above change is made, it may affect on the title of the
standard. Also the word "Default Tailorable Ordering" does not have
logical meaning. One possibility of the new title would be "API with
default order for International string ordering".
Last part of 2nd bullet (on an order which is culturally---of that script)
should be removed because "order which is acceptable culturally" is not a
scope of this standard. This part should be re-written something like
"The default order is aiming for easy understanding of non-casual user of
the script, cultural correctness/acceptance is not a purpose of the
default order. The correctness/acceptance by the casual (or native) user
to be provided by tailoring by the user or as a country profile".
Rationale: Above has been an agreement on the project scope from the
beginning. There were many discussions of impracticalness of having a
single default order which may satisfy all of cultures. The conclusion
has been it is not practical to have such an ideal default order, and it
was said that "this is why tailoring is needed". Japan, then, did not
request culturally correctness for ordering. Same story for French, since
French ordering is so sophisticated no outsider understand it easily,
therefore, it is not practical to use true French order as international
default order, it may causes mis-understanding of peoples of other
cultures. Such sophisticated ordering (such as French) can be
satisfactorily supported by tailoring anyway. (See clause 4.2.7 of DTR
11017, This IS is not i18n per 4.2.6 nor 4.2.4. This IS is aiming 4.2.7)
5. Definitions (major technical)
5-1, Each definitions should have separated sub-clause number.
5-2. API: Initial text of "for purpose of..... standard" is not
necessary.
5-3. equivalence: Too much, make it almost 1/3 by eliminating
"informative" texts with in this definition. (for example: last 4 lines)
5-4. field, first order talken, fourth order talken level, level, second
order talken, transformation, third order talken: Eliminate "informative"
explanations.
5-5. posthandling, prehandling : Those definition should be moved to the
related clause.
5-6 telephone-book-type transformation: This term need not be defined
in Definitions because it appears only once in Introduction (5th para.,
Page 5). Although Japan considers that the paragraph is understandable
in itself, we propose to change the first sentence to:
More generally, specific requirements exist for a kind of complex
transformation
-- e.g. phonetic transformation adopted in some telephone-book systems
because telephone-book ordering means differ from culture to culture, so,
this wording may confuse the user.
6. Conformance (major technical)
6-1. Conformance clause(s) should come after the scope clause it should
not be after the requirements clause. The location of the conformance
clause is inviting difficulty of understanding of each conformance levels
clearly.
Reason (rationale) why conformance clause should be clause 2:
If requirement is simple and no leveling are employed, the conformance
clause can be any place in theory. Note that ISO/IEC directive part-3
does not require "conformance clause" even. However, in case of ISO/IEC
CD 14651, the condition is different, it should be clause 2.
Since 14651 is a very complicated multilevel standard. the scope clause
can not cover all what "scope' clause should say. The conformance, in
particular, the clean and clear "levels" descriptions are acting, in
reality, as a sub-scope clauses as well as real conformance descriptions.
If it does not come after "scope" clause, it is almost impossible for the
user of the standard to understand "what are defined in this standard and
how to read the standard efficiently and accurately".
6-2, Conformance clause should have exact pointer(s) for the conformance
requirement (clause and sub-clause numbers). Umbrella conformance for
buried requirements with in main clauses (like this CD) should not be
used. (Current CD is too unkindly for reader)
6-3. In case of leveled conformance, provide a sub-clause to explain
what those levels are much straight way. (Too many indirect explanation
now).
6-3-1. Conformance level-1 should be defined as "Generic API only. And
should not make some of the parameters as "option". The option causes
in-compatibility problems between conforming level-1 APIs. Further
define two options (not parameter option s), one for COMPCAR and another
for COMPBIN + CARABIN.
6-3-2 Conformance level-2 should be defined and stated as "Generic API
and table format"
6-3-3 Conformance level-3: Change prehandling to requirement for string
input as normative. Thus prehandling is out of scope of this standard
(remove 5.1.2 at least). Then, change the description of this
conformance level accordantly. By the way, in current text, normative
clause (5.1.2) is reefers informative annex. This is prohibited practice.
6-3-4 Conformance level-4. Remove the word "possibility". then
resultant might be "Add API an access method for specific table.
6-4. Add a concept of conformance for "ordering result only"
6-5 Add a method to specify partial conformance of ordering result, for
example, a method to state "every thing but Japanese repertoire are
conforming this default order and Japanese repertoire are per JIS" would
be a real life use of this standard. (as one of sub-set of the ordering
result only conformance)
6-6, Add a method to swap the order of th
0a+e scripts, but still the orders within each scripts are conforming
default order.
6-7, Add a method to state only selected scripts in comment 6-6 are
conforming the default order.
6-8, Maintain compatibility with POSIX and C. Providing independent
conformance level may be one of the choice to respond for this comment. .
6-9, Remove all of "best guess" dependency. Write exactly what is needed. For example, there is no description what "default order" is. There is default table and API (and conformance levels), so best guess may be use the "default table" with the API
s.
7. Requirements (major technical)
7-1. There are many options in one conformance level, those should be
another levels of conformance if those are really necessary.
7-2. The "Toggle" mechanism, which is realized by parameters
"order_accent", "order_case" and "sign_escape", should be removed
because:
1) it contradicts with the concept of the locale mechanism -- it allows an
ordering regardless of the ordering table defined as a locale,
2) the concepts of "case" and "accents" are specific to some scripts and
they are not defined in this draft where these script-dependent concepts
have been resolved into universal rules in tables.
Instead of the current "Toggle" mechanism, Japan proposes to reconsider
the specification of ordering tables, which will be defined in ISO/IEC
14652, so as to enable variants of the default table be defined more
flexibly -- for example, by introducing som e preprocessing elements
#define ...
#ifdef ...
#include ...
etc.
7-3. table
To specify a name of an ordering table in COMPCAR and CARABIN as a
parameter "table" will put a heavy burden on implementations. At runtime
the processes COMPCAR and CARABIN should check every time whenever the
table is changed from that of the previous call and/or the table should
be compiled.
There are two alternatives to this problem:
1) to remove the parameter "table" from the two processes and define a
new process "set_collating_table" which has a parameter "table",
2) to define a new process "open_table" which has an input parameter
"table" and returns a pointer to a protected structure derived from that
"table" while the parameter "table" in the two process is changed to
"table_pointer".
7-4 "chbin1" and "chbin2" in COMPCAR are not necessary. Further more,
options within an API specification does not make any sense at all.
7-5. The whole contents of 5.3 should be removed or put into an
informative annex because those contents are to be defined in ISO/IEC
14652 in the current framework.
7-6. Add text for the case where characters are not encoded in ISO/IEC
10646. Some character set, e.g. ISO 6937 are not in ISO/IEC 10646, and
some do not have conversion table (or same character names) with ISO/IEC
10646 (yet).
8. Data table (such as Annex A) (major technical)
8-1. Japan confirms a principle of default order table as:
- The default order is non-native user friendly (easy to understand,
simple rule, less exceptions)
- Cultural correctness for the native user of the script should be done
by tailoring. APIs and data format should have enough room for the
necessary tailoring.
- Therefore, cultural correctness of the default order is not a goal of
this standard. Based on the principle above, Japanese proposal on
Japanese scripts are not correct for Japanese view, however, it is easy
for the people who are not familiar with Japanese scripts.
8-2 Collation for HIRAGANA and KATAKANA
Japan proposes to add a set of collating rules for HIRAGANA and KATAKANA
attached..
The order defined in Attachment is different from one defined in JIS X
4061 which was published in February 1997. The main differences in
handling of a prolonged sound mark <U30FC>. Roughly speaking, JIS X 4061
replaces the prolonged sound mark with the vowel of the most recent
letter, while Attachment neglects the prolonged sound mark at first in
the same way as a hyphen.
The second difference is handling of the iteration marks <U309D>, <U309E>,
<U30FD>, <U30FE>. Roughly speaking, JIS X 4061 replaces the iteration
marks with the most recent KANA letter, while Attachment handles the
iteration marks as they are.
The reasons for proposing Attachment are as follows:
1) JIS X 4061 cannot be realized by LC_COLLATE representation
unless some rules using regular expression, which will put a heavy
burden on implementations, are introduced,
2) ordering results of JIS X 4061 are hard to understand for
foreigners without knowledge of how letter sequences are
pronounced -- it is not cross-culture friendly,
3) ordering results of Attachment are easy to understand for
foreigners without knowledge of pronunciation of letter sequences
and even in Japan, a number of encyclopedia order their items in
the same way as Attachment does -- it is cross-culture friendly,
8-3 Consideration on Compatibility characters of ISO/IEC 10646.
Consideration on the compatibility characters are missing. At least,
following are needed. 8-3-1 UFF00-FF9F, FFE0-FFE8
Handle those characters as same as equivalent characters in A-zone.
8-3-2 F900-FA0D, FA10, FA12, FA15-FA1E, FA20, FA22, FA25, FA26,
FA2A-FA2D of ISO/IEC 10646-1 Handle those characters as same as
equivalent characters in I-zone.
8-4 FA0E, FA0F, FA11, FA13, FA14, FA1F, FA21, FA23, FA24, FA27-FA29 of
ISO/IEC 10646-1 and future addition of CJK ideographs (ext-A and B).
Merge them with I-zone characters with defined rule. Provide informative
annex which describe the rule (radical, number of the stroke and so
on.....)
8-5 Character combination type symbols.
For those characters which are made up combination of two or more
Japanese characters such as 3300-336F, Handle those as if those are
string of independent characters.
8-6. Symbols of character(s) and symbol(s)
Symbols with character(s) should be handled one of following methods.
a) Character(s) and symbol(s) like "short form" of normal writing such as
2480 which is looked like "( 13 )". Split the symbol as if it is a
normal string.
b) Character(s) and symbols can not split into one unambiguous sequence
such as 2470 which the circle can be either before or after character 17.
Handle as if it is a special form of the character(s) part of the symbol.
8-7. Symbols for making combining sequence such as 20E0.
Follow the rule proposed at 8-6 above, the process might be different
from the method for combining sequences.
8-8. Japan expect many countries have same kinds of comments above.
Japan request, therefore, confirmation of specific to the data table to
be circulated to all JTC1 member countries (not only SC22 p-member) for
review.
9. Other comments
Japan recognizes many editorial issues as well as technical issues which
are not on this ballot comment, too many major technical comments (and
may be more to expect) does not give us a time to scan all of them.
Japan thinks the minor editorial comment are unnecessary components of
this ballot comments because of un-matureness of the CD 14651.
Anyway, the text should be rewritten totally for full acceptance of the
technical comments.
------- ATTACHMENT ---------
%level 1
<kn-dot>
<kn-prolong>
<kn-a>
<kn-i>
<kn-u>
<kn-e>
<kn-o>
<kn-ka>
<kn-ki>
<kn-ku>
<kn-ke>
<kn-ko>
<kn-sa>
<kn-si>
<kn-su>
<kn-se>
<kn-so>
<kn-ta>
<kn-ti>
<kn-tu>
<kn-te>
<kn-to>
<kn-na>
<kn-ni>
<kn-nu>
<kn-ne>
<kn-no>
<kn-ha>
<kn-hi>
<kn-hu>
<kn-he>
<kn-ho>
<kn-ma>
<kn-mi>
<kn-mu>
<kn-me>
<kn-mo>
<kn-ya>
<kn-yu>
<kn-yo>
<kn-ra>
<kn-ri>
<kn-ru>
<kn-re>
<kn-ro>
<kn-wa>
<kn-wi>
<kn-we>
<kn-wo>
<kn-n>
<kn-cmb-voice>
<kn-cmb-semivoice>
<kn-voice>
<kn-semivoice>
<kn-iter>
%level 2
<kn-HIRA>
<kn-KATA>
%level 3
<kn-SMALL>
<kn-SMALL-HF> % for Table 122
<kn-NORMAL>
<kn-NORMAL-HF> % for Table 122
<kn-VOICED>
<kn-SEMIVOICED>
% abbreviations in comments -- UCS Name:
% HIRAkn -- HIRAGANA LETTER
% KATAkn -- KATAKANA LETTER
% hfwd -- HALFWIDTH
% voice -- HIRAGANA-KATAKANA VOICED SOUND MARK
% semi-voice -- HIRAGANA-kATAKANA SEMI-VOICED SOUND MARK
% iter -- ITERATION
% comb -- COMBINING
% prolong -- KATAKANA-HIRAGANA PROLONGED SOUND MARK
%
<U3041> <kn-a>;<kn-HIRA>;<kn-SMALL>;<IGNORE> % HIRAkn SMALL A
<U3042> <kn-a>;<kn-HIRA>;<kn-NORAML>;<IGNORE> % HIRAkn A
<U30A1> <kn-a>;<kn-KATA>;<kn-SMALL>;<IGNORE> % KATAkn SMALL A
<UFF67> <kn-a>;<kn-KATA>;<kn-SMALL-HF>;<IGNORE> % hfwd KATAkn SMALL A
<U30A2> <kn-a>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn A
<UFF71> <kn-a>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn A
%
<U3043> <kn-i>;<kn-HIRA>;<kn-SMALL>;<IGNORE> % HIRAkn SMALL I
<U3044> <kn-i>;<kn-HIRA>;<kn-NORAML>;<IGNORE> % HIRAkn I
<U30A3> <kn-i>;<kn-KATA>;<kn-SMALL>;<IGNORE> % KATAkn SMALL I
<UFF68> <kn-i>;<kn-KATA>;<kn-SMALL-HF>;<IGNORE> % hfwd KATAkn SMALL I
<U30A4> <kn-i>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn I
<UFF72> <kn-i>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn I
%
<U3045> <kn-u>;<kn-HIRA>;<kn-SMALL>;<IGNORE> % HIRAkn SMALL U
<U3046> <kn-u>;<kn-HIRA>;<kn-NORAML>;<IGNORE> % HIRAkn U
<U3094> <kn-u>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn VU
<U30A5> <kn-u>;<kn-KATA>;<kn-SMALL>;<IGNORE> % KATAkn SMALL U
<UFF69> <kn-u>;<kn-KATA>;<kn-SMALL-HF>;<IGNORE> % hfwd KATAkn SMALL U
<U30A6> <kn-u>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn U
<UFF73> <kn-u>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn U
<U30F4> <kn-u>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn VU
%
<U3047> <kn-e>;<kn-HIRA>;<kn-SMALL>;<IGNORE> % HIRAkn SMALL E
<U3048> <kn-e>;<kn-HIRA>;<kn-NORAML>;<IGNORE> % HIRAkn E
<U30A7> <kn-e>;<kn-KATA>;<kn-SMALL>;<IGNORE> % KATAkn SMALL E
<UFF6A> <kn-e>;<kn-KATA>;<kn-SMALL-HF>;<IGNORE> % hfwd KATAkn SMALL E
<U30A8> <kn-e>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn E
<UFF74> <kn-e>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn E
%
<U3049> <kn-o>;<kn-HIRA>;<kn-SMALL>;<IGNORE> % HIRAkn SMALL O
<U304A> <kn-o>;<kn-HIRA>;<kn-NORAML>;<IGNORE> % HIRAkn O
<U30A9> <kn-o>;<kn-KATA>;<kn-SMALL>;<IGNORE> % KATAkn SMALL O
<UFF6B> <kn-o>;<kn-KATA>;<kn-SMALL-HF>;<IGNORE> % hfwd KATAkn SMALL O
<U30AA> <kn-o>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn O
<UFF75> <kn-o>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn O
%
<U304B> <kn-ka>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn KA
<U304C> <kn-ka>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn GA
<U30F5> <kn-ka>;<kn-KATA>;<kn-SMALL>;<IGNORE> % KATAkn SMALL KA
<U30AB> <kn-ka>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn KA
<UFF76> <kn-ka>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn KA
<U30AC> <kn-ka>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn GA
%
<U304D> <kn-ki>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn KI
<U304E> <kn-ki>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn GI
<U30AD> <kn-ki>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn KI
<UFF77> <kn-ki>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn KI
<U30AE> <kn-ki>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn GI
%
<U304F> <kn-ku>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn KU
<U3050> <kn-ku>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn GU
<U30AF> <kn-ku>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn KU
<UFF78> <kn-ku>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn KU
<U30B0> <kn-ku>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn GU
%
<U3051> <kn-ke>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn KE
<U3052> <kn-ke>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn GE
<U30F6> <kn-ke>;<kn-KATA>;<kn-SMALL>;<IGNORE> % KATAkn SMALL KE
<U30B1> <kn-ke>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn KE
<UFF79> <kn-ke>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn KE
<U30B2> <kn-ke>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn SMALL GE
%
<U3053> <kn-ko>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn KO
<U3054> <kn-ko>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn GO
<U30B3> <kn-ko>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn KO
<UFF7A> <kn-ko>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn KO
<U30B4> <kn-ko>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn GO
%
<U3055> <kn-sa>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn SA
<U3056> <kn-sa>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn ZA
<U30B5> <kn-sa>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn SA
<UFF7B> <kn-sa>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn SA
<U30B6> <kn-sa>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn ZA
%
<U3057> <kn-si>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn SI
<U3058> <kn-si>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn ZI
<U30B7> <kn-si>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn SI
<UFF7C> <kn-si>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn SI
<U30B8> <kn-si>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn ZI
%
<U3059> <kn-su>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn SU
<U305A> <kn-su>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn ZU
<U30B9> <kn-su>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn SU
<UFF7D> <kn-su>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn SU
<U30BA> <kn-su>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn ZU
%
<U305B> <kn-se>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn SE
<U305C> <kn-se>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn ZE
<U30BB> <kn-se>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn SE
<UFF7E> <kn-se>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn SE
<U30BC> <kn-se>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn ZE
%
<U305D> <kn-so>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn SO
<U305E> <kn-so>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn ZO
<U30BD> <kn-so>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn SO
<UFF7F> <kn-so>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn SO
<U30BE> <kn-so>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn ZO
%
<U305F> <kn-ta>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn TA
<U3060> <kn-ta>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn DA
<U30BF> <kn-ta>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn TA
<UFF80> <kn-ta>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn TA
<U30C0> <kn-ta>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn DA
%
<U3061> <kn-ti>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn TI
<U3062> <kn-ti>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn DI
<U30C1> <kn-ti>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn TI
<UFF81> <kn-ti>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn TI
<U30C2> <kn-ti>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn DI
%
<U3063> <kn-tu>;<kn-HIRA>;<kn-SMALL>;<IGNORE> % HIRAkn SMALL TU
<U3064> <kn-tu>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn TU
<U3065> <kn-tu>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn DU
<U30C3> <kn-tu>;<kn-KATA>;<kn-SMALL>;<IGNORE> % KATAkn SMALL TU
<UFF6F> <kn-tu>;<kn-KATA>;<kn-SMAL-HFL>;<IGNORE> % hfwd KATAkn SMALL TU
<U30C4> <kn-tu>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn TU
<UFF82> <kn-tu>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn TU
<U30C5> <kn-tu>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn DU
%
<U3066> <kn-te>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn TE
<U3067> <kn-te>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn DE
<U30C6> <kn-te>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn TE
<UFF83> <kn-te>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn TE
<U30C7> <kn-te>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn DE
%
<U3068> <kn-to>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn TO
<U3069> <kn-to>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn DO
<U30C8> <kn-to>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn TO
<UFF84> <kn-to>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn TO
<U30C9> <kn-to>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn DO
%
<U306A> <kn-na>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn NA
<U30CA> <kn-na>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn NA
<UFF85> <kn-na>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn NA
%
<U306B> <kn-ni>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn NI
<U30CB> <kn-ni>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn NI
<UFF86> <kn-ni>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn NI
%
<U306C> <kn-nu>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn NU
<U30CC> <kn-nu>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn NU
<UFF87> <kn-nu>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn NU
%
<U306D> <kn-ne>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn NE
<U30CD> <kn-ne>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn NE
<UFF88> <kn-ne>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn NE
%
<U306E> <kn-no>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn NO
<U30CE> <kn-no>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn NO
<UFF89> <kn-no>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn NO
%
<U306F> <kn-ha>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn HA
<U3070> <kn-ha>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn BA
<U3071> <kn-ha>;<kn-HIRA>;<kn-SEMIVOICED>;<IGNORE> % HIRAkn PA
<U30CF> <kn-ha>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn HA
<UFF8A> <kn-ha>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn HA
<U30D0> <kn-ha>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn BA
<U30D1> <kn-ha>;<kn-KATA>;<kn-SEMIVOICED>;<IGNORE> % KATAkn PA
%
<U3072> <kn-hi>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn HI
<U3073> <kn-hi>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn BI
<U3074> <kn-hi>;<kn-HIRA>;<kn-SEMIVOICED>;<IGNORE> % HIRAkn PI
<U30D2> <kn-hi>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn HI
<UFF8B> <kn-hi>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn HI
<U30D3> <kn-hi>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn BI
<U30D4> <kn-hi>;<kn-KATA>;<kn-SEMIVOICED>;<IGNORE> % KATAkn PI
%
<U3075> <kn-hu>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn HU
<U3076> <kn-hu>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn BU
<U3077> <kn-hu>;<kn-HIRA>;<kn-SEMIVOICED>;<IGNORE> % HIRAkn PU
<U30D5> <kn-hu>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn HU
<UFF8C> <kn-hu>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn HU
<U30D6> <kn-hu>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn BU
<U30D7> <kn-hu>;<kn-KATA>;<kn-SEMIVOICED>;<IGNORE> % KATAkn PU
%
<U3078> <kn-he>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn HE
<U3079> <kn-he>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn BE
<U307A> <kn-he>;<kn-HIRA>;<kn-SEMIVOICED>;<IGNORE> % HIRAkn PE
<U30D8> <kn-he>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn HE
<UFF8D> <kn-he>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn HE
<U30D9> <kn-he>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn BE
<U30DA> <kn-he>;<kn-KATA>;<kn-SEMIVOICED>;<IGNORE> % KATAkn PE
%
<U307B> <kn-ho>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn HO
<U307C> <kn-ho>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAkn BO
<U307D> <kn-ho>;<kn-HIRA>;<kn-SEMIVOICED>;<IGNORE> % HIRAkn PO
<U30DB> <kn-ho>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn HO
<UFF8E> <kn-ho>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn HO
<U30DC> <kn-ho>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn BO
<U30DD> <kn-ho>;<kn-KATA>;<kn-SEMIVOICED>;<IGNORE> % KATAkn PO
%
<U307E> <kn-ma>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn MA
<U30DE> <kn-ma>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn MA
<UFF8F> <kn-ma>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn MA
%
<U307F> <kn-mi>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn MI
<U30DF> <kn-mi>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn MI
<UFF90> <kn-mi>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn MI
%
<U3080> <kn-mu>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn MU
<U30E0> <kn-mu>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn MU
<UFF91> <kn-mu>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn MU
%
<U3081> <kn-me>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn ME
<U30E1> <kn-me>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn ME
<UFF92> <kn-me>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn ME
%
<U3082> <kn-mo>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn MO
<U30E2> <kn-mo>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn MO
<UFF93> <kn-mo>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn MO
%
<U3083> <kn-ya>;<kn-HIRA>;<kn-SMALL>;<IGNORE> % HIRAkn SMALL YA
<U3084> <kn-ya>;<kn-HIRA>;<kn-NORAML>;<IGNORE> % HIRAkn YA
<UFF6C> <kn-ya>;<kn-KATA>;<kn-SMAL-HFL>;<IGNORE> % hfwd KATAkn SMALL YA
<U30E3> <kn-ya>;<kn-KATA>;<kn-SMALL>;<IGNORE> % KATAkn SMALL YA
<U30E4> <kn-ya>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn YA
<UFF94> <kn-ya>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn YA
%
<U3085> <kn-yu>;<kn-HIRA>;<kn-SMALL>;<IGNORE> % HIRAkn SMALL YU
<U3086> <kn-yu>;<kn-HIRA>;<kn-NORAML>;<IGNORE> % HIRAkn YU
<U30E5> <kn-yu>;<kn-KATA>;<kn-SMALL>;<IGNORE> % KATAkn SMALL YU
<UFF6D> <kn-yu>;<kn-KATA>;<kn-SMAL-HFL>;<IGNORE> % hfwd KATAkn SMALL YU
<U30E6> <kn-yu>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn YU
<UFF95> <kn-yu>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn YU
%
<U3087> <kn-yo>;<kn-HIRA>;<kn-SMALL>;<IGNORE> % HIRAkn SMALL YO
<U3088> <kn-yo>;<kn-HIRA>;<kn-NORAML>;<IGNORE> % HIRAkn YO
<U30E7> <kn-yo>;<kn-KATA>;<kn-SMALL>;<IGNORE> % KATAkn SMALL YO
<UFF6E> <kn-yo>;<kn-KATA>;<kn-SMAL-HFL>;<IGNORE> % hfwd KATAkn SMALL YO
<U30E8> <kn-yo>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn YO
<UFF96> <kn-yo>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn YO
%
<U3089> <kn-ra>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn RA
<U30E9> <kn-ra>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn RA
<UFF97> <kn-ra>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn RA
%
<U308A> <kn-ri>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn RI
<U30EA> <kn-ri>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn RI
<UFF98> <kn-ri>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn RI
%
<U308B> <kn-ru>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn RU
<U30EB> <kn-ru>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn RU
<UFF99> <kn-ru>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn RU
%
<U308C> <kn-re>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn RE
<U30EC> <kn-re>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn RE
<UFF9A> <kn-re>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn RE
%
<U308D> <kn-ro>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAkn RO
<U30ED> <kn-ro>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn RO
<UFF9B> <kn-ro>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn RO
%
<U308E> <kn-wa>;<kn-HIRA>;<kn-SMALL>;<IGNORE> % HIRAkn SMALL WA
<U308F> <kn-wa>;<kn-HIRA>;<kn-NORAML>;<IGNORE> % HIRAkn WA
<U30EE> <kn-wa>;<kn-KATA>;<kn-SMALL>;<IGNORE> % KATAkn SMALL WA
<U30EF> <kn-wa>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn WA
<UFF9C> <kn-wa>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn WA
<U30F7> <kn-wa>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn VA
%
<U3090> <kn-wi>;<kn-HIRA>;<kn-NORAML>;<IGNORE> % HIRAkn WI
<U30F0> <kn-wi>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn WI
<U30F8> <kn-wi>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn VI
%
<U3091> <kn-we>;<kn-HIRA>;<kn-NORAML>;<IGNORE> % HIRAkn WE
<U30F1> <kn-we>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn WE
<U30F9> <kn-we>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn VE
%
<U3092> <kn-wo>;<kn-HIRA>;<kn-NORAML>;<IGNORE> % HIRAkn WO
<U30F2> <kn-wo>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn WO
<UFF66> <kn-wo>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn WO
<U30FA> <kn-wo>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAkn VO
%
<U3093> <kn-n>;<kn-HIRA>;<kn-NORAML>;<IGNORE> % HIRAkn N
<U30F3> <kn-n>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAkn N
<UFF9D> <kn-n>;<kn-KATA>;<kn-NORMAL-HF>;<IGNORE> % hfwd KATAkn N
---
<U3099> <kn-cmb-voice>;<kn-HIRA>;<kn-NORAML>;<IGNORE> % comb voice
<U309A> <kn-cmb-semivoice>;<kn-HIRA>;<kn-NORAML>;<IGNORE> % comb semi-voice
<U309B> <kn-voice>;<kn-HIRA>;<kn-NORAML>;<IGNORE> % voice
<UFF9E> <kn-voice>;<kn-HIRA>;<kn-NORAML-HF>;<IGNORE> % hhwd voice
<U309C> <kn-semivoice>;<kn-HIRA>;<kn-NORAML>;<IGNORE> % semi-voice
<UFF9F> <kn-semivoice>;<kn-HIRA>;<kn-NORAML-HF>;<IGNORE> % hfwd semi-voice
<U309D> <kn-iter>;<kn-HIRA>;<kn-NORMAL>;<IGNORE> % HIRAGANA iter MARK
<U30FD> <kn-iter>;<kn-HIRA>;<kn-VOICED>;<IGNORE> % HIRAGANA VOICED iter MARK
<U309E> <kn-iter>;<kn-KATA>;<kn-NORMAL>;<IGNORE> % KATAKANA iter MARK
<U30FE> <kn-iter>;<kn-KATA>;<kn-VOICED>;<IGNORE> % KATAKANA VOICED iter MARK
%
<U30FB> <IGNORE>;<IGNORE>;<IGNORE>;<U30FB> % KATAKANA MIDDLE DOT
<UFF65> <IGNORE>;<IGNORE>;<IGNORE>;<UFF65> % hfwd KATAKANA MIDDLE DOT
<U30FC> <IGNORE>;<IGNORE>;<IGNORE>;<U30FC> % prolong
<UFF70> <IGNORE>;<IGNORE>;<IGNORE>;<UFF70> % hfwd prolong
<UFF61> % hfwd HALFWIDTH IDEOGRAPHIC FULL STOP -- to be handled with <U3002>
<UFF62> % hfwd HALFWIDTH LEFT CORNER BRACKET -- to be handled with <U300C>
<UFF63> % hfwd RIGHT CORNER BRACKET -- to be handled with <U300D>
<UFF64> % HALFWIDTH IDEOGRAPHIC COMMA -- to be handled with <U3001>
________________________end of Japan comments ___________________________
__________beginning of Netherlands comments accompanying negative _____
From: John Bijlsma <[email protected]>
JTC1 SC22 N2364, ISO/IEC CD 14651
IT - International String Ordering - Method for
Comparing Character Strings and Description of a
Default Tailorable Ordering
97-04-24, DISAPPROVAL WITH COMMENT
......................................................
The Netherlands vote negative on CD 14651. To turn our vote to positive
modifications shall be made in accordance with our comments. We reserve
our final position regarding the CD until we have seen the Final CD.
Technical comments:
1. Remove Annex 1 and all references to an International Default Order.
-- SC22 has no expertise in this field, and cannot check for correctness
Most NBs in SC22 are not able to check whether a proposed ordering
for a certain unfamiliar script is in agreement to actual practice
far from home. Those NBs that are familiar are not represented in
SC22, nor have been asked for comment.
-- Default order is an instrument of cultural imperialism.
In several countries more than one ordering rule is in use without
any agreed preference. Calling one of these the "default" is
imposing an extraneous pressure, and will involve interference with
national habits.
-- No need for a default.
No country uses always all characters from 10646. They should not be
burdened with unwanted features. A method for supplying ordering
information for a given restricted character set to an API should be
contained in 14651 itself, without reference to 14652.
2. Remove all references to 14652.
-- Needless complexity should be avoided.
An ISO standard should be as independent as possible of other ISO
standards. If ordering information can only be supplied by way
of a complete set of cultural conventions, as specified in 14652,
there is involved an enormous overhead, and an obligation to NBs of
also having to specify non-ordering information which is irrelevant
to 14651, but nevertheless required in this CD.
Editorial comments:
The text of this document leaves much to be desired regarding
precision of definition, clarity of presentation and conformance to
ISO directives part-3.
The NNI cannot give detailed comments here, nor offer replacement text as
doing so would require rewriting more than half of the document for which
we have no resources available. The NNI already gave some directions with
its vote on CD-registration, but found almost no improvement in this CD.
__________________________end of Netherland comments ________________
_________beginning of USA comments accompanying negative _____________
The US National Body votes to Disapprove ISO/IEC CD 14651 with the following
comments:
These are the U.S. comments for the first CD ballot for ISO/IEC CD 14651,
International String Ordering (SC22 N2364).
No alternative text is supplied as part of this response because a lot of it
would have to be written. Here are the concerns:
AF-1
The specification of the sorting algorithm must be made independently of a
programming model.
Sorting is a process that is used in an incredible variety of circumstances
and on widely different systems, including object-oriented systems. Care
should be taken in preparing the normative specifications for CD 14651 that
they are usable independent of a particular programming model, programming
language, or environment.
In particular, the descriptions of the sorting operations should be
expressed
in an abstract form, specifying IN, OUT and RETURN parameters but "without"
language binding. Also, no parameters needed for the sorting operation may
be
presumed to hide in some semi-opaque state, but rather they should always be
specified explicitly in the description of the operation.
If it is desired to show how the standard might be implemented in a POSIX
environment, that could be the subject of an informative annex. Function
bindings for POSIX could assume transparent access to locale data from the
POSIX locale model, if that is desired. The annex would specify how the
proposed POSIX functions make use of the abstract operations defined in the
normative part of the standard, and how their parameters are set either
explicitly or implicitly.
RLG 1:
The body of the standard includes material which belongs in an informative
annex, specifically the "Tutorial on problems solved by this standard."
RLG 2:
The order specified for two Cyrillic characters (p. 95-100 of the CD)
conflicts with the order in Table 2 of ISO/R9 and other sources (cited
below).
The characters in question are these two case pairs: CYRILLIC CAPITAL
LETTER
TSHE/CYRILLIC SMALL LETTER TSHE and CYRILLIC CAPITAL LETTER DZE/CYRILLIC
SMALL
LETTER DZE.
Cyrillic letter TSHE:
In the CD, TSHE follows KA WITH HOOK and precedes EL.
In ISO/R9 and other sources, TSHE follows TE and precedes U.
Cyrillic letter DZE:
In the CD, DZE follows KOPPA and precedes CHE.
In ISO/R9 and other sources, DZE follows ZE and precedes I.
Other differences in the order of Cyrillic characters between the CD and
Table
2 of ISO/R9 are either not supported by the other sources or are arbitrary.
RLG 3:
The order of scripts on p. 31 differs slightly from the order in ISO/IEC
10646. Specifically:
- Georgian follows Cyrillic; in ISO/IEC 10646, it follows Tibetan (pDAM-6)
- Hebrew follows Arabic, in ISO/IEC 10646, it follows Armenian (and
precedes Arabic).
These differences are not explained.
RLG 4:
Hangul is positioned between Tibetan and Cherokee (i.e., consistent with the
location of Hangul Jamo in ISO/IEC 10646). There is no explanation as to
why this position was chosen, rather than that of Hangul Syllables. Since
Korean may be written with a mixture of ideographs and Hangul syllables,
the Hangul Syllables position established by pDAM-5, immediately after the
CJK Unified Ideographs, might be preferable.
HP 1
The outline of the document does not follow the well defined and established
method already used in other JTC1 standards. For example, the Introduction
is too big and the reader gets lost and might decide not to continue to
read the document. Usually such information belongs to an informative
annex otherwise it becomes normative.
HP 2
The structure of the document has the "Scope" clause on page 11. This
clause should come immediately after a newly written short Introduction
clause. In addition, this clause needs clarifications. For example, does
it describes the APIs needed by applications to specify character string
ordering? It is also not clear what is meant by the phrase "full
repertoire of ISO/IEC 10646 (independently of coding)". The part that is
not clear in the previous statement is the one in parenthesis. In
addition, the "Scope" clause talks about a specific default ordering but
it is not clear as to where in the CD how it was derived or how it is
related to the APIs.
HP 3
The "Conformance" clause should follow immediately the "Scope" clause. It
should be combined with the "Requirements" clause. It should be rewritten
to make easy to understand how to conform without having to go through the
syntax and content complexity of the "Requirements" clause.
Conformance is difficult to determine from the document; the document
requires a table of precisely which features are required. Moreover, the
functions levels are, in general, independent of the previous level; there
is little reason to force all features of one level before the next higher
is reached. Post handling is informative, and has no place in
conformance.
HP 4
In the clause "Tailoring Mechanism", it is not clear at all as to what an
application developers needs to do to override the default ordering that is
specified in Annex 1.
HP 5
May be it would be better to have this CD become a Technical Report rather
than a standard since it allows users to override the default ordering
proposed and there might be more users overriding the default, with an
undefined and nowhere described mechanism, than what the CD proposes.
HP 6
Dependency on an unpublished standard 14652, Cultural Conventions
Specification is too high. Currently, 14652 is still in the CD stage as
mentioned in clause 2, Normative References, of this CD (14651).
In summary, there is a lot of structural and technical fine tuning that is
necessary to make this document complete. If such an effort takes too much
time may be the industry could be served better if the proposal is modified
for publication as a TR rather an ISO standard. This work can be later
converted to an ISO publication when CD 14652, Cultural Conventions
Specification, is accepted and is published as an ISO standard.
TG 1
The organization and nomenclature (e.g. COMPCAR) in unnecessarily obscure.
Names should be spelled out completely for clarity.
TG 2
The requirement that the original string be recoverable is unnecessary; many
applications, such as databases, will have a sort key be an alternate field
in the record. They may only need to have a level 1 sort for their
application. In that case, storing the original string twice or requiring
internal structure that enables reconstruction is unnecessary and only
increases storage to no purpose.
TG 3
Use of NBSP is in practice an unacceptable overload of its primary function.
Being able to functionally tailor just space and nbsp is in practice not
useful; in general a whole host of similar characters, punctuation and
symbols, behave the same way.
TG 4
The algorithm for comparison must be stated in terms of results, NOT a
specific mechanism.
TG 5
The format in Annex 1 is unnecessarily complex. It is impossible to assess
and recommend this standard where we cannot clearly determine the result
of the default sorting order rules in this annex. It forces use of a
whole separate notation for characters. To correct this, characters must
always be referred to by their full 10646 name for clarity, rather than
arbitrary notations such as AYEHS, AIGUT, POINN, QARNP, or many other
examples. Script names should always be the 10646 block name.
TG 6
The equivalencies of composed characters vs. composite character sequences;
e.g. a + umlaut and a-umlaut can be stated much more succinctly.
TG 7
The relative ordering of characters cannot be determined from the character
lists, since they are not even remotely in the resulting order. To correct
this, the ordering of characters within a script must be presented in the
resulting order as much as possible. Example:
<U0000> IGNORE;IGNORE;IGNORE;<U0000> % NULL
<U2400> IGNORE;IGNORE;IGNORE;<U2400> % SYMBOL FOR NULL
<U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING
<U2401> IGNORE;IGNORE;IGNORE;<U2401> % SYMBOL FOR START OF HEADING
<U0002> IGNORE;IGNORE;IGNORE;<U0002> % START OF TEXT
<U2402> IGNORE;IGNORE;IGNORE;<U2402> % SYMBOL FOR START OF TEXT
<U0003> IGNORE;IGNORE;IGNORE;<U0003> % END OF TEXT
<U2403> IGNORE;IGNORE;IGNORE;<U2403> % SYMBOL FOR END OF TEXT
...
The fourth column (in this case) determines the final ordering of the
characters, which is NOT the order presented. It must be presented as:
<U0000> IGNORE;IGNORE;IGNORE;<U0000> % NULL
<U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING
<U0002> IGNORE;IGNORE;IGNORE;<U0002> % START OF TEXT
<U0003> IGNORE;IGNORE;IGNORE;<U0003> % END OF TEXT
...
<U2400> IGNORE;IGNORE;IGNORE;<U2400> % SYMBOL FOR NULL
<U2401> IGNORE;IGNORE;IGNORE;<U2401> % SYMBOL FOR START OF HEADING
<U2402> IGNORE;IGNORE;IGNORE;<U2402> % SYMBOL FOR START OF TEXT
<U2403> IGNORE;IGNORE;IGNORE;<U2403> % SYMBOL FOR END OF TEXT
TG 8
The Annex also does not make clear that the vast majority of its characters
are sorted in character code order. This requires the reader to visually
inspect every line to no purpose. These should be replaced one statement;
"Except where otherwise noted, all symbols are sorted as:
<Uxxxx> IGNORE;IGNORE;IGNORE;<Uxxxx>"
TG 9
Annex 2
List #1 is superfluous. The statement should be that the words in List#2 in
any initial order, when sorted will result in List #2.
______________________ end of USA comments __________________________
_________beginning of Israel comments accompany negative ________________
THE STANDARDS INSTITUTE OF ISRAEL (SII)
Comments on ISO/IEC CD 14651 (ISO/IEC JTC 1/SC22/WG20 N471en)
The SII votes NO on CD 14651. If items 1, 2 and 3 were to be accepted,
our vote would become YES.
1. Hebrew Accents
The Hebrew accents (UO591 to UO5AF), Meteg (UO5BD) and Upper Dot (UO5C4)
do not participate in the string ordering process. They relate, in fact,
to the whole word, rather than to the letter to which they are attached,
and are never used in the lexicographic order or in any other ordering of
Hebrew texts.
- The Hebrew accents should be removed from the list of collating
symbols, page 35, and from page 45.
- On page 56 they should all be defined as:
- IGNORE; IGNORE; IGNORE; IGNORE;
2. Composite characters and combining characters.
It seems that combining characters do not sort and compare as equivalent
to their precomposed encoding. For instance, the two strings "Gu:nther"
and "Gu:nther", the first coded with UOOFC, the second with UOO75 followed
by UO3O8, are equivalent and should not be distinguished but are not
equivalent in the CD. The particular coding used is an artifact, possibly
not under the control of the user, and is normally meaningless.
3. Introduction, page 6, last paragraph: "If two equivalent strings are
not absolutely identical, then the tie must be broken."
This sentence is not acceptable. If two strings are equivalent they
should be treated as such. For example, Hebrew strings that are
equivalent but have different accents.
4. Introduction, page 4 (Editorial):
The introduction begins with a negative statement and continues with a
criticism of past practices. The SII suggests it should be preferable to
begin with a positive statement describing what the standard is and what
are its benefits.
5. Tutorial, page 7 (Editorial).
The tutorial would be better placed in an informative appendix.
6. Page 35 (Editorial).
The comment should be qubuts (the s is mussing).
________________end of Israel comments; end of document SC22 N2466 ____