CollateDemopublic class CollateDemo extends DemoApplet Concrete class for demonstrating language sensitive collation.
The following is the instruction on how to run the collation demo.
===================
Customization
You can produce a new collation by adding to or changing an existing
one.
To show...
You can modify an existing collation to show how this works.
By adding items at the end of a collation you override earlier
information.
Watch how you can make the letter P sort at the end of the
alphabet.
Do...
1. Scroll to the end of the Sequence field. After the Z,
type
"< p , P". This will put the letter P (with both of its
case
variants) at the end of the alphabet. Hit the Set Rule button. This creates
a new collation with the name "Custom-1" (you could give it a
different name by typing in the Collator Name field). When you now look
at the Text field, you will see that you have changed the sequence to put
Pat
at the end. (If you did not have Sort Ascending on, click it
now.)
Making P sort at the end may not seem terribly useful, but it is used to
make modifications in the sorting sequence for different languages.
To show...
For example, you can add CH as a single letter after C, as
in
traditional Spanish sorting.
Do...
Enter in the following after Z; "& C < ch , cH, Ch,
CH".
Hit the Set Rule button, type in test words in the Text field (such as
"czar",
"churo" and "darn"), and select Sort Ascending to
see
the resulting sort order.
To show...
You can also add other sequences to the collation rules,
such as sorting symbols with their alphabetic equivalents.
Do...
1. Scroll to the end of the Sequence field. After the end,
type the following list (you can just select this text in your browser and
paste it in, to avoid typing). Now type lines in the Text field with these
symbols on them, and select Sort Ascending to see the resulting sort
order.
- & Asterisk ; *
- & Question-mark ; ?
- & Hash-mark ; #
- & Exclamation-mark ; !
- & Dollar-sign ; $
- & Ampersand ; '&';
Details
If you are an advanced user and interested in trying out more rules,
here is a brief explanation of how they work. The sequence is a list of
rules. Each rule is of two forms:
- <modifier>
- <relation> <text-argument>
- <reset> <text-argument>
Modifier
@ Indicates that accents are sorted backwards, as in
French
Text-argument
The text can be any number of characters (if you want to include special
characters, such as space, use single-quotes around them).
Relation
The relations are the following:
- < Greater, as a letter difference (primary)
- ; Greater, as an accent difference (secondary)
- , Greater, as a case difference (tertiary)
- = Equal
- & Reset previous comparison.
Reset
The "&" is special in that does not put the text-argument
into the sorting sequence; instead, it indicates that the next
rule is with respect to where the text-argument would be sorted.
This sounds more complicated than it is in practice. For example, the
following are equivalent ways of expressing the same thing:
- a < b < c
- a < b & b < c
- a < c & a < b
Notice that the order is important, since the subsequent item goes
immediately
after the text-argument. The following are not equivalent:
- a < b & a < c
- a < c & a < b
The text-argument must already be present in the sequence, or some
initial substring of the text-argument must be present. (e.g. "a <
b& ae < e" is valid since "a" is present in the
sequencebefore "ae" is reset). In this latter case,
"ae"
is not entered and treated as a single character; instead,
"e" is sorted as if it were expanded to two characters:
"a"
followed by an "e".
This difference appears in natural languages: in traditional Spanish
"ch"
is treated as though it contracts to a single character
(expressed
as "c < ch < d"), while in traditional German
"ä"
(a-umlaut) is treated as though it expands to two characters
(expressed
as "a & ae ; ä < b").
Ignorable Characters
The first rule must start with a relation (the examples we have used are
fragments; "a < b" really should be "< a <
b").
If, however, the first relation is not "<", then all the
all
text-arguments up to the first "<" are ignorable. For
example,
", - < a < b" makes "-" an ignorable
character,
as we saw earlier in the word "black-birds".
Accents
The Collator automatically normalizes text internally to separate
accents
from base characters where possible. So, if you type in an
"ä"
(a-umlaut), after you reset the collation you will see
"a\u0308"
in the sequence, where \u0308 is the Java syntax for umlaut. The
demonstration
program uses this syntax instead of just showing the umlaut since many
browsers are unable to display the umlaut yet.
Errors
The following are errors:
- Two relations in a row (e.g. "a < , b"
- Two text arguments in a row (e.g. "a < b c < d")
- A reset where the text-argument is not already in the sequence
(e.g."a < b & e < f")
If you produce one of these errors, then the demonstration will beep at
you, and select the offending text (note: on some browsers, the
selection will not appear correctly). |
Methods Summary |
---|
public java.awt.Frame | createDemoFrame(DemoApplet applet)This creates a CollateFrame for the demo applet.
return new CollateFrame(applet);
| public static void | main(java.lang.String[] argv)The main function which defines the behavior of the CollateDemo applet
when an applet is started.
DemoApplet.showDemo(new CollateFrame(null));
|
|