Thursday, September 19, 2013

Locale character sorting - java

I am currently working on multilingual site development. Yesterday I faced an interesting requirement to sort the locale characters.

Assuming the english text sorting, we have written the code as Treemap based sorting.
Map data = new TreeMap();  
     data .put("sarav", "000");  
     data .put("rohit", "002"); 
As i18n involved, what happens all locale based text got the last position in the sorting.
data .put("ärav", "003"); 

Interestingly Treemap has a constructor with Compartor

TreeMap(Collator.getInstance(Locale.GERMAN))

The Collator class performs locale-sensitive String comparison. By using this Collator, we will be able to build searching and sorting routines for natural language text.

Collator is an abstract base class. Subclasses implement specific collation strategies. One subclass, RuleBasedCollator, is currently provided with the Java 2 platform and is applicable to a wide set of languages.

Like other locale-sensitive classes, you can use the static factory method, getInstance, to obtain the appropriate Collator object for a given locale, as I used above.

We can set a Collator's strength property to determine the level of difference considered significant in comparisons. Four strengths are provided: PRIMARY, SECONDARY, TERTIARY, and IDENTICAL. The exact assignment of strengths to language features is locale dependant. For example, in Czech, "e" and "f" are considered primary differences, while "e" and "ê" are secondary differences, "e" and "E" are tertiary differences and "e" and "e" are identical.

I didn't set any strength for my requirement, it worked as-is. I found a sample for this in the below URL

http://stackoverflow.com/questions/7502642/sort-a-list-of-hungarian-strings-in-the-hungarian-alphabetical-order