Khmer Localization23
Khmer Language As in the case of Thai and Lao, Khmer script originates from the Grantha script, the south Indian form of the ancient Indian Brahmi writing system.
Khmer script follows complex rules of layout in which consonants may take two different forms (e.g., the small form is placed on a lower line if it immediately follows another consonant). Space is used not to separate words but to indicate a pause in reading (very much like a comma in English). Vowels pronounced after a consonant may appear before, after, above, below; before and after (formed by two glyphs); before and above; below and above; or under and after the consonant.
At present, the definition of the language is so poor that even the number of vowels in the language is not clear. The number of vowels in the official reference (the only available dictionary) is different from the number of vowels taught in schools. The reference dictionary is sorted phonetically, making a systematic collation algorithm that will follow the same order impossible. Words starting with the same consonant may be ordered under different listings depending on how that consonant is pronounced in that word. As in the Lao localization project, an English/Khmer technical dictionary is not available, and the lack of it severely hampers the efforts to translate software into the local language.
Obstacles and Successes
When the KhmerOS project was first being considered, the technical situation was as follows:
- Khmer had already been included in Unicode. Fixed in 1996 by a team of people who had no contact with the Cambodian government, the definition in Unicode was later disputed, but to no avail. The Unicode Consortium refused to change anything, including the addition of necessary Khmer vowels, on the basis that these could be formed by combining other existing Khmer characters. The Consortium only permitted adding comments to the existing standard. The standard is now considered fixed by the Khmer government (in its 4.0).
- Microsoft had published OpenType specifications for Khmer, and included the language in its Uniscribe complex text layout engine. Microsoft Publisher worked very well in Khmer, but Microsoft still did not handle either line-breaking or sorting. Microsoft Word crashed quite often while using Khmer.
- Some OpenType Khmer fonts already existed, though none in the public domain.
- No FOSS programs were implemented in Khmer in the GNU/Linux environment, but some FOSS (such as Mozilla) worked well in Khmer under Windows, using the Microsoft Uniscribe engine.
- Some people had been considering FOSS in Khmer, but the idea had not gone beyond mailing list discussions.
- There has been an amazing proliferation of legacy (non-Unicode) fonts. Up to 26 different font encodings had been defined. They worked well enough under MS Word by modifying

Technorati Tags: 




Post new comment