User login
Popular content
Today's:
All time:
Last viewed:
Recent comments
- "The Days of Cheap Oil is Over."
1 day 10 hours ago - I would like more info
3 days 8 hours ago - Thank you for this post and
3 days 12 hours ago - Went to Bootcamp in 1991
3 days 13 hours ago - Absolutely rocks in every
3 days 18 hours ago - Simpsons
3 days 19 hours ago - Why Great Lakes was kept open and Orlando and San Diego closed.
4 days 3 hours ago - As an introduction to
4 days 10 hours ago - CEO Monkey Business
4 days 14 hours ago - Nevis Island Water Exploration On Schedule
4 days 15 hours ago
Annex B. Localization : Technical Aspects (Fonts)
Taran — Tue, 06/14/2005 - 21:41
Font Development Tools
Some FOSS tools for developing fonts are available. Although not as many as their proprietary counterparts, they are adequate to get the job done, and are continuously being improved. Some interesting examples are:
- XmBDFEd13. Developed by Mark Leisher, XmBDFEd is a Motif-based tool for developing BDF fonts. It allows one to edit bit-map glyphs of a font, do some simple transformations on the glyphs, transfer information between different fonts, and so on.
- FontForge14 (formerly PfaEdit15). Developed by George Williams, FontForge is a tool for developing outline fonts, including Postscript Type1, TrueType, and OpenType. Scanned images of letters can be imported and their outline vectors automatically traced. The splines can be edited, and transformations like skewing, scaling, rotating, thickening may be applied and much more. It provides sufficient functionalities for editing Type1 and TrueType fonts properties. OpenType tables can also be edited in its recent versions. One weak point, however, is hinting [No reference found... yet]. It guarantees Type1 hints quality, but not for TrueType.
- TTX/FontTools16. Just van Rossum�s TTX/FontTools is a tool to convert OpenType and TrueType fonts to and from XML. FontTools is a library for manipulating fonts, written in Python. It supports TrueType, OpenType, AFM and, to a certain extent, Type 1 and some Mac-specific formats. It allows one to dump OpenType tables, examine and edit them with XML or plain text editor, and merge them back to the font.
Font Configuration
There have been several font configuration systems available in GNU/Linux desktops. The most fundamental one is the X Window font system itself. But, due to some recent developments, another font configuration called fontconfig has been developed to serve some specific requirements of modern desktops. These two font configurations will be discussed briefly.
First, however, let us briefly discuss the X Window architecture, to understand font systems. X Window is a client-server system. X servers are the agents that provide service to control hardware devices, such as video cards, monitors, keyboards, mice or tablets, as well as passes user input events from the devices to the clients. X clients are GUI application programs that request X server to draw graphical objects on the screen, and accept user inputs via the events fed by X server. Note that with this architecture, X client and server can be on different machines in the network. In which case, X server is the machine that the user operates with, while X client can be a process running on the same machine or on a remote machine in the network.
In this client-server architecture, fonts are provided on the server side. Thus, installing fonts means configuring X server by installing fonts and registering them to its font path.
However, since X server is sometimes used to provide thin-client access in some deployments, where X server may run on cheap PCs booted by floppy or across network, or even from ROM, font installation on each X server is not always appropriate. Thus, font service has been delegated to a separate service called X Font Server (XFS). Another machine in the network can be dedicated for font service so that all X servers can request font information. Therefore, with this structure, an X server may be configured to manage fonts by itself or to use fonts from the font server, or both.
Nevertheless, recent changes in XFree86 have addressed some requirements to manage fonts at the client side. The Xft extension provides anti-aliased glyph images by font information provided by the X client. With this, the Xft extension also provides font management functionality to X clients in its first version. This was later split from Xft2 into a separate library called fontconfig. fontconfig is a font management system independent of X, which means it can also apply to non-GUI applications such as printing services. Modern desktops, including KDE 3 and GNOME 2 have adopted fontconfig as their font management systems, and have benefited from closer integration in providing easy font installation process. Moreover, client-side fonts also allow applications to do all glyph manipulations, such as making special effects, while enjoying consistent appearance on the screen and in printed outputs.
The splitting of the X client-server architecture is not standard practice on stand-alone desktops. However, it is important to always keep the split in mind, to enable particular features.
Output Methods
Since the usefulness of XOM is still being questioned, we shall discuss only the output methods already implemented in the two major toolkits: Pango of GTK+ 2 and Qt 3.
Pango Text Layout Engines
Pango [�Pan� means �all� in English and �go� means �language� in Japanese]18 is a multilingual text layout engine designed for quality text typesetting. Although it is the text drawing engine of GTK+, it can also be used outside GTK+ for other purposes, such as printing19. This section will provide localizers with a bird�s eye view of Pango. The Pango reference manual20 should be consulted for more detail.
PangoLayout
At a high level, Pango provides the PangoLayout class that takes care of typesetting text in a column of given width, as well as other information necessary for editing, such as cursor positions. Its features may be summarized as follows:
- Paragraph Properties
- indent
- justification
- spacing
- word/character
- wrapping modes
- alignment
- tabs
- Text Elements
- get lines and their extents
- character logical attributes (is line break, is cursor position, etc.)
- get runs and their extents
- cursor movements
- character search at (x, y) position
- Text Contents
- plain text
- markup text
Middle-level Processing
Pango also provides access to some middle-level text processing functions, although most clients in general do not use them directly. To gain a brief understanding of Pango internals, some highlights are discussed here. There are three major steps for text processing in Pango:21
- Itemize. Breaks input text into chunks (items) of consistent direction and shaping engine. This usually means chunks of text of the same language with the same font. Corresponding shaping and language engines are also associated with the items.
- Break. Determines possible line, word and character breaks within the given text item. It calls the language engine of the item (or the default engine based on Unicode data if no language engine exists) to analyze the logical attributes of the characters (is-line-break, is-char-break, etc.).
- Shape. Converts the text item into glyphs, with proper positioning. It calls the shaping engine of the item (or the default shaping engine that is currently suitable for European languages) to obtain a glyph string that provides the information required to render the glyphs (code point, width, offsets, etc.).
Pango Engines
Pango engines are implemented in loadable modules that provide entry functions for querying and creating the desired engine. During initialization, Pango queries the list of all engines installed in the memory. Then, when it itemizes input text, it also searches the list for the language and shaping engines available for the script of each item and creates them for association to the relevant text item.
Pango Language Engines
As discussed above, the Pango language engine is called to determine possible break positions in a text item of a certain language. It provides a method to analyze the logical attributes of every character in the text as listed in Table 3.
Pango Shaping Engines
As discussed above, the Pango shaping engine converts characters in a text item in a certain language into glyphs, and positions them according to the script constraints. It provides a method to convert a given text string into a sequence of glyphs information (glyph code, width and positioning) and a logical map that maps the glyphs back to character positions in the original text. With all the information provided, the text can be properly rendered on output devices, as well as accessed by the cursor despite the difference between logical and rendering order in some scripts like Indic, Hebrew and Arabic.
Qt Text Layout
Qt 3 text rendering is different from that of GTK+/Pango. Instead of modularizing, it handles all complex text rendering in a single class, called QComplexText, which is mostly based on the Unicode character database. This is equivalent to the default routines provided by Pango. Due to the incompleteness of the Unicode database, this class sometimes needs extra workarounds to override some values. Developers should examine this class if a script is not rendered properly.
Although relying on the Unicode database appears to be a straightforward method for rendering Unicode texts, this makes the class rigid and error prone. Checking the Qt Web site regularly to find out whether there are bugs in latest versions is advisable. However, a big change has been planned for Qt 4, which is the Scribe text layout engine, similar to Pango for GTK+.
Input Methods
The needs of keyboard maps and input methods have been discussed on page 37. This section will further discuss how to implement them, beginning with keyboard layouts. Pages 37-38 also mentions that XIM is the current basic input method framework for X Window. Only Qt 3 relies on it, while GTK+ 2 defines its own input method framework. Both XIM and GTK+ IM are discussed here.
Keyboard Layouts
The first step to providing text input for a particular language is to prepare the keyboard map. X Window handles the keyboard map using the X Keyboard (XKB) extension. When you start an X server on GNU/Linux, a virtual terminal is attached to it in raw mode, so that keyboard events are sent from the kernel without any translation.
The raw scan code of the key is then translated into keycode according to the keyboard model. For XFree86 on PC, the keycode map is usually �xfree86� as kept under /etc/X11/xkb/keycodes directory. The keycodes just represent the key positions in symbolic form, for further referencing.
The keycode is then translated into a keyboard symbol (keysym) according to the specified layout, such as qwerty, dvorak, or a layout for a specific language, chosen from the data under /etc/X11/xkb/symbols directory. A keysym does not represent a character yet. It requires an input method to translate sequences of key events into characters, which will be described later. For XFree86, all of the above setup is done via the setxkbmap command. (Setting up values in /etc/X11/XF86Config means setting parameters for setxkbmap at initial X server startup.) There are many ways of describing the configuration, as explained in Ivan Pascal�s XKB explanation22. The default method for XFree86 4.x is the �xfree86� rule (XKB rules are kept under /etc/X11/xkb/rules), with additional parameters:
- model � pc104, pc105, microsoft, microsoftplus, �
- layout � us, dk, ja, lo, th, � (For XFree86 4.0+, up to 64 groups can be provided as part of layout definition)
- variant � (mostly for Latins) nodeadkeys
- option � group switching key, swap caps, LED indicator, etc. (See /etc/X11/xkb/rules/xfree86 for all available options.)
For example:
$ setxkbmap us,th -option grp:alt_shift_toggle,grp_led:scroll
Sets layout using US symbols as the first group, and Thai symbols as the second group. The Alt-Shift combination is used to toggle between the two groups. Scroll Lock LED will be the group indicator, which will be on when the current group is not the first group, that is, on for Thai, off for US. You can even mix more than two languages:
$ setxkbmap us,th,lo -option grp:alt_shift_toggle,grp_led:scroll
This loads trilingual layout. Alt-Shift is used to rotate among the three groups; that is, Alt-RightShift chooses the next group and Alt-LeftShift chooses the previous group. Scroll Lock LED will be on when the Thai or Lao group is active.
The arguments for setxkbmap can be specified in /etc/X11/XF86Config for initialization on X server startup by describing the �InputDevice� section for keyboard, for example:
Section �InputDevice� Identifier �Generic Keyboard� Driver �keyboard� Option �CoreKeyboard� Option �XkbRules� �xfree86� Option �XkbModel� �microsoftplus� Option �XkbLayout� �us,th_tis� Option �XkbOptions grp:alt_shift_toggle,lv3:switch,grp_led:scroll� EndSection
Notice the last four option lines. They tell setxkbmap to use �xfree86� rule, with �microsoftplus� model (with Internet keys), mixed layout of US and Thai TIS-820.2538, and some more options for group toggle key and LED indicator. The �lv3:switch� option is only for keyboard layouts that require a 3rd level of shift (that is, one more than the normal shift keys). In this case for �th_tis� in XFree86 4.4.0, this option sets RightCtrl as 3rd level of shift.
Providing a Keyboard Map
If the keyboard map for a language is not available, one needs to prepare a new one. In XKB terms, one needs to prepare a symbols map, associating keysyms to the available keycodes.
The quickest way to start is to read the available symbols files under the /etc/X11/xkb/symbols directory. In particular, the files used by default rules of XFree86 4.3.0 are under the pc/ subdirectory. Here, only one group is defined per file, unlike the old files in its parent directory, in which groups are pre-combined. This is because XFree86 4.3.0 provides a flexible method for mixing keyboard layouts.
Therefore, unless you need to support the old versions of XFree86, all you need to do is to prepare a single-group symbols file under the pc/ subdirectory.
Here is an excerpt from the th_tis symbols file:
partial default alphanumeric_keys
xkb_symbols �basic�{
name[Group1]= �Thai (TIS-820.2538)�;
// The Thai layout defines a second keyboard group and changes
// the behavior of a few modifier keys.
key { [ 0x1000e4f, 0x1000e5b ] };
key { [ Thai_baht, Thai_lakkhangyao] };
key { [ slash, Thai_leknung ] };
key { [ minus, Thai_leksong ] };
key { [ Thai_phosamphao, Thai_leksam ] };
...
};
Each element in the xkb_symbols data, except the first one, is the association of keysyms to the keycode for unshift and shift versions, respectively. Here, some keysyms are predefined in Xlib. You can find the complete list in
For more details of the file format, see Ivan Pascal�s XKB explanation23. When finished, the symbols.dir file should be regenerated so that the symbols file is listed:
# cd /etc/X11/xkb/symbols # xkbcomp -lhlpR �*� -o ../symbols.dir
Then, the new layout may be tested as described in the previous section.
Additionally, entries may be added to /etc/X11/xkbcomp/rules/xfree86.lst so that some GUI keyboard configuration tools can see the layout.
Once the new keyboard map is completed, it may also be included in XFree86 source where the data for XKB are kept under the xc/programs/xkbcomp subdirectory.
XIM � X Input Method
For some languages, text input is as straightforward as one-to-one mapping from keysyms to characters, such as English. For European languages, this is a little more complicated because of accents. But for Chinese, Japanese and Korean (CJK), the one-to-one mapping is impossible. They require a series of keystroke interpretations to obtain each character.
X Input Method (XIM) is a locale-based framework designed to address the requirements of text input for any language. It is a separate service for handling input events as requested by X clients. Any text entry in X clients is represented by X Input Context (XIC). All the keyboard events will be propagated to the XIM, which determines the appropriate action for the events based on the current state of the XIC, and passes back the resulting characters.
Internally, a common process of every XIM is to translate keyboard scan code into keycode and then to keysym, by calling XKB, whose process detail has been described in previous sections. The following processes to convert keysyms into characters are different for different locales.
In general cases, XIM is usually implemented using the client-server model. More detailed discussion of XIM implementation is beyond the scope of this document. Please see Section 13.5 of the Xlib document24 and the XIM protocol25 for more information.
In general, users can choose their favourite XIM server by setting the system environment XMODIFIERS, like this:
$ export LANG=th_TH.TIS-620 $ export XMODIFIERS=�@im=Strict�
This specifies Strict input method for Thai locale.
GTK+ IM
As a cross-platform toolkit, GTK+ 2 defines its own framework using pure GTK+ APIs, instead of relying on the input methods of each operating system. This provides high-level of abstraction, making input methods development a lot easier than writing XIM servers. In any case, GTK+ can still use the several existing XIM servers through the imxim bridging module. Besides, the input methods developed become immediately available to GTK+ in all platforms it supports, including XFree86, Windows, and GNU/Linux framebuffer console. The only drawback is that the input methods cannot be shared with non-GTK+ applications.
Client Side
A normal GTK+-based text entry widget will provide an �Input Methods� context menu that can be opened by right clicking within the text area. This menu provides the list of all installed GTK+ IM modules, which the user can choose from. The menu is initialized by querying all installed modules for the engines they provide.
From the client�s point of view, each text entry is represented by an IM context, which communicates with the IM module after every key press event by calling a key filter function provided by the module. This allows the IM to intercept the key presses and translate them into characters. Non-character keys, such as function keys or control keys, are not usually intercepted. This allows the client to handle special keys, such as shortcuts.
There are also interfaces for the other direction. The IM can also call the client for some actions by emitting GLib signals, for which the handlers may be provided by the client by connecting callbacks to the signals:
- �preedit_changed�
Uncommitted (pre-edit) string is changed. The client may update the display, but not the input buffer, to let the user see the keystrokes.
- �commit�
Some characters are committed from the IM. The committed string is also passed so that the client can take it into its input buffer.
- �retrieve_surrounding�
The IM wants to retrieve some text around the cursor.
- �delete_surrounding�
The IM wants to delete the text around the cursor. The client should delete the text portion around the cursor as requested.
IM Modules
GTK+ input methods are implemented using loadable modules that provide entry functions for querying and creating the desired IM context. These are used as interface with the �Input Methods� context menu in text entry areas.
The IM module defines a new IM context class or classes and provides filter functions to be called by the client upon key press events. It can determine proper action to the key and return TRUE if it means to intercept the event or FALSE to pass the event back to the client.
Some IM (e.g., CJK and European) may do a stateful conversion which is incrementally matching the input string with predefined patterns until each unique pattern is matched before committing the converted string. During the partial matching, the IM emits the �preedit_changed� signal to the client for every change, so that it can update the pre-edit string to the display. Finally, to commit characters, the IM emits the �commit� signal, along with the converted string as the argument, to the IM context. Some IM (e.g., Thai) is context-sensitive. It needs to retrieve text around the cursor to determine the appropriate action. This can be done through the �retrieve_surrounding� signal.
In addition, the IM may request to delete some text from the client�s input buffer as required by Thai advanced IM. This is also used to correct the illegal sequences. This can be done via the �delete_surrounding� signal.
13Leisher, M., �The XmBDFEd Font Editor�; available from crl.nmsu.edu/~mleisher/xmbdfed.html.
14Williams, G., �PfaEdit�; available from pfaedit.sourceforge.net.
15van Rossum, J., S �TTX/FontTools�; available from fonttools.sourceforge.net/
16Note the difference with Microsoft�s �Windows� trademark. X Window is without �s�
17Taylor, O., �Pango�; available from www.pango.org.
18Taylor, O., �Pango � Design�; available from www.pango.org/design.shtml.
19GNOME Development Site,�Pango Reference Manual�; available from developer.gnome.org/doc/API/2.0/pango/
20This is a very rough classification. Obviously, there are further steps, such as line breaking, alignment and justification. They need not be discussed here, as they go beyond localization.
21Pascal, I., X Keyboard Extension; available from pascal.tsu.ru/en/xkb/.
22 Pascal, I., X Keyboard Extension; available from pascal.tsu.ru/en/xkb/.
23Gettys, J., Scheifler, R.W., �Xlib � C Language X Interface, X Consortium Standard, X Version 11 Release 6.4.
24Narita, M., Hiura, H., The Input Method Protocol Version 1.0. X Consortium Standard, X Version 11 Release 6.4.
25OpenI18N.org. OpenI18N Locale Name Guideline, Version 1.1 � 2003-03-11]; available from www.openi18n.org/docs/text/LocNameGuideV11.txt.


Post new comment