[ubuntu-in] SCIM all Indian Languages Baraha system
Gora Mohanty
gora at sarai.net
Wed Aug 15 23:20:30 BST 2007
On Tue, 2007-08-14 at 07:56 -0400, Dinbandhu wrote:
[...]
> One further question: in Baraha, there is a facility for interconverting
> text between languages. For example, a particular text which is written
> using Devanagari script can, with a single command, be converted into
> Bengali script. Would there be such a facility in SCIM as well?
[...]
If by conversion, you mean simple transliteration, i.e., a character in
a certain position in the Devanagari Unicode block getting converted to
the corresponding character in the Unicode block for the other language
this is possible through a Perl script that I wrote for transliterating
the keymaps. Thus, for example, the Devanagari letter "ka" (U0915, at
position 16, counting from U0900, the start of the Devanagari block)
would get transliterated to the Bengali letter "ka" (U0995, also at
position 16 from the start of the Bengali block at U0980). This works
in a crude sense, but runs into obvious problems when a character in
one language has no equivalent in the other.
You will need to check out the baraha-maps distribution via CVS from
code.indlinux.net. Install cvs with "sudo apt-get install cvs" if you
do not already have it, and then do the following:
1. Check out the code (in case your mail client wraps long lines, each
command below should be typed in a terminal on a single line):
cvs -d :pserver:anonymous at cvs.code.indlinux.net:/cvsroot/baraha-maps
login
When prompted for a password, just hit return.
cvs -d :pserver:anonymous at cvs.code.indlinux.net:/cvsroot/baraha-maps
checkout baraha-maps
This will create a sub-directory baraha-maps, and start checking out
the code, giving you some messages. The server occasionally has
problems, so if you have trouble connecting, please retry a few
times. Likewise, if the connection might not be closed properly at
the end, so if it downloads a bunch of files and seems to hang for
a while, just hit Control-C to return to the prompt. The checkout
should be complete if it seems to have hung for, say 5min. without
doing anything.
2. The code will include the Baraha keymap for Hindi, from which the
keymaps for other languages will be generated. Compile these with
cd baraha-maps
make
sudo make install
You will need to have Perl installed, but it should be there by
default. This will install the various xx-baraha.mim keymaps in
/usr/share/m17n, some Perl modules in /usr/local/lib/site_perl, and
a script called remap_lang in /usr/local/bin. Thus, you will need to
have /usr/local/bin in your path, or call the script with the full
pathname, i.e., /usr/local/bin/remap_lang. Should you wish to
uninstall things, do
sudo make uninstall
3. Here are some examples of using remap_lang:
remap_lang -i Devanagari -o Bengali < infile > outfile
transliterates Devanagari text in "infile" to Bengali text in
"outfile". Non-Devanagari text in "infile" is passed through
unchanged. Any Indian script in Unicode can be used as input, or
output. Try,
remap_lang -i help
for a list of known scripts
Normally, a check is made that both the input character, and the
output character are assigned in Unicode, and unassigned characters
are silently dropped. You can force this check not to be done, and
all characters to be transliterated with
remap_lang -i Devanagari -o Bengali -c 0 < infile > outfile
"remap_lang -h" gives a short usage message, and "remap_lang -m" a
detailed manual.
I had not considered such transliteration an important issue, so
feedback on this script will be appreciated.
Regards,
Gora
More information about the ubuntu-in
mailing list