What is the policy on UTF-8 non single byte characters in the master .xml files?

Doug Smythies dsmythies at telus.net
Sun Mar 31 21:52:46 UTC 2013


With the absence of any feedback, a policy of no multi-byte UTF-8 characters
for master .xml files for the serverguide project has been set.
I have removed all such characters from the master serverguide .xml files
(raring branch), along with some nonsense bytes in a few locations.
If contributors want open and closing quote characters, as different from
the ASCII generic quote character, they should use the DocBook <quote>
</quote> tags.
Note: opening and closing quote characters were the dominate use of
multi-byte UTF-8 characters in the .xml source files, if <quote> </quote)
tags were not used.

Note that even when using the quote tags, there is still an issue with at
least the greek PDF (LP: #922251), and likely other languages.

... Doug

-----Original Message-----
From: Doug Smythies [mailto:dsmythies at telus.net] 
Sent: March-25-2013 15:04
To: ubuntu-doc at lists.ubuntu.com
Cc: 'Doug Smythies'
Subject: What is the policy on UTF-8 non single byte characters in the
master .xml files? 

Is there a doc group policy statement or guideline on the use of any non
single byte characters in the master document .xml files? (I.E. use if any
non ASCII characters.)

If yes, please point me to it. If no, can one be made?

My input on the subject is that they should not be allowed, however I will
defer to any existing standards.

Currently, there are not very many such characters in the Ubuntu serverguide
master files (say, about 60), and they would be relatively easy to
eliminate.

Issues:
. Do not always translate properly.
. Do not always compile into PDF properly.
   . some PDF compile time warnings.
. Messes up "bzr diff" command.
. Do not display properly in some editors.
. Not easily editable in some editors.
. Masks any real source file odd character searches.

Of course, it is understand that multi-byte UTF-8 characters are essential
for the translated documents.

Also, it is acknowledged that every file starts with:
<?xml version="1.0" encoding="UTF-8"?>
suggesting that use of any UTF-8 characters should be O.K.

... Doug Smythies





More information about the ubuntu-doc mailing list