[Bug 1646260] [NEW] Locale names should always include the codeset component

Gunnar Hjalmarsson 1646260 at bugs.launchpad.net
Wed Nov 30 22:06:26 UTC 2016


Public bug reported:

If you install Ubuntu in English with Tel Aviv as the timezone location,
the installer figures out that the applicable locale is en_IL and adds
the line

LANG="en_IL"

to /etc/default/locale.

en_IL is a perfectly fine locale name; actually it's *the* correct name
of the English/Israel locale for UTF-8 according to SUPPORTED. However,
Python does not agree. Python seems to generally presuppose that locale
names include the codeset component, even if it accepts locale names
without codeset if they are included in the hard coded dictionary
locale_alias in /usr/lib/python3.5/locale.py. However, en_IL is a
relatively new locale, and not (yet) included in locale_alias:

gunnar at gunnar-ubuntu-current:~$ python3
Python 3.5.2+ (default, Sep 22 2016, 12:18:14) 
[GCC 6.2.0 20160927] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_CTYPE, 'en_IL')
'en_IL'
>>> mylocale = locale.getlocale(locale.LC_CTYPE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/locale.py", line 577, in getlocale
    return _parse_localename(localename)
  File "/usr/lib/python3.5/locale.py", line 486, in _parse_localename
    raise ValueError('unknown locale: %s' % localename)
ValueError: unknown locale: en_IL
>>> quit()

I got to know about this issue via <http://askubuntu.com/q/854950>. Now,
the problem is not limited to en_IL. New locales in glibc tend to be
UTF-8 only locales without the codeset included in their names in
SUPPORTED. glibc and Python will probably never be in sync.

One way to deal with this issue is to always add '.UTF-8' to such locale
names. For instance, 'en_IL.UTF-8' is understood by both glibc and
Python.

Probably this should be fixed in localechooser. Basically I'd like to
see a code snippet along these lines:

if [ "$LOCALE" = "${LOCALE%.*}" ]; then
    LOCALE=$( echo $LOCALE | sed -r 's/([^@]+)/\1.UTF-8/' )
fi

I haven't prepared a patch, since I don't know where exactly it should
be inserted without breaking anything else. (Don't know how to test it
either.) Still hoping that somebody finds it important enough to fix.

** Affects: localechooser (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: ubiquity (Ubuntu)
     Importance: Undecided
         Status: New

** Also affects: localechooser (Ubuntu)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to localechooser in Ubuntu.
https://bugs.launchpad.net/bugs/1646260

Title:
  Locale names should always include the codeset component

Status in localechooser package in Ubuntu:
  New
Status in ubiquity package in Ubuntu:
  New

Bug description:
  If you install Ubuntu in English with Tel Aviv as the timezone
  location, the installer figures out that the applicable locale is
  en_IL and adds the line

  LANG="en_IL"

  to /etc/default/locale.

  en_IL is a perfectly fine locale name; actually it's *the* correct
  name of the English/Israel locale for UTF-8 according to SUPPORTED.
  However, Python does not agree. Python seems to generally presuppose
  that locale names include the codeset component, even if it accepts
  locale names without codeset if they are included in the hard coded
  dictionary locale_alias in /usr/lib/python3.5/locale.py. However,
  en_IL is a relatively new locale, and not (yet) included in
  locale_alias:

  gunnar at gunnar-ubuntu-current:~$ python3
  Python 3.5.2+ (default, Sep 22 2016, 12:18:14) 
  [GCC 6.2.0 20160927] on linux
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import locale
  >>> locale.setlocale(locale.LC_CTYPE, 'en_IL')
  'en_IL'
  >>> mylocale = locale.getlocale(locale.LC_CTYPE)
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/usr/lib/python3.5/locale.py", line 577, in getlocale
      return _parse_localename(localename)
    File "/usr/lib/python3.5/locale.py", line 486, in _parse_localename
      raise ValueError('unknown locale: %s' % localename)
  ValueError: unknown locale: en_IL
  >>> quit()

  I got to know about this issue via <http://askubuntu.com/q/854950>.
  Now, the problem is not limited to en_IL. New locales in glibc tend to
  be UTF-8 only locales without the codeset included in their names in
  SUPPORTED. glibc and Python will probably never be in sync.

  One way to deal with this issue is to always add '.UTF-8' to such
  locale names. For instance, 'en_IL.UTF-8' is understood by both glibc
  and Python.

  Probably this should be fixed in localechooser. Basically I'd like to
  see a code snippet along these lines:

  if [ "$LOCALE" = "${LOCALE%.*}" ]; then
      LOCALE=$( echo $LOCALE | sed -r 's/([^@]+)/\1.UTF-8/' )
  fi

  I haven't prepared a patch, since I don't know where exactly it should
  be inserted without breaking anything else. (Don't know how to test it
  either.) Still hoping that somebody finds it important enough to fix.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/localechooser/+bug/1646260/+subscriptions



More information about the foundations-bugs mailing list