[Bug 272290] Re: Man pages show wrong Unicode characters instead of ASCII

Olivier Duclos 272290 at bugs.launchpad.net
Wed Nov 8 12:47:40 UTC 2023


Fixed in Debian with groff 1.23.0-3:
https://salsa.debian.org/debian/groff/-/commit/d5394c68d70e6c5199b01d2522e094c8fd52e64e

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to groff in Ubuntu.
https://bugs.launchpad.net/bugs/272290

Title:
  Man pages show wrong Unicode characters instead of ASCII

Status in groff package in Ubuntu:
  Confirmed

Bug description:
  Binary package hint: groff-base

  The "man" command displays man pages with cool-looking Unicode
  quotation marks, hyphens and more.  This very often leads to incorrect
  content, when the page actually tries to explain the meaning of an
  ASCII symbol.

  Few examples, starting with manual page name, continued with line
  number (assuming 80 columns) and description:

  bash L1428 and many more places:   ‘command‘ (U+2018) instead of
  `command` (U+0060) demonstrates command substitution.

  bash L274 and many other places use │ (U+2502) instead of the standard
  pipe symbol: | .

  gawk L605, L608:  \‘ instead of \`, \’ instead of \' as possible
  escapes for regexps.  L1348 and others use ’ (U+2019) instead of '
  making the examples wrong.

  links L33 talks about the ‐‐enable‐graphic option to ./configure, I'm
  pretty sure the configure script wouldn't understand those U+2010
  dashes.

  There are *lot* more man pages suffering from these kinds of problems.

  I haven't checked the specification of man pages' format, I don't know
  whether these particular man pages are buggy, or the rendering
  software.  Oh, by the way, this one is my favorite:

  groff L503 yet again uses ‘ (U+2018) instead of the old-fashioned
  backtick.  This means that groff itself fails to properly render its
  own manual page.  Sigh...

  
  These bugs make these manual pages
  - incorrect;
  - misleading;
  - not suitable for copy-pasting;
  - not searchable for these particular special characters;
  - even more incorrect if the terminal has limited font displaying capabilities (such as the Linux console with a font that completely lacks these Unicode symbols).

  
  One of the possible solution would be to fix all these manpages (my guess is that there are some hundreds of these).  I don't think this approach is feasible.

  Another possible solution is to patch groff to be less eager to use Unicode stuff.  We've chosen this approach in the distribution I used to be a maintainer of, and we've come up with this patch, which you might want to consider applying:
  https://svn.uhulinux.hu/packages/2.1/groff/patches/02-sane-ascii-characters.patch

  
  Note that there's one more problem with the handling all these UTF-8 stuff:  If one of these symbols is bold or underlined, and you redirect the output of "man" into a file, then you get some garbage (invalid UTF-8) there instead of the simple non-highlighted version.

  
  Don't get me wrong: I'm a great fan of proper typesetting as well as Unicode and always try to use the proper quotation marks, proper hyphens and so all.  I just think that there are places when this is not so necessary.  Manual pages formatted in terminals are usually for slightly more power users, not for those who only use some fancy graphical apps.  Here getting the quote marks and hyphens typographically incorrect is not such a big issue, it's much more important that the characters displayed are actually those the man pages are talking about.  UI strings of Gnome, KDE, OpenOffice.org and so on are proper places to all these fancy Unicode characters—but I just think they are shamelessly not used properly there, I wonder why...  For manual pages they are simply not important at all IMHO.

  
  I'm using Hardy 8.04.1, including groff-base 1.18.1.1-16 and man-db 2.5.1-3.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/groff/+bug/272290/+subscriptions




More information about the foundations-bugs mailing list