[Bug 1677244] [NEW] "UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 69: surrogates not allowed" with mime.file() on path from os.walk
Jamie Strandboge
jamie at ubuntu.com
Wed Mar 29 13:24:23 UTC 2017
Public bug reported:
The following script works fine on 16.04 LTS:
#!/usr/bin/python3
import magic
import os
dir = "/usr/share/ca-certificates/mozilla"
mime = magic.open(magic.MAGIC_MIME)
mime.load()
for root, dirnames, filenames in os.walk(dir):
for f in filenames:
fn = os.path.join(root, f)
print("%s: %s" % (fn, mime.file(fn)))
Eg:
$ python3 /tmp/test.py
/usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/EBG_Elektronik_Sertifika_Hizmet_Sağlayıcısı.crt: text/plain; charset=us-ascii
...
(notice the last filename before the ellipsis)
But on 17.04, this happens:
$ python3 /tmp/test.py
/usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: text/plain; charset=us-ascii
Traceback (most recent call last):
File "/home/ubuntu/test.py", line 15, in <module>
print("%s: %s" % (fn, mime.file(fn)))
File "/usr/lib/python3/dist-packages/magic.py", line 130, in file
bi = bytes(filename, 'utf-8')
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 69: surrogates not allowed
I'm guessing this is a change in python3 that python3-magic hasn't
accounted for, but I'm not sure. Adding python3 task just in case.
** Affects: file (Ubuntu)
Importance: Undecided
Status: New
** Affects: python3.5 (Ubuntu)
Importance: Undecided
Status: New
** Also affects: python3.5 (Ubuntu)
Importance: Undecided
Status: New
** Description changed:
The following script works fine on 16.04 LTS:
#!/usr/bin/python3
import magic
-
import os
dir = "/usr/share/ca-certificates/mozilla"
mime = magic.open(magic.MAGIC_MIME)
mime.load()
for root, dirnames, filenames in os.walk(dir):
- for f in filenames:
- fn = os.path.join(root, f)
- print("%s: %s" % (fn, mime.file(fn)))
-
+ for f in filenames:
+ fn = os.path.join(root, f)
+ print("%s: %s" % (fn, mime.file(fn)))
Eg:
$ python3 /tmp/test.py
/usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/EBG_Elektronik_Sertifika_Hizmet_Sağlayıcısı.crt: text/plain; charset=us-ascii
...
(notice the last filename before the ellipsis)
But on 17.04, this happens:
$ python3 /tmp/test.py
/usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: text/plain; charset=us-ascii
Traceback (most recent call last):
- File "/home/ubuntu/test.py", line 15, in <module>
- print("%s: %s" % (fn, mime.file(fn)))
- File "/usr/lib/python3/dist-packages/magic.py", line 130, in file
- bi = bytes(filename, 'utf-8')
+ File "/home/ubuntu/test.py", line 15, in <module>
+ print("%s: %s" % (fn, mime.file(fn)))
+ File "/usr/lib/python3/dist-packages/magic.py", line 130, in file
+ bi = bytes(filename, 'utf-8')
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 69: surrogates not allowed
I'm guessing this is a change in python3 that python3-magic hasn't
accounted for, but I'm not sure. Adding python3 task just in case.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to file in Ubuntu.
https://bugs.launchpad.net/bugs/1677244
Title:
"UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in
position 69: surrogates not allowed" with mime.file() on path from
os.walk
Status in file package in Ubuntu:
New
Status in python3.5 package in Ubuntu:
New
Bug description:
The following script works fine on 16.04 LTS:
#!/usr/bin/python3
import magic
import os
dir = "/usr/share/ca-certificates/mozilla"
mime = magic.open(magic.MAGIC_MIME)
mime.load()
for root, dirnames, filenames in os.walk(dir):
for f in filenames:
fn = os.path.join(root, f)
print("%s: %s" % (fn, mime.file(fn)))
Eg:
$ python3 /tmp/test.py
/usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/EBG_Elektronik_Sertifika_Hizmet_Sağlayıcısı.crt: text/plain; charset=us-ascii
...
(notice the last filename before the ellipsis)
But on 17.04, this happens:
$ python3 /tmp/test.py
/usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: text/plain; charset=us-ascii
Traceback (most recent call last):
File "/home/ubuntu/test.py", line 15, in <module>
print("%s: %s" % (fn, mime.file(fn)))
File "/usr/lib/python3/dist-packages/magic.py", line 130, in file
bi = bytes(filename, 'utf-8')
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 69: surrogates not allowed
I'm guessing this is a change in python3 that python3-magic hasn't
accounted for, but I'm not sure. Adding python3 task just in case.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/file/+bug/1677244/+subscriptions
More information about the foundations-bugs
mailing list