[Bug 1683428] Re: read_csv on bzip2 file unzips only the first block
Darko Veberic
1683428 at bugs.launchpad.net
Mon Apr 17 17:45:54 UTC 2017
Furthermore, according to https://bugs.python.org/issue20781 this is in
their opinion "not a bug" ie wont-fix. Unfortunately, the bz2 container
clearly allows for multiple concatenated streams (blocks) and therefore
IMHO this is a bug since a legally formatted bz2 file is not read
correctly and is truncated after the first block.
** Bug watch added: Python Roundup #20781
http://bugs.python.org/issue20781
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to python2.7 in Ubuntu.
https://bugs.launchpad.net/bugs/1683428
Title:
read_csv on bzip2 file unzips only the first block
Status in pandas package in Ubuntu:
New
Status in python2.7 package in Ubuntu:
New
Bug description:
It seems that the read_csv() suffers the same symptoms as eg the early
boost implementations, see
https://svn.boost.org/trac/boost/ticket/3853 for details. The bz2
files can namely be composed of many concatenated bz2 blocks which
have to be treated as a continuous stream.
How to test: create large csv file, much larger than 900k. Compress
with pbzip2 (each process creates one bz2 block). Alternatively create
many such csv files, bzip2 them individually and then cat *.bz2
>joined.bz2
read_csv() will uncompress and read only the first block.
Note that this is a severe bug since the parallel bzip2 is getting
increasingly common on multi-core systems.
ProblemType: Bug
DistroRelease: Ubuntu 16.10
Package: python-pandas 0.17.1-3ubuntu2
ProcVersionSignature: Ubuntu 4.8.0-42.45-generic 4.8.17
Uname: Linux 4.8.0-42-generic x86_64
ApportVersion: 2.20.3-0ubuntu8.2
Architecture: amd64
CurrentDesktop: XFCE
Date: Mon Apr 17 18:42:52 2017
InstallationDate: Installed on 2014-10-21 (909 days ago)
InstallationMedia: Ubuntu 14.04.1 LTS "Trusty Tahr" - Release amd64 (20140722.2)
PackageArchitecture: all
SourcePackage: pandas
UpgradeStatus: Upgraded to yakkety on 2016-10-20 (179 days ago)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pandas/+bug/1683428/+subscriptions
More information about the foundations-bugs
mailing list