[Bug 1684273] Re: Missing tail in iterparse
scoder
1684273 at bugs.launchpad.net
Sat Apr 22 06:50:52 UTC 2017
I agree that this is unexpected. It can be fixed by internally passing
more data into the parser before generating the "end" parse event, i.e.
by waiting for the tail text to end before yielding the element that
owns it.
** Changed in: lxml
Importance: Undecided => Medium
** Changed in: lxml
Status: New => Confirmed
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to lxml in Ubuntu.
https://bugs.launchpad.net/bugs/1684273
Title:
Missing tail in iterparse
Status in lxml:
Confirmed
Status in lxml package in Ubuntu:
New
Bug description:
Given a minimal parser (below) and a particular input file (attached),
iterparse is not returning the `tail` of the last `<span>` tag.
I am listening for the `end` event, which is the default, instead of
the `start` event.
Changing the input, for example by deleting unrelated tags such as the
`<link>` tag in the `<head>`, causes the missing text to reappear.
This makes it hard to produce a minified input! I was able to remove
everything /after/ the element with the missing tail, which doesn't
affect the bug, so that is what I attached.
I took the silence on the mailing list to mean that I did not have any
obvious problems with the way I was using iterparse. :) https
://mailman-mail5.webfaction.com/pipermail/lxml/2017-April/007882.html
---
```python
#!/usr/bin/env python3
import sys
from lxml import etree
for _, element in etree.iterparse(sys.argv[1], html=True):
print((
element.tag,
element.attrib,
element.text,
element.tail,
))
```
Invoke by:
```sh
$ ./bug.py bug.html | grep "splays their blue cards left"
```
Expected output:
```
('span', {'class': 'age e'}, '4', '.\n... Nnastya splays their blue cards left.\n')
```
Actual output: none, and return code 1.
---
Python : sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0)
lxml.etree : (3, 7, 3, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 3)
libxslt used : (1, 1, 29)
libxslt compiled : (1, 1, 29)
To manage notifications about this bug go to:
https://bugs.launchpad.net/lxml/+bug/1684273/+subscriptions
More information about the foundations-bugs
mailing list