[Bug 1684273] Re: Missing tail in iterparse

scoder 1684273 at bugs.launchpad.net
Sat Apr 22 06:50:52 UTC 2017


I agree that this is unexpected. It can be fixed by internally passing
more data into the parser before generating the "end" parse event, i.e.
by waiting for the tail text to end before yielding the element that
owns it.

** Changed in: lxml
   Importance: Undecided => Medium

** Changed in: lxml
       Status: New => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to lxml in Ubuntu.
https://bugs.launchpad.net/bugs/1684273

Title:
  Missing tail in iterparse

Status in lxml:
  Confirmed
Status in lxml package in Ubuntu:
  New

Bug description:
  Given a minimal parser (below) and a particular input file (attached),
  iterparse is not returning the `tail` of the last `<span>` tag.

  I am listening for the `end` event, which is the default, instead of
  the `start` event.

  Changing the input, for example by deleting unrelated tags such as the
  `<link>` tag in the `<head>`, causes the missing text to reappear.
  This makes it hard to produce a minified input! I was able to remove
  everything /after/ the element with the missing tail, which doesn't
  affect the bug, so that is what I attached.

  I took the silence on the mailing list to mean that I did not have any
  obvious problems with the way I was using iterparse. :) https
  ://mailman-mail5.webfaction.com/pipermail/lxml/2017-April/007882.html

  ---

  ```python
  #!/usr/bin/env python3

  import sys
  from lxml import etree

  for _, element in etree.iterparse(sys.argv[1], html=True):
      print((
          element.tag,
          element.attrib,
          element.text,
          element.tail,
      ))
  ```

  Invoke by:
  ```sh
  $ ./bug.py bug.html | grep "splays their blue cards left"
  ```

  Expected output:
  ```
  ('span', {'class': 'age e'}, '4', '.\n... Nnastya splays their blue cards left.\n')
  ```

  Actual output: none, and return code 1.

  ---

  Python              : sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0)
  lxml.etree          : (3, 7, 3, 0)
  libxml used         : (2, 9, 3)
  libxml compiled     : (2, 9, 3)
  libxslt used        : (1, 1, 29)
  libxslt compiled    : (1, 1, 29)

To manage notifications about this bug go to:
https://bugs.launchpad.net/lxml/+bug/1684273/+subscriptions



More information about the foundations-bugs mailing list