[MERGE][BUG #77533][RFC] ignore files with invalid filenames

Wed Aug 29 08:32:58 BST 2007

Fabio Machado de Oliveira wrote:
> Aaron Bentley escreveu:
>> Any reason not to use unicode_path_info directly instead of decoding
>> it again?

> There was no reason, fixed that.

Fabio,

Thanks for your efforts to date on this. As the month long email thread
shows, this is a non-trivial problem given the desire to Do The Right
Thing without impacting performance unnecessarily.

This is really close now and right on the resubmit/tweak borderline. In
the end, I've voted for resubmit because the test is less than I was
expecting given your earlier response to a review from Martin.

bb: resubmit

Firstly, don't forget to update NEWS with an item explaining this fix.
I've seen several people complain about this bug in the past so noting
it as fixed, and giving yourself credit, is important.

Other feedback/questions below.

> +    def test_invalid_filenames(self):
> +        """Test that bzr status wont break with an invalid filename."""
> +        wt = self.make_branch_and_tree('.')
> +        invalidfilename = 'foo\xe7\xff\xff'
> +        validfilename = 'foo'
> +        cant_test = True
> +        try:
> +            invalidfilename.decode(bzrlib.osutils._fs_enc)
> +        except UnicodeDecodeError:
> +            cant_test = False
> +        if cant_test:
> +            raise bzrlib.tests.TestSkipped("The invalid filename is valid for this filesystem.")
> +        file = open(invalidfilename, 'wb')
> +        file.write('foo')
> +        file.close()
> +        os.mkdir('dir\xe7\xff\xff')
> +
> +        """test with an empty tree"""
> +        wt.add([])
> +        self.assertEqual([], self.get_tree_paths(wt))
> +
> +        """ self.run_bzr('add') """
> +        self.run_bzr('status')
> +
> +        """test with a tree with a file"""
> +        self.build_tree([validfilename])
> +        self.run_bzr('add')
> +        self.run_bzr('ci -m "blabla"')
> +        self.assertEqual([validfilename], self.get_tree_paths(wt))
> +       
> +        self.run_bzr('status')

As Martin suggested previously, please confirm that the calls to 'add'
and 'status' worked. You can do this by trapping the output and/or
checking error codes as illustrated in several places in the (blackbox)
test suite.

Two other suggestions here:

* Break the above into two or three separate tests, each calling a
  common setup/check method.

* Change the 'cant_test' logic into 'can_test' logic. For many readers,
  "can_test = True" is easier to understand than "cant_test = False".

> +def filter_invalid_filenames(listdir):
> +    """Filter invalid filenames from a list.
> +
> +    The param is usually the return of an os.listdir call that can
> +    return a mix of unicode and byte strings.
> +
> +    :param listdir: A list of paths usually returned by os.listdir() that
> +    can be a mix of unicode and byte strings.
> +    :return: (list of unicode valid filenames, list of bytestrings containing
> +    invalid filenames)"""

The help for this method after the summary is redundant given the
explanation given for :param listdir:. Just the one line summary
followed by the param and return bits is enough IMO. I'd also suggest
making this a private subroutine by adding a leading underscore to the
name: list_valid_filenames() and sorted_list_valid_filenames() seems
enough of a public interface to this functionality to me.

Two final points:

1. It's not obvious to me whether the os.listdir() call on line 1115 of
   workingtree.py is always safe or not. If not, it needs to be
   converted to sorted_list_valid_filenames(). If so, a comment
   explaining why it is safe to use there would be useful.

2. There was a bit of discussion in the email thread re whether
   files skipped over ought to be reported or not. I think we
   ought to do this, at least when 'add' is used in verbose mode.
   You don't need to do this now as it can rightly be delivered
   as a separate patch. I do think however that a 'TODO' comment
   immediately after each call to (sorted_)list_valid_filenames
   would be good to add now.

When we do tackle the reporting of skipping over invalid filenames, I'd
suggest we do it by passing a callback (taking a list of invalid
filenames) to workingtree.list_files(). But as I said above, another day
for that. Let's get this one landed first.

Ian C.