[apparmor] [patch] update aa.py is_skippable_file() according to libapparmor

Christian Boltz apparmor at cboltz.de
Thu Dec 25 00:32:31 UTC 2014


Hello,

Am Mittwoch, 24. Dezember 2014 schrieb Kshitij Gupta:
> On Sun, Dec 7, 2014 at 2:49 AM, Christian Boltz wrote:
> > this patch updates is_skippable_file() to match all extensions that
> > are listed in  libapparmor _aa_is_blacklisted() - some extensions
> > were missing in the python code.
> > 
> > Also make the code more readable - even if it merges 4 re.search()
> > into one, and add some testcases.
> > 
> > Notes:
> > - the original code additionally ignored *.swp. I didn't include
> > that -> 
> >   *.swp looks like vim swap files which are also dot files
> 
> I'm not sure do we want to keep this or not. As stated parser_misc
> doesn't ignore this.
> A abc.txt.swp (open by vim) wont be covered under the dot file?
> Also there are other ways a .swp file maybe created.

vim creates a dot file - "vi abc.txt" creates .abc.txt.swp, and that is 
already covered by the "filename starts with a dot" rule.

> We'll get a bug report if it is meant to be there.

;-)

> A user says even mv created a .swp file:
> http://unix.stackexchange.com/questions/27923/what-causes-swap-files-t
> o-be-created

Maybe (I never checked), but moving around profiles in GB size range is 
a very rare event ;-) and for smaller files it's unlikely that the tools 
see that file in the second it exists.

> - the python code ignores README files, but the C code doesn't
> 
> >   (do we need to add README in the C code?)
> 
> Somebody else can answer this?

John? Tyler? Steve?

> > [ is_skippable_file.diff ]
> > 
> > === modified file 'utils/apparmor/aa.py'
> > --- utils/apparmor/aa.py        2014-11-29 12:40:10 +0000
> > +++ utils/apparmor/aa.py        2014-12-06 21:12:49 +0000
> > @@ -2546,13 +2546,16 @@
> > 
> >          else:
> >              return False
> > 
> > -# rpm backup files, dotfiles, emacs backup files should not be
> > processed -# The skippable files type needs be synced with apparmor
> > initscript> 
> >  def is_skippable_file(path):
> > -    """Returns True if filename matches something to be skipped"""
> > -    if (re.search('(^|/)\.[^/]*$', path) or
> > re.search('\.rpm(save|new)$', path)
> > -            or re.search('\.dpkg-(old|new)$', path) or
> > re.search('\.swp$', path)
> > -            or path[-1] == '~' or path == 'README'):
> > +    """Returns True if filename matches something to be skipped
> > (rpm or dpkg backup files, hidden files etc.)
> > +        The list of skippable files needs to be synced with
> > apparmor
> > initscript and libapparmor _aa_is_blacklisted()
> > +        path: filename (without directory)"""
> > +
> > +    skippable_suffix =
> > '(\.dpkg-new|\.dpkg-old|\.dpkg-dist|\.dpkg-bak|\.rpmnew|\.rpmsave|\.
> > orig|\.rej|~)$' 
> > +    skippable_files  = '^(.*/)?(README|\.[^/]*|)$' 
> > # README, dot files and empty filename
> 
> I suppose the number of calls to this function will be many as a large
> number of files are parsed/ignored.

Yes, basically number of files in /etc/apparmor.d/ (and subdirectories?) 

That are quite some files, but we have places that are called more often 
(for example parsing each line in the profile ;-)

> So, might I suggest that we (over?)optimise this?
> 
> I can think of two ways:
> 1. Use the regex and use re.compile for all the three regex's outside
> the function call (It shows considerable speedup as tested by
> cProfile in Python2.7)

Indeed, re.compile is a good idea.

Note that the first two are only strings that are then merged into 
skippable_files (to keep the regex parts easier readable).
Only skippable_files is used as regex.

> 2. Use simple string methods and leverage os.path submodule. This
> gives probably performance better than uncompiled regex and only a
> bit better than compiled versions.
> 
> An alternative version string based:
> 
> def is_skippable_file(path):
>     basename = os.path.basename(path)
>     if not basename or basename[-1] == '~' or basename[0] == '.' or
> basename == 'README':
>         return True
>     skippable_extension = ['.dpkg-new', '.dpkg-old', '.dpkg-dist',
> '.dpkg-bak', '.rpmnew', '.rpmsave', '.orig', '.rej']
>     extension = os.path.splitext(basename)[1]
>     if extension in skippable_extension:
>         return True
>     return False

That gives slightly less readable code IMHO - and the os.path.* calls 
also don't come for free. If it doesn't bring a big performance boost, 
I'd prefer the regex-based way.

BTW: using basename.endswith() could also be an option, and the python 
docs say that it accepts tuplis as suffix since py 2.5.

> Letme know if you want to have a look at the script I used to compare
> the performance.

Yes, that would be interesting.

I'd also be interested in some numbers - how long do the following 
variants take?

a) the "old" code we currently have in bzr (hint: it does 4 re.search)
b) with my patch applied (only one re.search, so I'd expect it to be 
   faster)
c) with my patch applied + using re.compile for skippable_files
   c1) with re.compile inside the function (= one re.compile per file)
   c2) with re.compile outside the function (= only one re.compile in 
       total)
d) with the code using os.path.* you proposed

(If you send me the test script, I can do the tests myself - but I won't 
object if you do them and send the result ;-)

> +    skippable_all    = '%s|%s' % (skippable_suffix, skippable_files)
> 
> > +
> > 
> > +    if re.search(skippable_all, path):
> >          return True
> > 
> > Also maybe add a "return False" at the end ;-)

Python returns None by default, which is "good enough" ;-)

OTOH, it never hurts if the code explicitely states what it does, so 
I'll add it in the next patch version.

> >  def is_skippable_dir(path):
> > === modified file 'utils/test/test-aa.py'
> > --- utils/test/test-aa.py       2014-11-27 22:20:26 +0000
> > +++ utils/test/test-aa.py       2014-12-06 21:06:15 +0000
...
> The tests look good. I'll hold-back the ACK for the modified
> version.
> 
> Please have a look at the suggestions.
> 
> Merry Christmas. (Apparently you were expecting an ACK under the
> christmas tree ;-) )

A review with comments is also ok ;-)


Regards,

Christian Boltz
-- 
[GUI vs. Command-Line] Einen ähnlichen Streit wird es in 20 Jahren
auch geben, wenn die "2D-Screenfanatiker" auf die "VR Fans" losgehen
und wieder ein Streit vom Zaun bricht der an Sinnfreiheit kaum zu
überbieten ist.   [Phillip Richdale in suse-linux]




More information about the AppArmor mailing list