[E/F/Unstable][PATCH 0/1] crypto: fix regression/use-after-free in af_alg_accept()
Mauricio Faria de Oliveira
mfo at canonical.com
Wed Jul 1 12:28:23 UTC 2020
Hi Stefan,
On Wed, Jul 1, 2020 at 4:35 AM Stefan Bader <stefan.bader at canonical.com> wrote:
>
> On 01.07.20 01:09, Mauricio Faria de Oliveira wrote:
> > Regarding the submission policies for this kernel SRU cycle:
> > this patch does not necessarily have to be applied for now;
> > just review/ack for B/E would be useful if at all possible.
> >
> > It has only been merged on Linus' tree yesterday.
> >
> > The patch applies cleanly on Unstable/Focal/Eoan,
> > and is a trivial backport on Disco/Bionic/Xenial.
> > (it's on all series because it's a fix to stable.)
> >
> > [Impact]
> >
> > * Users of the Linux kernel's crypto userspace API
> > reported BUG() / kernel NULL pointer dereference
> > errors after kernel upgrades.
> >
> > * The stack trace signature is an accept() syscall
> > going through af_alg_accept() and hitting errors
> > usually in one of:
> > - apparmor_sk_clone_security()
> > - apparmor_sock_graft()
> > - release_sock()
> >
> > [Fix]
> >
> > * This is a regression introduced by upstream commit
> > 37f96694cf73 ("crypto: af_alg - Use bh_lock_sock
> > in sk_destruct") which made its way through stable.
>
> Just for my understanding, what is the motivation to submit this separately and
> not via stable? I guess it is to get things in more quickly.
>
You're right -- particularly the review/ack, as it's required on UA
support cases
in order to provide temporary, supported kernel packages (hotfix packages)
until the official packages land.
> Was submitting e/f with cover and b/d/x seperatly without one intentional? It
> would be much better to group all the series patches under one cover as that
> keeps related things close together in the thread view.
>
No, I just didn't know it could be done for different series (say,
from a process perspective.)
I actually wondered a bit about it after the fact. Thanks for
mentioning; I will do it in the future.
cheers,
Mauricio
> -Stefan
>
> >
> > * The offending patch allows the critical regions
> > of af_alg_accept() and af_alg_release_parent() to
> > run concurrently; now with the "right" events on 2
> > CPUs it might drop the non-atomic reference counter
> > of the alg_sock then the sock, thus release a sock
> > that is still in use.
> >
> > * The fix is upstream commit 34c86f4c4a7b ("crypto:
> > af_alg - fix use-after-free in af_alg_accept() due
> > to bh_lock_sock()") [1]. It changes alg_sock's ref
> > counter to atomic, which addresses the root cause.
> >
> > [Test Case]
> >
> > * There is a synthetic test case available, which
> > uses a kprobes kernel module to synchronize the
> > concurrent CPUs on the instructions responsible
> > for the problem; and a userspace part to run it.
> >
> > * The organic reproducer is the Varnish Cache Plus
> > software with the Crypto vmod (which uses kernel
> > crypto userspace API) under long, very high load.
> >
> > * The patch has been verified on both reproducers
> > with the 4.15 and 5.7 kernels.
> >
> > * More tests performed with 'stress-ng --af-alg'
> > with 11 CPUs on Xenial/Bionic/Disco/Eoan/Focal
> > (all on same version of stress-ng, V0.11.14)
> >
> > No regressions observed from original kernel.
> > (the af-alg stressor can exercise almost all
> > kernel crypto modules shipped with the kernel;
> > so it checks more paths/crypto alg interfaces.)
> >
> > [Regression Potential]
> >
> > * The fix patch does a fundamental change in how
> > alg_sock reference counters work, plus another
> > change to the 'nokey' counting. This of course
> > *has* a risk of regression.
> >
> > * Regressions theoretically could manifest as use
> > after free errors (in case of undercounting) in
> > the af_alg functions or silent memory leaks (in
> > case of overcounting), but also other behaviors
> > since reference counting is key to many things.
> >
> > * FWIW, this patch has been written by the crypto
> > subsystem maintainer, who certainly knows a lot
> > of the normal and corner cases, thus giving the
> > patch more credit.
> >
> > * Testing with the organic reproducer ran as long
> > as 5 days, without issues, so it does look good.
> >
> > [Other Info]
> >
> > * Not sending for Groovy (should get via Unstable).
> >
> > * [1] Patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d
> >
> > [Stack Trace Examples]
> >
> > Examples:
> >
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> > ...
> > RIP: 0010:apparmor_sk_clone_security+0x26/0x70
> > ...
> > Call Trace:
> > security_sk_clone+0x33/0x50
> > af_alg_accept+0x81/0x1c0 [af_alg]
> > alg_accept+0x15/0x20 [af_alg]
> > SYSC_accept4+0xff/0x210
> > SyS_accept+0x10/0x20
> > do_syscall_64+0x73/0x130
> > entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> >
> > general protection fault: 0000 [#1] SMP PTI
> > ...
> > RIP: 0010:__release_sock+0x54/0xe0
> > ...
> > Call Trace:
> > release_sock+0x30/0xa0
> > af_alg_accept+0x122/0x1c0 [af_alg]
> > alg_accept+0x15/0x20 [af_alg]
> > SYSC_accept4+0xff/0x210
> > SyS_accept+0x10/0x20
> > do_syscall_64+0x73/0x130
> > entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> >
> > Herbert Xu (1):
> > crypto: af_alg - fix use-after-free in af_alg_accept() due to
> > bh_lock_sock()
> >
> > crypto/af_alg.c | 26 +++++++++++---------------
> > crypto/algif_aead.c | 9 +++------
> > crypto/algif_hash.c | 9 +++------
> > crypto/algif_skcipher.c | 9 +++------
> > include/crypto/if_alg.h | 4 ++--
> > 5 files changed, 22 insertions(+), 35 deletions(-)
> >
>
>
--
Mauricio Faria de Oliveira
More information about the kernel-team
mailing list