Cmnt: [SRU][F][PATCH v2 0/3] CVE-2023-21400

Tue Dec 10 07:33:05 UTC 2024

On Tue, Dec 10, 2024 at 02:20:51PM +0800, Chengen Du wrote:
> On Tue, Dec 10, 2024 at 1:46 PM Koichiro Den <koichiro.den at canonical.com> wrote:
> >
> > On Tue, Nov 26, 2024 at 12:14:25PM +0800, Chengen Du wrote:
> > > CVE-2023-21400
> > >
> > > BugLink: https://bugs.launchpad.net/bugs/2078659
> > >
> > > SRU Justification:
> > >
> > > [Impact]
> > > io_commit_cqring() writes the CQ ring tail to make it visible and also triggers any deferred work.
> > > When a ring is set up with IOPOLL, it doesn't require locking around the CQ ring updates.
> > > However, if there is deferred work that needs processing, io_queue_deferred() assumes that the completion_lock is held.
> > > The io_uring subsystem does not properly handle locking for rings with IOPOLL, leading to a double-free vulnerability, which can be exploited as CVE-2023-21400.
> > >
> > > [Fix]
> > > There is a commit that fixed this issue.
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fb348857e7b67eefe365052f1423427b66dedbf3
> > >
> > > There is no direct upstream commit for this issue, and the patch needs to be reworked to apply to version 5.4.
> > >
> > > [Test Plan]
> > > This is a timing issue that can be triggered by using the liburing library to implement a test program.
> > > First, set the io_uring_params flag to IORING_SETUP_IOPOLL and open an XFS file with the O_RDWR | O_DIRECT flags, as the XFS filesystem implements the iopoll function hook.
> > > After setting up io_uring, create two threads in the process: one thread will wait for completion queue events, and the other will continuously send readv and writev requests in sequence.
> > > The writev requests should include the IOSQE_IO_DRAIN flag to ensure that previous submission queue events are completed first.
> > >
> > > The issue arises when writev requests add entries into the defer_list, but the io_iopoll_complete function consumes entries from defer_list without holding the appropriate lock.
> > >
> > > [Where problems could occur]
> > > The problematic call path can be triggered under specific usage scenarios and only affects io_uring functionality.
> > > If the patch contains any issues, it may lead to a deadlock.
> > >
> > > Jens Axboe (1):
> > >   io_uring: ensure IOPOLL locks around deferred work
> > >
> > > Pavel Begunkov (2):
> > >   io_uring: remove extra check in __io_commit_cqring
> > >   io_uring: dont kill fasync under completion_lock
> >
> > [1/3] and [2/3] seem to be not only unrelated to CVE-2023-21400 but also
> > unhelpful in reducing context conflicts for [3/3]. If there is still a
> > compelling reason to include those two for CVE-2023-21400, could you let me
> > know?
> 
> Here is my initial version of the fix for this CVE:
> https://lists.ubuntu.com/archives/kernel-team/2024-November/155144.html
> 
> To address the issue within the current code structure, I have wrapped
> io_commit_cqring in the io_iopoll_complete function with
> completion_lock.
> However, this approach has surfaced an existing issue related to
> invoking kill_fasync under completion_lock as highlighted in commit
> 4aa84f2ffa81 (io_uring: don't kill fasync under completion_lock).
> 
> While this existing issue is not directly related to the CVE, I am
> concerned that the fix could increase the likelihood of encountering
> it, especially since io_iopoll_complete may be triggered more
> frequently.

           CPU0                    CPU1
           ----                    ----
      lock(&new->fa_lock);
                                   local_irq_disable();
                                   lock(&ctx->completion_lock);
                                   lock(&new->fa_lock);
      <Interrupt>
        lock(&ctx->completion_lock);

     *** DEADLOCK ***

I think the existance of io_timeout_fn() can induce this potential lock
inversion situation. I'm not sure why io_iopoll_complete()'s additional
lock(&completion_lock) can excerbate it, when we wrap only
__io_commit_cqring_flush() inside the completion lock.

Could you elaborate on a possible scenario? Or, is it more of a safety
catch in case some other backports in the future would increase the
possibility?

> 
> To avoid exacerbating the situation, I recommend addressing the
> kill_fasync logic separately before implementing the fix for the CVE,
> as well as backporting the prerequisite commit 0791015837f1 (io_uring:
> remove extra check in __io_commit_cqring).