[SRU][J/I/H/F][PATCH 1/2] ipmi: Move remove_work to dedicated workqueue

Thadeu Lima de Souza Cascardo cascardo at canonical.com
Thu Dec 16 13:52:22 UTC 2021


On Thu, Dec 16, 2021 at 12:21:44PM +0200, Ioanna Alifieraki wrote:
> BugLink: https://bugs.launchpad.net/bugs/1950666
> 
> Currently when removing an ipmi_user the removal is deferred as a work on
> the system's workqueue. Although this guarantees the free operation will
> occur in non atomic context, it can race with the ipmi_msghandler module
> removal (see [1]) . In case a remove_user work is scheduled for removal
> and shortly after ipmi_msghandler module is removed we can end up in a
> situation where the module is removed fist and when the work is executed
> the system crashes with :
> BUG: unable to handle page fault for address: ffffffffc05c3450
> PF: supervisor instruction fetch in kernel mode
> PF: error_code(0x0010) - not-present page
> because the pages of the module are gone. In cleanup_ipmi() there is no
> easy way to detect if there are any pending works to flush them before
> removing the module. This patch creates a separate workqueue and schedules
> the remove_work works on it. When removing the module the workqueue is
> drained when destroyed to avoid the race.
> 
> [1] https://bugs.launchpad.net/bugs/1950666
> 
> Cc: stable at vger.kernel.org # 5.1
> Fixes: 3b9a907223d7 (ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier)
> Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki at canonical.com>
> Message-Id: <20211115131645.25116-1-ioanna-maria.alifieraki at canonical.com>
> Signed-off-by: Corey Minyard <cminyard at mvista.com>
> (cherry picked from commit 1d49eb91e86e8c1c1614c72e3e958b6b7e2472a9)
> Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki at canonical.com>
> ---
>  drivers/char/ipmi/ipmi_msghandler.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
> index a08f53f208bf..f3a2f228f648 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -191,6 +191,8 @@ struct ipmi_user {
>  	struct work_struct remove_work;
>  };
>  
> +struct workqueue_struct *remove_work_wq;
> +
>  static struct ipmi_user *acquire_ipmi_user(struct ipmi_user *user, int *index)
>  	__acquires(user->release_barrier)
>  {
> @@ -1261,7 +1263,7 @@ static void free_user(struct kref *ref)
>  	struct ipmi_user *user = container_of(ref, struct ipmi_user, refcount);
>  
>  	/* SRCU cleanup must happen in task context. */
> -	schedule_work(&user->remove_work);
> +	queue_work(remove_work_wq, &user->remove_work);
>  }
>  
>  static void _ipmi_destroy_user(struct ipmi_user *user)
> @@ -5153,6 +5155,13 @@ static int ipmi_init_msghandler(void)
>  
>  	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
>  
> +	remove_work_wq = create_singlethread_workqueue("ipmi-msghandler-remove-wq");
> +	if (!remove_work_wq) {
> +		pr_err("unable to create ipmi-msghandler-remove-wq workqueue");
> +		rv = -ENOMEM;
> +		goto out;
> +	}
> +

Though not so easy to trigger: If this returns an error, then initialized ==
false, but the timer has been setup and so has the panic_notifier been
registered. That is, when unloading the module, you have some new problems to
deal with. The exit path in ipmi_init_msghandler should undo these, or rather,
this should be done first.

Cascardo.

>  	initialized = true;
>  
>  out:
> @@ -5178,6 +5187,8 @@ static void __exit cleanup_ipmi(void)
>  	int count;
>  
>  	if (initialized) {
> +		destroy_workqueue(remove_work_wq);
> +
>  		atomic_notifier_chain_unregister(&panic_notifier_list,
>  						 &panic_block);
>  
> -- 
> 2.17.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team



More information about the kernel-team mailing list