ACK/cmnt: [SRU X] [PATCH 0/5] Line discipline buffer flush/tty_reopen() race fix
Stefan Bader
stefan.bader at canonical.com
Wed Jan 9 10:10:55 UTC 2019
On 09.01.19 10:57, Kleber Souza wrote:
> On 1/8/19 9:18 PM, Guilherme G. Piccoli wrote:
>> BugLink: https://bugs.launchpad.net/bugs/1791758
>>
>> [Impact]
>>
>> * Line discipline code is racy when we have buffer being flush while the
>> tty is being initialized or reinitialized. For the first problem, we have
>> an upstream patch since January 2018:
>> b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf");
>> although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent
>> ones.
>>
>> * For the race between the buffer flush while tty is being reopened, we
>> have a patch that addresses this issue recently merged for 5.0-rc1:
>> 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()").
>> No Ubuntu kernel currently contains this patch, hence we're hereby
>> submitting the SRU request. The upstream complete patch series for
>> this is in [0].
>>
>> * The approach of both patches are similar - they rely in locking/semaphore
>> to prevent race conditions. Some additional patches are necessary to prevent
>> correlated issues, like preventing a potential deadlock due to bad
>> prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to
>> c96cf923a98d ("tty: Don't block on IO when ldisc change is pending").
>> All the necessary fixes are grouped here in this SRU request.
>>
>> * The symptom of the race condition between the buffer flush and the
>> tty reopen routine is a kernel crash with the following trace:
>>
>>
>> BUG: unable to handle kernel paging request at 0000000000002268
>> IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0
>> [...]
>> Call Trace:
>> [<addr>] ? kvm_sched_clock_read+0x1e/0x30
>> [<addr>] n_tty_receive_buf2+0x14/0x20
>> [<addr>] flush_to_ldisc+0xd5/0x120
>> [<addr>] process_one_work+0x156/0x400
>> [<addr>] worker_thread+0x11a/0x480
>> [...]
>>
>>
>> * A kernel crash was collected from an user, analysis is present in
>> comment #4 in LP #1791758.
>>
>>
>> [Test Case]
>>
>> * It is not trivial to trigger this fault, but the usual recipe is to keep
>> accessing a machine through SSH (or keep killing getty when in IPMI serial
>> console) and in some way run commands before the terminal is ready in that
>> machine (like hacking some echo into ttySx or pts in an infinite loop).
>>
>> * We have reports of users that could reproduce this issue in their
>> production environment, and with the patches present in this SRU request
>> the problem was fixed.
>>
>>
>> [Regression Potential]
>>
>> * tty subsystem is highly central and patches in that area are always
>> delicate. For example, the upstream series [0] is a re-spin (V6) due to
>> a hard to reproduce issue reported in the PA-RISC architecture, which was
>> found in the V5 iteration [1] but was fixed by the patch c96cf923a98d,
>> present in this SRU request.
>>
>> * The patchset [0] is present in tty-next tree since mid-November, and the
>> patch b027e2298bd5 is available upstream since January/2018 (it's available
>> in both Ubuntu kernels 4.15 and 4.18), so the overall likelihood of
>> regressions is low.
>>
>> * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18)
>> and didn't show any issues.
>>
>>
>> [0] https://marc.info/?l=linux-kernel&m=154103190111795
>> [1] https://marc.info/?l=linux-kernel&m=153737852618183
>>
>>
>> Dmitry Safonov (4):
>> tty: Drop tty->count on tty_reopen() failure
>> tty: Hold tty_ldisc_lock() during tty_reopen()
>> tty: Don't block on IO when ldisc change is pending
>> tty: Simplify tty->count math in tty_reopen()
>>
>> Gaurav Kohli (1):
>> tty: fix data race between tty_init_dev and flush of buf
>>
>> drivers/tty/n_hdlc.c | 4 ++--
>> drivers/tty/n_r3964.c | 2 +-
>> drivers/tty/n_tty.c | 8 ++++----
>> drivers/tty/tty_io.c | 20 +++++++++++++++++---
>> drivers/tty/tty_ldisc.c | 11 +++++++++--
>> include/linux/tty.h | 9 +++++++++
>> 6 files changed, 42 insertions(+), 12 deletions(-)
>>
> Those are indeed some sensitive changes and a bit intrusive, but they
> are needed to fix the issue and most of them have been around on
> mainline for some time now. I guess we'll also detect issues with
> regression tests if they can be easily broken.
>
> Please note that the required format for the provenance of the patch is:
>
> (backported from commit <sha1> <upstream repo name>)
> or
> (cherry-pick from commit <sha1> <upstream repo name>)
no, "cherry picked from commit ..." without the - and with an "ed" ;)
>
> so the "commit" part is missing on those patches. The <upstream repo
> name> can be omitted if they come from upstream. This can be fixed when
> applying, but please keep it in mind for the next submissions :-).
>
>
> Acked-by: Kleber Sacilotto de Souza <kleber.souza at canonical.com>
>
>
>
More information about the kernel-team
mailing list