[PATCH] opal: prd_info: Add resilience to service check

Alex Hung alex.hung at canonical.com
Wed May 2 21:34:08 UTC 2018


On Wed, May 2, 2018 at 2:28 PM, Deb McLemore <debmc at linux.vnet.ibm.com> wrote:
> Hi Alex, the patch is good, there was the fwts_pipeio regression patch
>
> which fixed the issue which surfaced this, but I think the resilience
>
> is good anyway.
>
> https://lists.ubuntu.com/archives/fwts-devel/2018-April/010348.html

Thanks Deb. I will ask other reviewers to check/ack it.

>
>
> On 05/02/2018 02:38 PM, Alex Hung wrote:
>> On Mon, Apr 9, 2018 at 6:07 AM, Deb McLemore <debmc at linux.vnet.ibm.com> wrote:
>>> Just an update on this, narrowing this down to the Host OS (Ubuntu 16.04)
>>>
>>> has different levels of opal-prd daemon.  So far it seems that some
>>>
>>> changes to the fwts_pipe_readwrite does not return some socket info that it use to
>>>
>>> and so maybe different paths.  There is a fix we can do to properly
>>>
>>> only look at the return code from the child exit process (fwts_pipe_close2) on the case
>>>
>>> where there is no socket data coming back on the systemctl stop command and not the
>>>
>>> output buffer of the socket handling, but really need to look deeper to
>>>
>>> see the underlying issue more clearly, but I wanted to update the mailing
>>>
>>> list.
>> Hi Deb,
>>
>> Are we expecting an updated patch for this or do you think this patch
>> is in a good shape?
>>
>> There was no FWTS 18.04.00 but there will be 18.05.00 in two weeks
>> (hopefully). If everybody agrees, this should be included in 18.05.00.
>>
>>>
>>> $ opal-prd --version
>>> opal-prd opal-prd-5.1.13
>>>
>>>
>>> $ opal-prd --version
>>> opal-prd opal-prd-5.4.3
>>>
>>>
>>> On 04/07/2018 01:41 PM, Deborah McLemore wrote:
>>>> The case I reproduced was manually running the "fwts prd_info" and all it does
>>>> is a 'systemd status', then if 'running', 'systemd stop'.  The 'systemd stop'
>>>> fails with -1.
>>>> It works ok on some levels of Ubuntu and others not, I will do more
>>>> investigation to see the root differences, but the proposed enhancement
>>>> is a good one to ignore 'systemd stop' exit status since we did get a successful
>>>> status of 'running' from the 'systemd status' query.
>>>> The 'systemd stop' functionally works (the service is stopped), its just the
>>>> exit status from the 'systemd stop' which is the -1 on some OS's.  We should be
>>>> more resilient.  We only attempt to 'systemd start' after the test runs if we
>>>> had determined that we were 'running' and tried the 'systemd stop', so its not
>>>> so quick, but possibly.
>>>> =====================================
>>>> Deb McLemore
>>>> IBM OpenPower - IBM Systems
>>>> (512) 286 9980
>>>>
>>>> debmc at us.ibm.com
>>>> debmc at linux.vnet.ibm.com - (plain text)
>>>> =====================================
>>>>
>>>>     ----- Original message -----
>>>>     From: ppaidipe <ppaidipe at linux.vnet.ibm.com>
>>>>     To: Deborah McLemore/Austin/IBM at IBMUS
>>>>     Cc: Vasant Hegde <hegdevasant at linux.vnet.ibm.com>, Deb McLemore
>>>>     <debmc at linux.vnet.ibm.com>, fwts-devel at lists.ubuntu.com
>>>>     Subject: Re: [PATCH] opal: prd_info: Add resilience to service check
>>>>     Date: Sat, Apr 7, 2018 1:16 PM
>>>>     On 2018-04-07 20:50, Deborah McLemore wrote:
>>>>      > We are getting -1 back, what is the expected exit status from systemd
>>>>      > stop ?
>>>>      >
>>>>
>>>>       From the execution of test what i understand is we are requesting
>>>>     start/stop
>>>>     the service too quickly which made the test fail.
>>>>
>>>>     Apr 07 13:11:18 xxxxxxxxxxx systemd[1]: opal-prd.service: Start request
>>>>     repeated too quickly.
>>>>     Apr 07 13:11:18 xxxxxxxxxxx systemd[1]: opal-prd.service: Failed with
>>>>     result 'start-limit-hit'.
>>>>     Apr 07 13:11:18 xxxxxxxxxxx systemd[1]: Failed to start OPAL PRD daemon.
>>>>
>>>>     So we need to request start/restart only when it is done with stop, and
>>>>     also request for stop
>>>>     only when the daemon is already started.
>>>>
>>>>
>>>>     Thanks
>>>>     Pridhiviraj
>>>>
>>>>      > Sent from my iPhone
>>>>      >
>>>>      >> On Apr 7, 2018, at 9:23 AM, Vasant Hegde
>>>>      > <hegdevasant at linux.vnet.ibm.com> wrote:
>>>>      >>
>>>>      >>> On 04/07/2018 07:40 PM, Deb McLemore wrote:
>>>>      >>> When the opal-prd.service is running and attempt to stop is
>>>>      >>> performed, ignore the exit status and continue.
>>>>      >>
>>>>      >> Deb,
>>>>      >>
>>>>      >> Can you please explain why do you want to ignore exit status here?
>>>>      >> Is there any issues?
>>>>      >>
>>>>      >> -Vasant
>>>>      >>
>>>>      >>
>>>>      >>
>>>>      >>>
>>>>      >>> Signed-off-by: Deb McLemore <debmc at linux.vnet.ibm.com>
>>>>      >>> ---
>>>>      >>> src/opal/prd_info.c | 20 ++++----------------
>>>>      >>> 1 file changed, 4 insertions(+), 16 deletions(-)
>>>>      >>>
>>>>      >>> diff --git a/src/opal/prd_info.c b/src/opal/prd_info.c
>>>>      >>> index 4082a18..2db9413 100644
>>>>      >>> --- a/src/opal/prd_info.c
>>>>      >>> +++ b/src/opal/prd_info.c
>>>>      >>> @@ -73,7 +73,7 @@ static int prd_dev_query(fwts_framework *fw)
>>>>      >>>
>>>>      >>> static int prd_service_check(fwts_framework *fw, int *restart)
>>>>      >>> {
>>>>      >>> - int rc = FWTS_OK, status = 0, stop_status = 0;
>>>>      >>> + int rc = FWTS_OK, status = 0;
>>>>      >>> char *command;
>>>>      >>> char *output = NULL;
>>>>      >>>
>>>>      >>> @@ -97,25 +97,13 @@ static int prd_service_check(fwts_framework
>>>>      > *fw, int *restart)
>>>>      >>> goto out;
>>>>      >>> case 0: /* "running" */
>>>>      >>> command = "systemctl stop opal-prd.service 2>&1";
>>>>      >>> - stop_status = fwts_exec2(command, &output);
>>>>      >>> + fwts_exec2(command, &output);
>>>>      >>>
>>>>      >>> if (output)
>>>>      >>> free(output);
>>>>      >>>
>>>>      >>> - switch (stop_status) {
>>>>      >>> - case 0:
>>>>      >>> - *restart = 1;
>>>>      >>> - break;
>>>>      >>> - default:
>>>>      >>> - fwts_failed(fw, LOG_LEVEL_HIGH, "OPAL PRD Info",
>>>>      >>> - "Attempt was made to stop the "
>>>>      >>> - "opal-prd.service but was not "
>>>>      >>> - "successful. Try to "
>>>>      >>> - ""sudo systemctl stop "
>>>>      >>> - "opal-prd.service" and retry.");
>>>>      >>> - rc = FWTS_ERROR;
>>>>      >>> - goto out;
>>>>      >>> - }
>>>>      >>> + *restart = 1;
>>>>      >>> + break;
>>>>      >>> default:
>>>>      >>> break;
>>>>      >>> }
>>>>      >>>
>>>>      >>
>>>>      >>
>>>>      >> --
>>>>      >> fwts-devel mailing list
>>>>      >> fwts-devel at lists.ubuntu.com
>>>>      >> Modify settings or unsubscribe at:
>>>>      >
>>>>     https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ubuntu.com_mailman_listinfo_fwts-2Ddevel&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=V3KRDPsp3yMosW9R4elWYg&m=Sy-O20yWd_N3piZoJOEzigB1XzmLV4OUCfEyl3ENAcc&s=oPh1ACx1NGTgif-0V5BIQffXXqjymI8QC_bagI2jZsA&e=
>>>>      > [1]
>>>>      >>
>>>>      >
>>>>      >
>>>>      >
>>>>      > Links:
>>>>      > ------
>>>>      > [1]
>>>>      >
>>>>     https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ubuntu.com_mailman_listinfo_fwts-2Ddevel&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=V3KRDPsp3yMosW9R4elWYg&m=Sy-O20yWd_N3piZoJOEzigB1XzmLV4OUCfEyl3ENAcc&s=oPh1ACx1NGTgif-0V5BIQffXXqjymI8QC_bagI2jZsA&e=
>>>>
>>>>
>>>
>>> --
>>> fwts-devel mailing list
>>> fwts-devel at lists.ubuntu.com
>>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/fwts-devel
>>
>>
>



-- 
Cheers,
Alex Hung



More information about the fwts-devel mailing list