[PATCH] opal: prd_info: Add resilience to service check
Alex Hung
alex.hung at canonical.com
Wed May 2 21:34:08 UTC 2018
On Wed, May 2, 2018 at 2:28 PM, Deb McLemore <debmc at linux.vnet.ibm.com> wrote:
> Hi Alex, the patch is good, there was the fwts_pipeio regression patch
>
> which fixed the issue which surfaced this, but I think the resilience
>
> is good anyway.
>
> https://lists.ubuntu.com/archives/fwts-devel/2018-April/010348.html
Thanks Deb. I will ask other reviewers to check/ack it.
>
>
> On 05/02/2018 02:38 PM, Alex Hung wrote:
>> On Mon, Apr 9, 2018 at 6:07 AM, Deb McLemore <debmc at linux.vnet.ibm.com> wrote:
>>> Just an update on this, narrowing this down to the Host OS (Ubuntu 16.04)
>>>
>>> has different levels of opal-prd daemon. So far it seems that some
>>>
>>> changes to the fwts_pipe_readwrite does not return some socket info that it use to
>>>
>>> and so maybe different paths. There is a fix we can do to properly
>>>
>>> only look at the return code from the child exit process (fwts_pipe_close2) on the case
>>>
>>> where there is no socket data coming back on the systemctl stop command and not the
>>>
>>> output buffer of the socket handling, but really need to look deeper to
>>>
>>> see the underlying issue more clearly, but I wanted to update the mailing
>>>
>>> list.
>> Hi Deb,
>>
>> Are we expecting an updated patch for this or do you think this patch
>> is in a good shape?
>>
>> There was no FWTS 18.04.00 but there will be 18.05.00 in two weeks
>> (hopefully). If everybody agrees, this should be included in 18.05.00.
>>
>>>
>>> $ opal-prd --version
>>> opal-prd opal-prd-5.1.13
>>>
>>>
>>> $ opal-prd --version
>>> opal-prd opal-prd-5.4.3
>>>
>>>
>>> On 04/07/2018 01:41 PM, Deborah McLemore wrote:
>>>> The case I reproduced was manually running the "fwts prd_info" and all it does
>>>> is a 'systemd status', then if 'running', 'systemd stop'. The 'systemd stop'
>>>> fails with -1.
>>>> It works ok on some levels of Ubuntu and others not, I will do more
>>>> investigation to see the root differences, but the proposed enhancement
>>>> is a good one to ignore 'systemd stop' exit status since we did get a successful
>>>> status of 'running' from the 'systemd status' query.
>>>> The 'systemd stop' functionally works (the service is stopped), its just the
>>>> exit status from the 'systemd stop' which is the -1 on some OS's. We should be
>>>> more resilient. We only attempt to 'systemd start' after the test runs if we
>>>> had determined that we were 'running' and tried the 'systemd stop', so its not
>>>> so quick, but possibly.
>>>> =====================================
>>>> Deb McLemore
>>>> IBM OpenPower - IBM Systems
>>>> (512) 286 9980
>>>>
>>>> debmc at us.ibm.com
>>>> debmc at linux.vnet.ibm.com - (plain text)
>>>> =====================================
>>>>
>>>> ----- Original message -----
>>>> From: ppaidipe <ppaidipe at linux.vnet.ibm.com>
>>>> To: Deborah McLemore/Austin/IBM at IBMUS
>>>> Cc: Vasant Hegde <hegdevasant at linux.vnet.ibm.com>, Deb McLemore
>>>> <debmc at linux.vnet.ibm.com>, fwts-devel at lists.ubuntu.com
>>>> Subject: Re: [PATCH] opal: prd_info: Add resilience to service check
>>>> Date: Sat, Apr 7, 2018 1:16 PM
>>>> On 2018-04-07 20:50, Deborah McLemore wrote:
>>>> > We are getting -1 back, what is the expected exit status from systemd
>>>> > stop ?
>>>> >
>>>>
>>>> From the execution of test what i understand is we are requesting
>>>> start/stop
>>>> the service too quickly which made the test fail.
>>>>
>>>> Apr 07 13:11:18 xxxxxxxxxxx systemd[1]: opal-prd.service: Start request
>>>> repeated too quickly.
>>>> Apr 07 13:11:18 xxxxxxxxxxx systemd[1]: opal-prd.service: Failed with
>>>> result 'start-limit-hit'.
>>>> Apr 07 13:11:18 xxxxxxxxxxx systemd[1]: Failed to start OPAL PRD daemon.
>>>>
>>>> So we need to request start/restart only when it is done with stop, and
>>>> also request for stop
>>>> only when the daemon is already started.
>>>>
>>>>
>>>> Thanks
>>>> Pridhiviraj
>>>>
>>>> > Sent from my iPhone
>>>> >
>>>> >> On Apr 7, 2018, at 9:23 AM, Vasant Hegde
>>>> > <hegdevasant at linux.vnet.ibm.com> wrote:
>>>> >>
>>>> >>> On 04/07/2018 07:40 PM, Deb McLemore wrote:
>>>> >>> When the opal-prd.service is running and attempt to stop is
>>>> >>> performed, ignore the exit status and continue.
>>>> >>
>>>> >> Deb,
>>>> >>
>>>> >> Can you please explain why do you want to ignore exit status here?
>>>> >> Is there any issues?
>>>> >>
>>>> >> -Vasant
>>>> >>
>>>> >>
>>>> >>
>>>> >>>
>>>> >>> Signed-off-by: Deb McLemore <debmc at linux.vnet.ibm.com>
>>>> >>> ---
>>>> >>> src/opal/prd_info.c | 20 ++++----------------
>>>> >>> 1 file changed, 4 insertions(+), 16 deletions(-)
>>>> >>>
>>>> >>> diff --git a/src/opal/prd_info.c b/src/opal/prd_info.c
>>>> >>> index 4082a18..2db9413 100644
>>>> >>> --- a/src/opal/prd_info.c
>>>> >>> +++ b/src/opal/prd_info.c
>>>> >>> @@ -73,7 +73,7 @@ static int prd_dev_query(fwts_framework *fw)
>>>> >>>
>>>> >>> static int prd_service_check(fwts_framework *fw, int *restart)
>>>> >>> {
>>>> >>> - int rc = FWTS_OK, status = 0, stop_status = 0;
>>>> >>> + int rc = FWTS_OK, status = 0;
>>>> >>> char *command;
>>>> >>> char *output = NULL;
>>>> >>>
>>>> >>> @@ -97,25 +97,13 @@ static int prd_service_check(fwts_framework
>>>> > *fw, int *restart)
>>>> >>> goto out;
>>>> >>> case 0: /* "running" */
>>>> >>> command = "systemctl stop opal-prd.service 2>&1";
>>>> >>> - stop_status = fwts_exec2(command, &output);
>>>> >>> + fwts_exec2(command, &output);
>>>> >>>
>>>> >>> if (output)
>>>> >>> free(output);
>>>> >>>
>>>> >>> - switch (stop_status) {
>>>> >>> - case 0:
>>>> >>> - *restart = 1;
>>>> >>> - break;
>>>> >>> - default:
>>>> >>> - fwts_failed(fw, LOG_LEVEL_HIGH, "OPAL PRD Info",
>>>> >>> - "Attempt was made to stop the "
>>>> >>> - "opal-prd.service but was not "
>>>> >>> - "successful. Try to "
>>>> >>> - ""sudo systemctl stop "
>>>> >>> - "opal-prd.service" and retry.");
>>>> >>> - rc = FWTS_ERROR;
>>>> >>> - goto out;
>>>> >>> - }
>>>> >>> + *restart = 1;
>>>> >>> + break;
>>>> >>> default:
>>>> >>> break;
>>>> >>> }
>>>> >>>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> fwts-devel mailing list
>>>> >> fwts-devel at lists.ubuntu.com
>>>> >> Modify settings or unsubscribe at:
>>>> >
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ubuntu.com_mailman_listinfo_fwts-2Ddevel&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=V3KRDPsp3yMosW9R4elWYg&m=Sy-O20yWd_N3piZoJOEzigB1XzmLV4OUCfEyl3ENAcc&s=oPh1ACx1NGTgif-0V5BIQffXXqjymI8QC_bagI2jZsA&e=
>>>> > [1]
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > Links:
>>>> > ------
>>>> > [1]
>>>> >
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ubuntu.com_mailman_listinfo_fwts-2Ddevel&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=V3KRDPsp3yMosW9R4elWYg&m=Sy-O20yWd_N3piZoJOEzigB1XzmLV4OUCfEyl3ENAcc&s=oPh1ACx1NGTgif-0V5BIQffXXqjymI8QC_bagI2jZsA&e=
>>>>
>>>>
>>>
>>> --
>>> fwts-devel mailing list
>>> fwts-devel at lists.ubuntu.com
>>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/fwts-devel
>>
>>
>
--
Cheers,
Alex Hung
More information about the fwts-devel
mailing list