[Flight Delay Bundle] Fixing the namenode/compute-nodes relation error.

Samuel Cozannet samuel.cozannet at canonical.com
Wed Feb 25 09:51:23 UTC 2015


Note that in addition, using the latest charms, another error pops up
afterward because of formating a partition.

See https://bugs.launchpad.net/charms/+source/hdp-hadoop/+bug/1425456

Resolution:
On the YARN Master, find the process that is attempting to reformat the
partition and hard kill it. When Juju goes in error mode because of that,
tell it the issue is resolved.

Sam



Best,
Samuel

--
Samuel Cozannet
Cloud, Big Data and IoT Strategy Team
Business Development - Cloud and ISV Ecosystem
Changing the Future of Cloud
Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> / Juju
<https://jujucharms.com>
samuel.cozannet at canonical.com
mob: +33 616 702 389
skype: samnco
Twitter: @SaMnCo_23

On Wed, Feb 25, 2015 at 10:31 AM, Samuel Cozannet <
samuel.cozannet at canonical.com> wrote:

> Actually the bug was already filled:
> https://bugs.launchpad.net/charms/+source/hdp-hadoop/+bug/1414080
>
> but now this email is the solution, so it should get a resolution quickly.
>
> Thanks,
> Sam
>
> Best,
> Samuel
>
> --
> Samuel Cozannet
> Cloud, Big Data and IoT Strategy Team
> Business Development - Cloud and ISV Ecosystem
> Changing the Future of Cloud
> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
> Juju <https://jujucharms.com>
> samuel.cozannet at canonical.com
> mob: +33 616 702 389
> skype: samnco
> Twitter: @SaMnCo_23
>
> On Wed, Feb 25, 2015 at 10:24 AM, Samuel Cozannet <
> samuel.cozannet at canonical.com> wrote:
>
>> Hi,
>>
>> @juju mailing lists members: sorry for adding you to the thread only now.
>> This is a discussion about making sure the Flight Delay demo gets to work,
>> which is a bundle comprising a few HDP nodes (compute, YARN) and ipython
>> notebook customized to run Scala code.
>>
>> @Andrew: I figured out what is going wrong. See below
>>
>> @All:
>> So this is the story of a YARN node (based on charm hdp-hadoop-7 and 4
>> compute nodes (same charm).
>> If you deploy it with multiple compute nodes at once, you get a failed
>> relation namenode on the yarn-master side:
>>
>> *unit-yarn-master-0[28041]: 2015-02-25 09:09:40 INFO
>> unit.yarn-master/0.namenode-relation-joined logger.go:40
>> subprocess.CalledProcessError: Command '['su', 'hdfs', '-c',
>> '/usr/lib/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf start
>> namenode']' returned non-zero exit status 1*
>> *unit-yarn-master-0[28041]: 2015-02-25 09:09:40 ERROR juju.worker.uniter
>> uniter.go:608 hook "namenode-relation-joined" failed: exit status 1*
>>
>> So I connected on yarn-master/0 and tried:
>>
>> *ubuntu at ip-172-31-42-86:~$ sudo su hdfs*
>> *hdfs at ip-172-31-42-86:/home/ubuntu$ /usr/lib/hadoop/sbin/hadoop-daemon.sh
>> --config /etc/hadoop/conf start namenode*
>> *namenode running as process 9270. Stop it first.*
>>
>> So I did it:
>> *hdfs at ip-172-31-42-86:/home/ubuntu$ /usr/lib/hadoop/sbin/hadoop-daemon.sh
>> --config /etc/hadoop/conf stop namenode*
>> *stopping namenode*
>>
>> But then when running:
>>
>> *juju resolved -r yarn-master/0 *
>>
>> I would still run into the same issue. The trick is to remove *-r*. What
>> happens is that
>> * the hook is run as many times as there are compute-nodes.
>> * The error comes from the hook not testing if the namenode service is
>> already started or not, and trying to start it anyway instead of restarting
>> it.
>>
>> So the fix comes with alternatively stopping namenode service, and
>> resolving the issue on juju client side:
>>
>> On YARN side:
>> *hdfs at ip-172-31-42-86:/home/ubuntu$ /usr/lib/hadoop/sbin/hadoop-daemon.sh
>> --config /etc/hadoop/conf stop namenode*
>> *stopping namenode*
>>
>> Then  (on client side)
>> *juju resolved yarn-master/0 *
>>
>> Then on YARN side:
>> *hdfs at ip-172-31-42-86:/home/ubuntu$ /usr/lib/hadoop/sbin/hadoop-daemon.sh
>> --config /etc/hadoop/conf stop namenode*
>> *stopping namenode*
>>
>> Then  (on client side) (!!! there is no retry !!!)
>> *juju resolved yarn-master/0 *
>>
>> do that as many times as you have compute nodes (minus one for the last
>> time the namenode will actually start) and you'll be OK.
>>
>> @Andrew: I'll fill a bug for that issue on Launchpad. There is no
>> "restart" command for namenode, so that needs to be a stop then start.
>> Thanks for finding it out.
>>
>> Best,
>> Samuel
>>
>> --
>> Samuel Cozannet
>> Cloud, Big Data and IoT Strategy Team
>> Business Development - Cloud and ISV Ecosystem
>> Changing the Future of Cloud
>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>> Juju <https://jujucharms.com>
>> samuel.cozannet at canonical.com
>> mob: +33 616 702 389
>> skype: samnco
>> Twitter: @SaMnCo_23
>>
>> On Mon, Feb 23, 2015 at 5:43 PM, Samuel Cozannet <
>> samuel.cozannet at canonical.com> wrote:
>>
>>> I know we had an issue lately with some kind of upstream change with
>>> Java, so that would be one that we can't fix until the charmers team fixes
>>> it.
>>>
>>> I actually need to deploy it for demos @MWC so I'll have a look, but
>>> later this week only. I'll keep you posted when I do it. Before that,
>>> eventually use the juju mailing list or IRC channel on freenode as others
>>> can also answer. Some of our solution architects have played with it, they
>>> may be able to help.
>>>
>>> Best,
>>> Sam
>>>
>>> Best,
>>> Samuel
>>>
>>> --
>>> Samuel Cozannet
>>> Cloud, Big Data and IoT Strategy Team
>>> Business Development - Cloud and ISV Ecosystem
>>> Changing the Future of Cloud
>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>>> Juju <https://jujucharms.com>
>>> samuel.cozannet at canonical.com
>>> mob: +33 616 702 389
>>> skype: samnco
>>> Twitter: @SaMnCo_23
>>>
>>> On Mon, Feb 23, 2015 at 5:35 PM, Andrew Brookes <andrew at theasi.co>
>>> wrote:
>>>
>>>> Hi Sam,
>>>>
>>>> No luck I'm afraid. I wasn't able to get past the race condition. I
>>>> waited about an hour before adding the relations.
>>>>
>>>> Any ideas?
>>>>
>>>> Thanks,
>>>>
>>>> Andy.
>>>>
>>>>
>>>> On 20 February 2015 at 14:36, Andrew Brookes <andrew at theasi.co> wrote:
>>>>
>>>>> Thanks. I'll give it a go. Have a good flight.
>>>>>
>>>>> On 20 February 2015 at 14:35, Samuel Cozannet <
>>>>> samuel.cozannet at canonical.com> wrote:
>>>>>
>>>>>> You should see them in the notebook GUI if everything went well.
>>>>>>
>>>>>> However, with the relations red, this is not good. That's the error
>>>>>> of the race condition, and I never found how to resolve it.
>>>>>> The consequence is that YARN doesn't talk to the nodes and pig, which
>>>>>> means the cluster is useless.
>>>>>> The only way to recover that I found so far is to kill all services
>>>>>> (pig, harn, compute) and restart from scratch. First connect yarn &
>>>>>> compute. Tail the logs until it doesn't move (green relation doesn't mean
>>>>>> that it has finished everything). Then connect pig.
>>>>>>
>>>>>> I have to run and take a plane. If you look at the github for that
>>>>>> bundle, the deploy script I made is mostly manual and prevents errors such
>>>>>> as this one, but you need to follow its questions one after the other. (see
>>>>>> https://github.com/SaMnCo/bundle-flight-delay-demo/blob/master/00-deploy
>>>>>> )
>>>>>>
>>>>>> Best,
>>>>>> Sam
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Samuel
>>>>>>
>>>>>> --
>>>>>> Samuel Cozannet
>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>> Changing the Future of Cloud
>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>>>>>> Juju <https://jujucharms.com>
>>>>>> samuel.cozannet at canonical.com
>>>>>> mob: +33 616 702 389
>>>>>> skype: samnco
>>>>>> Twitter: @SaMnCo_23
>>>>>>
>>>>>> On Fri, Feb 20, 2015 at 3:19 PM, Andrew Brookes <andrew at theasi.co>
>>>>>> wrote:
>>>>>>
>>>>>>> It's deployed. I can connect to juju and the ipython notebook.
>>>>>>>
>>>>>>> I see this error though:
>>>>>>> [image: Inline images 1]
>>>>>>>
>>>>>>> Also, I'm not sure where the notebooks are stored.
>>>>>>>
>>>>>>> On 20 February 2015 at 13:18, Andrew Brookes <andrew at theasi.co>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks. I'll try it now.
>>>>>>>>
>>>>>>>> On 20 February 2015 at 12:15, Samuel Cozannet <
>>>>>>>> samuel.cozannet at canonical.com> wrote:
>>>>>>>>
>>>>>>>>> Hey!
>>>>>>>>>
>>>>>>>>> The airline is a nasty beast :/ because of a race condition
>>>>>>>>> between the datanodes and the notebook that also runs some Hadoop
>>>>>>>>> components.
>>>>>>>>>
>>>>>>>>> Can you try with this deployment script:
>>>>>>>>> http://bazaar.launchpad.net/~mmenkhof/orange-box-examples/orange-box-examples-new-demo-flight-delay/view/head:/hadoop/flight-delay-demo/01-deploy.sh
>>>>>>>>>
>>>>>>>>> and let me know if it works?
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Samuel
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Samuel Cozannet
>>>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>>>> Changing the Future of Cloud
>>>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>>>> samuel.cozannet at canonical.com
>>>>>>>>> mob: +33 616 702 389
>>>>>>>>> skype: samnco
>>>>>>>>> Twitter: @SaMnCo_23
>>>>>>>>>
>>>>>>>>> On Fri, Feb 20, 2015 at 12:41 PM, Andrew Brookes <andrew at theasi.co
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Hi Sam,
>>>>>>>>>>
>>>>>>>>>> We're having some real troubles configuring a hadoop instance for
>>>>>>>>>> the airline delay prediction. We've tried the Juju charm and also deploying
>>>>>>>>>> manually.
>>>>>>>>>>
>>>>>>>>>> Apparently when deploying the juju charm the data nodes did not
>>>>>>>>>> seem to be communicating with the NameNode.
>>>>>>>>>>
>>>>>>>>>> Any help on this would be appreciated.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Andy.
>>>>>>>>>>
>>>>>>>>>> On 26 January 2015 at 13:11, Samuel Cozannet <
>>>>>>>>>> samuel.cozannet at canonical.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey
>>>>>>>>>>>
>>>>>>>>>>> As I am doing all the testing, I actually have one up & running:
>>>>>>>>>>> https://ec2-54-149-158-178.us-west-2.compute.amazonaws.com
>>>>>>>>>>>
>>>>>>>>>>> password: secret
>>>>>>>>>>>
>>>>>>>>>>> Open airline/demo/notebook python
>>>>>>>>>>>
>>>>>>>>>>> I can also activate the spark one if you need...
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Samuel
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Samuel Cozannet
>>>>>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>>>>>> Changing the Future of Cloud
>>>>>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>>>>>> samuel.cozannet at canonical.com
>>>>>>>>>>> mob: +33 616 702 389
>>>>>>>>>>> skype: samnco
>>>>>>>>>>> Twitter: @SaMnCo_23
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Jan 25, 2015 at 8:39 PM, Angie Ma <angie at theasi.co>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for this! That's awesome and very interesting! Will have
>>>>>>>>>>>> a look through the datasets. I think we'll design a mini project for the
>>>>>>>>>>>> fellowship and may be combine with some flight crashes data we've got.
>>>>>>>>>>>> Extend it as a hackathon as well. Will keep you posted.
>>>>>>>>>>>>
>>>>>>>>>>>> On 23 January 2015 at 09:38, Samuel Cozannet <
>>>>>>>>>>>> samuel.cozannet at canonical.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hey!!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Just to let you know I had been working on Airline Delay
>>>>>>>>>>>>> Prediction
>>>>>>>>>>>>> <http://hortonworks.com/blog/data-science-apacheh-hadoop-predicting-airline-delays/> in
>>>>>>>>>>>>> python and also the version in Scala
>>>>>>>>>>>>> <http://hortonworks.com/blog/data-science-hadoop-spark-scala-part-2/>
>>>>>>>>>>>>> .
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's now possible to deploy the architecture and notebooks
>>>>>>>>>>>>> involved directly as a bundle in Juju:
>>>>>>>>>>>>> https://demo.jujucharms.com/~samuel-cozannet/trusty/flight-delay-demo-2/?text=flight#readme
>>>>>>>>>>>>>
>>>>>>>>>>>>> You can browse the code on:
>>>>>>>>>>>>> * https://github.com/SaMnCo/bundle-flight-delay-demo
>>>>>>>>>>>>> * https://github.com/SaMnCo/charm-flight-delay-demo
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let me know if that is a usecase you'd like to use for
>>>>>>>>>>>>> hackathons, I can see if it's possible to build a smaller / less expensive
>>>>>>>>>>>>> version (that one has 5 quad core/16GB RAM units)...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Enjoy :)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Samuel
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Samuel Cozannet
>>>>>>>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>>>>>>>> Changing the Future of Cloud
>>>>>>>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>>>>>>>> samuel.cozannet at canonical.com
>>>>>>>>>>>>> mob: +33 616 702 389
>>>>>>>>>>>>> skype: samnco
>>>>>>>>>>>>> Twitter: @SaMnCo_23
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> * <http://www.theasi.co>*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *  Andrew Brookes   |   CTO, ASI   e: andrew at theasi.co ・ *
>>>>>>>>>> * m: +44 (0) 7888 675 230 ・  skype: brookesey
>>>>>>>>>> <https://twitter.com/advskills> <https://www.facebook.com/theasi.co>
>>>>>>>>>> <http://www.pinterest.com/advskills/>
>>>>>>>>>> <https://www.linkedin.com/company/advanced-skills-initiative>*
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> * <http://www.theasi.co>*
>>>>>>>>
>>>>>>>>
>>>>>>>> *  Andrew Brookes   |   CTO, ASI   e: andrew at theasi.co ・ *
>>>>>>>> * m: +44 (0) 7888 675 230 ・  skype: brookesey
>>>>>>>> <https://twitter.com/advskills> <https://www.facebook.com/theasi.co>
>>>>>>>> <http://www.pinterest.com/advskills/>
>>>>>>>> <https://www.linkedin.com/company/advanced-skills-initiative>*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> * <http://www.theasi.co>*
>>>>>>>
>>>>>>>
>>>>>>> *  Andrew Brookes   |   CTO, ASI   e: andrew at theasi.co ・ *
>>>>>>> * m: +44 (0) 7888 675 230 ・  skype: brookesey
>>>>>>> <https://twitter.com/advskills> <https://www.facebook.com/theasi.co>
>>>>>>> <http://www.pinterest.com/advskills/>
>>>>>>> <https://www.linkedin.com/company/advanced-skills-initiative>*
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> * <http://www.theasi.co>*
>>>>>
>>>>>
>>>>> *  Andrew Brookes   |   CTO, ASI   e: andrew at theasi.co ・ *
>>>>> * m: +44 (0) 7888 675 230 ・  skype: brookesey
>>>>> <https://twitter.com/advskills> <https://www.facebook.com/theasi.co>
>>>>> <http://www.pinterest.com/advskills/>
>>>>> <https://www.linkedin.com/company/advanced-skills-initiative>*
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> * <http://www.theasi.co>*
>>>>
>>>>
>>>> *  Andrew Brookes   |   CTO, ASI   e: andrew at theasi.co ・ *
>>>> * m: +44 (0) 7888 675 230 ・  skype: brookesey
>>>> <https://twitter.com/advskills> <https://www.facebook.com/theasi.co>
>>>> <http://www.pinterest.com/advskills/>
>>>> <https://www.linkedin.com/company/advanced-skills-initiative>*
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju/attachments/20150225/c530575a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 177871 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/juju/attachments/20150225/c530575a/attachment.png>


More information about the Juju mailing list