[Flight Delay Bundle] Fixing the namenode/compute-nodes relation error.
Samuel Cozannet
samuel.cozannet at canonical.com
Wed Feb 25 09:24:38 UTC 2015
Hi,
@juju mailing lists members: sorry for adding you to the thread only now.
This is a discussion about making sure the Flight Delay demo gets to work,
which is a bundle comprising a few HDP nodes (compute, YARN) and ipython
notebook customized to run Scala code.
@Andrew: I figured out what is going wrong. See below
@All:
So this is the story of a YARN node (based on charm hdp-hadoop-7 and 4
compute nodes (same charm).
If you deploy it with multiple compute nodes at once, you get a failed
relation namenode on the yarn-master side:
*unit-yarn-master-0[28041]: 2015-02-25 09:09:40 INFO
unit.yarn-master/0.namenode-relation-joined logger.go:40
subprocess.CalledProcessError: Command '['su', 'hdfs', '-c',
'/usr/lib/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf start
namenode']' returned non-zero exit status 1*
*unit-yarn-master-0[28041]: 2015-02-25 09:09:40 ERROR juju.worker.uniter
uniter.go:608 hook "namenode-relation-joined" failed: exit status 1*
So I connected on yarn-master/0 and tried:
*ubuntu at ip-172-31-42-86:~$ sudo su hdfs*
*hdfs at ip-172-31-42-86:/home/ubuntu$ /usr/lib/hadoop/sbin/hadoop-daemon.sh
--config /etc/hadoop/conf start namenode*
*namenode running as process 9270. Stop it first.*
So I did it:
*hdfs at ip-172-31-42-86:/home/ubuntu$ /usr/lib/hadoop/sbin/hadoop-daemon.sh
--config /etc/hadoop/conf stop namenode*
*stopping namenode*
But then when running:
*juju resolved -r yarn-master/0 *
I would still run into the same issue. The trick is to remove *-r*. What
happens is that
* the hook is run as many times as there are compute-nodes.
* The error comes from the hook not testing if the namenode service is
already started or not, and trying to start it anyway instead of restarting
it.
So the fix comes with alternatively stopping namenode service, and
resolving the issue on juju client side:
On YARN side:
*hdfs at ip-172-31-42-86:/home/ubuntu$ /usr/lib/hadoop/sbin/hadoop-daemon.sh
--config /etc/hadoop/conf stop namenode*
*stopping namenode*
Then (on client side)
*juju resolved yarn-master/0 *
Then on YARN side:
*hdfs at ip-172-31-42-86:/home/ubuntu$ /usr/lib/hadoop/sbin/hadoop-daemon.sh
--config /etc/hadoop/conf stop namenode*
*stopping namenode*
Then (on client side) (!!! there is no retry !!!)
*juju resolved yarn-master/0 *
do that as many times as you have compute nodes (minus one for the last
time the namenode will actually start) and you'll be OK.
@Andrew: I'll fill a bug for that issue on Launchpad. There is no "restart"
command for namenode, so that needs to be a stop then start. Thanks for
finding it out.
Best,
Samuel
--
Samuel Cozannet
Cloud, Big Data and IoT Strategy Team
Business Development - Cloud and ISV Ecosystem
Changing the Future of Cloud
Ubuntu <http://ubuntu.com> / Canonical UK LTD <http://canonical.com> / Juju
<https://jujucharms.com>
samuel.cozannet at canonical.com
mob: +33 616 702 389
skype: samnco
Twitter: @SaMnCo_23
On Mon, Feb 23, 2015 at 5:43 PM, Samuel Cozannet <
samuel.cozannet at canonical.com> wrote:
> I know we had an issue lately with some kind of upstream change with Java,
> so that would be one that we can't fix until the charmers team fixes it.
>
> I actually need to deploy it for demos @MWC so I'll have a look, but later
> this week only. I'll keep you posted when I do it. Before that, eventually
> use the juju mailing list or IRC channel on freenode as others can also
> answer. Some of our solution architects have played with it, they may be
> able to help.
>
> Best,
> Sam
>
> Best,
> Samuel
>
> --
> Samuel Cozannet
> Cloud, Big Data and IoT Strategy Team
> Business Development - Cloud and ISV Ecosystem
> Changing the Future of Cloud
> Ubuntu <http://ubuntu.com> / Canonical UK LTD <http://canonical.com> /
> Juju <https://jujucharms.com>
> samuel.cozannet at canonical.com
> mob: +33 616 702 389
> skype: samnco
> Twitter: @SaMnCo_23
>
> On Mon, Feb 23, 2015 at 5:35 PM, Andrew Brookes <andrew at theasi.co> wrote:
>
>> Hi Sam,
>>
>> No luck I'm afraid. I wasn't able to get past the race condition. I
>> waited about an hour before adding the relations.
>>
>> Any ideas?
>>
>> Thanks,
>>
>> Andy.
>>
>>
>> On 20 February 2015 at 14:36, Andrew Brookes <andrew at theasi.co> wrote:
>>
>>> Thanks. I'll give it a go. Have a good flight.
>>>
>>> On 20 February 2015 at 14:35, Samuel Cozannet <
>>> samuel.cozannet at canonical.com> wrote:
>>>
>>>> You should see them in the notebook GUI if everything went well.
>>>>
>>>> However, with the relations red, this is not good. That's the error of
>>>> the race condition, and I never found how to resolve it.
>>>> The consequence is that YARN doesn't talk to the nodes and pig, which
>>>> means the cluster is useless.
>>>> The only way to recover that I found so far is to kill all services
>>>> (pig, harn, compute) and restart from scratch. First connect yarn &
>>>> compute. Tail the logs until it doesn't move (green relation doesn't mean
>>>> that it has finished everything). Then connect pig.
>>>>
>>>> I have to run and take a plane. If you look at the github for that
>>>> bundle, the deploy script I made is mostly manual and prevents errors such
>>>> as this one, but you need to follow its questions one after the other. (see
>>>> https://github.com/SaMnCo/bundle-flight-delay-demo/blob/master/00-deploy
>>>> )
>>>>
>>>> Best,
>>>> Sam
>>>>
>>>>
>>>>
>>>> Best,
>>>> Samuel
>>>>
>>>> --
>>>> Samuel Cozannet
>>>> Cloud, Big Data and IoT Strategy Team
>>>> Business Development - Cloud and ISV Ecosystem
>>>> Changing the Future of Cloud
>>>> Ubuntu <http://ubuntu.com> / Canonical UK LTD <http://canonical.com> /
>>>> Juju <https://jujucharms.com>
>>>> samuel.cozannet at canonical.com
>>>> mob: +33 616 702 389
>>>> skype: samnco
>>>> Twitter: @SaMnCo_23
>>>>
>>>> On Fri, Feb 20, 2015 at 3:19 PM, Andrew Brookes <andrew at theasi.co>
>>>> wrote:
>>>>
>>>>> It's deployed. I can connect to juju and the ipython notebook.
>>>>>
>>>>> I see this error though:
>>>>> [image: Inline images 1]
>>>>>
>>>>> Also, I'm not sure where the notebooks are stored.
>>>>>
>>>>> On 20 February 2015 at 13:18, Andrew Brookes <andrew at theasi.co> wrote:
>>>>>
>>>>>> Thanks. I'll try it now.
>>>>>>
>>>>>> On 20 February 2015 at 12:15, Samuel Cozannet <
>>>>>> samuel.cozannet at canonical.com> wrote:
>>>>>>
>>>>>>> Hey!
>>>>>>>
>>>>>>> The airline is a nasty beast :/ because of a race condition between
>>>>>>> the datanodes and the notebook that also runs some Hadoop components.
>>>>>>>
>>>>>>> Can you try with this deployment script:
>>>>>>> http://bazaar.launchpad.net/~mmenkhof/orange-box-examples/orange-box-examples-new-demo-flight-delay/view/head:/hadoop/flight-delay-demo/01-deploy.sh
>>>>>>>
>>>>>>> and let me know if it works?
>>>>>>>
>>>>>>> Best,
>>>>>>> Samuel
>>>>>>>
>>>>>>> --
>>>>>>> Samuel Cozannet
>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>> Changing the Future of Cloud
>>>>>>> Ubuntu <http://ubuntu.com> / Canonical UK LTD
>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>> samuel.cozannet at canonical.com
>>>>>>> mob: +33 616 702 389
>>>>>>> skype: samnco
>>>>>>> Twitter: @SaMnCo_23
>>>>>>>
>>>>>>> On Fri, Feb 20, 2015 at 12:41 PM, Andrew Brookes <andrew at theasi.co>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Sam,
>>>>>>>>
>>>>>>>> We're having some real troubles configuring a hadoop instance for
>>>>>>>> the airline delay prediction. We've tried the Juju charm and also deploying
>>>>>>>> manually.
>>>>>>>>
>>>>>>>> Apparently when deploying the juju charm the data nodes did not
>>>>>>>> seem to be communicating with the NameNode.
>>>>>>>>
>>>>>>>> Any help on this would be appreciated.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Andy.
>>>>>>>>
>>>>>>>> On 26 January 2015 at 13:11, Samuel Cozannet <
>>>>>>>> samuel.cozannet at canonical.com> wrote:
>>>>>>>>
>>>>>>>>> Hey
>>>>>>>>>
>>>>>>>>> As I am doing all the testing, I actually have one up & running:
>>>>>>>>> https://ec2-54-149-158-178.us-west-2.compute.amazonaws.com
>>>>>>>>>
>>>>>>>>> password: secret
>>>>>>>>>
>>>>>>>>> Open airline/demo/notebook python
>>>>>>>>>
>>>>>>>>> I can also activate the spark one if you need...
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Samuel
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Samuel Cozannet
>>>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>>>> Changing the Future of Cloud
>>>>>>>>> Ubuntu <http://ubuntu.com> / Canonical UK LTD
>>>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>>>> samuel.cozannet at canonical.com
>>>>>>>>> mob: +33 616 702 389
>>>>>>>>> skype: samnco
>>>>>>>>> Twitter: @SaMnCo_23
>>>>>>>>>
>>>>>>>>> On Sun, Jan 25, 2015 at 8:39 PM, Angie Ma <angie at theasi.co> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for this! That's awesome and very interesting! Will have a
>>>>>>>>>> look through the datasets. I think we'll design a mini project for the
>>>>>>>>>> fellowship and may be combine with some flight crashes data we've got.
>>>>>>>>>> Extend it as a hackathon as well. Will keep you posted.
>>>>>>>>>>
>>>>>>>>>> On 23 January 2015 at 09:38, Samuel Cozannet <
>>>>>>>>>> samuel.cozannet at canonical.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey!!
>>>>>>>>>>>
>>>>>>>>>>> Just to let you know I had been working on Airline Delay
>>>>>>>>>>> Prediction
>>>>>>>>>>> <http://hortonworks.com/blog/data-science-apacheh-hadoop-predicting-airline-delays/> in
>>>>>>>>>>> python and also the version in Scala
>>>>>>>>>>> <http://hortonworks.com/blog/data-science-hadoop-spark-scala-part-2/>
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>>> It's now possible to deploy the architecture and notebooks
>>>>>>>>>>> involved directly as a bundle in Juju:
>>>>>>>>>>> https://demo.jujucharms.com/~samuel-cozannet/trusty/flight-delay-demo-2/?text=flight#readme
>>>>>>>>>>>
>>>>>>>>>>> You can browse the code on:
>>>>>>>>>>> * https://github.com/SaMnCo/bundle-flight-delay-demo
>>>>>>>>>>> * https://github.com/SaMnCo/charm-flight-delay-demo
>>>>>>>>>>>
>>>>>>>>>>> Let me know if that is a usecase you'd like to use for
>>>>>>>>>>> hackathons, I can see if it's possible to build a smaller / less expensive
>>>>>>>>>>> version (that one has 5 quad core/16GB RAM units)...
>>>>>>>>>>>
>>>>>>>>>>> Enjoy :)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Samuel
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Samuel Cozannet
>>>>>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>>>>>> Changing the Future of Cloud
>>>>>>>>>>> Ubuntu <http://ubuntu.com> / Canonical UK LTD
>>>>>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>>>>>> samuel.cozannet at canonical.com
>>>>>>>>>>> mob: +33 616 702 389
>>>>>>>>>>> skype: samnco
>>>>>>>>>>> Twitter: @SaMnCo_23
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> * <http://www.theasi.co>*
>>>>>>>>
>>>>>>>>
>>>>>>>> * Andrew Brookes | CTO, ASI e: andrew at theasi.co ・ *
>>>>>>>> * m: +44 (0) 7888 675 230 ・ skype: brookesey
>>>>>>>> <https://twitter.com/advskills> <https://www.facebook.com/theasi.co>
>>>>>>>> <http://www.pinterest.com/advskills/>
>>>>>>>> <https://www.linkedin.com/company/advanced-skills-initiative>*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> * <http://www.theasi.co>*
>>>>>>
>>>>>>
>>>>>> * Andrew Brookes | CTO, ASI e: andrew at theasi.co ・ *
>>>>>> * m: +44 (0) 7888 675 230 ・ skype: brookesey
>>>>>> <https://twitter.com/advskills> <https://www.facebook.com/theasi.co>
>>>>>> <http://www.pinterest.com/advskills/>
>>>>>> <https://www.linkedin.com/company/advanced-skills-initiative>*
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> * <http://www.theasi.co>*
>>>>>
>>>>>
>>>>> * Andrew Brookes | CTO, ASI e: andrew at theasi.co ・ *
>>>>> * m: +44 (0) 7888 675 230 ・ skype: brookesey
>>>>> <https://twitter.com/advskills> <https://www.facebook.com/theasi.co>
>>>>> <http://www.pinterest.com/advskills/>
>>>>> <https://www.linkedin.com/company/advanced-skills-initiative>*
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> * <http://www.theasi.co>*
>>>
>>>
>>> * Andrew Brookes | CTO, ASI e: andrew at theasi.co ・ *
>>> * m: +44 (0) 7888 675 230 ・ skype: brookesey
>>> <https://twitter.com/advskills> <https://www.facebook.com/theasi.co>
>>> <http://www.pinterest.com/advskills/>
>>> <https://www.linkedin.com/company/advanced-skills-initiative>*
>>>
>>
>>
>>
>> --
>> * <http://www.theasi.co>*
>>
>>
>> * Andrew Brookes | CTO, ASI e: andrew at theasi.co ・ *
>> * m: +44 (0) 7888 675 230 ・ skype: brookesey
>> <https://twitter.com/advskills> <https://www.facebook.com/theasi.co>
>> <http://www.pinterest.com/advskills/>
>> <https://www.linkedin.com/company/advanced-skills-initiative>*
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju/attachments/20150225/df3047ce/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 177871 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/juju/attachments/20150225/df3047ce/attachment.png>
More information about the Juju
mailing list