How best to install Spark?

Fri Jan 30 18:09:06 UTC 2015

I'll have a look asap, but probably not before Tuesday.

This may be "my guts tell me that" but, if you have the time, try to
collocate YARN and Spark, that will guarantee you have the YARN_CONF_DIR
set. I am 90% sure it will fix your problem.

YARN itself will not eat much resources, you should be alright and it may
allow you to move forward instead of being stuck.

Best,
Sam

Best,
Samuel

--
Samuel Cozannet
Cloud, Big Data and IoT Strategy Team
Business Development - Cloud and ISV Ecosystem
Changing the Future of Cloud
Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> / Juju
<https://jujucharms.com>
samuel.cozannet at canonical.com
mob: +33 616 702 389
skype: samnco
Twitter: @SaMnCo_23

On Fri, Jan 30, 2015 at 7:01 PM, Ken Williams <ken.w at theasi.co> wrote:

> Hi Sam,
>
>     Attached is my bundles.yaml file.
>
>     Also, there is no file 'directories.sh' on my spark-master/0 machine
> (see below),
>
> ubuntu at ip-172-31-54-245:~$ ls -l /etc/profile.d/
> total 12
> -rw-r--r-- 1 root root 1559 Jul 29  2014 Z97-byobu.sh
> -rwxr-xr-x 1 root root 2691 Oct  6 13:19 Z99-cloud-locale-test.sh
> -rw-r--r-- 1 root root  663 Apr  7  2014 bash_completion.sh
> ubuntu at ip-172-31-54-245:~$
>
>
>     Many thanks again your help,
>
> Ken
>
>
> On 30 January 2015 at 15:45, Samuel Cozannet <
> samuel.cozannet at canonical.com> wrote:
>
>> Hey,
>>
>> can you send the bundle you're using (in the GUI, bottom right, "export"
>> button should give you a bundles.yaml file, please send that to me, so I
>> can bootstrap the same environment as you are playing with.
>>
>> also
>> * can you let me know if you have a file /etc/profile.d/directories.sh?
>> * if yes, can you execute it from your command line, then do the spark
>> command again, and let me know?
>>
>> Thx,
>> Sam
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Best,
>> Samuel
>>
>> --
>> Samuel Cozannet
>> Cloud, Big Data and IoT Strategy Team
>> Business Development - Cloud and ISV Ecosystem
>> Changing the Future of Cloud
>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>> Juju <https://jujucharms.com>
>> samuel.cozannet at canonical.com
>> mob: +33 616 702 389
>> skype: samnco
>> Twitter: @SaMnCo_23
>>
>> On Fri, Jan 30, 2015 at 3:46 PM, Ken Williams <ken.w at theasi.co> wrote:
>>
>>> Ok - I have been able to add the relation using this,
>>>
>>>                 juju add-relation yarn-hdfs-master:resourcemanager
>>> spark-master
>>>
>>> But I still cannot see a /etc/hadoop/conf directory on the spark-master
>>> machine
>>> so I still get the same error about HADOOP_CONF_DIR and YARN_CONF_DIR
>>> (below),
>>>
>>>
>>> root at ip-172-31-60-53:~# spark-submit --class
>>> org.apache.spark.examples.SparkPi     --master yarn-client
>>> --num-executors 3     --driver-memory 1g     --executor-memory 1g
>>> --executor-cores 1     --queue thequeue     lib/spark-examples*.jar     10
>>> Spark assembly has been built with Hive, including Datanucleus jars on
>>> classpath
>>> Exception in thread "main" java.lang.Exception: When running with master
>>> 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the
>>> environment.
>>> at
>>> org.apache.spark.deploy.SparkSubmitArguments.checkRequiredArguments(SparkSubmitArguments.scala:177)
>>> at
>>> org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:81)
>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:70)
>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>> root at ip-172-31-60-53:~#
>>>
>>> Should there be a /etc/hadoop/conf directory ?
>>>
>>> Thanks for any help,
>>>
>>> Ken
>>>
>>>
>>> On 30 January 2015 at 12:59, Samuel Cozannet <
>>> samuel.cozannet at canonical.com> wrote:
>>>
>>>> Have you tried without ':master":
>>>>
>>>> juju add-relation yarn-hdfs-master:resourcemanager spark-master
>>>>
>>>> I think Spark master consumes the relationship but doesn't have to
>>>> expose its master relationship.
>>>>
>>>> Rule of thumb, when a relation is non ambiguous on one of its ends,
>>>> there is no requirement to specify it when adding it.
>>>>
>>>> Another option if this doesn't work is to use the GUI to create the
>>>> relation. It will give you a dropdown of available relationships between
>>>> entities.
>>>>
>>>> Let me know how it goes,
>>>> Thx,
>>>> Sam
>>>>
>>>>
>>>> Best,
>>>> Samuel
>>>>
>>>> --
>>>> Samuel Cozannet
>>>> Cloud, Big Data and IoT Strategy Team
>>>> Business Development - Cloud and ISV Ecosystem
>>>> Changing the Future of Cloud
>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>>>> Juju <https://jujucharms.com>
>>>> samuel.cozannet at canonical.com
>>>> mob: +33 616 702 389
>>>> skype: samnco
>>>> Twitter: @SaMnCo_23
>>>>
>>>> On Fri, Jan 30, 2015 at 1:09 PM, Ken Williams <ken.w at theasi.co> wrote:
>>>>
>>>>> Hi Sam,
>>>>>
>>>>>     I understand what you are saying but when I try to add the 2
>>>>> relations I get this error,
>>>>>
>>>>> root at adminuser-VirtualBox:~# juju add-relation
>>>>> yarn-hdfs-master:resourcemanager spark-master:master
>>>>> ERROR no relations found
>>>>> root at adminuser-VirtualBox:~# juju add-relation
>>>>> yarn-hdfs-master:namenode spark-master:master
>>>>> ERROR no relations found
>>>>>
>>>>>   Am I adding the relations right ?
>>>>>
>>>>>   Attached is my 'juju status' file.
>>>>>
>>>>>   Thanks for all your help,
>>>>>
>>>>> Ken
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 30 January 2015 at 11:16, Samuel Cozannet <
>>>>> samuel.cozannet at canonical.com> wrote:
>>>>>
>>>>>> Hey Ken,
>>>>>>
>>>>>> Yes, you need to create the relationship between the 2 entities to
>>>>>> they know about each other.
>>>>>>
>>>>>> Looking at the list of hooks for the charm
>>>>>> <https://github.com/Archethought/spark-charm/tree/master/hooks> you
>>>>>> can see there are 2 hooks named namenode-relation-changed
>>>>>> <https://github.com/Archethought/spark-charm/blob/master/hooks/namenode-relation-changed>
>>>>>>  and resourcemanager-relation-changed
>>>>>> <https://github.com/Archethought/spark-charm/blob/master/hooks/resourcemanager-relation-changed> which
>>>>>> are related to YARN/Hadoop.
>>>>>> Looking deeper in the code, you'll notice they reference a function
>>>>>> found in bdutils.py called "setHadoopEnvVar()", which based on its name
>>>>>> should set the HADOOP_CONF_DIR.
>>>>>>
>>>>>> There are 2 relations, so add both of them.
>>>>>>
>>>>>> Note that I didn't test this myself, but I expect this should fix the
>>>>>> problem. If it doesn't please come back to us...
>>>>>>
>>>>>> Thanks!
>>>>>> Sam
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Samuel
>>>>>>
>>>>>> --
>>>>>> Samuel Cozannet
>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>> Changing the Future of Cloud
>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>>>>>> Juju <https://jujucharms.com>
>>>>>> samuel.cozannet at canonical.com
>>>>>> mob: +33 616 702 389
>>>>>> skype: samnco
>>>>>> Twitter: @SaMnCo_23
>>>>>>
>>>>>> On Fri, Jan 30, 2015 at 11:51 AM, Ken Williams <ken.w at theasi.co>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> Thanks, Kapil - this works :-)
>>>>>>>
>>>>>>> I can now run the SparkPi example successfully.
>>>>>>> root at ip-172-31-60-53:~# spark-submit --class
>>>>>>> org.apache.spark.examples.SparkPi /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>>>>>> Spark assembly has been built with Hive, including Datanucleus jars
>>>>>>> on classpath
>>>>>>> 15/01/30 10:29:33 WARN NativeCodeLoader: Unable to load
>>>>>>> native-hadoop library for your platform... using builtin-java classes where
>>>>>>> applicable
>>>>>>> Pi is roughly 3.14318
>>>>>>>
>>>>>>> root at ip-172-31-60-53:~#
>>>>>>>
>>>>>>> I'm now trying to run the same example with the spark-submit
>>>>>>> '--master' option set to either 'yarn-cluster' or 'yarn-client'
>>>>>>> but I keep getting the same error :
>>>>>>>
>>>>>>> root at ip-172-31-60-53:~# spark-submit --class
>>>>>>> org.apache.spark.examples.SparkPi     --master yarn-client
>>>>>>> --num-executors 3     --driver-memory 1g     --executor-memory 1g
>>>>>>> --executor-cores 1     --queue thequeue     lib/spark-examples*.jar     10
>>>>>>> Spark assembly has been built with Hive, including Datanucleus jars
>>>>>>> on classpath
>>>>>>> Exception in thread "main" java.lang.Exception: When running with
>>>>>>> master 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in
>>>>>>> the environment.
>>>>>>>
>>>>>>> But on my spark-master/0 machine there is no /etc/hadoop/conf
>>>>>>> directory.
>>>>>>> So what should the HADOOP_CONF_DIR or YARN_CONF_DIR value be ?
>>>>>>> Do I need to add a juju relation between spark-master and ...
>>>>>>> yarn-hdfs-master to make them aware of each other ?
>>>>>>>
>>>>>>> Thanks for any help,
>>>>>>>
>>>>>>> Ken
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 28 January 2015 at 19:32, Kapil Thangavelu <
>>>>>>> kapil.thangavelu at canonical.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 28, 2015 at 1:54 PM, Ken Williams <ken.w at theasi.co>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Sam/Amir,
>>>>>>>>>
>>>>>>>>>     I've been able to 'juju ssh spark-master/0' and I successfully
>>>>>>>>> ran the two
>>>>>>>>> simple examples for pyspark and spark-shell,
>>>>>>>>>
>>>>>>>>>     ./bin/pyspark
>>>>>>>>>     >>> sc.parallelize(range(1000)).count()
>>>>>>>>>     1000
>>>>>>>>>
>>>>>>>>>     ./bin/spark-shell
>>>>>>>>>      scala> sc.parallelize(1 to 1000).count()
>>>>>>>>>     1000
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Now I want to run some of the spark examples in the
>>>>>>>>> spark-exampes*.jar
>>>>>>>>> file, which I have on my local machine. How do I copy the jar file
>>>>>>>>> from
>>>>>>>>> my local machine to the AWS machine ?
>>>>>>>>>
>>>>>>>>> I have tried 'scp' and 'juju scp' from the local command-line but
>>>>>>>>> both fail (below),
>>>>>>>>>
>>>>>>>>> root at adminuser:~# scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>>>>>>>> ubuntu at ip-172-31-59:/tmp
>>>>>>>>> ssh: Could not resolve hostname ip-172-31-59: Name or service not
>>>>>>>>> known
>>>>>>>>> lost connection
>>>>>>>>> root at adminuser:~# juju scp
>>>>>>>>> /tmp/spark-examples-1.2.0-hadoop2.4.0.jar ubuntu at ip-172-31-59:/tmp
>>>>>>>>> ERROR exit status 1 (nc: getaddrinfo: Name or service not known)
>>>>>>>>>
>>>>>>>>> Any ideas ?
>>>>>>>>>
>>>>>>>>
>>>>>>>> juju scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>>>>>>> spark-master/0:/tmp
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ken
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 28 January 2015 at 17:29, Samuel Cozannet <
>>>>>>>>> samuel.cozannet at canonical.com> wrote:
>>>>>>>>>
>>>>>>>>>> Glad it worked!
>>>>>>>>>>
>>>>>>>>>> I'll make a merge request to the upstream so that it works
>>>>>>>>>> natively from the store asap.
>>>>>>>>>>
>>>>>>>>>> Thanks for catching that!
>>>>>>>>>> Samuel
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Samuel
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Samuel Cozannet
>>>>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>>>>> Changing the Future of Cloud
>>>>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>>>>> samuel.cozannet at canonical.com
>>>>>>>>>> mob: +33 616 702 389
>>>>>>>>>> skype: samnco
>>>>>>>>>> Twitter: @SaMnCo_23
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 28, 2015 at 6:15 PM, Ken Williams <ken.w at theasi.co>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Sam (and Maarten),
>>>>>>>>>>>
>>>>>>>>>>>     Cloning Spark 1.2.0 from github seems to have worked!
>>>>>>>>>>>     I can install the Spark examples afterwards.
>>>>>>>>>>>
>>>>>>>>>>>     Thanks for all your help!
>>>>>>>>>>>
>>>>>>>>>>>     Yes - Andrew and Angie both say 'hi'  :-)
>>>>>>>>>>>
>>>>>>>>>>>     Best Regards,
>>>>>>>>>>>
>>>>>>>>>>> Ken
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 28 January 2015 at 16:43, Samuel Cozannet <
>>>>>>>>>>> samuel.cozannet at canonical.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hey Ken,
>>>>>>>>>>>>
>>>>>>>>>>>> So I had a closer look to your Spark problem and found out what
>>>>>>>>>>>> went wrong.
>>>>>>>>>>>>
>>>>>>>>>>>> The charm available on the charmstore is trying to download
>>>>>>>>>>>> Spark 1.0.2, and the versions available on the Apache website are 1.1.0,
>>>>>>>>>>>> 1.1.1 and 1.2.0.
>>>>>>>>>>>>
>>>>>>>>>>>> There is another version of the charm available on GitHub that
>>>>>>>>>>>> actually will deploy 1.2.0
>>>>>>>>>>>>
>>>>>>>>>>>> 1. On your computer, the below folders & get there:
>>>>>>>>>>>>
>>>>>>>>>>>> cd ~
>>>>>>>>>>>> mkdir charms
>>>>>>>>>>>> mkdir charms/trusty
>>>>>>>>>>>> cd charms/trusty
>>>>>>>>>>>>
>>>>>>>>>>>> 2. Branch the Spark charm.
>>>>>>>>>>>>
>>>>>>>>>>>> git clone https://github.com/Archethought/spark-charm spark
>>>>>>>>>>>>
>>>>>>>>>>>> 3. Deploy Spark from local repository
>>>>>>>>>>>>
>>>>>>>>>>>> juju deploy --repository=~/charms local:trusty/spark
>>>>>>>>>>>> spark-master
>>>>>>>>>>>> juju deploy --repository=~/charms local:trusty/spark spark-slave
>>>>>>>>>>>> juju add-relation spark-master:master spark-slave:slave
>>>>>>>>>>>>
>>>>>>>>>>>> Worked on AWS for me just minutes ago. Let me know how it goes
>>>>>>>>>>>> for you. Note that this version of the charm does NOT install the Spark
>>>>>>>>>>>> examples. The files are present though, so you'll find them in
>>>>>>>>>>>> /var/lib/juju/agents/unit-spark-master-0/charm/files/archive
>>>>>>>>>>>>
>>>>>>>>>>>> Hope that helps...
>>>>>>>>>>>> Let me know if it works for you!
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Sam
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Samuel
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Samuel Cozannet
>>>>>>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>>>>>>> Changing the Future of Cloud
>>>>>>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>>>>>>> samuel.cozannet at canonical.com
>>>>>>>>>>>> mob: +33 616 702 389
>>>>>>>>>>>> skype: samnco
>>>>>>>>>>>> Twitter: @SaMnCo_23
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jan 28, 2015 at 4:44 PM, Ken Williams <ken.w at theasi.co>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm completely new to juju so any help is appreciated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm trying to create a hadoop/analytics-type platform.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've managed to install the 'data-analytics-with-sql-like'
>>>>>>>>>>>>> bundle
>>>>>>>>>>>>> (using this command)
>>>>>>>>>>>>>
>>>>>>>>>>>>>     juju quickstart
>>>>>>>>>>>>> bundle:data-analytics-with-sql-like/data-analytics-with-sql-like
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is very impressive, and gives me virtually everything
>>>>>>>>>>>>> that I want
>>>>>>>>>>>>> (hadoop, hive, etc) - but I also need Spark.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The Spark charm (
>>>>>>>>>>>>> http://manage.jujucharms.com/~asanjar/trusty/spark)
>>>>>>>>>>>>> and bundle (
>>>>>>>>>>>>> http://manage.jujucharms.com/bundle/~asanjar/spark/spark-cluster
>>>>>>>>>>>>> )
>>>>>>>>>>>>> however do not seem stable or available and I can't figure out
>>>>>>>>>>>>> how to install them.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Should I just download and install the Spark tar-ball on the
>>>>>>>>>>>>> nodes
>>>>>>>>>>>>> in my AWS cluster, or is there a better way to do this ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ken
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Juju mailing list
>>>>>>>>>>>>> Juju at lists.ubuntu.com
>>>>>>>>>>>>> Modify settings or unsubscribe at:
>>>>>>>>>>>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Juju mailing list
>>>>>>>>> Juju at lists.ubuntu.com
>>>>>>>>> Modify settings or unsubscribe at:
>>>>>>>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju/attachments/20150130/c210f523/attachment.html>