How best to install Spark?

Fri Jan 30 18:01:02 UTC 2015

Hi Sam,

    Attached is my bundles.yaml file.

    Also, there is no file 'directories.sh' on my spark-master/0 machine
(see below),

ubuntu at ip-172-31-54-245:~$ ls -l /etc/profile.d/
total 12
-rw-r--r-- 1 root root 1559 Jul 29  2014 Z97-byobu.sh
-rwxr-xr-x 1 root root 2691 Oct  6 13:19 Z99-cloud-locale-test.sh
-rw-r--r-- 1 root root  663 Apr  7  2014 bash_completion.sh
ubuntu at ip-172-31-54-245:~$

    Many thanks again your help,

Ken

On 30 January 2015 at 15:45, Samuel Cozannet <samuel.cozannet at canonical.com>
wrote:

> Hey,
>
> can you send the bundle you're using (in the GUI, bottom right, "export"
> button should give you a bundles.yaml file, please send that to me, so I
> can bootstrap the same environment as you are playing with.
>
> also
> * can you let me know if you have a file /etc/profile.d/directories.sh?
> * if yes, can you execute it from your command line, then do the spark
> command again, and let me know?
>
> Thx,
> Sam
>
>
>
>
>
>
>
>
>
> Best,
> Samuel
>
> --
> Samuel Cozannet
> Cloud, Big Data and IoT Strategy Team
> Business Development - Cloud and ISV Ecosystem
> Changing the Future of Cloud
> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
> Juju <https://jujucharms.com>
> samuel.cozannet at canonical.com
> mob: +33 616 702 389
> skype: samnco
> Twitter: @SaMnCo_23
>
> On Fri, Jan 30, 2015 at 3:46 PM, Ken Williams <ken.w at theasi.co> wrote:
>
>> Ok - I have been able to add the relation using this,
>>
>>                 juju add-relation yarn-hdfs-master:resourcemanager
>> spark-master
>>
>> But I still cannot see a /etc/hadoop/conf directory on the spark-master
>> machine
>> so I still get the same error about HADOOP_CONF_DIR and YARN_CONF_DIR
>> (below),
>>
>>
>> root at ip-172-31-60-53:~# spark-submit --class
>> org.apache.spark.examples.SparkPi     --master yarn-client
>> --num-executors 3     --driver-memory 1g     --executor-memory 1g
>> --executor-cores 1     --queue thequeue     lib/spark-examples*.jar     10
>> Spark assembly has been built with Hive, including Datanucleus jars on
>> classpath
>> Exception in thread "main" java.lang.Exception: When running with master
>> 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the
>> environment.
>> at
>> org.apache.spark.deploy.SparkSubmitArguments.checkRequiredArguments(SparkSubmitArguments.scala:177)
>> at
>> org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:81)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:70)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> root at ip-172-31-60-53:~#
>>
>> Should there be a /etc/hadoop/conf directory ?
>>
>> Thanks for any help,
>>
>> Ken
>>
>>
>> On 30 January 2015 at 12:59, Samuel Cozannet <
>> samuel.cozannet at canonical.com> wrote:
>>
>>> Have you tried without ':master":
>>>
>>> juju add-relation yarn-hdfs-master:resourcemanager spark-master
>>>
>>> I think Spark master consumes the relationship but doesn't have to
>>> expose its master relationship.
>>>
>>> Rule of thumb, when a relation is non ambiguous on one of its ends,
>>> there is no requirement to specify it when adding it.
>>>
>>> Another option if this doesn't work is to use the GUI to create the
>>> relation. It will give you a dropdown of available relationships between
>>> entities.
>>>
>>> Let me know how it goes,
>>> Thx,
>>> Sam
>>>
>>>
>>> Best,
>>> Samuel
>>>
>>> --
>>> Samuel Cozannet
>>> Cloud, Big Data and IoT Strategy Team
>>> Business Development - Cloud and ISV Ecosystem
>>> Changing the Future of Cloud
>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>>> Juju <https://jujucharms.com>
>>> samuel.cozannet at canonical.com
>>> mob: +33 616 702 389
>>> skype: samnco
>>> Twitter: @SaMnCo_23
>>>
>>> On Fri, Jan 30, 2015 at 1:09 PM, Ken Williams <ken.w at theasi.co> wrote:
>>>
>>>> Hi Sam,
>>>>
>>>>     I understand what you are saying but when I try to add the 2
>>>> relations I get this error,
>>>>
>>>> root at adminuser-VirtualBox:~# juju add-relation
>>>> yarn-hdfs-master:resourcemanager spark-master:master
>>>> ERROR no relations found
>>>> root at adminuser-VirtualBox:~# juju add-relation
>>>> yarn-hdfs-master:namenode spark-master:master
>>>> ERROR no relations found
>>>>
>>>>   Am I adding the relations right ?
>>>>
>>>>   Attached is my 'juju status' file.
>>>>
>>>>   Thanks for all your help,
>>>>
>>>> Ken
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 30 January 2015 at 11:16, Samuel Cozannet <
>>>> samuel.cozannet at canonical.com> wrote:
>>>>
>>>>> Hey Ken,
>>>>>
>>>>> Yes, you need to create the relationship between the 2 entities to
>>>>> they know about each other.
>>>>>
>>>>> Looking at the list of hooks for the charm
>>>>> <https://github.com/Archethought/spark-charm/tree/master/hooks> you
>>>>> can see there are 2 hooks named namenode-relation-changed
>>>>> <https://github.com/Archethought/spark-charm/blob/master/hooks/namenode-relation-changed>
>>>>>  and resourcemanager-relation-changed
>>>>> <https://github.com/Archethought/spark-charm/blob/master/hooks/resourcemanager-relation-changed> which
>>>>> are related to YARN/Hadoop.
>>>>> Looking deeper in the code, you'll notice they reference a function
>>>>> found in bdutils.py called "setHadoopEnvVar()", which based on its name
>>>>> should set the HADOOP_CONF_DIR.
>>>>>
>>>>> There are 2 relations, so add both of them.
>>>>>
>>>>> Note that I didn't test this myself, but I expect this should fix the
>>>>> problem. If it doesn't please come back to us...
>>>>>
>>>>> Thanks!
>>>>> Sam
>>>>>
>>>>>
>>>>> Best,
>>>>> Samuel
>>>>>
>>>>> --
>>>>> Samuel Cozannet
>>>>> Cloud, Big Data and IoT Strategy Team
>>>>> Business Development - Cloud and ISV Ecosystem
>>>>> Changing the Future of Cloud
>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>>>>> Juju <https://jujucharms.com>
>>>>> samuel.cozannet at canonical.com
>>>>> mob: +33 616 702 389
>>>>> skype: samnco
>>>>> Twitter: @SaMnCo_23
>>>>>
>>>>> On Fri, Jan 30, 2015 at 11:51 AM, Ken Williams <ken.w at theasi.co>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Thanks, Kapil - this works :-)
>>>>>>
>>>>>> I can now run the SparkPi example successfully.
>>>>>> root at ip-172-31-60-53:~# spark-submit --class
>>>>>> org.apache.spark.examples.SparkPi /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>>>>> Spark assembly has been built with Hive, including Datanucleus jars
>>>>>> on classpath
>>>>>> 15/01/30 10:29:33 WARN NativeCodeLoader: Unable to load native-hadoop
>>>>>> library for your platform... using builtin-java classes where applicable
>>>>>> Pi is roughly 3.14318
>>>>>>
>>>>>> root at ip-172-31-60-53:~#
>>>>>>
>>>>>> I'm now trying to run the same example with the spark-submit
>>>>>> '--master' option set to either 'yarn-cluster' or 'yarn-client'
>>>>>> but I keep getting the same error :
>>>>>>
>>>>>> root at ip-172-31-60-53:~# spark-submit --class
>>>>>> org.apache.spark.examples.SparkPi     --master yarn-client
>>>>>> --num-executors 3     --driver-memory 1g     --executor-memory 1g
>>>>>> --executor-cores 1     --queue thequeue     lib/spark-examples*.jar     10
>>>>>> Spark assembly has been built with Hive, including Datanucleus jars
>>>>>> on classpath
>>>>>> Exception in thread "main" java.lang.Exception: When running with
>>>>>> master 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in
>>>>>> the environment.
>>>>>>
>>>>>> But on my spark-master/0 machine there is no /etc/hadoop/conf
>>>>>> directory.
>>>>>> So what should the HADOOP_CONF_DIR or YARN_CONF_DIR value be ?
>>>>>> Do I need to add a juju relation between spark-master and ...
>>>>>> yarn-hdfs-master to make them aware of each other ?
>>>>>>
>>>>>> Thanks for any help,
>>>>>>
>>>>>> Ken
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 28 January 2015 at 19:32, Kapil Thangavelu <
>>>>>> kapil.thangavelu at canonical.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 28, 2015 at 1:54 PM, Ken Williams <ken.w at theasi.co>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi Sam/Amir,
>>>>>>>>
>>>>>>>>     I've been able to 'juju ssh spark-master/0' and I successfully
>>>>>>>> ran the two
>>>>>>>> simple examples for pyspark and spark-shell,
>>>>>>>>
>>>>>>>>     ./bin/pyspark
>>>>>>>>     >>> sc.parallelize(range(1000)).count()
>>>>>>>>     1000
>>>>>>>>
>>>>>>>>     ./bin/spark-shell
>>>>>>>>      scala> sc.parallelize(1 to 1000).count()
>>>>>>>>     1000
>>>>>>>>
>>>>>>>>
>>>>>>>> Now I want to run some of the spark examples in the
>>>>>>>> spark-exampes*.jar
>>>>>>>> file, which I have on my local machine. How do I copy the jar file
>>>>>>>> from
>>>>>>>> my local machine to the AWS machine ?
>>>>>>>>
>>>>>>>> I have tried 'scp' and 'juju scp' from the local command-line but
>>>>>>>> both fail (below),
>>>>>>>>
>>>>>>>> root at adminuser:~# scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>>>>>>> ubuntu at ip-172-31-59:/tmp
>>>>>>>> ssh: Could not resolve hostname ip-172-31-59: Name or service not
>>>>>>>> known
>>>>>>>> lost connection
>>>>>>>> root at adminuser:~# juju scp
>>>>>>>> /tmp/spark-examples-1.2.0-hadoop2.4.0.jar ubuntu at ip-172-31-59:/tmp
>>>>>>>> ERROR exit status 1 (nc: getaddrinfo: Name or service not known)
>>>>>>>>
>>>>>>>> Any ideas ?
>>>>>>>>
>>>>>>>
>>>>>>> juju scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>>>>>> spark-master/0:/tmp
>>>>>>>
>>>>>>>>
>>>>>>>> Ken
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 28 January 2015 at 17:29, Samuel Cozannet <
>>>>>>>> samuel.cozannet at canonical.com> wrote:
>>>>>>>>
>>>>>>>>> Glad it worked!
>>>>>>>>>
>>>>>>>>> I'll make a merge request to the upstream so that it works
>>>>>>>>> natively from the store asap.
>>>>>>>>>
>>>>>>>>> Thanks for catching that!
>>>>>>>>> Samuel
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Samuel
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Samuel Cozannet
>>>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>>>> Changing the Future of Cloud
>>>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>>>> samuel.cozannet at canonical.com
>>>>>>>>> mob: +33 616 702 389
>>>>>>>>> skype: samnco
>>>>>>>>> Twitter: @SaMnCo_23
>>>>>>>>>
>>>>>>>>> On Wed, Jan 28, 2015 at 6:15 PM, Ken Williams <ken.w at theasi.co>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Sam (and Maarten),
>>>>>>>>>>
>>>>>>>>>>     Cloning Spark 1.2.0 from github seems to have worked!
>>>>>>>>>>     I can install the Spark examples afterwards.
>>>>>>>>>>
>>>>>>>>>>     Thanks for all your help!
>>>>>>>>>>
>>>>>>>>>>     Yes - Andrew and Angie both say 'hi'  :-)
>>>>>>>>>>
>>>>>>>>>>     Best Regards,
>>>>>>>>>>
>>>>>>>>>> Ken
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 28 January 2015 at 16:43, Samuel Cozannet <
>>>>>>>>>> samuel.cozannet at canonical.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey Ken,
>>>>>>>>>>>
>>>>>>>>>>> So I had a closer look to your Spark problem and found out what
>>>>>>>>>>> went wrong.
>>>>>>>>>>>
>>>>>>>>>>> The charm available on the charmstore is trying to download
>>>>>>>>>>> Spark 1.0.2, and the versions available on the Apache website are 1.1.0,
>>>>>>>>>>> 1.1.1 and 1.2.0.
>>>>>>>>>>>
>>>>>>>>>>> There is another version of the charm available on GitHub that
>>>>>>>>>>> actually will deploy 1.2.0
>>>>>>>>>>>
>>>>>>>>>>> 1. On your computer, the below folders & get there:
>>>>>>>>>>>
>>>>>>>>>>> cd ~
>>>>>>>>>>> mkdir charms
>>>>>>>>>>> mkdir charms/trusty
>>>>>>>>>>> cd charms/trusty
>>>>>>>>>>>
>>>>>>>>>>> 2. Branch the Spark charm.
>>>>>>>>>>>
>>>>>>>>>>> git clone https://github.com/Archethought/spark-charm spark
>>>>>>>>>>>
>>>>>>>>>>> 3. Deploy Spark from local repository
>>>>>>>>>>>
>>>>>>>>>>> juju deploy --repository=~/charms local:trusty/spark spark-master
>>>>>>>>>>> juju deploy --repository=~/charms local:trusty/spark spark-slave
>>>>>>>>>>> juju add-relation spark-master:master spark-slave:slave
>>>>>>>>>>>
>>>>>>>>>>> Worked on AWS for me just minutes ago. Let me know how it goes
>>>>>>>>>>> for you. Note that this version of the charm does NOT install the Spark
>>>>>>>>>>> examples. The files are present though, so you'll find them in
>>>>>>>>>>> /var/lib/juju/agents/unit-spark-master-0/charm/files/archive
>>>>>>>>>>>
>>>>>>>>>>> Hope that helps...
>>>>>>>>>>> Let me know if it works for you!
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Sam
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Samuel
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Samuel Cozannet
>>>>>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>>>>>> Changing the Future of Cloud
>>>>>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>>>>>> samuel.cozannet at canonical.com
>>>>>>>>>>> mob: +33 616 702 389
>>>>>>>>>>> skype: samnco
>>>>>>>>>>> Twitter: @SaMnCo_23
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jan 28, 2015 at 4:44 PM, Ken Williams <ken.w at theasi.co>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm completely new to juju so any help is appreciated.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm trying to create a hadoop/analytics-type platform.
>>>>>>>>>>>>
>>>>>>>>>>>> I've managed to install the 'data-analytics-with-sql-like'
>>>>>>>>>>>> bundle
>>>>>>>>>>>> (using this command)
>>>>>>>>>>>>
>>>>>>>>>>>>     juju quickstart
>>>>>>>>>>>> bundle:data-analytics-with-sql-like/data-analytics-with-sql-like
>>>>>>>>>>>>
>>>>>>>>>>>> This is very impressive, and gives me virtually everything that
>>>>>>>>>>>> I want
>>>>>>>>>>>> (hadoop, hive, etc) - but I also need Spark.
>>>>>>>>>>>>
>>>>>>>>>>>> The Spark charm (
>>>>>>>>>>>> http://manage.jujucharms.com/~asanjar/trusty/spark)
>>>>>>>>>>>> and bundle (
>>>>>>>>>>>> http://manage.jujucharms.com/bundle/~asanjar/spark/spark-cluster
>>>>>>>>>>>> )
>>>>>>>>>>>> however do not seem stable or available and I can't figure out
>>>>>>>>>>>> how to install them.
>>>>>>>>>>>>
>>>>>>>>>>>> Should I just download and install the Spark tar-ball on the
>>>>>>>>>>>> nodes
>>>>>>>>>>>> in my AWS cluster, or is there a better way to do this ?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>
>>>>>>>>>>>> Ken
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Juju mailing list
>>>>>>>>>>>> Juju at lists.ubuntu.com
>>>>>>>>>>>> Modify settings or unsubscribe at:
>>>>>>>>>>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Juju mailing list
>>>>>>>> Juju at lists.ubuntu.com
>>>>>>>> Modify settings or unsubscribe at:
>>>>>>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju/attachments/20150130/d2cadd0c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bundles.yaml
Type: application/octet-stream
Size: 1693 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/juju/attachments/20150130/d2cadd0c/attachment.obj>