How best to install Spark?

Fri Jan 30 12:09:04 UTC 2015

Hi Sam,

    I understand what you are saying but when I try to add the 2 relations
I get this error,

root at adminuser-VirtualBox:~# juju add-relation
yarn-hdfs-master:resourcemanager spark-master:master
ERROR no relations found
root at adminuser-VirtualBox:~# juju add-relation yarn-hdfs-master:namenode
spark-master:master
ERROR no relations found

  Am I adding the relations right ?

  Attached is my 'juju status' file.

  Thanks for all your help,

Ken

On 30 January 2015 at 11:16, Samuel Cozannet <samuel.cozannet at canonical.com>
wrote:

> Hey Ken,
>
> Yes, you need to create the relationship between the 2 entities to they
> know about each other.
>
> Looking at the list of hooks for the charm
> <https://github.com/Archethought/spark-charm/tree/master/hooks> you can
> see there are 2 hooks named namenode-relation-changed
> <https://github.com/Archethought/spark-charm/blob/master/hooks/namenode-relation-changed>
>  and resourcemanager-relation-changed
> <https://github.com/Archethought/spark-charm/blob/master/hooks/resourcemanager-relation-changed> which
> are related to YARN/Hadoop.
> Looking deeper in the code, you'll notice they reference a function found
> in bdutils.py called "setHadoopEnvVar()", which based on its name should
> set the HADOOP_CONF_DIR.
>
> There are 2 relations, so add both of them.
>
> Note that I didn't test this myself, but I expect this should fix the
> problem. If it doesn't please come back to us...
>
> Thanks!
> Sam
>
>
> Best,
> Samuel
>
> --
> Samuel Cozannet
> Cloud, Big Data and IoT Strategy Team
> Business Development - Cloud and ISV Ecosystem
> Changing the Future of Cloud
> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
> Juju <https://jujucharms.com>
> samuel.cozannet at canonical.com
> mob: +33 616 702 389
> skype: samnco
> Twitter: @SaMnCo_23
>
> On Fri, Jan 30, 2015 at 11:51 AM, Ken Williams <ken.w at theasi.co> wrote:
>
>>
>> Thanks, Kapil - this works :-)
>>
>> I can now run the SparkPi example successfully.
>> root at ip-172-31-60-53:~# spark-submit --class
>> org.apache.spark.examples.SparkPi /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>> Spark assembly has been built with Hive, including Datanucleus jars on
>> classpath
>> 15/01/30 10:29:33 WARN NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> Pi is roughly 3.14318
>>
>> root at ip-172-31-60-53:~#
>>
>> I'm now trying to run the same example with the spark-submit '--master'
>> option set to either 'yarn-cluster' or 'yarn-client'
>> but I keep getting the same error :
>>
>> root at ip-172-31-60-53:~# spark-submit --class
>> org.apache.spark.examples.SparkPi     --master yarn-client
>> --num-executors 3     --driver-memory 1g     --executor-memory 1g
>> --executor-cores 1     --queue thequeue     lib/spark-examples*.jar     10
>> Spark assembly has been built with Hive, including Datanucleus jars on
>> classpath
>> Exception in thread "main" java.lang.Exception: When running with master
>> 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the
>> environment.
>>
>> But on my spark-master/0 machine there is no /etc/hadoop/conf directory.
>> So what should the HADOOP_CONF_DIR or YARN_CONF_DIR value be ?
>> Do I need to add a juju relation between spark-master and ...
>> yarn-hdfs-master to make them aware of each other ?
>>
>> Thanks for any help,
>>
>> Ken
>>
>>
>>
>>
>>
>> On 28 January 2015 at 19:32, Kapil Thangavelu <
>> kapil.thangavelu at canonical.com> wrote:
>>
>>>
>>>
>>> On Wed, Jan 28, 2015 at 1:54 PM, Ken Williams <ken.w at theasi.co> wrote:
>>>
>>>>
>>>> Hi Sam/Amir,
>>>>
>>>>     I've been able to 'juju ssh spark-master/0' and I successfully ran
>>>> the two
>>>> simple examples for pyspark and spark-shell,
>>>>
>>>>     ./bin/pyspark
>>>>     >>> sc.parallelize(range(1000)).count()
>>>>     1000
>>>>
>>>>     ./bin/spark-shell
>>>>      scala> sc.parallelize(1 to 1000).count()
>>>>     1000
>>>>
>>>>
>>>> Now I want to run some of the spark examples in the spark-exampes*.jar
>>>> file, which I have on my local machine. How do I copy the jar file from
>>>> my local machine to the AWS machine ?
>>>>
>>>> I have tried 'scp' and 'juju scp' from the local command-line but both
>>>> fail (below),
>>>>
>>>> root at adminuser:~# scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>>> ubuntu at ip-172-31-59:/tmp
>>>> ssh: Could not resolve hostname ip-172-31-59: Name or service not known
>>>> lost connection
>>>> root at adminuser:~# juju scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>>> ubuntu at ip-172-31-59:/tmp
>>>> ERROR exit status 1 (nc: getaddrinfo: Name or service not known)
>>>>
>>>> Any ideas ?
>>>>
>>>
>>> juju scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar spark-master/0:/tmp
>>>
>>>>
>>>> Ken
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 28 January 2015 at 17:29, Samuel Cozannet <
>>>> samuel.cozannet at canonical.com> wrote:
>>>>
>>>>> Glad it worked!
>>>>>
>>>>> I'll make a merge request to the upstream so that it works natively
>>>>> from the store asap.
>>>>>
>>>>> Thanks for catching that!
>>>>> Samuel
>>>>>
>>>>> Best,
>>>>> Samuel
>>>>>
>>>>> --
>>>>> Samuel Cozannet
>>>>> Cloud, Big Data and IoT Strategy Team
>>>>> Business Development - Cloud and ISV Ecosystem
>>>>> Changing the Future of Cloud
>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>>>>> Juju <https://jujucharms.com>
>>>>> samuel.cozannet at canonical.com
>>>>> mob: +33 616 702 389
>>>>> skype: samnco
>>>>> Twitter: @SaMnCo_23
>>>>>
>>>>> On Wed, Jan 28, 2015 at 6:15 PM, Ken Williams <ken.w at theasi.co> wrote:
>>>>>
>>>>>>
>>>>>> Hi Sam (and Maarten),
>>>>>>
>>>>>>     Cloning Spark 1.2.0 from github seems to have worked!
>>>>>>     I can install the Spark examples afterwards.
>>>>>>
>>>>>>     Thanks for all your help!
>>>>>>
>>>>>>     Yes - Andrew and Angie both say 'hi'  :-)
>>>>>>
>>>>>>     Best Regards,
>>>>>>
>>>>>> Ken
>>>>>>
>>>>>>
>>>>>> On 28 January 2015 at 16:43, Samuel Cozannet <
>>>>>> samuel.cozannet at canonical.com> wrote:
>>>>>>
>>>>>>> Hey Ken,
>>>>>>>
>>>>>>> So I had a closer look to your Spark problem and found out what went
>>>>>>> wrong.
>>>>>>>
>>>>>>> The charm available on the charmstore is trying to download Spark
>>>>>>> 1.0.2, and the versions available on the Apache website are 1.1.0, 1.1.1
>>>>>>> and 1.2.0.
>>>>>>>
>>>>>>> There is another version of the charm available on GitHub that
>>>>>>> actually will deploy 1.2.0
>>>>>>>
>>>>>>> 1. On your computer, the below folders & get there:
>>>>>>>
>>>>>>> cd ~
>>>>>>> mkdir charms
>>>>>>> mkdir charms/trusty
>>>>>>> cd charms/trusty
>>>>>>>
>>>>>>> 2. Branch the Spark charm.
>>>>>>>
>>>>>>> git clone https://github.com/Archethought/spark-charm spark
>>>>>>>
>>>>>>> 3. Deploy Spark from local repository
>>>>>>>
>>>>>>> juju deploy --repository=~/charms local:trusty/spark spark-master
>>>>>>> juju deploy --repository=~/charms local:trusty/spark spark-slave
>>>>>>> juju add-relation spark-master:master spark-slave:slave
>>>>>>>
>>>>>>> Worked on AWS for me just minutes ago. Let me know how it goes for
>>>>>>> you. Note that this version of the charm does NOT install the Spark
>>>>>>> examples. The files are present though, so you'll find them in
>>>>>>> /var/lib/juju/agents/unit-spark-master-0/charm/files/archive
>>>>>>>
>>>>>>> Hope that helps...
>>>>>>> Let me know if it works for you!
>>>>>>>
>>>>>>> Best,
>>>>>>> Sam
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>> Samuel
>>>>>>>
>>>>>>> --
>>>>>>> Samuel Cozannet
>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>> Changing the Future of Cloud
>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>> samuel.cozannet at canonical.com
>>>>>>> mob: +33 616 702 389
>>>>>>> skype: samnco
>>>>>>> Twitter: @SaMnCo_23
>>>>>>>
>>>>>>> On Wed, Jan 28, 2015 at 4:44 PM, Ken Williams <ken.w at theasi.co>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi folks,
>>>>>>>>
>>>>>>>> I'm completely new to juju so any help is appreciated.
>>>>>>>>
>>>>>>>> I'm trying to create a hadoop/analytics-type platform.
>>>>>>>>
>>>>>>>> I've managed to install the 'data-analytics-with-sql-like' bundle
>>>>>>>> (using this command)
>>>>>>>>
>>>>>>>>     juju quickstart
>>>>>>>> bundle:data-analytics-with-sql-like/data-analytics-with-sql-like
>>>>>>>>
>>>>>>>> This is very impressive, and gives me virtually everything that I
>>>>>>>> want
>>>>>>>> (hadoop, hive, etc) - but I also need Spark.
>>>>>>>>
>>>>>>>> The Spark charm (http://manage.jujucharms.com/~asanjar/trusty/spark
>>>>>>>> )
>>>>>>>> and bundle (
>>>>>>>> http://manage.jujucharms.com/bundle/~asanjar/spark/spark-cluster)
>>>>>>>> however do not seem stable or available and I can't figure out how
>>>>>>>> to install them.
>>>>>>>>
>>>>>>>> Should I just download and install the Spark tar-ball on the nodes
>>>>>>>> in my AWS cluster, or is there a better way to do this ?
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>>
>>>>>>>> Ken
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Juju mailing list
>>>>>>>> Juju at lists.ubuntu.com
>>>>>>>> Modify settings or unsubscribe at:
>>>>>>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Juju mailing list
>>>> Juju at lists.ubuntu.com
>>>> Modify settings or unsubscribe at:
>>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju/attachments/20150130/78294075/attachment.html>
-------------- next part --------------
root at adminuser-VirtualBox:~# juju status
environment: amazon
machines:
  "0":
    agent-state: started
    agent-version: 1.21.1
    dns-name: 54.152.65.119
    instance-id: i-35618fcf
    instance-state: running
    series: trusty
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M
    state-server-member-status: has-vote
  "1":
    agent-state: started
    agent-version: 1.21.1
    dns-name: 54.152.169.101
    instance-id: i-548675bb
    instance-state: running
    series: trusty
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M
  "2":
    agent-state: started
    agent-version: 1.21.1
    dns-name: 54.152.218.10
    instance-id: i-8f7aed7e
    instance-state: running
    series: trusty
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M
  "3":
    agent-state: started
    agent-version: 1.21.1
    dns-name: 54.152.218.70
    instance-id: i-69789693
    instance-state: running
    series: trusty
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M
  "4":
    agent-state: started
    agent-version: 1.21.1
    dns-name: 54.152.35.98
    instance-id: i-478675a8
    instance-state: running
    series: trusty
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M
  "5":
    agent-state: started
    agent-version: 1.21.1
    dns-name: 54.152.0.48
    instance-id: i-2163f4d0
    instance-state: running
    series: trusty
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M
  "6":
    agent-state: started
    agent-version: 1.21.1
    dns-name: 54.152.95.64
    instance-id: i-ca759b30
    instance-state: running
    series: trusty
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M
services:
  compute-node:
    charm: cs:trusty/hdp-hadoop-4
    exposed: false
    relations:
      datanode:
      - yarn-hdfs-master
      nodemanager:
      - yarn-hdfs-master
    units:
      compute-node/0:
        agent-state: started
        agent-version: 1.21.1
        machine: "1"
        public-address: 54.152.169.101
  hdphive:
    charm: cs:trusty/hdp-hive-2
    exposed: false
    relations:
      db:
      - mysql
      namenode:
      - yarn-hdfs-master
      resourcemanager:
      - yarn-hdfs-master
    units:
      hdphive/0:
        agent-state: started
        agent-version: 1.21.1
        machine: "2"
        open-ports:
        - 10000/tcp
        public-address: 54.152.218.10
  juju-gui:
    charm: cs:trusty/juju-gui-17
    exposed: true
    units:
      juju-gui/0:
        agent-state: started
        agent-version: 1.21.1
        machine: "0"
        open-ports:
        - 80/tcp
        - 443/tcp
        public-address: 54.152.65.119
  mysql:
    charm: cs:trusty/mysql-4
    exposed: false
    relations:
      cluster:
      - mysql
      db:
      - hdphive
    units:
      mysql/0:
        agent-state: started
        agent-version: 1.21.1
        machine: "3"
        public-address: 54.152.218.70
  spark-master:
    charm: local:trusty/spark-0
    exposed: false
    relations:
      master:
      - spark-slave
    units:
      spark-master/0:
        agent-state: started
        agent-version: 1.21.1
        machine: "5"
        open-ports:
        - 4040/tcp
        - 7077/tcp
        - 8080/tcp
        - 18080/tcp
        public-address: 54.152.0.48
  spark-slave:
    charm: local:trusty/spark-1
    exposed: false
    relations:
      slave:
      - spark-master
    units:
      spark-slave/0:
        agent-state: started
        agent-version: 1.21.1
        machine: "6"
        open-ports:
        - 8081/tcp
        public-address: 54.152.95.64
  yarn-hdfs-master:
    charm: cs:trusty/hdp-hadoop-4
    exposed: false
    relations:
      namenode:
      - compute-node
      - hdphive
      resourcemanager:
      - compute-node
      - hdphive
    units:
      yarn-hdfs-master/0:
        agent-state: error
        agent-state-info: 'hook failed: "namenode-relation-joined" for compute-node:datanode'
        agent-version: 1.21.1
        machine: "4"
        open-ports:
        - 8010/tcp
        - 8020/tcp
        - 8480/tcp
        - 50070/tcp
        - 50075/tcp
        - 50470/tcp
        public-address: 54.152.35.98