%pyspark in Zeppelin: No module named pyspark error

Wed Jul 13 17:47:34 UTC 2016

Hi Gregory,

Here is what I have so far.

When in yarn-client mode pyspark jobs fail with "pyspark module not
present": http://pastebin.ubuntu.com/19266710/
Most probably this is because the execution end-nodes are not spark nodes,
they are just hadoop nodes without pyspark installed.
You will need to  run the job you have in a spark cluster setup in
standalone execution mode scaled to match your needs.
Relating spark to the hadoop-plugin will give you access to HDFS.

In this setup you will need to manually go and add the following line:
"spark.driver.extraClassPath
/usr/lib/hadoop/share/hadoop/common/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar"
inside /etc/spark/conf/spark-defaults.conf
We are working on a patch to remove this extra manual step.

A couple of asks from our side:
- Would it be possible to share with us the job you are running so that we
verify we have addressed your use-case?
- You mentioned problems with using the spark charm that is based on Apache
Bigtop. Would it be possible to
provide us with more info on what is not working there?

We would like to thank you for your feedback as it allows us to improve our
work.

Thanks,
Konstantinos

On Tue, Jul 12, 2016 at 9:55 PM, Kevin Monroe <kevin.monroe at canonical.com>
wrote:
>
> I think i accidentally discarded kostas' message.  Sorry about that!
>
> Gregory, Kostas is working on reproducing your env.. We should know more
in the next day or so.
>
> ---------- Forwarded message ----------
> From: Konstantinos Tsakalozos <kos.tsakalozos at canonical.com>
> Date: Tue, Jul 12, 2016 at 10:39 AM
> Subject: Re: %pyspark in Zeppelin: No module named pyspark error
> To: Gregory Van Seghbroeck <gregory.vanseghbroeck at intec.ugent.be>
> Cc: Kevin Monroe <kevin.monroe at canonical.com>, bigdata at lists.ubuntu.com
>
>
> Hi Gregory,
>
> Thank you for the info you provided. I will need some time to setup the
deployment you just described and try to reproduce the error. I guess any
pyspark job should have the same effect.
>
> Thanks,
> Konstantinos
>
> On Tue, Jul 12, 2016 at 11:31 AM, Gregory Van Seghbroeck <
gregory.vanseghbroeck at intec.ugent.be> wrote:
>>
>> Hi Kevin,
>>
>>
>>
>> Thanks for the response! Really like the juju and canonical community.
>>
>>
>>
>> I can tell you the juju version. This is 1.25.3.
>>
>> The status will be a problem, since I removed most of the services. This
being said, I don’t think we are already using the bigtop spark charms, so
this might be the problem. Here a list of the services I deployed before:
>>
>> -          cs:trusty/apache-hadoop-namenode-2
>>
>> -          cs:trusty/apache-hadoop-resourcemanager-3
>>
>> -          cs:trusty/apache-hadoop-slave-2
>>
>> -          cs:trusty/apache-hadoop-plugin-14
>>
>> -          cs:trusty/apache-spark-9
>>
>> -          cs:trusty/apache-zeppelin-7
>>
>>
>>
>> The reason why we don’t use the bigtop charms yet, is that we see
problems with the hostnames on the containers. Some of the relations use
hostnames, but these cannot be resolved. So I have to add the mapping
between IPs and hostnames manually to the /etc/hosts file.
>>
>>
>>
>> The image I pasted in, showing our environment, was a screenshot of the
Zeppelin environment. These parameters looked oké from what I could find
online.
>>
>>
>>
>> Kind Regards,
>>
>> Gregory
>>
>>
>>
>>
>>
>> From: Kevin Monroe [mailto:kevin.monroe at canonical.com]
>> Sent: Monday, July 11, 2016 7:20 PM
>> To: Gregory Van Seghbroeck <gregory.vanseghbroeck at intec.ugent.be>
>> Cc: bigdata at lists.ubuntu.com
>> Subject: Re: %pyspark in Zeppelin: No module named pyspark error
>>
>>
>>
>> Hi Gregory,
>>
>>
>>
>> I wasn't able to see your data after "Our environment is set up as
follows:"
>>
>>
>>
>> <big black box for me>
>>
>>
>>
>> Will you reply with the output (or a pastebin link) with the following:
>>
>>
>>
>> juju version
>>
>> juju status --format=tabular
>>
>>
>>
>> Kostas has found a potential zeppelin issue in the bigtop charms where
the bigtop spark offering may be too old.  Knowing your juju and charm
versions will help me know if your issue is related.
>>
>>
>>
>> Thanks!
>>
>> -Kevin
>>
>>
>>
>> On Mon, Jul 11, 2016 at 7:36 AM, Gregory Van Seghbroeck <
gregory.vanseghbroeck at intec.ugent.be> wrote:
>>
>> Dear,
>>
>>
>>
>> We have deployed Zeppelin with juju and connected it to Spark. According
to juju everything went well. We can see this is indeed the case; when we
try to execute one of the Zeppelin tutorials we see some nice graphs.
However, if we try to use the python interpreter (%pyspark) we always get
an error.
>>
>>
>> Kind Regards,
>>
>> Gregory
>>
>>
>> --
>> Bigdata mailing list
>> Bigdata at lists.ubuntu.com
>> Modify settings or unsubscribe at:
https://lists.ubuntu.com/mailman/listinfo/bigdata
>>
>>
>>
>>
>> --
>> Bigdata mailing list
>> Bigdata at lists.ubuntu.com
>> Modify settings or unsubscribe at:
https://lists.ubuntu.com/mailman/listinfo/bigdata
>>
>
>
>
> --
> Bigdata mailing list
> Bigdata at lists.ubuntu.com
> Modify settings or unsubscribe at:
https://lists.ubuntu.com/mailman/listinfo/bigdata
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/bigdata/attachments/20160713/1de39271/attachment.html>