%pyspark in Zeppelin: No module named pyspark error

Sat Jul 16 20:57:46 UTC 2016

Thanks

On Sat, Jul 16, 2016 at 6:50 PM, Merlijn Sebrechts <
merlijn.sebrechts at gmail.com> wrote:

> The issue with the Bigtop Charms is tracked here:
> https://github.com/juju-solutions/layer-apache-bigtop-base/issues/14
>
> @Cory was looking into a possible fix. Any update on this?
>
> 2016-07-14 17:35 GMT+02:00 Konstantinos Tsakalozos <
> kos.tsakalozos at canonical.com>:
>
>> Hi Gregory,
>>
>> Thank you for your feedback. We are looking forward on any more
>> information you can give us.
>>
>> In the mean time, you can use revision 80 of apache-spark charm on
>> bigdata-dev to verify that the patch for pyspark does indeed address
>> the issue you are seeing. You can deploy the revision 80 with:
>> "juju deploy cs:~bigdata-dev/apache-spark-80"
>>
>> Thanks,
>> Konstantinos
>>
>>
>> On Thu, Jul 14, 2016 at 5:33 PM, Gregory Van Seghbroeck
>> <gregory.vanseghbroeck at intec.ugent.be> wrote:
>> > Hi Konstantinos,
>> >
>> > Thanks a lot!! I'll give it a try after my holidays.
>> >
>> > I still have to answer your question about the bigtop charms. Here it
>> goes ... my apologies for being vague with versions and stuff, it`s from a
>> while back.
>> > What I did, was deploying a small HDFS setup using the big top charms.
>> We always set things up in LXC containers on bare metal servers. Management
>> of these bare metal servers is out of our hands, it is provided by our
>> Emulab system.
>> > Everything seemed to go fine, except the relations part. The resource
>> manager needs FQDN to set things up properly. Unfortunately, resolving the
>> FQDNs is something that fails. It has to do with how the physical system is
>> set up and how the networking is handled between the LXC containers. This
>> is something one of my colleagues (Merlijn Sebrechts, probably not a
>> stranger to you or at least not to the community) has created for us. My
>> workaround at that moment was to manually add all the FQDNs in the
>> /etc/hosts file. Sufficient at that time, but not workable in the long run.
>> So I asked my colleague if he could simply add this in the charms that
>> provide the networking, but he responded to me that something like this
>> actually should be handled in the big top charms, since the failing
>> relation is part of that charm.
>> > I probably should have gone directly to the authors of the big top
>> charms, but you were so helpful to also respond to this issue.
>> >
>> > If things are not clear, I'll try to reproduce this issue on our system
>> and will come back to you in a week or so.
>> >
>> > Kind regards and thanks again for your help.
>> > Gregory
>> >
>> > -----Original Message-----
>> > From: Konstantinos Tsakalozos [mailto:kos.tsakalozos at canonical.com]
>> > Sent: Thursday, July 14, 2016 3:40 PM
>> > To: Gregory Van Seghbroeck <gregory.vanseghbroeck at intec.ugent.be>
>> > Cc: bigdata at lists.ubuntu.com; Kevin Monroe <kevin.monroe at canonical.com>
>> > Subject: Re: %pyspark in Zeppelin: No module named pyspark error
>> >
>> > Hi Gregory,
>> >
>> > Done some more testing today and submitted a patch for review.
>> >
>> > The line:
>> > "spark.driver.extraClassPath
>> > /usr/lib/hadoop/share/hadoop/common/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar"
>> > will fix only spark-shell
>> > For pyspark the line to be added to spark-defaults.conf is slightly
>> different:
>> > "spark.jars
>> /usr/lib/hadoop/share/hadoop/common/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar"
>> >
>> > We have a patch under review
>> > https://github.com/juju-solutions/layer-apache-spark/pull/25 so that
>> you will not have to do any editing.
>> >
>> > Thanks,
>> > Konstantinos
>> >
>> >
>> >
>> > On Wed, Jul 13, 2016 at 8:47 PM, Konstantinos Tsakalozos <
>> kos.tsakalozos at canonical.com> wrote:
>> >> Hi Gregory,
>> >>
>> >> Here is what I have so far.
>> >>
>> >> When in yarn-client mode pyspark jobs fail with "pyspark module not
>> >> present": http://pastebin.ubuntu.com/19266710/
>> >> Most probably this is because the execution end-nodes are not spark
>> >> nodes, they are just hadoop nodes without pyspark installed.
>> >> You will need to  run the job you have in a spark cluster setup in
>> >> standalone execution mode scaled to match your needs.
>> >> Relating spark to the hadoop-plugin will give you access to HDFS.
>> >>
>> >> In this setup you will need to manually go and add the following line:
>> >> "spark.driver.extraClassPath
>> >> /usr/lib/hadoop/share/hadoop/common/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar"
>> >> inside /etc/spark/conf/spark-defaults.conf
>> >> We are working on a patch to remove this extra manual step.
>> >>
>> >> A couple of asks from our side:
>> >> - Would it be possible to share with us the job you are running so
>> >> that we verify we have addressed your use-case?
>> >> - You mentioned problems with using the spark charm that is based on
>> >> Apache Bigtop. Would it be possible to provide us with more info on
>> >> what is not working there?
>> >>
>> >> We would like to thank you for your feedback as it allows us to
>> >> improve our work.
>> >>
>> >> Thanks,
>> >> Konstantinos
>> >>
>> >>
>> >> On Tue, Jul 12, 2016 at 9:55 PM, Kevin Monroe
>> >> <kevin.monroe at canonical.com>
>> >> wrote:
>> >>>
>> >>> I think i accidentally discarded kostas' message.  Sorry about that!
>> >>>
>> >>> Gregory, Kostas is working on reproducing your env.. We should know
>> >>> more in the next day or so.
>> >>>
>> >>> ---------- Forwarded message ----------
>> >>> From: Konstantinos Tsakalozos <kos.tsakalozos at canonical.com>
>> >>> Date: Tue, Jul 12, 2016 at 10:39 AM
>> >>> Subject: Re: %pyspark in Zeppelin: No module named pyspark error
>> >>> To: Gregory Van Seghbroeck <gregory.vanseghbroeck at intec.ugent.be>
>> >>> Cc: Kevin Monroe <kevin.monroe at canonical.com>,
>> >>> bigdata at lists.ubuntu.com
>> >>>
>> >>>
>> >>> Hi Gregory,
>> >>>
>> >>> Thank you for the info you provided. I will need some time to setup
>> >>> the deployment you just described and try to reproduce the error. I
>> >>> guess any pyspark job should have the same effect.
>> >>>
>> >>> Thanks,
>> >>> Konstantinos
>> >>>
>> >>> On Tue, Jul 12, 2016 at 11:31 AM, Gregory Van Seghbroeck
>> >>> <gregory.vanseghbroeck at intec.ugent.be> wrote:
>> >>>>
>> >>>> Hi Kevin,
>> >>>>
>> >>>>
>> >>>>
>> >>>> Thanks for the response! Really like the juju and canonical
>> community.
>> >>>>
>> >>>>
>> >>>>
>> >>>> I can tell you the juju version. This is 1.25.3.
>> >>>>
>> >>>> The status will be a problem, since I removed most of the services.
>> >>>> This being said, I don’t think we are already using the bigtop spark
>> >>>> charms, so this might be the problem. Here a list of the services I
>> deployed before:
>> >>>>
>> >>>> -          cs:trusty/apache-hadoop-namenode-2
>> >>>>
>> >>>> -          cs:trusty/apache-hadoop-resourcemanager-3
>> >>>>
>> >>>> -          cs:trusty/apache-hadoop-slave-2
>> >>>>
>> >>>> -          cs:trusty/apache-hadoop-plugin-14
>> >>>>
>> >>>> -          cs:trusty/apache-spark-9
>> >>>>
>> >>>> -          cs:trusty/apache-zeppelin-7
>> >>>>
>> >>>>
>> >>>>
>> >>>> The reason why we don’t use the bigtop charms yet, is that we see
>> >>>> problems with the hostnames on the containers. Some of the relations
>> >>>> use hostnames, but these cannot be resolved. So I have to add the
>> >>>> mapping between IPs and hostnames manually to the /etc/hosts file.
>> >>>>
>> >>>>
>> >>>>
>> >>>> The image I pasted in, showing our environment, was a screenshot of
>> >>>> the Zeppelin environment. These parameters looked oké from what I
>> >>>> could find online.
>> >>>>
>> >>>>
>> >>>>
>> >>>> Kind Regards,
>> >>>>
>> >>>> Gregory
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> From: Kevin Monroe [mailto:kevin.monroe at canonical.com]
>> >>>> Sent: Monday, July 11, 2016 7:20 PM
>> >>>> To: Gregory Van Seghbroeck <gregory.vanseghbroeck at intec.ugent.be>
>> >>>> Cc: bigdata at lists.ubuntu.com
>> >>>> Subject: Re: %pyspark in Zeppelin: No module named pyspark error
>> >>>>
>> >>>>
>> >>>>
>> >>>> Hi Gregory,
>> >>>>
>> >>>>
>> >>>>
>> >>>> I wasn't able to see your data after "Our environment is set up as
>> >>>> follows:"
>> >>>>
>> >>>>
>> >>>>
>> >>>> <big black box for me>
>> >>>>
>> >>>>
>> >>>>
>> >>>> Will you reply with the output (or a pastebin link) with the
>> following:
>> >>>>
>> >>>>
>> >>>>
>> >>>> juju version
>> >>>>
>> >>>> juju status --format=tabular
>> >>>>
>> >>>>
>> >>>>
>> >>>> Kostas has found a potential zeppelin issue in the bigtop charms
>> >>>> where the bigtop spark offering may be too old.  Knowing your juju
>> >>>> and charm versions will help me know if your issue is related.
>> >>>>
>> >>>>
>> >>>>
>> >>>> Thanks!
>> >>>>
>> >>>> -Kevin
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Mon, Jul 11, 2016 at 7:36 AM, Gregory Van Seghbroeck
>> >>>> <gregory.vanseghbroeck at intec.ugent.be> wrote:
>> >>>>
>> >>>> Dear,
>> >>>>
>> >>>>
>> >>>>
>> >>>> We have deployed Zeppelin with juju and connected it to Spark.
>> >>>> According to juju everything went well. We can see this is indeed
>> >>>> the case; when we try to execute one of the Zeppelin tutorials we
>> see some nice graphs.
>> >>>> However, if we try to use the python interpreter (%pyspark) we
>> >>>> always get an error.
>> >>>>
>> >>>>
>> >>>> Kind Regards,
>> >>>>
>> >>>> Gregory
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Bigdata mailing list
>> >>>> Bigdata at lists.ubuntu.com
>> >>>> Modify settings or unsubscribe at:
>> >>>> https://lists.ubuntu.com/mailman/listinfo/bigdata
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Bigdata mailing list
>> >>>> Bigdata at lists.ubuntu.com
>> >>>> Modify settings or unsubscribe at:
>> >>>> https://lists.ubuntu.com/mailman/listinfo/bigdata
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Bigdata mailing list
>> >>> Bigdata at lists.ubuntu.com
>> >>> Modify settings or unsubscribe at:
>> >>> https://lists.ubuntu.com/mailman/listinfo/bigdata
>> >>>
>> >
>>
>> --
>> Bigdata mailing list
>> Bigdata at lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/bigdata
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/bigdata/attachments/20160716/0a622d5d/attachment-0001.html>