Sprark Production Cluster

Fri Jan 13 21:56:42 UTC 2017

On Thu, Jan 12, 2017 at 11:23 AM, Paddy <ravingbonkerss at gmail.com> wrote:

> Hi all,
> I am trying to create a spark production cluster with juju.
> I am using this bundle
>
> https://jujucharms.com/spark/xenial/15
>
>
> But this bundle has spark version 1.5 is there any way to update the
> version to latest ?
>
> please let me know.
>
> -pR
>

Hey pR,

That spark charm is based on the latest package from the stable Apache
Bigtop release (bigtop-1.1.0), which is spark 1.5.1.  We really like Bigtop
because they do a lot to verify interoperability of big data components
(i.e., bigtop tests give us confidence that the spark package works with
hadoop, zeppelin, etc packages).

That said, I understand the desire for more recent application versions.  I
can offer 3 suggestions:

First, know that the bigotp-1.2.0 release should be happening very soon --
the release manager has been decided, so we'll see 1.2 as soon as
https://issues.apache.org/jira/browse/BIGTOP-2282 is resolved.  This will
bring the Bigtop spark offering up to spark-2.1 [1].  I would expect charms
to be refreshed within about a week of the bigtop-1.2 release.

Second, if you can't wait for the release, you can try deploying the spark
charm, transferring the latest spark-*.deb from [1] to that unit, and 'sudo
dpkg -i spark-*.deb' to upgrade spark on that unit to 2.1.x.  I have not
tried this myself, but I'm happy to do so and/or help debug if you want to
go this route.

Finally, we have an apache-spark charm [2] which was built to work with
upstream tarballs instead of a packaged version.  By default, that charm
will pull spark-1.6.1 from our s3 bucket, but with a quick change to the
resources.yaml [3], we could make it pull a newer version.  Again, I'd be
happy to help try this or assist you if this sounds good to you.

Whatever you decided, it would be helpful if you could let us know what
you're doing with this cluster.  Are you running spark standalone, or will
you run spark in yarn-[client|cluster] mode alongside a hadoop cluster (if
in yarn mode, what hadoop version)?  Do you require HA (and therefore
zookeeper)?  What does your infra look like (bare-metal, containers,
openstack, cloud? intel, ppc, arm?)?

Thanks!
-Kevin

[1]:
https://ci.bigtop.apache.org/view/Packages/job/Bigtop-trunk-packages/COMPONENTS=spark,OS=ubuntu-16.04/
[2]: https://jujucharms.com/apache-spark/
[3]:
https://api.jujucharms.com/charmstore/v5/apache-spark/archive/resources.yaml
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/bigdata/attachments/20170113/d4054b12/attachment.html>