Cloudera provides the parcel of latest Apache Spark(0.9) for Cloudera Manager, which is incompatible with old versions of Shark (0.8.1, 0.8.0 or earlier). The official release/pre-release of Shark 0.9.0 for CDH 4 is still not available for downloading, build from source might be a choice.
Shark’s wiki: Build Shark From Source Code
This page is a little bit old but still useful.
1 Install CDH 4 + Spark
Install from parcels, follow Install Spark/Shark on CDH 4
2 Install git
1 |
yum install -y git |
3 Clone source
Check out the branch named ‘branch-0.9‘ to /opt/shark/shark-0.9.1-bin-cdh4
1 |
git clone https://github.com/amplab/shark -b branch-0.9 /opt/shark/shark-0.9.1-bin-cdh4 |
4 Install JDK
I’m using JDK from CDH 4, which is 1.6.0_31.
If java does not work well:
1 |
export JAVA_HOME=/usr/java/1.6.0_31 |
5 Prepare Patched Hive
This can be get from https://github.com/amplab/shark/releases and https://github.com/amplab/shark/wiki/Hive-Patches.
I’ve downloaded one as http://cloudera.rst.im/shark/hive-0.11.0-bin.tgz,if you have my Cloudera Mirror sync-ed, you can find it in $PATH_TO_MIRROR/shark
extract as /opt/shark/hive-0.11.0-bin/
Release notes of the pre-release 0.9.0 says:
AMPLab’s Hive 0.11 distribution – binaries for this have now been uploaded to Maven Central (see below) and are provided in the hive-0.11.0-bin.tgz shipped with this release.
So we don’t need to download/compile the patched hive manually anymore.
6 Build
1 2 3 |
cd /opt/shark/shark-0.9.1-bin-cdh4/; export SCALA_HOME=/opt/cloudera/parcels/SPARK/lib/spark; SHARK_HADOOP_VERSION=2.0.0-cdh4.5.0 ./sbt/sbt package |
7 Package
1 2 |
cd ..; tar zcvf shark-0.9.1-bin-cdh4.tgz shark-0.9.1-bin-cdh4; |
8 Run
If you’re trying to run ./bin/shark-withinfo, you’ll find you have to create symbolic links just like what I did in Install Spark/Shark on CDH 4 :P
Rest?
All in Install Spark/Shark on CDH 4
Download
Shark-0.9.1 for CDH 4.5.0 (Pre-release): http://cloudera.rst.im/shark/shark-0.9.1-bin-cdh4.tgz
Scala-2.10.3: http://cloudera.rst.im/shark/scala-2.10.3.tgz