Build Shark 0.9 for CDH 4

Cloudera provides the parcel of latest Apache Spark(0.9) for Cloudera Manager, which is incompatible with old versions of Shark (0.8.1, 0.8.0 or earlier). The official release/pre-release of Shark 0.9.0 for CDH 4 is still not available for downloading, build from source might be a choice.

Shark’s wiki: Build Shark From Source Code
This page is a little bit old but still useful.

1 Install CDH 4 + Spark

Install from parcels, follow Install Spark/Shark on CDH 4

2 Install git

3 Clone source

Check out the branch named ‘branch-0.9‘ to /opt/shark/shark-0.9.1-bin-cdh4

4 Install JDK

I’m using JDK from CDH 4, which is 1.6.0_31.
If java does not work well:

5 Prepare Patched Hive

This can be get from https://github.com/amplab/shark/releases and https://github.com/amplab/shark/wiki/Hive-Patches.
I’ve downloaded one as http://cloudera.rst.im/shark/hive-0.11.0-bin.tgz,if you have my Cloudera Mirror sync-ed, you can find it in $PATH_TO_MIRROR/shark
extract as /opt/shark/hive-0.11.0-bin/

Release notes of the pre-release 0.9.0 says:

AMPLab’s Hive 0.11 distribution – binaries for this have now been uploaded to Maven Central (see below) and are provided in the hive-0.11.0-bin.tgz shipped with this release.

So we don’t need to download/compile the patched hive manually anymore.

6 Build

7 Package

8 Run

If you’re trying to run ./bin/shark-withinfo, you’ll find you have to create symbolic links just like what I did in Install Spark/Shark on CDH 4 :P

Rest?

All in Install Spark/Shark on CDH 4

Download

Shark-0.9.1 for CDH 4.5.0 (Pre-release): http://cloudera.rst.im/shark/shark-0.9.1-bin-cdh4.tgz
Scala-2.10.3: http://cloudera.rst.im/shark/scala-2.10.3.tgz

Build Shark 0.9 for CDH 4 by @sskaje: https://sskaje.me/2014/02/build-shark-for-cdh-4/