Shark 0.9.1 Snapshot Working with CDH 4.6.0 Now

The Shark 0.9.x(0.9.0, 0.9.1) are still pre-release: https://github.com/amplab/shark/releases

Previously supplied 0.9.0 Prebuild with Hadoop2, CDH4.5.0: shark-0.9.0-hadoop2-bin is not really working with CDH 4.5.0, so I tried to compile Build Shark 0.9 for CDH 4, Install Spark/Shark on CDH 4

Some days later, 0.9.1 is out at the https://github.com/amplab/shark/tree/branch-0.9, the patched hive is uploaded to maven repo and can be put in lib_managed now.

I have 5 nodes for my CDH 4 test, hadoop1 – hadoop5, HDFS NameNode HA on hadoop5 + hadoop4, nameservice1.
Spark master locates on hadoop5, works on all nodes.

Problem

I tried to compile with:

This works for the first build of SHark 0.9.0 with CDH 4.5.0, but for 0.9.1 or master branch, nameservice1 is resolvd by dns and hit a wildcard record makes namenode not accessible by Shark, meta data are ok.

Errors:

After several test base on different branches/commits, I tried to minimize the configuration:
1 Change the hive-site.xml, set fs.default.name and fs.defaultFS to hadoop5
2 Edit the shark-env.sh, comment the line

This time, shark is working at standalone mode and communicating directly with namenode, if there’s any error, that might cause the stupid HA problem.

New error came up:

Another mismatched hadoop IPC server/client versions!

Last time I saw this was Install Spark/Shark on CDH 4.
I find all jars named like *hadoop* under shark/lib_managed/, found this:

I don’t understand why there’s a ‘hadoop-core-1.0.4.jar‘.
I tried to delete this stupid file and copy ‘/opt/cloudera/parcels/CDH/lib/hadoop/client-0.20/hadoop-core-2.0.0-mr1-cdh4.6.0.jar‘ to ./lib_managed/jars/org.apache.hadoop/hadoop-core.

Second test working as standalone mode, it works!
Bring all configurations back, works again!

DO NOT FORGET TO SYNC LATEST SHARK BUILD TO ALL SERVERS!

Why?

Get back to compile issue.
I’ve set ‘SHARK_HADOOP_VERSION=2.0.0-cdh4.6.0’, why a hadoop-core-1.0.4.jar is downloaded and put into lib_managed?
I tried grep ‘1\.0\.4’ in the shark folder, many lines mostly under target/; sbt clean and package, still.
Changed project/ScalaBuild and bin/dev/run-tests-from-scratch, nothing changed.
Why?

http://spark.incubator.apache.org/docs/latest/hadoop-third-party-distributions.html

Download

I’ve added a snapshot build(commit fe75a886) for CDH 4.6.0 working for me here: http://cloudera.rst.im/shark/.

Shark 0.9.1 Snapshot Working with CDH 4.6.0 Now by @sskaje: https://sskaje.me/2014/03/shark-0-9-1-snapshot-working-cdh-4-6-0/

Incoming search terms: