Manually Upgrade CDH 5.2 in CM 5

I was interrupted again when upgrading CDH.

HDFS

This time, NameNode was not started, I have to bring them up and resume the upgrade progress.
I didn’t save any log about NN’s error, stop all HDFS components and ran ‘Upgrade HDFS Metadata‘, then start HDFS.

YARN

Next, YARN.
I started YARN, and then all other services. Hive went down, then YARN.
I checked CM’s monitor:

I found both instances of ResourceManager were ‘Standby‘.

Here is what I found from /var/log/hadoop-yarn/hadoop-cmf-yarn-RESOURCEMANAGER-hadoop4.xxx.com.log.out

Google helps a lot: http://community.cloudera.com/t5/Cloudera-Manager-Installation/CDH-5-YARN-Resource-Manager-HA-deadlock-in-Kerberos-cluster/td-p/14396

In /opt/cloudera/parcels/CDH/lib/zookeeper/bin/zkCli.sh,
Do

one by one, because zkCli.sh does not have wildcard support.

Hive

I just guessed that hive didn’t work because of YARN, but I was wrong.
I checked all hive related commands executed by CM:

So I stopped Hive, ran Update Hive Metastore NameNodes and Upgrade Hive Metastore Database Schema, none of them worked but with the error message above.
I got more from logs:

The schemaTool reminded me, I manually upgraded hive metastore in Feb: Hive MetaStore Schema Upgrade Failed When Upgrading CDH5.
But this time, dbType should be postgres instead of derby.(Derby is not supported by Impala, that’s why I changed to postgresql embedded in Cloudera Manager.)

I cann’t find the terminal output, but when I ran:

I found a similar output (only first few lines) to the blog post above, saying schemaTool was trying to connect to derby

I re-deploy hive’s client configuration, and checked /etc/hive/conf/hive-site.xml, and compared with /var/run/cloudera-scm-agent/process/4525-hive-HIVEMETASTORE/hive-site.xml,
xml under /etc uses hive metastore’s thrift server and that under CM’s running folder speicified the exact database connection. schemaTool uses the /etc one.
So I replaced /etc one with CM’s, and then redo upgradeSchema:

Same error as I saw in CM’s log, plpgsql does not exist. Fix this by:

You can find password from the xml I mentioned above of file like

If you meet error message saying OWNER_NAME or OWNER_TYPE already exists in table DBS, open /opt/cloudera/parcels/CDH/lib/hive/scripts/metastore/upgrade/postgres/016-HIVE-6386.postgres.sql and comment/delete the two ALTER TABLE lines.

Manually Upgrade CDH 5.2 in CM 5 by @sskaje: https://sskaje.me/2014/10/manually-upgrade-cdh-5-2-cm-5/

Incoming search terms:

Newer Documentation for HttpFS(Hadoop HDFS over HTTP)

I was trying to make rsyslog v8 communicating with hadoop hdfs directly via omhdfs, but failed as it’s said officially that omhdfs is not working with rsyslog v8 by now.

UPDATE: OmHTTPFS, Another Rsyslog HDFS Output Plugin, https://github.com/rsyslog/rsyslog/tree/master/contrib/omhttpfs

I was recommended to use HttpFS when setting up Hue in CDH. HttpFS is a http gateway for HDFS, originally developed by cloudera as hoop and then contributed to Apache foundation as a component of HDFS.

It’s named ‘Hadoop HDFS over HTTP’, HttpFS is a server that provides a REST HTTP gateway supporting all HDFS File System operations (read and write). And it is inteoperable with the webhdfs REST HTTP API. The latest doc can be found here.
Examples with cURL are given on that doc page:

$ curl http://httpfs-host:14000/webhdfs/v1/user/foo/README.txt returns the contents of the HDFS /user/foo/README.txt file.
$ curl http://httpfs-host:14000/webhdfs/v1/user/foo?op=list returns the contents of the HDFS /user/foo directory in JSON format.
$ curl -X POST http://httpfs-host:14000/webhdfs/v1/user/foo/bar?op=mkdirs creates the HDFS /user/foo.bar directory.

Try those, only things you can get is a HTTP 401.
You have to add, at least, a ‘user.name=xxx’ to identify yourself, then:

YOU ARE FOOLED!

The latest source of httpfs can be found in apache foundation’s svn: http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/.
Continue reading “Newer Documentation for HttpFS(Hadoop HDFS over HTTP)” »

Newer Documentation for HttpFS(Hadoop HDFS over HTTP) by @sskaje: https://sskaje.me/2014/08/doc-for-httpfs/

Incoming search terms:

PHP ODBC Connect Cloudera Impala and Hive

Environment

CentOS 5.5
PHP 5.3.10
(This article also works for PHP 5.3.3 on CentOS 6).

Dependencies

UnixODBC

UnixODBC can be installed from yum repo

I built a unixODBC 2.3.2 from source, installed to /usr/local/unixODBC

ODBC Connectors

Cloudera offers ODBC libs for both Hive and Impala:
http://www.cloudera.com/content/support/en/downloads/connectors/impala/impala-odbc-v2-5-15.html
http://www.cloudera.com/content/support/en/downloads/connectors/hive/hive-odbc-v2-5-9.html

Follow the install guide on urls above, only wget and yum –nogpgcheck localinstall xxx.rpm required.
Continue reading “PHP ODBC Connect Cloudera Impala and Hive” »

PHP ODBC Connect Cloudera Impala and Hive by @sskaje: https://sskaje.me/2014/07/php-odbc-connect-cloudera-impala-hive/

Incoming search terms:

Delete Unexpected 127.0.0.1 from Cloudera Manager

I have cloudera manager 5.0.0 installed in my small cluster, tried to delete some nodes and then found cloudera manager not working, exceptions thrown in the landing page and 500 in the hosts page, almost null pointer exception everywhere.
Next time when I restart cm and log into CM, agent upgrading guide begins again. Here I found the 127.0.0.1 appeared in the host list which is not delete-able.
So I try to delete data from PostgreSQL.

1 read database config

password can be found from Cloudera Manager Drop Database/User on Embedded Postgresql

2 check table

3 find data

The row host_id=13 has an empty cluster id, no ip address nor name filled.

4 delete data

Try to find out deletes on foreign keys.

So Delete like

5 restart

Delete Unexpected 127.0.0.1 from Cloudera Manager by @sskaje: https://sskaje.me/2014/04/delete-unexpected-127-0-0-1-cloudera-manager/

Incoming search terms:

Fix Alternatives for Cloudera Manager + CDH

Earlier post: Fix Hadoop Conf Alternatives for CDH5

I tried to upgrade Cloudera Manager + CDH 5.0.0 beta 1 and beta 2 from CM+CDH 4 then downgrade and delete, found many alternatives were installed on my small cluster, that made my lately installed CM+CDH 4 and CM+CDH 5 not working well, all because of the dirty uninstallation of CM + CDH 5 beta-s.

To fix these alternatives, I wrote a python script, read default alternative configurations, check all currently installed alternatives and delete broken links, install defaults and bring down priority, so we can use ‘Deploy Client Configuration’ in CM to set up the correct ones.

Repository: https://github.com/sskaje/cm_fix_alternatives

Tested only under centos 6.

Fix Alternatives for Cloudera Manager + CDH by @sskaje: https://sskaje.me/2014/04/fix-alternatives-cloudera-manager-cdh/