OmHTTPFS, Another Rsyslog HDFS Output Plugin

Still in beta, merged into rsyslog’s official repo.

Known issues or features to be supported:
Error codes returned;
Connection detect;
http keep-alive;

Code here: https://github.com/rsyslog/rsyslog/tree/master/contrib/omhttpfs
Continue reading “OmHTTPFS, Another Rsyslog HDFS Output Plugin” »

OmHTTPFS, Another Rsyslog HDFS Output Plugin by @sskaje: https://sskaje.me/2014/12/omhttpfs-rsyslog-hdfs-output-plugin/

Incoming search terms:

YARN NodeManager Failed to Start

I upgraded my CDH, one of my NodeNamager cannot be brought up.
NullPointer Exceptions were found in error log:

I tried deleting all ZooKeeper-related configs(which you can find it from Manually Upgrade CDH 5.2 in CM 5, exact the YARN part), not working.
Deleted the NodeManager instance and then reinstall, same.

Many ‘Recovering application’ and ‘Recovering localized resource’ were found in that log file:

Deleted those, still failed.

And, in the start-up message part,

‘/tmp/hadoop-yarn/’ were read every on starting.
The solution is, stop the instance, delete ‘/tmp/hadoop-yarn/’ from local filesystem, start the instance.

YARN NodeManager Failed to Start by @sskaje: https://sskaje.me/2014/11/yarn-nodemanager-failed-start/

Incoming search terms:

Manually Upgrade CDH 5.2 in CM 5

I was interrupted again when upgrading CDH.

HDFS

This time, NameNode was not started, I have to bring them up and resume the upgrade progress.
I didn’t save any log about NN’s error, stop all HDFS components and ran ‘Upgrade HDFS Metadata‘, then start HDFS.

YARN

Next, YARN.
I started YARN, and then all other services. Hive went down, then YARN.
I checked CM’s monitor:

I found both instances of ResourceManager were ‘Standby‘.

Here is what I found from /var/log/hadoop-yarn/hadoop-cmf-yarn-RESOURCEMANAGER-hadoop4.xxx.com.log.out

Google helps a lot: http://community.cloudera.com/t5/Cloudera-Manager-Installation/CDH-5-YARN-Resource-Manager-HA-deadlock-in-Kerberos-cluster/td-p/14396

In /opt/cloudera/parcels/CDH/lib/zookeeper/bin/zkCli.sh,
Do

one by one, because zkCli.sh does not have wildcard support.

Hive

I just guessed that hive didn’t work because of YARN, but I was wrong.
I checked all hive related commands executed by CM:

So I stopped Hive, ran Update Hive Metastore NameNodes and Upgrade Hive Metastore Database Schema, none of them worked but with the error message above.
I got more from logs:

The schemaTool reminded me, I manually upgraded hive metastore in Feb: Hive MetaStore Schema Upgrade Failed When Upgrading CDH5.
But this time, dbType should be postgres instead of derby.(Derby is not supported by Impala, that’s why I changed to postgresql embedded in Cloudera Manager.)

I cann’t find the terminal output, but when I ran:

I found a similar output (only first few lines) to the blog post above, saying schemaTool was trying to connect to derby

I re-deploy hive’s client configuration, and checked /etc/hive/conf/hive-site.xml, and compared with /var/run/cloudera-scm-agent/process/4525-hive-HIVEMETASTORE/hive-site.xml,
xml under /etc uses hive metastore’s thrift server and that under CM’s running folder speicified the exact database connection. schemaTool uses the /etc one.
So I replaced /etc one with CM’s, and then redo upgradeSchema:

Same error as I saw in CM’s log, plpgsql does not exist. Fix this by:

You can find password from the xml I mentioned above of file like

If you meet error message saying OWNER_NAME or OWNER_TYPE already exists in table DBS, open /opt/cloudera/parcels/CDH/lib/hive/scripts/metastore/upgrade/postgres/016-HIVE-6386.postgres.sql and comment/delete the two ALTER TABLE lines.

Manually Upgrade CDH 5.2 in CM 5 by @sskaje: https://sskaje.me/2014/10/manually-upgrade-cdh-5-2-cm-5/

Incoming search terms:

Newer Documentation for HttpFS(Hadoop HDFS over HTTP)

I was trying to make rsyslog v8 communicating with hadoop hdfs directly via omhdfs, but failed as it’s said officially that omhdfs is not working with rsyslog v8 by now.

UPDATE: OmHTTPFS, Another Rsyslog HDFS Output Plugin, https://github.com/rsyslog/rsyslog/tree/master/contrib/omhttpfs

I was recommended to use HttpFS when setting up Hue in CDH. HttpFS is a http gateway for HDFS, originally developed by cloudera as hoop and then contributed to Apache foundation as a component of HDFS.

It’s named ‘Hadoop HDFS over HTTP’, HttpFS is a server that provides a REST HTTP gateway supporting all HDFS File System operations (read and write). And it is inteoperable with the webhdfs REST HTTP API. The latest doc can be found here.
Examples with cURL are given on that doc page:

$ curl http://httpfs-host:14000/webhdfs/v1/user/foo/README.txt returns the contents of the HDFS /user/foo/README.txt file.
$ curl http://httpfs-host:14000/webhdfs/v1/user/foo?op=list returns the contents of the HDFS /user/foo directory in JSON format.
$ curl -X POST http://httpfs-host:14000/webhdfs/v1/user/foo/bar?op=mkdirs creates the HDFS /user/foo.bar directory.

Try those, only things you can get is a HTTP 401.
You have to add, at least, a ‘user.name=xxx’ to identify yourself, then:

YOU ARE FOOLED!

The latest source of httpfs can be found in apache foundation’s svn: http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/.
Continue reading “Newer Documentation for HttpFS(Hadoop HDFS over HTTP)” »

Newer Documentation for HttpFS(Hadoop HDFS over HTTP) by @sskaje: https://sskaje.me/2014/08/doc-for-httpfs/

Incoming search terms:

Build omhdfs for Rsyslog

omhttpfs is contrib-ed to rsyslog official repo as an alternative log-to-hdfs solution.

Omhdfs is not provided as an RPM from rsyslog’s offical yum repo, so I’m trying to build on a machine with hadoop installed.
My server is CentOS 6.5, with CDH 5.1 installed.

Download source from: http://www.rsyslog.com/files/download/rsyslog/rsyslog-8.2.2.tar.gz

Install dependencies:

Then I tried to:

Failed on make.
Error message:

Continue reading “Build omhdfs for Rsyslog” »

Build omhdfs for Rsyslog by @sskaje: https://sskaje.me/2014/08/build-omhdfs-rsyslog/