Newer Documentation for HttpFS(Hadoop HDFS over HTTP)

Fix Hadoop Conf Alternatives for CDH5

I’m using CDH5, upgraded failed from CDH4 and then reinstalled directly. /etc/hadoop/conf is linked to /etc/hadoop/conf/conf.cloudera.mapreduce1/. Deploy Client Configuration does not make it right. The way fix it is manually set a new path and remove the old one, like:

But the next time you try Deploy Client Configuration would corrupt it again.

Infobright 企业版数据导入和数据擦写实验

拿到一个IEE的试用版证书,试了下作为日志存储和计算的方案。统计数据查询就不用测了,ICE试试就能感受出来,比hive反正快了不少。 这里主要还是想测试 INSERT / UPDATE / DELETE。 实验环境的日志系统使用 rsyslog -> flume-ng -> IEE/HDFS. 使用Flume-ng自带的HDFS Sink写HDFS的方案一直很稳定,目录按天分,写脚本预先创建目录、加Hive分区,使用hive进行分析。 但是由于可能对当天数据有统计需求hdfs.rollInterval设的比较小,目前是2分钟,每天都会有大量小文件,hive处理速度十分慢。 Flume-ng 找人写了个简单的入mysql的插件,单加了一个队列,把日志文件切分后按列送进mysql,插件要求数据库insert使用prepare批量处理insert。

YARN NodeManager Failed to Start

I upgraded my CDH, one of my NodeNamager cannot be brought up. NullPointer Exceptions were found in error log:

Manually Upgrade CDH 5.2 in CM 5

I was interrupted again when upgrading CDH. HDFS This time, NameNode was not started, I have to bring them up and resume the upgrade progress. I didn’t save any log about NN’s error, stop all HDFS components and ran ‘Upgrade HDFS Metadata‘, then start HDFS.

