OmHTTPFS, Another Rsyslog HDFS Output Plugin

Still in beta, merged into rsyslog’s official repo. Known issues or features to be supported: Error codes returned; Connection detect; http keep-alive; … Code here: Incoming search terms:rsyslog hdfsLink to this post!

Build omhdfs for Rsyslog

omhttpfs is contrib-ed to rsyslog official repo as an alternative log-to-hdfs solution. Omhdfs is not provided as an RPM from rsyslog’s offical yum repo, so I’m trying to build on a machine with hadoop installed. My server is CentOS 6.5, with CDH 5.1 installed. Download source from: Install dependencies:

Then I tried to: … Continue reading “Build omhdfs for Rsyslog”

Newer Documentation for HttpFS(Hadoop HDFS over HTTP)

I was trying to make rsyslog v8 communicating with hadoop hdfs directly via omhdfs, but failed as it’s said officially that omhdfs is not working with rsyslog v8 by now. UPDATE: OmHTTPFS, Another Rsyslog HDFS Output Plugin, I was recommended to use HttpFS when setting up Hue in CDH. HttpFS is a http gateway … Continue reading “Newer Documentation for HttpFS(Hadoop HDFS over HTTP)”

Infobright 企业版数据导入和数据擦写实验

拿到一个IEE的试用版证书,试了下作为日志存储和计算的方案。统计数据查询就不用测了,ICE试试就能感受出来,比hive反正快了不少。 这里主要还是想测试 INSERT / UPDATE / DELETE。 实验环境的日志系统使用 rsyslog -> flume-ng -> IEE/HDFS. 使用Flume-ng自带的HDFS Sink写HDFS的方案一直很稳定,目录按天分,写脚本预先创建目录、加Hive分区,使用hive进行分析。 但是由于可能对当天数据有统计需求hdfs.rollInterval设的比较小,目前是2分钟,每天都会有大量小文件,hive处理速度十分慢。 Flume-ng 找人写了个简单的入mysql的插件,单加了一个队列,把日志文件切分后按列送进mysql,插件要求数据库insert使用prepare批量处理insert。 Incoming search terms:infobright 企业版Link to this post!