OmHTTPFS, Another Rsyslog HDFS Output Plugin

Still in beta, merged into rsyslog’s official repo.

Known issues or features to be supported:
Error codes returned;
Connection detect;
http keep-alive;

Code here: https://github.com/rsyslog/rsyslog/tree/master/contrib/omhttpfs
Continue reading “OmHTTPFS, Another Rsyslog HDFS Output Plugin” »

OmHTTPFS, Another Rsyslog HDFS Output Plugin by @sskaje: https://sskaje.me/2014/12/omhttpfs-rsyslog-hdfs-output-plugin/

Incoming search terms:

Newer Documentation for HttpFS(Hadoop HDFS over HTTP)

I was trying to make rsyslog v8 communicating with hadoop hdfs directly via omhdfs, but failed as it’s said officially that omhdfs is not working with rsyslog v8 by now.

UPDATE: OmHTTPFS, Another Rsyslog HDFS Output Plugin, https://github.com/rsyslog/rsyslog/tree/master/contrib/omhttpfs

I was recommended to use HttpFS when setting up Hue in CDH. HttpFS is a http gateway for HDFS, originally developed by cloudera as hoop and then contributed to Apache foundation as a component of HDFS.

It’s named ‘Hadoop HDFS over HTTP’, HttpFS is a server that provides a REST HTTP gateway supporting all HDFS File System operations (read and write). And it is inteoperable with the webhdfs REST HTTP API. The latest doc can be found here.
Examples with cURL are given on that doc page:

$ curl http://httpfs-host:14000/webhdfs/v1/user/foo/README.txt returns the contents of the HDFS /user/foo/README.txt file.
$ curl http://httpfs-host:14000/webhdfs/v1/user/foo?op=list returns the contents of the HDFS /user/foo directory in JSON format.
$ curl -X POST http://httpfs-host:14000/webhdfs/v1/user/foo/bar?op=mkdirs creates the HDFS /user/foo.bar directory.

Try those, only things you can get is a HTTP 401.
You have to add, at least, a ‘user.name=xxx’ to identify yourself, then:

YOU ARE FOOLED!

The latest source of httpfs can be found in apache foundation’s svn: http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/.
Continue reading “Newer Documentation for HttpFS(Hadoop HDFS over HTTP)” »

Newer Documentation for HttpFS(Hadoop HDFS over HTTP) by @sskaje: https://sskaje.me/2014/08/doc-for-httpfs/

Incoming search terms:

Build omhdfs for Rsyslog

omhttpfs is contrib-ed to rsyslog official repo as an alternative log-to-hdfs solution.

Omhdfs is not provided as an RPM from rsyslog’s offical yum repo, so I’m trying to build on a machine with hadoop installed.
My server is CentOS 6.5, with CDH 5.1 installed.

Download source from: http://www.rsyslog.com/files/download/rsyslog/rsyslog-8.2.2.tar.gz

Install dependencies:

Then I tried to:

Failed on make.
Error message:

Continue reading “Build omhdfs for Rsyslog” »

Build omhdfs for Rsyslog by @sskaje: https://sskaje.me/2014/08/build-omhdfs-rsyslog/

Project: Merge small files on HDFS for Hive table

Project: Merge small files on HDFS for Hive table

Introduction

Github: https://github.com/sskaje/hive_merge

This is a solution for small file problems on HDFS, but Hive table only.

Here is why I wrote this project: Solving Small Files Problem on CDH4.

This script simply INSERT the requested table/partition to a new table, let data be merged by Hive itself, then INSERT back with compression.

Continue reading “Project: Merge small files on HDFS for Hive table” »

Project: Merge small files on HDFS for Hive table by @sskaje: https://sskaje.me/2013/12/project-merge-small-files-hdfs-hive-table/

Incoming search terms: