Baidu地图API接口 by @sskaje:

Project: Merge small files on HDFS for Hive table

Project: Merge small files on HDFS for Hive table



This is a solution for small file problems on HDFS, but Hive table only.

Here is why I wrote this project: Solving Small Files Problem on CDH4.

This script simply INSERT the requested table/partition to a new table, let data be merged by Hive itself, then INSERT back with compression.

Continue reading “Project: Merge small files on HDFS for Hive table” »

Project: Merge small files on HDFS for Hive table by @sskaje:

Incoming search terms:

Solving Small Files Problem on CDH4

This morning when I open my Cloudera Manager, it shows the NameNode server is ‘Concerning’ with a message like ‘The DataNode has xxx blocks. Warning threshold: 200,000 block(s).’.
I tried to google this, said that there might be too many files on HDFS, as DataNode’s default block size is 128MB on my CDH4, a single file with 1 byte would take a 128MB block.

Then I tried hdfs dfs -count to find out number of files of each directory on HDFS, about 70k files under /user/hdfs/.staging and 170k under a folder for Flume-NG.

I’m collecting logs with Flume-NG on CDH4 and trying to analyse with hive, from syslog, sink to HDFS and MySQL(infobright). The HDFS part in the configuration looks like:

Continue reading “Solving Small Files Problem on CDH4” »

Solving Small Files Problem on CDH4 by @sskaje:

Incoming search terms:

Collections: Integer factorization

Integer factorization:

In number theory, integer factorization or prime factorization is the decomposition of a composite number into smaller non-trivial divisors, which when multiplied together equal the original integer.


Msieve is a C library implementing a suite of algorithms to factor large integers. It contains an implementation of the SIQS and GNFS algorithms; the latter has helped complete some of the largest public factorizations known

msieve has CUDA supported!!

Continue reading “Collections: Integer factorization” »

Collections: Integer factorization by @sskaje:

Virtualized ARM on Ubuntu

I was finding articles/wikis how to emulate an arm linux (armel) on centos/ubuntu, then I found this from MDN:

This article uses an old release by linaro which based on Ubuntu natty that can no longer be found on
As Ubuntu says, armel would not be supported, that’s why the latest code name of ubuntu supporting armel is begin with ‘Q’.

I found another server release and a new nano, tried that with similar commands, notes are below:

Continue reading “Virtualized ARM on Ubuntu” »

Virtualized ARM on Ubuntu by @sskaje:

Incoming search terms: