Project: Merge small files on HDFS for Hive table
Introduction
Github: https://github.com/sskaje/hive_merge
This is a solution for small file problems on HDFS, but Hive table only.
Here is why I wrote this project: Solving Small Files Problem on CDH4.
This script simply INSERT the requested table/partition to a new table, let data be merged by Hive itself, then INSERT back with compression.
Continue reading “Project: Merge small files on HDFS for Hive table” »