Project: Merge small files on HDFS for Hive table
Introduction
Github: https://github.com/sskaje/hive_merge
This is a solution for small file problems on HDFS, but Hive table only.
Here is why I wrote this project: Solving Small Files Problem on CDH4.
This script simply INSERT the requested table/partition to a new table, let data be merged by Hive itself, then INSERT back with compression.
Continue reading “Project: Merge small files on HDFS for Hive table” »
Project: Merge small files on HDFS for Hive table by @sskaje: https://sskaje.me/2013/12/project-merge-small-files-hdfs-hive-table/