Currently available for CentOS/RHEL 6 x86_64.
Cloudera Manager 5.0.2
Cloudera Manager 4.8.3
Cloudera Distribution of Hadoop
CDH’s official product page: http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.html
Parcel is my favorite choice installing CDH from CM.
To add a new parcel repository, Cloudera Manager Admin Console => Administration => Settings => Parcels,
add a URL below to ‘Remote Parcel Repository URLs‘.
CDH 5: http://cloudera.rst.im/cdh5/parcels/latest/ View files
CDH 4 CDH 4: http://cloudera.rst.im/cdh4/parcels/latest/ View files
Currently only LZO related packages.
Cloudera does not provide a ‘latest’ link for GPL Extras, I’ll maintain it myself.
View files for CDH5
View files for CDH4
Cloudera Impala is the industry’s leading massively parallel processing (MPP) SQL query engine that runs natively in Apache Hadoop™. The Apache-licensed, open source Impala project combines modern, scalable parallel database technology with the power of Hadoop, enabling users to directly query data stored in HDFS and Apache HBase™ without requiring data movement or transformation. Impala is designed from the ground up as part of the Hadoop ecosystem and shares the same flexible file and data formats, metadata, security and resource management frameworks used by MapReduce, Apache Hive™, Apache Pig™ and other components of the Hadoop stack.
Impala is included in CDH5. Use this parcel repo only if you are using CDH4
Cloudera Search brings full-text, interactive search and scalable indexing to Apache Hadoop. Powered by Apache Solr, the enterprise standard for open source search, Cloudera Search brings scale and reliability for a new generation of big data search. And because it is integrated with CDH, Cloudera Search gains the same fault tolerance, scale, visibility, and flexibility provided to other Hadoop workloads.
Solr/Search is included in CDH5. Use this parcel repo only if you are using CDH4
Apache Sentry (incubating) is the next step in enterprise-grade big data security and delivers fine-grained authorization to data stored in Apache Hadoop. An independent security module that integrates with open source SQL query engines Apache Hive and Cloudera Impala, Sentry delivers advanced authorization controls to enable multi-user applications and cross-functional processes for enterprise data sets.
Apache Spark is an open source, parallel data processing framework that complements Apache Hadoop to make it easy to develop fast, unified Big Data applications combining batch, streaming, and interactive analytics on all your data. In collaboration with Databricks – the company leading the development of Spark – Cloudera offers commercial support for Spark with Cloudera Enterprise.
If you’re trying to install Shark on CDH4, you may need this: Install Spark/Shark on CDH 4.
Apache Spark is included in CDH5. Use this parcel repo only if you are using CDH4
The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system. Apache Accumulo is based on Google’s BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. Apache Accumulo features a few novel improvements on the BigTable design in the form of cell-based access control and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. Other notable improvements and feature are outlined here.
Something more about installation:
If you decide to install with installer above, you can add a line to your /etc/hosts
Maybe it’s better for you to hijack this on your private DNS forwarder.
If you are about to install by yum, run
wget http://cloudera.rst.im/cm5.repo -O /etc/yum.repo.d/cloudera-manager.repo for CM5.
wget http://cloudera.rst.im/cm4.repo -O /etc/yum.repo.d/cloudera-manager.repo for CM4,
It’s always a best idea for you to sync the mirror to your local server, host an nginx and resolve archive.cloudera.com to your local yum server.
Scripts are provided on github: https://github.com/sskaje/cloudera_mirror.