Cloudera Mirror

Currently available for CentOS/RHEL 6 x86_64.
Address:
http://cloudera.rst.im/
rsync://cloudera.rst.im/cloudera/

Cloudera Manager

Cloudera Manager 5.0.2

Cloudera Manager 5: installer for CM5, yum repo file for CM5

Cloudera Manager 4.8.3

Cloudera Manager 4: installer for CM4, yum repo file for CM4

Cloudera Distribution of Hadoop

CDH’s official product page: http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.html
Parcel is my favorite choice installing CDH from CM.
To add a new parcel repository, Cloudera Manager Admin Console => Administration => Settings => Parcels,
add a URL below to ‘Remote Parcel Repository URLs‘.

CDH 5

CDH 5: http://cloudera.rst.im/cdh5/parcels/latest/ View files

CDH 4

CDH 4: http://cloudera.rst.im/cdh4/parcels/latest/ View files

GPL Extras

Currently only LZO related packages.
Cloudera does not provide a ‘latest’ link for GPL Extras, I’ll maintain it myself.
View files for CDH5 View files for CDH4
URL: http://cloudera.rst.im/gplextras5/parcels/latest/ http://cloudera.rst.im/gplextras/parcels/latest/

Cloudera Impala

View files

Cloudera Impala is the industry’s leading massively parallel processing (MPP) SQL query engine that runs natively in Apache Hadoop™. The Apache-licensed, open source Impala project combines modern, scalable parallel database technology with the power of Hadoop, enabling users to directly query data stored in HDFS and Apache HBase™ without requiring data movement or transformation. Impala is designed from the ground up as part of the Hadoop ecosystem and shares the same flexible file and data formats, metadata, security and resource management frameworks used by MapReduce, Apache Hive™, Apache Pig™ and other components of the Hadoop stack.

URL: http://cloudera.rst.im/impala/parcels/latest/
Impala is included in CDH5. Use this parcel repo only if you are using CDH4

Cloudera Search

View files

Cloudera Search brings full-text, interactive search and scalable indexing to Apache Hadoop. Powered by Apache Solr, the enterprise standard for open source search, Cloudera Search brings scale and reliability for a new generation of big data search. And because it is integrated with CDH, Cloudera Search gains the same fault tolerance, scale, visibility, and flexibility provided to other Hadoop workloads.

URL: http://cloudera.rst.im/search/parcels/latest/
Solr/Search is included in CDH5. Use this parcel repo only if you are using CDH4

Apache Sentry

View files

Apache Sentry (incubating) is the next step in enterprise-grade big data security and delivers fine-grained authorization to data stored in Apache Hadoop. An independent security module that integrates with open source SQL query engines Apache Hive and Cloudera Impala, Sentry delivers advanced authorization controls to enable multi-user applications and cross-functional processes for enterprise data sets.

URL: http://cloudera.rst.im/sentry/parcels/latest/

Apache Spark

View files

Apache Spark is an open source, parallel data processing framework that complements Apache Hadoop to make it easy to develop fast, unified Big Data applications combining batch, streaming, and interactive analytics on all your data. In collaboration with Databricks – the company leading the development of Spark – Cloudera offers commercial support for Spark with Cloudera Enterprise.

URL: http://cloudera.rst.im/spark/parcels/latest/
If you’re trying to install Shark on CDH4, you may need this: Install Spark/Shark on CDH 4.
Apache Spark is included in CDH5. Use this parcel repo only if you are using CDH4

Apache Accumulo

View files

The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system. Apache Accumulo is based on Google’s BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. Apache Accumulo features a few novel improvements on the BigTable design in the form of cell-based access control and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. Other notable improvements and feature are outlined here.

URL: http://cloudera.rst.im/accumulo/parcels/latest/

Install/Deploy

Something more about installation:
If you decide to install with installer above, you can add a line to your /etc/hosts
106.186.27.96 archive.cloudera.com
Maybe it’s better for you to hijack this on your private DNS forwarder.

If you are about to install by yum, run
wget http://cloudera.rst.im/cm5.repo -O /etc/yum.repo.d/cloudera-manager.repo for CM5.
wget http://cloudera.rst.im/cm4.repo -O /etc/yum.repo.d/cloudera-manager.repo for CM4,

It’s always a best idea for you to sync the mirror to your local server, host an nginx and resolve archive.cloudera.com to your local yum server.

Source code

Scripts are provided on github: https://github.com/sskaje/cloudera_mirror.

More

Cloudera Archive Mirror Updated for CM5 & CDH5
Cloudera Archive Mirror for RHEL/CentOS 6

Cloudera Mirror by @sskaje: https://sskaje.me/cloudera-mirror/