Cluster management tools – Big Data

Big Data application and resource managers

  • Hadoop Map-Reduce is a distributed resource manager and data processing. Provides a scheduling infrastructure that provides algorithms for performing the distributed calculations.
  • YARN is an operating data system and distributed resource Manager. Evolution of Map-Reduce. It can run on Linux and Windows.
  • Standalone is an operating data system and distributed resource Manager. It can be run on Linux, Mac, and Windows. 
  • Mesos is an operating data system and distributed resource Manager. It can be run on Linux and Mac. 

Big Data Cluster Access interface

  • Apache Ambari is the cluster access interface for HortonWorks IBM, Azure and pivotal.
  • Ganglia is the cluster access interface for Amazon, IBM and pivotal platforms.
  • Nagios is the cluster access interface for IBM and pivotal.
  • Cloudera Manager is the Cloudera cluster access interface.
  • Apache Hue provides a graphical browser interface to perform your Hive work in a simple way.

Big Data Cluster Management

  • Zookeeper is a manager synchronization for the cluster.
  • WHIRR is the Cloud supply for Hadoop, you can boot a cluster in a few minutes with a very simple configuration file.

Big Data Workflows

  • Oozie is a workflow manager that allows you to define when to run the MapReduce jobs, in a programmed way or when new data are available.

Cascading creates and executes data processing workflows in Hadoop clusters using any JVM-based language (the Java virtual machine). Again, the goal is to remove the complexity of working with MapReduce and its work. It is very used in complex environments such as bioinformatics, Machine Learning algorithms, predictive analysis, Web Mining and ETL tools.