Big data processing frameworks

The Big data ecosystems data processing frameworks are classified in the following blocks: Batch Processing Hadoop Map-Reduce: Batch or batch processing engine. Real-time processing Apache Storm Apache Samza IBM InfoSphere Apache S4 (Yahoo) Apache complexion Hybrid...

Apache Storm

Storm definition Apache Storm is a low-latency, high-availability real-time distributed computing system based on master-slave architecture. Storm is ideal for working with data that need to be analyzed in real time where latency is a variable to take into account, an...

RabbitMQ

RabbitMQ definition RabbitMQ is an MQ Message Queuing system that allows you to communicate to a multitude of actors in a fast, secure, asynchronous and reliable way. RabbitMQ acts as a middleware between producers and consumers of messages. Features Guarantees the...

Apache Flume

Flume definition Apache Flume is a distributed service that reliably and efficiently moves large amounts of data, especially logs. Ideal for online analytics applications in Hadoop environments. Flume has a simple and flexible architecture based on streaming data,...

Temporal Evolution of Big Data

Temporal evolution graphic line Temporal evolution line 2003 – Google File System 2004 – MapReduce: Simplified processing of big clusters. 2005 – Doug Cutting starts developing Hadoop. 2006 – Yahoo starts working on Hadoop. 2008 – Hadoop...

Apache HBase

HBase definition HBase is a column-oriented database management system that runs on the HDFS and is typically used to distribute data sets. HBase does not support a structured query language such as SQL, as opposed to relational database managers. The system provides...