Apache Hadoop YARN

Yarn definition Yarn (Yet Another Resource negotiator) is a data operating system and distributed Resource Manager, also known as Hadoop 2 as it is the evolution of Hadoop Map-Reduce. The most significant changes of Hadoop 2 over Hadoop 1 is that the thread technology is included, this technology provides an effective allocation of resources, for…

Read More »

Kerberos

Kerberos definition Kerberos is an authentication protocol that allows two computers to demonstrate their identity mutually in a secure way. Implemented on a client server architecture and works on the basis of tickets that serve to demonstrate the identity of the users. Authentication between two computers is carried out using a trusted third party called…

Read More »

Apache Flink (batch & streaming processing)

Flink definition Apache Flink is a native low-latency data flow processing engine that provides communication and fault tolerance data distribution capabilities. Flink was developed in Java and Scala by the Technical University of Berlin and is currently the start-up Data artisans which is responsible for supporting and making improvements. The applications that obtain the best…

Read More »

Big data processing frameworks

The Big data ecosystems data processing frameworks are classified in the following blocks: Batch Processing Hadoop Map-Reduce: Batch or batch processing engine. Real-time processing Apache Storm Apache Samza IBM InfoSphere Apache S4 (Yahoo) Apache complexion Hybrid processing Apache Spark streaming: Batch processing engine with streaming functions via micro-batches. Uses a lambda architecture  Apache Flink: Streaming…

Read More »

RabbitMQ

RabbitMQ definition RabbitMQ is an MQ Message Queuing system that allows you to communicate to a multitude of actors in a fast, secure, asynchronous and reliable way. RabbitMQ acts as a middleware between producers and consumers of messages. Features Guarantees the delivery and order of the messages that are consumed, respecting the order of arrival…

Read More »

Apache Hadoop

Hadoop definition Apache Hadoop is a distributed system that allows to carry out processing of large volumes of data through cluster, easy to scale. Broadly speaking, it can be said that Hadoop is composed by two parts: Data storage of different types (HDFS) Performs data processing tasks in a distributed way (MapReduce). Hadoop is based…

Read More »