RDD definition

by | Jun 27, 2018 | Apache Spark | 0 comments

RDD definition

RDD Resilient distributed datasets represents an immutable and partitioned collection of elements that can be operated in parallel.

A RDD can be created or paralelizando a collection of data (list, dictionary,..) or loading it of an external storage system, such as a file sharing system, HDFS, HBase, or any data source that offers a Hadoop input format.


Submit a Comment

Your email address will not be published. Required fields are marked *