Scala Dataset

Creating datasetsScala_logo

RDD Simple to Dataset

Example of creating a dataset from a RDD

val rdd = sc.parallelize(List(1,2,3,4,5))
val ds = spark.createDataset(rdd)
ds.show()
 +-----+
 |value|
 +-----+
 |  1  |
 |  2  |
 |  3  |
 |  4  |
 |  5  |
 +-----+

Classes to Dataset

An example of creating a dataset from an instance of a class that contains Data.

import spark.implicits._

case class Person(name: String, surname: String, age: Integer, salary: Integer)

val person1 = Person("Peter","Garcia",24,24000)
val person2 = Person("Juan","Garcia",26,27000)
val person3 = Person("Lola","Martin",29,31000)
val person4 = Person("Sara","Garcia",35,34000)

val data = Seq(person1,person2,person3,person4)

val ds = spark.createDataset(data)
ds.show()
 +------+--------+----+-------+
 |name  |surname |age |salary |
 +------+--------+----+-------+
 | Peter|  Garcia|  24|  24000|
 |  Juan|  Garcia|  26|  27000|
 |  Lola|  Martin|  29|  31000|
 |  Sara|  Garcia|  35|  34000|
 +------+--------+----+-------+

 

Transforming RDD to Dataset

Example of how to move from a rdd to a dataset in a simple way

val rdd = sc.parallelize(Seq(("Paco","Garcia",24,24000),("Juan","Garcia",26,27000),("Lola","Martin",29,31000),("Sara","Garcia",35,34000)))
val ds = rdd.toDS()
display(ds)
View DataSet

View DataSet