Creating datasets
RDD Simple to Dataset
Example of creating a dataset from a RDD
val rdd = sc.parallelize(List(1,2,3,4,5)) val ds = spark.createDataset(rdd) ds.show()
+-----+ |value| +-----+ | 1 | | 2 | | 3 | | 4 | | 5 | +-----+
Classes to Dataset
An example of creating a dataset from an instance of a class that contains Data.
import spark.implicits._
case class Person(name: String, surname: String, age: Integer, salary: Integer)
val person1 = Person("Peter","Garcia",24,24000)
val person2 = Person("Juan","Garcia",26,27000)
val person3 = Person("Lola","Martin",29,31000)
val person4 = Person("Sara","Garcia",35,34000)
val data = Seq(person1,person2,person3,person4)
val ds = spark.createDataset(data)
ds.show()+------+--------+----+-------+ |name |surname |age |salary | +------+--------+----+-------+ | Peter| Garcia| 24| 24000| | Juan| Garcia| 26| 27000| | Lola| Martin| 29| 31000| | Sara| Garcia| 35| 34000| +------+--------+----+-------+
Transforming RDD to Dataset
Example of how to move from a rdd to a dataset in a simple way
val rdd = sc.parallelize(Seq(("Paco","Garcia",24,24000),("Juan","Garcia",26,27000),("Lola","Martin",29,31000),("Sara","Garcia",35,34000)))
val ds = rdd.toDS()
display(ds)






0 Comments