Create DataFrames
Example of how to create a dataframe in Scala.
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType};
val data = List(
Row("Peter","Garcia",24,24000),
Row("Juan","Garcia",26,27000),
Row("Lola","Martin",29,31000),
Row("Sara","Garcia",35,34000)
)
val rdd = sc.parallelize(data)
val schema = StructType(
List(
StructField("name", StringType, nullable=false),
StructField("surname", StringType, nullable=false),
StructField("age", IntegerType),
StructField("salary", IntegerType)
)
)
val df = spark.createDataFrame(rdd,schema)
df.printSchema()
df.show()root |-- name: string (nullable = false) |-- surname: string (nullable = false) |-- age: integer (nullable = true) |-- salary: integer (nullable = true)
+------+--------+----+-------+ | name | surname| age| salary| +------+--------+----+-------+ | Peter| Garcia | 24 | 24000 | | Juan | Garcia | 26 | 27000 | | Lola | Martin | 29 | 31000 | | Sara | Garcia | 35 | 34000 | +------+--------+----+-------+
Creating Dataframe with Random data
import scala.util.Random
val df = sc.parallelize(
Seq.fill(5){(Math.abs(Random.nextLong % 100000L),Math.abs(Random.nextLong % 100L))}
).toDF("salary" , "age")
df.show()+------+----+ |salary| age| +------+----+ | 41772| 17 | | 74772| 66 | | 6326 | 60 | | 72581| 70 | | 53037| 0 | +-------+---+
Transforming RDD to Dataframe
val name_cols=Array("id", "name", "values")
val df=sc.parallelize(Seq(
(1,"Mario", Seq(0,2,5)),
(2,"Sonia", Seq(1,20,5)))).toDF(name_cols: _*)
df.show()+---+------+----------+ | id| name | values | +---+------+----------+ | 1| Mario | [0, 2, 5]| | 2| Sonia |[1, 20, 5]| +---+------+----------+
Transforming Dataset to Dataframe
import org.apache.spark.sql.functions._
val wordsDataset = sc.parallelize(
Seq("Hello world hello world",
"no hello no world no more",
"count words"))
.toDS()
val result = wordsDataset
.flatMap(_.split(" ")) // Split sentences into words
.filter(_ != "") // Filter entry words
.map(_.toLowerCase())
.toDF() // Convert to DF to add and order
.groupBy($"value") // Count occurrences of words
.agg(count("*") as "ocurrences")
.orderBy($"ocurrences" desc) // Show occurrences per word
result.show()+---------+------------+ | value | ocurrences | +---------+------------+ | count | 1 | | words | 1 | | more | 1 | | no | 3 | | hello | 3 | | world | 3 | +---------+------------+
Transforming lists to Dataframe
val A = List("Paco","Sara","Tom","Rosa")
val B = List(1,2,3,4)
val C = List(5,6,7,8)
val zip = A.zip(B).zip(C)
val tup = zip.map{case ((w,x),y)=>(w,x,y)}
val df = tup.toDF("A","B","C")
df.show()+----+---+---+ | A| B| C| +----+---+---+ |Paco| 1| 5| |Sara| 2| 6| |Tom | 3| 7| |Rosa| 4| 8| +----+---+---+





0 Comments