Crear DataFrames
Ejemplo de como crear un dataframe en Scala.
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType};
val data = List(
Row("Paco","Garcia",24,24000),
Row("Juan","Garcia",26,27000),
Row("Lola","Martin",29,31000),
Row("Sara","Garcia",35,34000)
)
val rdd = sc.parallelize(data)
val schema = StructType(
List(
StructField("nombre", StringType, nullable=false),
StructField("apellido", StringType, nullable=false),
StructField("edad", IntegerType),
StructField("salario", IntegerType)
)
)
val df = spark.createDataFrame(rdd,schema)
df.printSchema()
df.show()
root |-- nombre: string (nullable = false) |-- apellido: string (nullable = false) |-- edad: integer (nullable = true) |-- salario: integer (nullable = true)
+------+--------+----+-------+ |nombre|apellido|edad|salario| +------+--------+----+-------+ | Paco | Garcia | 24 | 24000 | | Juan | Garcia | 26 | 27000 | | Lola | Martin | 29 | 31000 | | Sara | Garcia | 35 | 34000 | +------+--------+----+-------+
Crear dataframe con datos aleatorios
import scala.util.Random
val df = sc.parallelize(
Seq.fill(5){(Math.abs(Random.nextLong % 100000L),Math.abs(Random.nextLong % 100L))}
).toDF("salario" , "edad")
df.show()+-------+----+ |salario|edad| +-------+----+ | 41772| 17 | | 74772| 66 | | 6326| 60 | | 72581| 70 | | 53037| 0 | +-------+----+
Transformar RDD a Dataframe
val nombre_cols=Array("id", "nombre", "valores")
val df=sc.parallelize(Seq(
(1,"Mario", Seq(0,2,5)),
(2,"Sonia", Seq(1,20,5)))).toDF(nombre_cols: _*)
df.show()+---+------+----------+ | id|nombre| valores | +---+------+----------+ | 1| Mario | [0, 2, 5]| | 2| Sonia |[1, 20, 5]| +---+------+----------+
Transformar Dataset a Dataframe
import org.apache.spark.sql.functions._
val wordsDataset = sc.parallelize(
Seq("Hola mundo hola mundo",
"ni hola ni mundo ni nada",
"cuenta palabras"))
.toDS()
val result = wordsDataset
.flatMap(_.split(" ")) // Dividir las frases en palabras
.filter(_ != "") // Filtrar palabras vacias
.map(_.toLowerCase())
.toDF() // Convertir a DF para agregar y ordenar
.groupBy($"value") // Contar ocurrencias de palabras
.agg(count("*") as "ocurrencias")
.orderBy($"ocurrencias" desc) // Mostar la ocurrencia de cada palabra
result.show()
+---------+------------+ | value | ocurrencias| +---------+------------+ | nada | 1 | | palabras| 1 | | cuenta | 1 | | ni | 3 | | hola | 3 | | mundo | 3 | +---------+------------+
Transformar Listas a Dataframe
val A = List("Paco","Sara","Flor","Rosa")
val B = List(1,2,3,4)
val C = List(5,6,7,8)
val zip = A.zip(B).zip(C)
val tup = zip.map{case ((w,x),y)=>(w,x,y)}
val df = tup.toDF("A","B","C")
df.show+----+---+---+ | A| B| C| +----+---+---+ |Paco| 1| 5| |Sara| 2| 6| |Flor| 3| 7| |Rosa| 4| 8| +----+---+---+





0 comentarios