Scala Filter DataFrame

by | Aug 27, 2018 | Apache Spark, Big Data, Scala-example | 0 comments

Filter data with likeScala logo

Filtering is made to select the people whose surname contains “Garc” and which age is under 30.

val df = sc.parallelize(Seq(
("Paco","Garcia",24,24000,"2018-08-06 00:00:00"),
("Juan","Garcia",26,27000,"2018-08-07 00:00:00"),
("Ana", "Martin",28,28000,"2018-08-14 00:00:00"),
("Lola","Martin",29,31000,"2018-08-18 00:00:00"),
("Sara","Garcia",35,34000,"2018-08-20 00:00:00")
)).toDF("name","surname","age","salary","reg_date")

val type_df = df.select($"name",$"surname",$"age",$"salary", unix_timestamp($"reg_date", "yyyy-MM-dd HH:mm:ss").cast(TimestampType).as("timestamp"))
type_df.show()

val filter_df = type_df.filter("surname like 'Garc%' AND age < 30")
filter_df.show()
+------+--------+----+-------+-------------------+
|name  |surname |age |salary |          timestamp|
+------+--------+----+-------+-------------------+
|  Paco|  Garcia|  24|  24000|2018-08-06 00:00:00|
|  Juan|  Garcia|  26|  27000|2018-08-07 00:00:00|
|   Ana|  Martin|  28|  28000|2018-08-14 00:00:00|
|  Lola|  Martin|  29|  31000|2018-08-18 00:00:00|
|  Sara|  Garcia|  35|  34000|2018-08-20 00:00:00|
+------+--------+----+-------+-------------------+

+------+--------+----+-------+-------------------+
|name  |surname |age |salary |          timestamp|
+------+--------+----+-------+-------------------+
|  Paco|  Garcia|  24|  24000|2018-08-06 00:00:00|
|  Juan|  Garcia|  26|  27000|2018-08-07 00:00:00|
+------+--------+----+-------+-------------------+

Filtering data by matching item

Filtering is made to select people who with the surname “Garcia”

df.filter("surname== 'Garcia'").show()
+------+--------+----+-------+
|  name| surname| age| salary| 
+------+--------+----+-------+
|  Paco|  Garcia|  24|  24000|
|  Juan|  Garcia|  26|  27000|
|  Sara|  Garcia|  35|  34000|
+------+--------+----+-------+

Filtering data from the result of a pool

Filtering is done to select the surnames to be repeated more than twice

df.groupBy("surname").count().filter("count > 2").show()
+--------+-----+
| surname|count|
+--------+-----+
|  Garcia|    3|
+--------+-----+

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *