Leer Json en Scala

por Diego Calvo | Ago 27, 2018 | Big data, Scala, Spark | 1 Comentario

Leer Json de cadena de texto

Ejemplo simple de lectura de Json a partir de una cadena de texto

val events = sc.parallelize(
"""
[{"accion":"create","tiempo":"2018-08-07T00:01:17Z"},
 {"accion":"create","tiempo":"2018-08-07T00:01:17Z"}]
""" :: Nil)

val df = sqlContext.read.json(events)
df.printSchema()
df.show()

root
|-- accion: string (nullable = true)
|-- tiempo: string (nullable = true)

+------+--------------------+
|accion|              tiempo|
+------+--------------------+
|create|2018-08-07T00:01:17Z|
|create|2018-08-07T00:01:17Z|
+------+--------------------+

Leer Json de cadena de texto autodefiniendo la estructura

Ejemplo simple de lectura de Json a partir de una cadena de texto autodefiniendo la estructura

val events = sc.parallelize(
"""
[{"accion":"create","tiempo":"2018-08-07T00:01:17Z"},
{"accion":"create","tiempo":"2018-08-07T00:01:17Z"}]
""" :: Nil)

import org.apache.spark.sql.types.{DataType, StructType}

// Leer el esquema del Json
val schema_json=spark.read.json(events).schema.json

// Añadir el esquema
val schema=DataType.fromJson(schema_json).asInstanceOf[StructType]

// Leer el Json usando el esquema
val df=spark.read.schema(schema).json(events)

df.printSchema()
df.show()

Leer Json de cadena de texto definiendo la estructura

Ejemplo simple de lectura de Json a partir de una cadena de texto definiendo la estructura

import org.apache.spark.sql.types._

val events = sc.parallelize(
"""
[{"accion":"create","evento":{"tipo":1,"tiempo":"2018-08-07T00:01:17Z"}},
 {"accion":"create","evento":{"tipo":1,"tiempo":"2018-08-07T00:01:17Z"}}]
"""
:: Nil)

val schema = (new StructType)
  .add("accion", StringType)
  .add("evento", (new StructType)
      .add("tipo", LongType)
      .add("tiempo", StringType)
   )
  
// Leer el Json usando el esquema 
val df=spark.read.schema(schema).json(events)

df.printSchema()
df.show()
df.select($"evento.*").show()

root
 |-- accion: string (nullable = true)
 |-- evento: struct (nullable = true)
 |    |-- tipo: long (nullable = true)
 |    |-- tiempo: string (nullable = true)

+------+--------------------+
|accion|              evento|
+------+--------------------+
|create|[1, 2018-08-07T00...|
|create|[1, 2018-08-07T00...|
+------+--------------------+

+----+--------------------+
|tipo|              tiempo|
+----+--------------------+
|   1|2018-08-07T00:01:17Z|
|   1|2018-08-07T00:01:17Z|
+----+--------------------+

Leer Json de HDFS

El fichero prueba1.json contiene exactamente el texto contenido entre tiples comillas en el anterior ejemplo, es decir:

prueba1.json

[{"accion":"create","evento":{"tipo":1,"tiempo":"2018-08-07T00:01:17Z"}},
 {"accion":"create","evento":{"tipo":1,"tiempo":"2018-08-07T00:01:17Z"}}]

Una vez definido el contenido se especifica como leer este fichero HDFS:

val df = spark.read.json(spark.sparkContext.wholeTextFiles("/pruebas/prueba1.json").values)
df.printSchema()
df.select($"evento.*").show()

root
 |-- accion: string (nullable = true)
 |-- evento: struct (nullable = true)
 |    |-- tipo: long (nullable = true)
 |    |-- tiempo: string (nullable = true)

+----+--------------------+
|tipo|              tiempo|
+----+--------------------+
|   1|2018-08-07T00:01:17Z|
|   1|2018-08-07T00:01:17Z|
+----+--------------------+

1 Comentario

Christian el 18 noviembre, 2022 a las 2:16 pm
Buenos dias, tengo un json un poco extraño y me gustaría saber como procesarlo. Te podría mandar el json y me comentas como procesarlo?
Responder

Leer Json en Scala

Leer Json de cadena de texto

Leer Json de cadena de texto autodefiniendo la estructura

Leer Json de cadena de texto definiendo la estructura

Leer Json de HDFS

1 Comentario

Enviar un comentario Cancelar la respuesta

Mi filosofía

Contacto

Aviso legal

Leer Json en Scala

Leer Json de cadena de texto<img decoding="async" class="size-full wp-image-4072 alignright" src="https://www.diegocalvo.es/wp-content/uploads/2018/08/scala_logo.png" alt="" width="143" height="61" />

Leer Json de cadena de texto autodefiniendo la estructura

Leer Json de cadena de texto definiendo la estructura

Leer Json de HDFS

1 Comentario

Enviar un comentario Cancelar la respuesta

Leer Json de cadena de texto