Read & write JSON in Python

by Diego Calvo | Oct 9, 2018 | Big Data, Python-example | 0 comments

Generate data to use to read & write JSON

Example of random data to use in the following sections

data = []
for x in range(5):
    data.append((random.randint(0,9), random.randint(0,9)))
df = spark.createDataFrame(data, ("label", "data"))

df.show()

+-----+----+
|label|data|
+-----+----+
|    4|   0|
|    7|   0|
|    1|   1|
|    3|   8|
|    3|   5|
+-----+----+

Write data in JSON format

path_json = "/prueba.json" # Leer desde HDFS
path_json = "D:/prueba.json" # Leer desde fichero local

df.write \
    .mode("overwrite") \
    .format("json") \
    .save(path_json)

Read data in JSON format

df2 = spark\
    .read\
    .option("multiline", "true") \
    .json(path_json)

df2.show()

+-----+----+
|label|data|
+-----+----+
|    4|   0|
|    7|   0|
|    1|   1|
|    3|   8|
|    3|   5|
+-----+----+

Write gzip compressed data in JSON format

path_json_gzip = "/prueba_gzip.json" # Leer desde HDFS
path_json_gzip = "D:/prueba_gzip.json" # Leer desde fichero local

df.write\
    .mode("overwrite")\
    .format("json")\
    .option("compression", "gzip")\
    .save(path_json_gzip)

Read gzip compressed data in JSON format

df2 = spark\
    .read\
    .option("multiline", "true") \
    .json(path_json_gzip)

df2.show()

+-----+----+
|label|data|
+-----+----+
|    4|   0|
|    7|   0|
|    1|   1|
|    3|   8|
|    3|   5|
+-----+----+

Write deflate compressed data in JSON format

path_json_deflate = "/prueba_deflate.json" # Leer desde HDFS
path_json_deflate = "D:/prueba_deflate.json" # Leer desde fichero local

df.write\
    .mode("overwrite")\
    .format("json")\
    .option("compression", "deflate")\
    .save(path_json_deflate)

Read deflate compressed data in JSON format

df2 = spark\
    .read\
    .option("multiline", "true") \
    .json(path_json_deflate)

df2.show()

+-----+----+
|label|data|
+-----+----+
|    4|   0|
|    7|   0|
|    1|   1|
|    3|   8|
|    3|   5|
+-----+----+

Write bzip2 compressed data in JSON format

path_json_bzip2 = "/prueba_bzip2.json" # Leer desde HDFS 
path_json_bzip2 = "D:/prueba_bzip2.json" # Leer desde fichero local

df.write\
    .mode("overwrite")\
    .format("json")\
    .option("compression", "bzip2")\
    .save(path_json_bzip2)

Read bzip2 compressed data in JSON format

df2 = spark\
    .read\
    .option("multiline", "true") \
    .json(path_json_bzip2)

df2.show()

+-----+----+
|label|data|
+-----+----+
|    4|   0|
|    7|   0|
|    1|   1|
|    3|   8|
|    3|   5|
+-----+----+

Read & write JSON in Python

Generate data to use to read & write JSON

Write data in JSON format

Read data in JSON format

Write gzip compressed data in JSON format

Read gzip compressed data in JSON format

Write deflate compressed data in JSON format

Read deflate compressed data in JSON format

Write bzip2 compressed data in JSON format

Read bzip2 compressed data in JSON format

0 Comments

Submit a Comment Cancel reply

Mi filosofía

Contacto

Aviso legal

Read & write JSON in Python

Generate data to use to read & write JSON<img decoding="async" class=" wp-image-4232 alignright" src="https://www.diegocalvo.es/wp-content/uploads/2018/09/logo-python.png" alt="Python logo" width="186" height="47" />

Write data in JSON format

Read data in JSON format

Write gzip compressed data in JSON format

Read gzip compressed data in JSON format

Write deflate compressed data in JSON format

Read deflate compressed data in JSON format

Write bzip2 compressed data in JSON format

Read bzip2 compressed data in JSON format

0 Comments

Submit a Comment Cancel reply

Generate data to use to read & write JSON