Apache Spark libraries and installation in Python

by Diego Calvo | Nov 23, 2017 | Apache Spark, Python-example | 0 comments

Prerequisites

Java 6 or higher
Python Interpreter 2.6 or higher

Installation

Install is very simple just download the latest version of Spark and unzip

wget http://apache.rediris.es/spark/spark-1.5-0/spark-1.5.0-bin-hadoop2.6.tgz 
tar -xf spark-1.5.0-bin-hadoop2.6.tgz

Interpreter execution

To run it can be done through the Pyspark interpreter or by loading a file.py

./spark-1.5.0-bin-hadoop2.6/bin/pyspark 

from pyspark import SparkConf, SparkContext 
sc = SparkContext()

Direct execution

./spark-1.5.0-bin-hadoop2.6/bin/spark-submit file.py

Use without installation

It is recommended to use the cloud services of databricks, for this we will register free of charge on their platform as users of the version “Community Edition“.

For use:

Upload or create an interpretable file
Assign a cluster for execution by clicking on “detached” icon and creating a new Cluster. It is recommended to use a low Spark version to ensure Compatibility.

Regular bookstores

#! /bin/python from pyspark 
import SparkConf, SparkContext 
sc = SparkContext()

Apache Spark libraries and installation in Python

Prerequisites

Installation

Interpreter execution

Direct execution

Use without installation

Regular bookstores

0 Comments

Submit a Comment Cancel reply

Mi filosofía

Contacto

Aviso legal

Apache Spark libraries and installation in Python

Prerequisites<img decoding="async" class=" wp-image-3728 alignright" src="https://www.diegocalvo.es/wp-content/uploads/2018/06/spark.png" alt="Spark logo" width="141" height="75" />

Installation

Interpreter execution

Direct execution

Use without installation

Regular bookstores

0 Comments

Submit a Comment Cancel reply

Prerequisites