Apache Spark libraries and installation in Python

PrerequisitesSpark logo

  • Java 6 or higher
  • Python Interpreter 2.6 or higher

Installation

Install is very simple just download the latest version of Spark and unzip

wget http://apache.rediris.es/spark/spark-1.5-0/spark-1.5.0-bin-hadoop2.6.tgz 
tar -xf spark-1.5.0-bin-hadoop2.6.tgz

Interpreter execution

To run it can be done through the Pyspark interpreter or by loading a file.py

./spark-1.5.0-bin-hadoop2.6/bin/pyspark 

from pyspark import SparkConf, SparkContext 
sc = SparkContext()

Direct execution

./spark-1.5.0-bin-hadoop2.6/bin/spark-submit file.py

Use without installation

It is recommended to use the cloud services of databricks, for this we will register free of charge on their platform as users of the version “Community Edition“.

For use:

  1. Upload or create an interpretable file
  2. Assign a cluster for execution by clicking on “detached” icon and creating a new Cluster. It is recommended to use a low Spark version to ensure Compatibility.

Regular bookstores

#! /bin/python from pyspark 
import SparkConf, SparkContext 
sc = SparkContext()