Skip to content

fsm python Tute 1 Installation

Kipling edited this page Mar 5, 2021 · 7 revisions

Tute 1 - Installation

Pre-instillation (Debian/Ubuntu)

The fsm-python package is based on Apache Spark (PySpark) and is designed to process big data using parallel computing. Therefore, you will need to install Spark to use the fsm tools

You can install Apache Spark form the official download page

curl -O https://downloads.apache.org/spark/spark-3.1.1/spark-3.1.1-bin-hadoop2.7.tgz

Unzip and move to /opt/spark

tar xvf spark-3.1.1-bin-hadoop2.7.tgz
sudo mv spark-3.1.1-bin-hadoop2.7 /opt/spark

Add Spark to your Environment variables:

Eg: for linux

echo '#Add Spark to PATH' >> $HOME/.bashrc && \
echo 'export SPARK_HOME=/opt/spark' >> $HOME/.bashrc && \
echo 'export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/bin' >> $HOME/.bashrc && \
echo 'export PYSPARK_PYTHON=python3' >> $HOME/.bashrc && \
source ~/.bashrc

You can check that pySpark is installed by running:

$ pyspark
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.0.0
      /_/

Using Python version 3.8.5 (default, Jul 28 2020 12:59:40)
SparkSession available as 'spark'.
>>> sc
<SparkContext master=local[*] appName=PySparkShell>

Install

You can install it easily with pip.

Make sure your public ssh key has been added to this repo, then:

Note: to add your key, contact kip.crossing@gmail.com

pip install git+ssh://git@github.com/soiltechproject/fsm-python.git

upgrade

To get the latest version:

pip install --upgrade git+ssh://git@github.com/soiltechproject/fsm-python.git

Setting up nodes (optional)

Start a standalone master server:

 start-master.sh 

Will return the location of a log file. You can look there for find out the URL of your master server.

start-slave.sh spark://ubuntu:7077

Tip You can stop master and slave spark running via:

$ SPARK_HOME/sbin/stop-slave.sh
$ SPARK_HOME/sbin/stop-master.sh

You can then use these nodes when configuring your spark application SparkContext

Configure spark application

When starting your new FSM based application, you SHOULD configure your spark context. This will tell the application where and how to deploy.

For example, to run locally:

import fsm
from pyspark import SparkContext, SparkConf
from pyspark.serializers import MarshalSerializer
conf = SparkConf().setMaster("local[*]") \
                .setAppName("FarmSoilMapping")\
                .set("serializer", MarshalSerializer)\
                .set("spark.scheduler.mode", "FAIR")
sc = SparkContext.getOrCreate(conf)
 
print(sc.uiWebUrl)

To learn more about how to configure your spark context, see:

https://spark.apache.org/docs/latest/configuration.html

Deploy

When it's time to deploy to the cloud, you can use a program such as flintrock

Clone this wiki locally