Spark on yarn 集群安装

Scala安装(不是必要的步骤)

下载合适的scala版本

1
wget http://scala-lang.org/files/archive/scala-2.11.8.tgz

解压安装,设置环境变量

1
2
3
4
tar -zxvf scala-2.11.8.tgz -C /usr/local/
vim /etc/profile
export SCALA_HOME=/usr/local/scala
export PATH=$SCALA_HOME/bin:$PATH

测试

1
2
[root@dev-test ~]# scala -version
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL

Spark 安装

这里因为之前已经安装好了hadoop,所以直接在hadoop集群上安装spark
下载合适的spark版本

  • 解压安装,修改配置文件

    1
    2
    3
    4
    [hadoop@dev-test ~]$ tar -zxvf spark-1.6.1-bin-hadoop2.6.tgz
    [hadoop@dev-test ~]$ mv spark-1.6.1-bin-hadoop2.6 spark
    [hadoop@dev-test spark]$ cd conf/
    [hadoop@dev-test conf]$ cp spark-env.sh.template spark-env.sh
  • 修改spark-env.sh

    1
    2
    3
    4
    5
    6
    JAVA_HOME=/usr/local/java
    HADOOP_HOME=/home/hadoop/hadoop
    SPARK_JAVA_OPTS=-Dspark.driver.port=53411
    HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
    SPARK_MASTER_IP=HADOOP-MASTER-153
  • 复制spark-defaults.conf.template 为 spark-defaults.conf,并修改:

    1
    2
    spark.master            spark://<HOSTNAME OF YOUR MASTER NODE>:7077
    spark.serializer org.apache.spark.serializer.KryoSerializer
  • 修改slaves文件。

    1
    2
    3
    4
    5
    <HOSTNAME OF YOUR MASTER NODE>
    <HOSTNAME OF YOUR SLAVE NODE 1>
    ...
    ...
    <HOSTNAME OF YOUR SLAVE NODE n>
  • 分发Spark安装包到各个节点上

    1
    2
    3
    4
    scp -r  ./spark HADOOP-SLAVE-154:/home/hadoop
    scp -r ./spark HADOOP-SLAVE-146:/home/hadoop
    scp -r ./spark HADOOP-SLAVE-147:/home/hadoop
    scp -r ./spark HADOOP-SLAVE-148:/home/hadoop
  • 启动关停Spark

    1
    2
    $ sh /home/hadoop/spark/sbin/start-all.sh
    $ sh /home/hadoop/spark/sbin/stop-all.shstop-all.sh

Spark 测试

访问 http://hadoop-master-153:8080/

在yarn上测试

1
[hadoop@HADOOP-MASTER-153 ~]$ spark/bin/spark-shell --master yarn-client

本地:

1
2
[hadoop@HADOOP-MASTER-153 ~]$ spark/bin/spark-shell --master local[2]
#2代表两个线程

参考链接