Spark(1)分布式集安装部署与验证测试⽬录
⼀、准备⼯作
1、准备三台服务器(虚拟机):
weekend110192.168.2.100
weekend01192.168.2.101
weekend02192.168.2.102
2、Hadoop已经安装好并能正常启动
⼆、安装部署
1、先在⼀台机器(weekend110)上安装Scala和Spark
安装Scala:
官⽹下载安装包并上传到虚拟机,然后解压:tar -zxvf soft/scala-2. -C /home/hadoop/app
配置到环境变量:
export SCALA_HOME=/home/hadoop/app/scala-2.11.8
export PATH=$PATH:$SCALA_HOME/bin
# 虽然spark本⾝⾃带scala,但还是建议安装
安装Spark:
官⽹下载安装包并上传到虚拟机,然后解压:tar -zxvf soft/spark-2.4. -C /home/hadoop/app
配置到环境变量:
export SPARK_HOME=/home/hadoop/app/spark-2.4.0-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
修改配置⽂件:
1)修改spark-env.sh
[hadoop@weekend110 ~]$ cd /home/hadoop/app/spark-2.4.0-bin-hadoop2.7/conf
[hadoop@weekend110 conf]$ cp plate spark-env.sh
[hadoop@weekend110 conf]$ vi spark-env.sh
[hadoop@weekend110 conf]$ grep -Ev "^$|#" /home/hadoop/app/spark-2.4.0-bin-hadoop2.7/conf/spark-env.sh
HADOOP_CONF_DIR=/home/hadoop/app/hadoop-2.7.0/etc/hadoop/
JAVA_HOME=/home/hadoop/app/jdk1.8.0_231
SCALA_HOME=/home/hadoop/app/scala-2.11.8
SPARK_MASTER_HOST=weekend110
SPARK_MASTER_PORT=8080
SPARK_MASTER_WEBUI_PORT=7077
SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=1g
SPARK_WORKER_PORT=8081
SPARK_WORKER_WEBUI_PORT=7078
YARN_CONF_DIR=/home/hadoop/app/hadoop-2.7.0/etc/hadoop
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080
-ainedApplications=30
-Dspark.history.fs.logDirectory=hdfs://weekend110:9000/test"
2)修改slaves配置⽂件
[hadoop@weekend110 conf]$ vi slaves
[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ grep -Ev "^$|#" /home/hadoop/app/spark-2.4.0-bin-hadoop2.7/conf/slaves
weekend01
weekend02
3)配置JobHistoryServer
[hadoop@weekend110 conf]$ f.f
[hadoop@weekend110 conf]$ f
[hadoop@weekend110 conf]$ grep -Ev "^$|#" /home/hadoop/app/spark-2.4.0-bin-hadoop2.7/f
abled          true
spark.eventLog.dir              hdfs://weekend110:9000/test
spark.yarn.historyServer.address=weekend110:18080
spark.history.ui.port=18080
注意:HDFS上的⽬录需要提前创建。
参数描述:
spark.eventLog.dir:Application在运⾏过程中所有的信息均记录在该属性指定的路径下;
spark.history.ui.port=18080  WEBUI访问的端⼝号为18080
hadoop分布式集搭建spark.history.fs.logDirectory=hdfs://weekend110:9000/test 配置了该属性后,在start-history-server.sh时就⽆需再显式的指定路径,Spark History Server页⾯只展⽰该指定路径下的信息
ainedApplications=30指定保存Application历史记录的个数,如果超过这个值,旧的应⽤程序信息将被删除,这个是内存中的应⽤数,⽽不是页⾯上显⽰的应⽤数。
4)分别分发到 weekend01 和 weekend02 这两台机器上:
[hadoop@weekend110 app]$ scp -r /home/hadoop/app/scala-2.11.8 hadoop@weekend01:/home/hadoop/app
[hadoop@weekend110 app]$ scp -r /home/hadoop/app/scala-2.11.8 hadoop@weekend02:/home/hadoop/app
[hadoop@weekend110 app]$ scp -r /home/hadoop/app/spark-2.4.0-bin-hadoop2.7 hadoop@weekend01:/home/hadoop/app [hadoop@weekend110 app]$ scp -r /home/hadoop/app/spark-2.4.0-bin-hadoop2.7 hadoop@weekend02:/home/hadoop/app
5)分别在 weekend01 和 weekend02 上加载好环境变量,并source⽣效:
[hadoop@weekend110 app]$ scp /home/hadoop/.bash_profile hadoop@weekend01:/home/hadoop
[hadoop@weekend110 app]$ scp /home/hadoop/.bash_profile hadoop@weekend02:/home/hadoop
[hadoop@weekend01 app]$ source /home/hadoop/.bash_profile
[hadoop@weekend02 app]$ source /home/hadoop/.bash_profile
三、启动服务和验证
1、修改事宜:
为了避免和hadoop中的start/stop-all.sh脚本发⽣冲突,将spark/sbin/下的start/stop-all.sh脚本进⾏重命名
[hadoop@weekend110 sbin]$ mv start-all.sh start-spark-all.sh
[hadoop@weekend110 sbin]$ mv stop-all.sh stop-spark-all.sh
2、启动spark,在启动之前先启动Hadoop服务(HDFS+YARN):
[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ sbin/start-spark-all.sh
启动完spark之后,
会在我们配置的主节点weekend110上启动⼀个进程Master
会在我们配置的从节点weekend01上启动⼀个进程Worker
会在我们配置的从节点weekend02上启动⼀个进程Worker
3、启动历史服务:
[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ sbin/start-history-server.sh
4、进程服务都起来了:
[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ jps
3920 ResourceManager
4034 NodeManager
3763 SecondaryNameNode
5187 Master
3588 DataNode
5269 HistoryServer
3452 NameNode
7375 Jps
5、执⾏spark⾃带的测试任务
[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ bin/spark-submit \
--class org.amples.SparkPi \
--master yarn \
--deploy-mode client \
.
/examples/jars/spark-examples_2.11-2.1.1.jar \
100
6、查看结果
[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ bin/spark-submit \
> --class org.amples.SparkPi \
> --master yarn \
> --deploy-mode client \
> ./examples/jars/spark-examples_2.11-2.4.0.jar \
> 100
20/07/31 02:16:20 WARN NativeCodeLoader: Unable to load native-hadoop library for using builtin-java classes where applicable 20/07/31 02:16:24 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. Pi is roughly 3.1
418083141808313

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。