sparkonyarn运行问题记录--688IT编程网

sparkonyarn运⾏问题记录

问题⼀：

18/03/15 07:59:23 INFO yarn.Client:

client token: N/A

diagnostics: Application application_1521099425266_0002 failed 2 times due to AM Container for appattempt_1521099425266_0002_000002 exited with exitCode: 1

For more detailed output, check application tracking page:spark1:8088/proxy/application_1521099425266_0002/Then, click on links to logs of each attempt.

Diagnostics: Exception from container-launch.

Container id: container_1521099425266_0002_02_000001

Exit code: 1

Stack trace: ExitCodeException exitCode=1:

at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)

at org.apache.hadoop.util.Shell.run(Shell.java:455)

at org.apache.hadoop.util.ute(Shell.java:715)

at org.apache.hadoop.demanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)

at org.apache.hadoop.ainermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)

at org.apache.hadoop.ainermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)

at urrent.FutureTask.run(FutureTask.java:266)

at urrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at urrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

此问题⼀般和内存有关，调⼤内存

再把虚拟和物理监控线程关闭

<name&demanager.pmem-check-enabled</name>

<value>false</value>

</property>

<name&demanager.vmem-check-enabled</name>

<value>false</value>

</property>

问题⼆：

Container exited with a non-zero exit code 1

Failing this attempt. Failing the application.

ApplicationMaster host: N/A

ApplicationMaster RPC port: -1

queue: root.kfk

start time: 1521115132862

final status: FAILED

tracking URL: spark1:8088/cluster/app/application_1521099425266_0002

user: kfk

Exception in thread "main" org.apache.spark.SparkException: Application application_1521099425266_0002 finished with failed status

at org.apache.spark.deploy.yarn.Client.run(Client.scala:1104)

at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150)

at org.apache.spark.deploy.yarn.Client.main(Client.scala)

flect.NativeMethodAccessorImpl.invoke0(Native Method)

flect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

flect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at flect.Method.invoke(Method.java:498)

at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)

at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)

at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

18/03/15 07:59:23 INFO util.ShutdownHookManager: Shutdown hook called

18/03/15 07:59:23 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-edf48e42-1bda-41b6-8a1b-7f9e176da728

此问题⼀般是由于集配置原因，检查jdk ，yarn 的配置⽂件

问题三：

diagnostics: Application application_1521099425266_0004 failed 2 times due to Error launching appattempt_1521099425266_0004_000002. Got exception: org.apache.ptions.YarnException: Unauthorized request to start contain This token is expired. current time is 1521213771615 found 1521138303131

Note: System times on machines may be out of sync. Check system time and time zones.

wInstance0(Native Method)

wInstance(NativeConstructorAccessorImpl.java:62)

wInstance(DelegatingConstructorAccessorImpl.java:45)

at wInstance(Constructor.java:423)

at org.apache.hadoop.ds.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)

at org.apache.hadoop.ds.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)

at org.apache.hadoop.sourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123)

at org.apache.hadoop.sourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)

at urrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at urrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

同步集的时间即可，本⼈集其实⼀直都是时钟同步的，但是不知道为什么第三个节点会突然时钟错乱，jdk版本也错乱了

问题问题四：

Container exited with a non-zero exit code 15

Failing this attempt. Failing the application.

2018-03-16 11:59:29,345 INFO org.apache.hadoop.app.RMAppImpl: application_1521214648009_0003 State change from FINAL_SAVING to FAILED

2018-03-16 11:59:29,346 WARN org.apache.hadoop.sourcemanager.RMAuditLogger:

USER=kfk OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAIL For more detailed output, check application tracking page:spark2:8088/proxy/application_1521214648009_0003/Then, click on links to logs of each attempt.

Diagnostics: Exception from container-launch.

Container id: container_1521214648009_0003_02_000001

Exit code: 15

Stack trace: ExitCodeException exitCode=15:

at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)

at org.apache.hadoop.util.Shell.run(Shell.java:455)

at org.apache.hadoop.util.ute(Shell.java:715)

at org.apache.hadoop.demanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)

at org.apache.hadoop.ainermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)

at org.apache.hadoop.ainermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)

at urrent.FutureTask.run(FutureTask.java:266)

at urrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at urrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Container exited with a non-zero exit code 15

Failing this attempt. Failing the application. APPID=application_1521214648009_0003

2018-03-16 11:59:29,346 INFO org.apache.hadoop.sourcemanager.RMAppManager$ApplicationSummary: appId=app

lication_1521214648009_0003,name=st.MyScalaWordCout,user=kfk,queue=root.kfk,state=FAILED,t 2018-03-16 11:59:30,164 INFO org.apache.hadoop.sourcemanager.scheduler.fair.FairScheduler: Null

2018-03-16 12:00:15,892 INFO keeper.ClientCnxn: Client session timed out, have not heard from server in 6667ms for sessionid 0x3622d0b65080001, closing socket connection and attempting reconnect

2018-03-16 12:00:15,996 INFO org.apache.hadoop.very.ZKRMStateStore: Watcher event type: None with state:Disconnected for path:null for Service org.apache.hadoop.very.R 2018-03-16 12:00:15,996 INFO org.apache.hadoop.very.ZKRMStateStore: ZKRMStateStore Session disconnected

2018-03-16 12:00:16,123 INFO keeper.ClientCnxn: Opening socket connection to server spark1/192.168.208.151:2181. Will not attempt to authenticate using SASL (unknown error)

2018-03-16 12:00:17,199 INFO keeper.ClientCnxn: Client session timed out, have not heard from server in 6670ms for sessionid 0x1622882ae9c0001, closing socket connection and attempting reconnect

2018-03-16 12:00:17,301 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session disconnected. Entering

2018-03-16 12:00:17,838 INFO keeper.ClientCnxn: Opening socket connection to server spark3/192.168.208.153:2181. Will not attempt to authenticate using SASL (unknown error)

2018-03-16 12:00:18,838 INFO keeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.208.152:35089, server: spark3/192.168.208.153:2181

2018-03-16 12:00:18,843 INFO keeper.ClientCnxn: Session establishment complete on server spark3/192.168.208.153:2181, sessionid = 0x1622882ae9c0001, negotiated timeout = 10000

2018-03-16 12:00:18,844 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.

2018-03-16 12:00:18,858 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old a

ctive which needs to

2018-03-16 12:00:18,862 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a0272731203726d32

2018-03-16 12:00:18,862 INFO org.apache.hadoop.ha.ActiveStandbyElector: But old node has our own data, so don't need to fence it.

2018-03-16 12:00:18,862 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /yarn-leader-election/rs/ActiveBreadCrumb to indicate that the local node is the most

2018-03-16 12:00:19,127 INFO keeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.208.152:50168, server: spark1/192.168.208.151:2181

2018-03-16 12:00:21,384 INFO keeper.ClientCnxn: Session establishment complete on server spark1/192.168.208.151:2181, sessionid = 0x3622d0b65080001, negotiated timeout = 10000

2018-03-16 12:00:21,386 INFO org.f.Configuration: found l at file:/opt/modules/hadoop-2.6.0-cdh5.4.5/etc/l

2018-03-16 12:00:21,387 INFO org.apache.hadoop.very.ZKRMStateStore: Watcher event type: None with state:SyncConnected for path:null for Service org.apache.hadoop.very 2018-03-16 12:00:21,387 INFO org.apache.hadoop.very.ZKRMStateStore: ZKRMStateStore Session connected

2018-03-16 12:00:21,387 INFO org.apache.hadoop.very.ZKRMStateStore: ZKRMStateStore Session restored

2018-03-16 12:00:21,406 INFO org.apache.hadoop.sourcemanager.RMAuditLogger: USER=kfk OPERATION=refreshAdminAcls TARGET=AdminService RESULT=SUCCESS

2018-03-16 12:00:21,407 INFO org.apache.hadoop.sourcemanager.ResourceManager: Already in active state

2018-03-16 12:00:21,407 INFO org.apache.hadoop.sourcemanager.RMAuditLogger: USER=kfk OPERATION=refreshQueues TARGET=AdminService RESULT=SUCCESS

2018-03-16 12:00:21,408 INFO org.f.Configuration: found l at file:/opt/modules/hadoop-2.6.0-cdh5.4.5/etc/l

2018-03-16 12:00:21,426 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to

2018-03-16 12:00:21,426 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to

2018-03-16 12:00:21,426 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list

2018-03-16 12:00:21,431 INFO org.apache.hadoop.sourcemanager.RMAuditLogger: USER=kfk OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS

2018-03-16 12:00:21,432 INFO org.f.Configuration: found l at file:/opt/modules/hadoop-2.6.0-cdh5.4.5/etc/l

2018-03-16 12:00:21,450 INFO org.apache.hadoop.sourcemanager.RMAuditLogger: U

SER=kfk OPERATION=refreshSuperUserGroupsConfiguration TARGET=AdminService RESULT=SUCCESS

2018-03-16 12:00:21,450 INFO org.f.Configuration: found l at file:/opt/modules/hadoop-2.6.0-cdh5.4.5/etc/l

2018-03-16 12:00:21,451 INFO org.apache.hadoop.security.Groups: clearing userToGroupsMap cache

2018-03-16 12:00:21,451 INFO org.apache.hadoop.sourcemanager.RMAuditLogger: USER=kfk OPERATION=refreshUserToGroupsMappings TARGET=AdminService RESULT=SUCCESS

2018-03-16 12:00:21,451 INFO org.apache.hadoop.sourcemanager.RMAuditLogger: USER=kfk OPERATION=transitionToActive TARGET=RMHAProtocolService RESULT=SUCCESS

这些问题看表⾯⼀般看不出来，在yarn的⽇志⾥⾯可以查看具体⽇志

问题五：

Exception in thread "main" org.apache.spark.SparkException: Application application_1521293577934_0006 finished with failed status

at org.apache.spark.deploy.yarn.Client.run(Client.scala:1104)

at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150)

at org.apache.spark.deploy.yarn.Client.main(Client.scala)

flect.NativeMethodAccessorImpl.invoke0(Native Method)

flect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

flect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at flect.Method.invoke(Method.java:498)

at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)

exitedat org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)

at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

这只是个表⾯错误，实际错误到资源调度列表中的错误任务，点击进去发现实际错误

Diagnostics: User class threw exception: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://ns/opt/

688IT编程网

sparkonyarn运行问题记录

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式选择题

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

688IT编程网

sparkonyarn运行问题记录

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式 选择题

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

java正则表达式选择题

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

java正则表达式选择题

非零金额正则表达式