clouderaserver与agent失连问题
#该主机已与 Cloudera Manager Server 未建⽴联系
1
1
#该主机已与 Cloudera Manager Server 未建⽴联系
server端monitor服务正常agent连不上
#该主机已与 Cloudera Manager Server 建⽴联系。该主机未与 Host Monitor 建⽴联系。
[20/Feb/2020 16:51:51 +0000] 22086 MonitorDaemon-Reporter firehoses INFO Creating a connection to the ACTIVITYMONITOR.
[20/Feb/2020 16:51:51 +0000] 22086 MonitorDaemon-Reporter firehoses INFO Creating a connection to the SERVICEMONITOR.
[20/Feb/2020 16:51:51 +0000] 22086 MonitorDaemon-Reporter firehoses INFO Creating a connection
to the HOSTMONITOR.
[20/Feb/2020 16:51:51 +0000] 22086 MonitorDaemon-Reporter throttling_logger ERROR Error sending messages to firehose: mgmt-HOSTMONITOR-d592ed6aea0516a09027c2cf834d8979
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/firehose.py", line 121, in _send
self._port)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 469, in __init__
File "/usr/lib64/python2.7/httplib.py", line 833, in connect
self.timeout, self.source_address)
File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused
15
1
#该主机已与 Cloudera Manager Server 建⽴联系。该主机未与 Host Monitor 建⽴联系。
2
[20/Feb/2020 16:51:51 +0000] 22086 MonitorDaemon-Reporter firehoses INFO Creating a connection to the ACTIVITYMONITOR.
3
[20/Feb/2020 16:51:51 +0000] 22086 MonitorDaemon-Reporter firehoses INFO Creating a connection to the SERVICEMONITOR.
4
[20/Feb/2020 16:51:51 +0000] 22086 MonitorDaemon-Reporter firehoses INFO Creating a connectio
n to the HOSTMONITOR.
5
[20/Feb/2020 16:51:51 +0000] 22086 MonitorDaemon-Reporter throttling_logger ERROR Error sending messages to firehose: mgmt-HOSTMONITOR-d592ed6aea0516a09027c2cf834d8979
6
Traceback (most recent call last):
7
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/firehose.py", line 121, in _send
8
self._port)
9
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 469, in __init__
10
11
File "/usr/lib64/python2.7/httplib.py", line 833, in connect
12
self.timeout, self.source_address)
13
File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
14
raise err
15
error: [Errno 111] Connection refused
参考:
server⽇志⾥
2020-02-20 17:25:06,371 WARN New I/O boss #388:com.f.log.AgentResponseAsyncHandler: (2 skipped) Exception thrown while trying to get log search results from agent on host: creative
java.ConnectException: Connection timed out: creative/172.19.40.203:9000
。。
2020-02-20 17:35:17,209 ERROR ParcelUpdateService:com.cloudera.parcelponents.ParcelDownloaderImpl: (10 skipped) Unable to retrieve remote parcel repository
manifest
urrent.ExecutionException: java.UnknownHostException: archive.cloudera: Name or service not known
cloudera agent monitor firehose error: [Errno 111] Connection refused
#重新添加主机
2020-02-20 20:19:57,879 ERROR scm-web-4143:f.model.DbCommand: Command null(DeployClusterClientConfig) has completed. finalstate:FINISHED, success:false, msg:Command Deploy Client Configuration is not currently a 2020-02-20 20:19:57,894 INFO scm-web-4143:prise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/v7/clusters/LogServerClu/commands/deployClientConfig, Status:200
2020-02-20 20:19:57,978 WARN scm-web-4105:fmand.flow.SeqFlowCmd: Invalid command state json
prise.JsonUtil2$JsonRuntimeException: com.fasterxml.MismatchedInputException: No content to map due to end-of-input
at [Source: (String)""; line: 1, column: 0]
at prise.JsonUtil2.valueFromString(JsonUtil2.java:193)
8
1
cloudera agent monitor firehose error: [Errno 111] Connection refused
2
#重新添加主机
3
2020-02-20 20:19:57,879 ERROR scm-web-4143:f.model.DbCommand: Command null(DeployClusterClientConfig) has completed. finalstate:FINISHED, success:false, msg:Command Deploy Client Configuration 4
2020-02-20 20:19:57,894 INFO scm-web-4143:prise.JavaMelodyFacade: Exiting
HTTP Operation: Method:POST, Path:/v7/clusters/LogServerClu/commands/deployClientConfig, Status:200
5
2020-02-20 20:19:57,978 WARN scm-web-4105:fmand.flow.SeqFlowCmd: Invalid command state json
6
prise.JsonUtil2$JsonRuntimeException: com.fasterxml.MismatchedInputException: No content to map due to end-of-input
7
at [Source: (String)""; line: 1, column: 0]
8
at prise.JsonUtil2.valueFromString(JsonUtil2.java:193)
不是JDK的原因!
搞了⼀天最终⼤法:
把170,171,172,221四台agent停掉,停掉170 server;然后再重启server,四个agent
#四台
systemctl stop cloudera-scm-agent
systemctl stop cloudera-scm-server
#170
systemctl start cloudera-scm-server
#四台
systemctl start cloudera-scm-agent
7
1
#四台
2
systemctl stop cloudera-scm-agent
3
systemctl stop cloudera-scm-server
4
#170
5
systemctl start cloudera-scm-server
6
#四台
7
systemctl start cloudera-scm-agent
还是没解决221节点(内⽹ip映射)从cloudera删除集:四台节点都是配置221的公⽹ip映射;然后从新添加到集。
#scm-status.log
20/Feb/2020 21:56:44 +0000] 5440 MainThread _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE Started monitor thread 'Autoreloader'.
[20/Feb/2020 21:56:44 +0000] 5440 MainThread _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE Started monitor thread '_TimeoutMonitor'.
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging  ERROR    [20/Feb/2020:21:56:44] ENGINE Error in HTTP server: shutting down
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
self.httpserver.start()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cheroot/server.py", line 1326, in start
(msg)
error: No socket could be created -- (('47.103.112.221', 9000): [Errno 99] Cannot assign requested address)
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE Bus STOPPING
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('creative', 9000)) already shut down [20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE Stopped thread '_TimeoutMonitor'.
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE Stopped thread 'Autoreloader'.
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE Bus STOPPED
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE Bus EXITING
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE Bus EXITED
#scm-agent.log
[20/Feb/2020 21:56:35 +0000] 5322 MainThread _cplogging  INFO    [20/Feb/2020:21:56:35] ENGINE Serving on 127.0.0.1:9001
[20/Feb/2020 21:56:35 +0000] 5322 MainThread _cplogging  INFO    [20/Feb/2020:21:56:35] ENGINE Bus STARTED
[20/Feb/2020 21:56:37 +0000] 5322 MainThread main        ERROR    Top-level exception: <Fault 40: 'ABNORMAL_TERMINATION: status_server'>
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/main.py", line 107, in main_impl
ag.start(legacy_supervisor)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 839, in start
self.supervisor_client.start_process(STATUS_SERVER_PROC)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/util/__init__.py", line 531, in new_fn
return fn(self, *args, **kwargs)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/supervisor.py", line 406, in start_process
raise RetryableProcessException(fault)
RetryableProcessException: <Fault 40: 'ABNORMAL_TERMINATION: status_server'>
###查看ip及hostname对应关系
[root@creative cloudera-scm-agent]# python -c 'import socket; fqdn(), fqdn())'
creative 47.103.112.221
36
1
#scm-status.log
2
20/Feb/2020 21:56:44 +0000] 5440 MainThread _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE Started monitor thread 'Autoreloader'.
3
[20/Feb/2020 21:56:44 +0000] 5440 MainThread _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE Started monitor thread '_TimeoutMonitor'.
4
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging  ERROR    [20/Feb/2020:21:56:44] ENGINE Error in HTTP server: shutting down
5
Traceback (most recent call last):
6
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
7
self.httpserver.start()
8
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cheroot/server.py", line 1326, in start
9
(msg)
10
error: No socket could be created -- (('47.103.112.221', 9000): [Errno 99] Cannot assign requested address)
11
12
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE Bus STOPPING
13
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('creative', 9000)) already shut down 14
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE Stopped thread '_TimeoutMonitor'.
15
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE Stopped thread 'Autoreloader'.
16
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE Bus STOPPED
17
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE Bus EXITING
18
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging  INFO    [20/Feb/2020:21:56:44] ENGINE Bus EXITED
19
#scm-agent.log
20
[20/Feb/2020 21:56:35 +0000] 5322 MainThread _cplogging  INFO    [20/Feb/2020:21:56:35] ENGINE Serving on 127.0.0.1:9001
21exited
[20/Feb/2020 21:56:35 +0000] 5322 MainThread _cplogging  INFO    [20/Feb/2020:21:56:35] ENGINE Bus STARTED
22
[20/Feb/2020 21:56:37 +0000] 5322 MainThread main        ERROR    Top-level exception: <Fault 40: 'ABNORMAL_TERMINATION: status_server'>
23
Traceback (most recent call last):
24
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/main.py", line 107, in main_impl
25
ag.start(legacy_supervisor)
26
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 839, in start
27
self.supervisor_client.start_process(STATUS_SERVER_PROC)
28
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/util/__init__.py", line 531, in new_fn
29
return fn(self, *args, **kwargs)
30
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/supervisor.py", line 406, in start_process
31
raise RetryableProcessException(fault)
32
RetryableProcessException: <Fault 40: 'ABNORMAL_TERMINATION: status_server'>
33
34
###查看ip及hostname对应关系
35
[root@creative cloudera-scm-agent]# python -c 'import socket; fqdn(), fqdn())'
36
creative 47.103.112.221
最终删除agent从新安装⽤公⽹ip配置hosts⽂件映射
creative: IOException thrown while collecting data from host: Connection refused (Connection refused)
#agent.log
[20/Feb/2020 22:48:42 +0000] 11398 MonitorDaemon-Reporter throttling_logger ERROR (10 skippe
d) Error sending messages to firehose: mgmt-HOSTMONITOR-d592ed6aea0516a09027c2cf834d8979
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/firehose.py", line 121, in _send
self._port)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 469, in __init__
File "/usr/lib64/python2.7/httplib.py", line 833, in connect
self.timeout, self.source_address)
File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused
#/var/log/cloudera-scm-firehose
#activemontor⽇志
2020-02-20 21:01:43,753 WARN f.BasicScmProxy: Exception while getting current fragments hashes
java.ConnectException: Connection refused (Connection refused)
...
2020-02-20 21:02:40,203 INFO firehose.Main: Starting Firehose. JVM Args: [-XX:+UseConcMarkSweepGC, -XX:+UseParNewGC, -Dmgmt.log.file=mgmt-cmf-mgmt-ACTIVITYMONITOR-hz-seeing-bg-01.log.out, -Djava.awt.he ...#hostmontor⽇志
2020-02-20 21:02:45,838 WARN firehose.HMONToSMONHostSubjectRecordPublisher: Failed to send messages to SMON.
flect.UndeclaredThrowableException
at com.sun.proxy.$Proxy23.writeStatusRecords(Unknown Source)
at firehose.BasicFirehoseClient.writeStatusRecords(BasicFirehoseClient.java:75)
at firehose.HMONToSMONHostSubjectRecordPublisher.processRecords(HMONToSMONHostSubjectRecordPublisher.java:107)
at store.leveldb.LDBSubjectRecordStore.write(LDBSubjectRecordStore.java:399)
at kaiser.HMONTestRunner.runHostTestsForSession(HMONTestRunner.java:86)
at kaiser.HMONTestRunner.runTestsForSession(HMONTestRunner.java:66)
at kaiser.BaseTestRunner.runTestsOnAllSubjects(BaseTestRunner.java:143)
at kaiser.KaiserService$KaiserServiceRunner.run(KaiserService.java:138)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.avro.AvroRemoteException: java.ConnectException: Connection refused (Connection refused)
33
1
creative: IOException thrown while collecting data from host: Connection refused (Connection refused)
2
#agent.log
3
[20/Feb/2020 22:48:42 +0000] 11398 MonitorDaemon-Reporter throttling_logger ERROR (10 skipped) Error sending messages to firehose: mgmt-HOSTMONITOR-d592ed6aea0516a09027c2cf834d8979
4
Traceback (most recent call last):
5
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/firehose.py", line 121, in _send
6
self._port)
7
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 469, in __init__
8
9
File "/usr/lib64/python2.7/httplib.py", line 833, in connect
10
self.timeout, self.source_address)
11
File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
12
raise err
13
error: [Errno 111] Connection refused
14
#/var/log/cloudera-scm-firehose
15
#activemontor⽇志
16
2020-02-20 21:01:43,753 WARN f.BasicScmProxy: Exception while getting current fragments hashes
17
java.ConnectException: Connection refused (Connection refused)
18
...
19
2020-02-20 21:02:40,203 INFO firehose.Main: Starting Firehose. JVM Args: [-XX:+UseConcMarkSweepGC, -XX:+UseParNewGC, -Dmgmt.log.file=mgmt-cmf-mgmt-ACTIVITYMONITOR-hz-seeing-bg-01.log 20
.
..#hostmontor⽇志
21
2020-02-20 21:02:45,838 WARN firehose.HMONToSMONHostSubjectRecordPublisher: Failed to send messages to SMON.
22
flect.UndeclaredThrowableException
23
at com.sun.proxy.$Proxy23.writeStatusRecords(Unknown Source)
24
at firehose.BasicFirehoseClient.writeStatusRecords(BasicFirehoseClient.java:75)
25
at firehose.HMONToSMONHostSubjectRecordPublisher.processRecords(HMONToSMONHostSubjectRecordPublisher.java:107)
26
at store.leveldb.LDBSubjectRecordStore.write(LDBSubjectRecordStore.java:399)
27
at kaiser.HMONTestRunner.runHostTestsForSession(HMONTestRunner.java:86)
28
at kaiser.HMONTestRunner.runTestsForSession(HMONTestRunner.java:66)
29

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。