hadoop格式化namenode之后⽆法启动datanode
1 概述
解决hadoop启动hdfs时,datanode⽆法启动的问题。错误为:
[plain] view plaincopy
<embed id="ZeroClipboardMovie_1" src="csdnimg/public/highlighter/ZeroClipboard.swf" loop="false" menu="false" quality="best" bgcolor="#ffffff" width="16" height="16" name="ZeroClipboardMovie_1" align="middle" allowscriptaccess="always" allowfullscreen="false" type="application/x-shockwave-flash" pluginspage="www.macromedia/go/getflashplayer" flashvars="id=1&width=16&height=16" wmode="transparent" >
1. java.io.IOException: Incompatible clusterIDs in /home/lxh/hadoop/hdfs/data: namenode clusterID = CID-a3938a0b-57b5-
458d-841c-d096e2b7a71c; datanode clusterID = CID-200e6206-98b5-44b2-9e48-262871884eeb
2 问题描述
执⾏start-dfs.sh后,根据打印⽇志,可以看到分别执⾏了NameNode、DataNode的操作。
[plain] view plaincopy
<embed id="ZeroClipboardMovie_2" src="csdnimg/public/highlighter/ZeroClipboard.swf" loop="false" menu="false" quality="best" bgcolor="#ffffff" width="16" height="16" name="ZeroClipboardMovie_2" align="middle" allowscriptaccess="always" allowfullscreen="false" type="application/x-shockwave-flash" pluginspage="www.macromedia/go/getflashplayer" flashvars="id=2&width=16&height=16" wmode="transparent" >
1. Starting namenodes on [localhost]
2. localhost: starting namenode, logging to /home/lxh/hadoop/hadoop-2.4.1/logs/hadoop-lxh-namenode-ubuntu.out
3. localhost: starting datanode, logging to /home/lxh/hadoop/hadoop-2.
4.1/logs/hadoop-lxh-datanode-ubuntu.out
但是执⾏jps查看启动结果时,返现DataNode并没有启动。
[plain] view plaincopy
<embed id="ZeroClipboardMovie_3" src="csdnimg/public/highlighter/ZeroClipboard.swf" loop="false" menu="false" quality="best" bgcolor="#ffffff" width="16" height="16" name="ZeroClipboardMovie_3" align="middle" allowscriptaccess="always" allowfullscreen="false" type="application/x-shockwave-flash" pluginspage="www.macromedia/go/getflashplayer" flashvars="id=3&width=16&height=16" wmode="transparent" >
1. 10256 ResourceManager
2. 29634 NameNode
3. 29939 SecondaryNameNode
4. 30054 Jps
5. 10399 NodeManager
3 查问题
很是费解,刚刚还能够正常运⾏,并且执⾏了wordcount的测试程序。于是回想了⼀下刚才的操作,执⾏了dfs格式化(hdfs namenode -format和hdfs datanode -format),然后重新启动就出现了这个情况。难道与格式化有关?于是查看⽇志:
[plain] view plaincopy
<embed id="ZeroClipboardMovie_4" src="csdnimg/public/highlighter/ZeroClipboard.swf" loop="false" menu="false" quality="best" bgcolor="#ffffff" width="16" height="16" name="ZeroClipboardMovie_4" align="middle" allowscriptaccess="always" allowfullscreen="false" type="application/x-shockwave-flash" pluginspage="www.macromedia/go/getflashplayer" flashvars="id=4&width=16&height=16" wmode="transparent" >
1. 2014-08-08 00:32:08,787 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool
<registering> (Datanode Uuid unassigned) service to localhost/127.0.0.1:9000. Exiting.
2. java.io.IOException: Incompatible clusterIDs in /home/lxh/hadoop/hdfs/data: namenode clusterID = CID-a3938a0b-57b5-
458d-841c-d096e2b7a71c; datanode clusterID = CID-200e6206-98b5-44b2-9e48-262871884eeb
3. at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:477)
4. at org.apache.hadoop.hdfs.server.verTransitionRead(DataStorage.java:226)
5. at org.apache.hadoop.hdfs.server.verTransitionRead(DataStorage.java:254)
6. at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:974)
7. at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:945)
8. at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:278)
9. at org.apache.hadoop.hdfs.server.tToNNAndHandshake(BPServiceActor.java:220)
10. at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
11. at java.lang.Thread.run(Thread.java:745)
12. 2014-08-08 00:32:08,790 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block
pool <registering> (Datanode Uuid unassigned) service to localhost/127.0.0.1:9000
13. 2014-08-08 00:32:08,791 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering>
(Datanode Uuid unassigned)
根据⽇志描述,原因是datanode的clusterID 和 namenode的clusterID 不匹配。
原因到,看看是否如⽇志描述的这样。
打开l中关于datanode和namenode对应的⽬录,分别打开其中的current/VERSION⽂件,进⾏对⽐。
${datanode}/current/VERSION:
[plain] view plaincopy
<embed id="ZeroClipboardMovie_5" src="csdnimg/public/highlighter/ZeroClipboard.swf" loop="false" menu="false" quality="best" bgcolor="#ffffff" width="16" height="16" name="ZeroClipboardMovie_5" align="middle" allowscriptaccess="always" allowfullscreen="false" type="application/x-shockwave-flash" pluginspage="www.macromedia/go/getflashplayer" flashvars="id=5&width=16&height=16" wmode="transparent" >
1. storageID=DS-be8dfa2b-17b1-4c9f-bbfe-4898956a39ed
2. clusterID=CID-200e6206-98b5-44b2-9e48-262871884eeb
3. cTime=0
4. datanodeUuid=406b6d6a-0cb1-453d-b689-9ee62433b15d
box sizing5. storageType=DATA_NODE
6. layoutVersion=-55
${namenode}/current/VERSION:
[plain] view plaincopy
<embed id="ZeroClipboardMovie_6" src="csdnimg/public/highlighter/ZeroClipboard.swf" loop="false" menu="false" quality="best" bgcolor="#ffffff" width="16" height="16" name="ZeroClipboardMovie_6" align="middle" allowscriptaccess="always" allowfullscreen="false" type="application/x-shockwave-flash" pluginspage="www.macromedia/go/getflashplayer" flashvars="id=6&width=16&height=16" wmode="transparent" >
1. namespaceID=670379
2. clusterID=CID-a3938a0b-57b5-458d-841c-d096e2b7a71c
3. cTime=0
4. storageType=NAME_NODE
5. blockpoolID=BP-325596647-127.0.1.1-1407429078192
6. layoutVersion=-56
果然如⽇志中记录的⼀样,于是修改datanode的VERSION⽂件中的clusterID,使与namenode保持⼀致,然后启动dfs(执⾏start-
dfs.sh),在执⾏jps查看启动情况,发现全部正常启动。
[plain] view plaincopy
<embed id="ZeroClipboardMovie_7" src="csdnimg/public/highlighter/ZeroClipboard.swf" loop="false" menu="false" quality="best" bgcolor="#ffffff" width="16" height="16" name="ZeroClipboardMovie_7" align="middle" allowscriptaccess="always" allowfullscreen="false" type="application/x-shockwave-flash" pluginspage="www.macromedia/go/getflashplayer" flashvars="id=7&width=16&height=16" wmode="transparent" >
1. 10256 ResourceManager
2. 30614 NameNode
3. 30759 DataNode
4. 30935 SecondaryNameNode
5. 31038 Jps
6. 10399 NodeManager
4 分析问题原因
执⾏hdfs namenode -format后,current⽬录会删除并重新⽣成,其中VERSION⽂件中的clusterID也会随之变化,⽽datanode的VERSION⽂件中的clusterID保持不变,造成两个clusterID不⼀致。
所以为了避免这种情况,可以再执⾏的namenode格式化之后,删除datanode的current⽂件夹,或者修改datanode的VERSION⽂件中出clusterID与namenode的VERSION⽂件中的clusterID⼀样,然后重新启动dfs。
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论