I upgraded my CDH, one of my NodeNamager cannot be brought up.
NullPointer Exceptions were found in error log:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
2014-11-03 16:51:10,042 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl failed in state INITED; cause: java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:289) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:252) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:235) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:250) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:445) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492) 2014-11-03 16:51:10,053 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Applications still running : [application_1413440084214_10329] 2014-11-03 16:51:10,053 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: org.apache.hadoop.yarn.server.nodemanager.containermanager.logagg 2014-11-03 16:51:10,054 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED; cause: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:289) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:252) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:235) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:250) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:445) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492) 2014-11-03 16:51:10,055 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NodeManager metrics system... 2014-11-03 16:51:10,055 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system stopped. 2014-11-03 16:51:10,055 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system shutdown complete. 2014-11-03 16:51:10,056 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:289) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:252) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:235) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:250) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:445) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492) 2014-11-03 16:51:10,058 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NodeManager at a1.hadoop.xxx.com/172.16.3.131 ************************************************************/ |
I tried deleting all ZooKeeper-related configs(which you can find it from Manually Upgrade CDH 5.2 in CM 5, exact the YARN part), not working.
Deleted the NodeManager instance and then reinstall, same.
Many ‘Recovering application’ and ‘Recovering localized resource’ were found in that log file:
1 2 3 4 5 6 |
2014-11-03 16:51:09,965 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Recovering localized resource { hdfs://nameservice1/user/xxx/.staging/job_1413440084214_10329/job.jar, 1414674773046, PATTERN, (?:classes/|lib/).* } at /hadoop/yarn/nm/usercache/xxx/appcache/application_1413440084214_10329/filecache/14/job.jar 2014-11-03 16:51:09,965 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/user/xxx/.staging/job_1413440084214_10329/job.jar(->/hadoop/yarn/nm/usercache/xxx/appcache/application_1413440084214_10329/filecache/14/job.jar) transitioned from INIT to LOCALIZED 2014-11-03 16:51:09,965 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Recovering localized resource { hdfs://nameservice1/user/xxx/.staging/job_1413440084214_10329/job.xml, 1414674774452, FILE, null } at /hadoop/yarn/nm/usercache/xxx/appcache/application_1413440084214_10329/filecache/15/job.xml 2014-11-03 16:51:09,966 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/user/xxx/.staging/job_1413440084214_10329/job.xml(->/hadoop/yarn/nm/usercache/xxx/appcache/application_1413440084214_10329/filecache/15/job.xml) transitioned from INIT to LOCALIZED 2014-11-03 16:51:09,986 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Recovering application application_1413440084214_10329 2014-11-03 16:51:10,005 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1413440084214_10329 transitioned from NEW to INITING |
Deleted those, still failed.
And, in the start-up message part,
1 2 3 4 5 6 7 8 |
2014-10-30 21:23:07,141 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: registered UNIX signal handlers for [TERM, HUP, INT] 2014-10-30 21:23:08,259 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService: Using state database at /tmp/hadoop-yarn/yarn-nm-recovery/yarn-nm-state for recover 2014-10-30 21:23:08,291 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService$LeveldbLogger: Recovering log #432 2014-10-30 21:23:08,309 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService$LeveldbLogger: Delete type=0 #432 2014-10-30 21:23:08,309 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService$LeveldbLogger: Delete type=3 #431 2014-10-30 21:23:08,321 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService: Loaded NM state version info 1.0 |
‘/tmp/hadoop-yarn/’ were read every on starting.
The solution is, stop the instance, delete ‘/tmp/hadoop-yarn/’ from local filesystem, start the instance.
YARN NodeManager Failed to Start by @sskaje: https://sskaje.me/2014/11/yarn-nodemanager-failed-start/
Incoming search terms:
Link to this post!