I was interrupted again when upgrading CDH.
HDFS
This time, NameNode was not started, I have to bring them up and resume the upgrade progress.
I didn’t save any log about NN’s error, stop all HDFS components and ran ‘Upgrade HDFS Metadata‘, then start HDFS.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
2014-10-15 11:26:26,618 INFO org.apache.hadoop.hdfs.server.namenode.NNConf: Maximum size of an xattr: 16384 2014-10-15 11:26:26,627 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /hadoop/dfs/nn/in_use.lock acquired by nodename 30535@hadoop4.xxx.com 2014-10-15 11:26:26,717 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage java.io.IOException: File system image contains an old layout version -55. An upgrade to version -59 is required. Please restart NameNode with the "-rollingUpgrade started" option if a rolling upgrade is already started; or restart NameNode with the "-upgrade" option to start a new upgrade. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:231) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:994) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:726) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:529) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:585) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:751) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:735) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1476) 2014-10-15 11:26:26,728 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@hadoop4.xxx.com:50070 2014-10-15 11:26:26,829 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system... 2014-10-15 11:26:26,829 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped. 2014-10-15 11:26:26,830 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete. 2014-10-15 11:26:26,830 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.io.IOException: File system image contains an old layout version -55. An upgrade to version -59 is required. Please restart NameNode with the "-rollingUpgrade started" option if a rolling upgrade is already started; or restart NameNode with the "-upgrade" option to start a new upgrade. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:231) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:994) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:726) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:529) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:585) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:751) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:735) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1476) 2014-10-15 11:26:26,832 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2014-10-15 11:26:26,836 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hadoop4.xxx.com/172.16.3.19 ************************************************************/ |
YARN
Next, YARN.
I started YARN, and then all other services. Hive went down, then YARN.
I checked CM’s monitor:
1 |
The health test result for YARN_RESOURCEMANAGERS_HEALTH has become bad: ResourceManager summary: hadoop4.xxx.com (Availability: Stopped, Health: Good), hadoop5.xxx.com (Availability: Stopped, Health: Good). This health test is bad because the Service Monitor did not find an active ResourceManager. |
I found both instances of ResourceManager were ‘Standby‘.
Here is what I found from /var/log/hadoop-yarn/hadoop-cmf-yarn-RESOURCEMANAGER-hadoop4.xxx.com.log.out
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
2014-10-15 11:37:08,098 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session 2014-10-15 11:37:08,102 INFO org.apache.zookeeper.ZooKeeper: Session: 0x34911de9796000f closed 2014-10-15 11:37:09,102 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=a1.hadoop.xxx.com:2181,hadoop5.xxx.com:2181,hadoop4.xxx.com:2181 sessionTimeout=10000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@28b35e23 2014-10-15 11:37:09,104 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hadoop4.xxx.com/172.16.3.19:2181. Will not attempt to authenticate using SASL (unknown error) 2014-10-15 11:37:09,104 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hadoop4.xxx.com/172.16.3.19:2181, initiating session 2014-10-15 11:37:09,107 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hadoop4.xxx.com/172.16.3.19:2181, sessionid = 0x34911de97960012, negotiated timeout = 10000 2014-10-15 11:37:09,108 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2014-10-15 11:37:09,109 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected. 2014-10-15 11:37:09,110 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/var/run/cloudera-scm-agent/process/4352-yarn-RESOURCEMANAGER/yarn-site.xml 2014-10-15 11:37:09,115 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn OPERATION=refreshAdminAcls TARGET=AdminService RESULT=SUCCESS 2014-10-15 11:37:09,115 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Already in standby state 2014-10-15 11:37:09,115 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn OPERATION=transitionToStandby TARGET=RMHAProtocolService RESULT=SUCCESS 2014-10-15 11:37:56,578 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced... 2014-10-15 11:37:56,586 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a067961726e524d1205726d323038 2014-10-15 11:37:56,586 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /yarn-leader-election/yarnRM/ActiveBreadCrumb to indicate that the local node is the most recent active... 2014-10-15 11:37:56,589 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/var/run/cloudera-scm-agent/process/4352-yarn-RESOURCEMANAGER/yarn-site.xml 2014-10-15 11:37:56,596 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn OPERATION=refreshAdminAcls TARGET=AdminService RESULT=SUCCESS 2014-10-15 11:37:56,596 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to active state 2014-10-15 11:37:56,597 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn OPERATION=transitionToActive TARGET=RMHAProtocolService RESULT=FAILURE DESCRIPTION=Exception transitioning to active PERMISSIONS=All users are allowed 2014-10-15 11:37:56,597 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:291) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116) ... 4 more Caused by: org.apache.hadoop.service.ServiceStateException: RMActiveServices cannot enter state STARTED from state STOPPED at org.apache.hadoop.service.ServiceStateModel.checkStateTransition(ServiceStateModel.java:129) at org.apache.hadoop.service.ServiceStateModel.enterState(ServiceStateModel.java:111) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:190) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:928) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:968) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:965) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:965) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:282) ... 5 more |
Google helps a lot: http://community.cloudera.com/t5/Cloudera-Manager-Installation/CDH-5-YARN-Resource-Manager-HA-deadlock-in-Kerberos-cluster/td-p/14396
In /opt/cloudera/parcels/CDH/lib/zookeeper/bin/zkCli.sh,
Do
1 |
rmr /rmstore/ZKRMStateRoot/RMAppRoot/application_xxxxxx |
one by one, because zkCli.sh does not have wildcard support.
Hive
I just guessed that hive didn’t work because of YARN, but I was wrong.
I checked all hive related commands executed by CM:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
Exception in thread "main" MetaException(message:Hive Schema version 0.13.0 does not match metastore's schema version 0.12.0 Metastore is not upgraded or corrupt) at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6311) at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6282) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108) at com.sun.proxy.$Proxy0.verifySchema(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:485) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:532) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:406) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:365) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:55) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:60) at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4953) at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5173) at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5093) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) |
So I stopped Hive, ran Update Hive Metastore NameNodes and Upgrade Hive Metastore Database Schema, none of them worked but with the error message above.
I got more from logs:
1 2 3 4 |
Error: ERROR: language "plpgsql" does not exist Hint: Use CREATE LANGUAGE to load the language into the database. (state=42704,code=0) org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !! *** schemaTool failed *** |
The schemaTool reminded me, I manually upgraded hive metastore in Feb: Hive MetaStore Schema Upgrade Failed When Upgrading CDH5.
But this time, dbType should be postgres instead of derby.(Derby is not supported by Impala, that’s why I changed to postgresql embedded in Cloudera Manager.)
I cann’t find the terminal output, but when I ran:
1 |
/opt/cloudera/parcels/CDH/lib/hive/bin/schematool -dbType postgres -info -verbose |
I found a similar output (only first few lines) to the blog post above, saying schemaTool was trying to connect to derby
1 |
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true |
I re-deploy hive’s client configuration, and checked /etc/hive/conf/hive-site.xml, and compared with /var/run/cloudera-scm-agent/process/4525-hive-HIVEMETASTORE/hive-site.xml,
xml under /etc uses hive metastore’s thrift server and that under CM’s running folder speicified the exact database connection. schemaTool uses the /etc one.
So I replaced /etc one with CM’s, and then redo upgradeSchema:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
[root@hadoop4 ~]# /opt/cloudera/parcels/CDH/lib/hive/bin/schematool -dbType postgres -verbose -upgradeSchemaFrom 0.12.0 14/10/15 13:43:31 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Metastore connection URL: jdbc:postgresql://hadoop4.xxx.com:7432/hive Metastore Connection Driver : org.postgresql.Driver Metastore connection User: hive Starting upgrade metastore schema from version 0.12.0 to 0.13.0 Upgrade script upgrade-0.12.0-to-0.13.0.postgres.sql Connecting to jdbc:postgresql://hadoop4.xxx.com:7432/hive Connected to: PostgreSQL (version 8.4.20) Driver: PostgreSQL Native Driver (version PostgreSQL 9.0 JDBC4 (build 801)) Transaction isolation: TRANSACTION_READ_COMMITTED 0: jdbc:postgresql://hadoop4.xxx.com:7432/h> !autocommit on Autocommit status: true 0: jdbc:postgresql://hadoop4.xxx.com:7432/h> !closeall Closing: 0: jdbc:postgresql://hadoop4.xxx.com:7432/hive beeline> Completed pre-upgrade-0.12.0-to-0.13.0.postgres.sql Connecting to jdbc:postgresql://hadoop4.xxx.com:7432/hive Connected to: PostgreSQL (version 8.4.20) Driver: PostgreSQL Native Driver (version PostgreSQL 9.0 JDBC4 (build 801)) Transaction isolation: TRANSACTION_READ_COMMITTED 0: jdbc:postgresql://hadoop4.xxx.com:7432/h> !autocommit on Autocommit status: true 0: jdbc:postgresql://hadoop4.xxx.com:7432/h> SELECT 'Upgrading MetaStore schema from 0.12.0 to 0.13.0' +---------------------------------------------------+--+ | ?column? | +---------------------------------------------------+--+ | Upgrading MetaStore schema from 0.12.0 to 0.13.0 | +---------------------------------------------------+--+ 1 row selected (0.011 seconds) 0: jdbc:postgresql://hadoop4.xxx.com:7432/h> SELECT '< HIVE-5700 enforce single date format for partition column storage >' +------------------------------------------------------------------------+--+ | ?column? | +------------------------------------------------------------------------+--+ | < HIVE-5700 enforce single date format for partition column storage > | +------------------------------------------------------------------------+--+ 1 row selected (0.002 seconds) 0: jdbc:postgresql://hadoop4.xxx.com:7432/h> CREATE FUNCTION hive13_to_date(date_str text) RETURNS DATE AS $$ DECLARE dt DATE; BEGIN dt := date_str::DATE; RETURN dt; EXCEPTION WHEN others THEN RETURN null; END; $$ LANGUAGE plpgsql Error: ERROR: language "plpgsql" does not exist Hint: Use CREATE LANGUAGE to load the language into the database. (state=42704,code=0) Closing: 0: jdbc:postgresql://hadoop4.xxx.com:7432/hive org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !! org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !! at org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:252) at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:533) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: Schema script failed, errorcode 2 at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:410) at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:383) at org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:247) ... 6 more *** schemaTool failed *** [root@hadoop4 ~]# |
Same error as I saw in CM’s log, plpgsql does not exist. Fix this by:
1 2 |
[root@hadoop4 conf]# sudo -u hive createlang plpgsql hive -p 7432 Password: |
You can find password from the xml I mentioned above of file like
1 2 3 4 5 6 7 8 9 |
[root@hadoop4 conf]# cat /var/run/cloudera-scm-agent/process/4507-hive-metastore-upgrade/metastore_db_py.properties #metastore database properties #Wed Oct 15 12:39:05 CST 2014 dbtype=postgresql user=hive port=7432 pw=xxxxxx host=hadoop4.xxx.com dbname=hive |
If you meet error message saying OWNER_NAME or OWNER_TYPE already exists in table DBS, open /opt/cloudera/parcels/CDH/lib/hive/scripts/metastore/upgrade/postgres/016-HIVE-6386.postgres.sql and comment/delete the two ALTER TABLE lines.
Incoming search terms:
- restart NameNode with the -rollingUpgrade started option if a rolling upgrade is already started; or restart NameNode with the -upgrade option to start a new upgrade
- Please restart NameNode with the -rollingUpgrade started
- anywrx
- beautyvst
- bicyclevk3
- CDH An upgrade to version -60 is required
- growihz
- micemtt
- NoAuth for /yarn-leader-election/yarnRM/ActiveBreadCrumb
- Please restart NameNode with the -rollingUpgrade started option if a rolling upgrade is already started; or restart NameNode with the -upgrade option to start a new upgrade
- safety22k
- seeyw7