This week I’m working with a client to help them get an Oracle Applications environment up and running on RAC. We started with metalink note 362135.1 as a general guide for the process. Although I am much more familiar with RMAN this note recommended using rconfig to convert the database to RAC+ASM. (rconfig is the tool used by OEM on the backend to do conversions and is documented in Appendix D of the Clusterware Install Manual.)
rconfig is a pretty cool utility. Although it does not allow you to control much of the conversion process it is very slick and easy to use. However we did run into one rather odd problem…
[oracle@lab01 SampleXML]$ rconfig ConvertToRAC.xml Converting Database dbtest to Cluster Database. Target Oracle Home : /u01/oracle/app/product/10.2.0. Setting Data Files and Control Files Adding Database Instances Adding Redo Logs Enabling threads for all Database Instances Setting TEMP tablespace Adding UNDO tablespaces Adding Trace files Setting Flash Recovery Area Updating Oratab Creating Password file(s) Configuring Listeners Configuring related CRS resources Adding NetService entries Starting Cluster Database <?xml version="1.0" ?> <RConfig> <ConvertToRAC> <Convert> <Response> <Result code="1" > Got Exception </Result> <ErrorDetails> oracle.sysman.assistants.rconfig.engine.CRSStartupException: oracle.ops.mgmt.database.DatabaseException: PRKP-1001 : Error starting instance dbtest1 on node lab01 CRS-0215: Could not start resource 'ora.dbtest.dbtest1.inst'. PRKP-1001 : Error starting instance dbtest2 on node lab02 CRS-0215: Could not start resource 'ora.dbtest.dbtest2.inst'. Operation Failed. Refer logs at /u01/oracle/app/product/10.2.0/cfgtoollogs/rconfig/rconfig.log for more details. </ErrorDetails> </Response> </Convert> </ConvertToRAC></RConfig>
It made it almost all the way through then threw this error at the very end. A quick glance at the database alert log revealed that the database had never even attempted to start; seems that the clusterware never even got to this point. I combed through all the log files for clusterware and rconfig (rconfig puts its log files in $ORACLE_HOME/cfgtoollogs) but couldn’t find anything incriminating (or helpful). CRSD is the process that would be actually managing the process of starting the DB (through the oracle-supplied racgwrap script) but its log file was rather uninformative about what was going on. So I started restarting things… stopped and restarted the nodeapps but that didn’t make a difference. Then I stopped and restarted the node resources using crsctl – and suddenly everything came up. (?)
[oracle@lab01 SampleXML]$ crsctl stop resources Stopping resources. Successfully stopped CRS resources [oracle@lab01 SampleXML]$ crsctl start resources Starting resources. Successfully started CRS resources
Did this on both nodes and then everything came up (or at least attempted to – there was an unrelated problem on the second node that prevented it from completely starting). I did not change anything; just restarted the resources using crsctl. Still not sure exactly what the problem was but this seemed to work around it. (Strange.) If I figure anything out then I’ll post it. For the next environment I think that we’ll try bouncing the resources before running rconfig and see if that makes any difference.