Linux, Oracle, Technical

The Clusterware C API

Well I’ve been incognito for the past two weeks or so because I’ve been finishing up a pretty detailed paper about Oracle Services. Finally finished up the first draft yesterday… it’s 16 pages in the IEEE Computer Society article LaTeX class – which doesn’t leave much whitespace! It’s a pretty comprehensive review of pretty much every aspect of services in Oracle databases.

Two Kinds of Configuration Data Stored in the OCR

Anyway tonight I was doing a little digging with the clusterware API and I thought I’d post my discoveries so far. The whole reason for this was that I’m trying to investigate the contents of the OCR. Even Julian Dyke and Steve Shaw’s book Pro Oracle Database 10g RAC on Linux seems to skip over the fact that there are in fact two different sets of data stored in the OCR. The spuriously-named OCRDUMP utility only dumps out half the contents. It seems that CRS_STAT -F will dump the other half.

For my paper I was trying to figure out what to call these two different kinds of data. Yesterday I was using linux’ READELF program to browse the internals of the clusterware shared libraries (don’t ask why I was doing this – I’m not really sure myself at the moment) and I noticed a number of functions that dealt with something called “stringpairs” and something else called “resources”. Then I’m like, duuh… Oracle’s clusterware API is public. Just have a look at the C header files.

So today I checked the header files out. I’m pretty sure that they’re the same thing: StringPairs are dumped by OCRDUMP and Resources are dumped by CRS_STAT. Also I’m pretty sure that there’s in fact nothing enforcing a hierarchical structure to your StringPairs; I think you can put pretty much any pair of strings in there that you want. The hierarchical, period-delimited structure Oracle’s utilities use is convenient but not required.

Writing C Code Against the Clusterware API

To prove it I was going to write a C program to get all the resource and stringpairs using the API and dump them – so that I could confirm that the lists match. So the first step was finding the header file. Easy enough… $CRS_HOME/crs/demo/clscrsx.h

Next step – I noticed that there’s a demo program in there too. Let’s get that running first. Well I didn’t quite get it running but I learned a few cool things. For one, some clusterware code has changed between and (I’ll check as soon as I get a chance). First lets look at how I got it compiled and running:

[root@rh4lab15 demo]# pwd

[root@rh4lab15 demo]# export C_INCLUDE_PATH=/u10/app/oracle/product/10.2.0/db_1/rdbms/public
[root@rh4lab15 demo]# export LIBRARY_PATH=/u10/app/oracle/product/10.2.0/db_1/lib

[root@rh4lab15 demo]# make -f demo_crs.mk demos
make -f /u10/crs/oracle/product/10.2.0/crs_1/crs/demo/demo_crs.mk build EXE=crsapp OBJS=crsapp.o
make[1]: Entering directory '/u10/crs/oracle/product/10.2.0/crs_1/crs/demo'
/usr/bin/gcc -o crsapp crsapp.o -L/lib/ -lhasgen10 -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 
-lcore10 -lsnls10 -lnls10  -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -locr10 -locrb10 
-locrutl10 -lhasgen10 -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10  
-lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lclntsh -lskgxn2 -lcore10 'cat /lib/sysliblist'
-lclntsh   -lclntsh -lskgxn2
cat: /lib/sysliblist: No such file or directory
make[1]: Leaving directory '/u10/crs/oracle/product/10.2.0/crs_1/crs/demo'

Simple enough. Now about — when you run it on it always seems to throw an error 330 when it calls clscrs_register_resource.

[root@iss-365-rac01 demo]# ./crsapp
Number of resources = 3
Registering resources: clscrs.example1.demo clscrs.example2.demo clscrs.example3.demo
register status = 200

Printing op_status list:
clscrs.example1.demo : 330, No HOSTING_MEMBERS should be given for balanced placement policy
clscrs.example2.demo : 330, No HOSTING_MEMBERS should be given for balanced placement policy
clscrs.example3.demo : 330, No HOSTING_MEMBERS should be given for balanced placement policy

register_resources returned with status 200

Querying resources: clscrs.example1.democlscrs.example2.democlscrs.example3.demo

clscrs_stat status = 200

Printing Attrlist:

Attrlist for res clscrs.example1.demo:
Num attrs = 0

Attrlist for res clscrs.example2.demo:
Num attrs = 0

Attrlist for res clscrs.example3.demo:
Num attrs = 0

Printing op_status list:
clscrs.example1.demo : 210, Could not find resource 'clscrs.example1.demo'.
clscrs.example2.demo : 210, Could not find resource 'clscrs.example2.demo'.
clscrs.example3.demo : 210, Could not find resource 'clscrs.example3.demo'.

stat_resources returned with status 200

start_resource returned with status 200

stop_resource returned with status 200

Unregistering resources: clscrs.example1.demo clscrs.example2.demo clscrs.example3.demo
unregister status = 200

Printing op_status list:
clscrs.example1.demo : 210, Could not find resource 'clscrs.example1.demo'.
clscrs.example2.demo : 210, Could not find resource 'clscrs.example2.demo'.
clscrs.example3.demo : 210, Could not find resource 'clscrs.example3.demo'.

unregister_resources returned with status 200

This is seemingly caused by a typo in the source code on lines 234, 252, and 270…

[root@iss-365-rac01 demo]# egrep -n '(clscrs_PLACEMENT|HOSTING)' crsapp.c
234:  clscrs_res_set_attr(res1, clscrs_PLACEMENT, (oratext *)"balanced");
241:  clscrs_res_set_attr(res1, clscrs_HOSTING_MEMBERS, nodename);
252:  clscrs_res_set_attr(res2, clscrs_PLACEMENT, (oratext *)"balanced");
259:  clscrs_res_set_attr(res2, clscrs_HOSTING_MEMBERS, nodename);
270:  clscrs_res_set_attr(res3, clscrs_PLACEMENT, (oratext *)"balanced");
277:  clscrs_res_set_attr(res3, clscrs_HOSTING_MEMBERS, nodename);

…which is fixed in…

[root@rh4lab15 demo]# egrep -n '(clscrs_PLACEMENT|HOSTING)' crsapp.c
238:  clscrs_res_set_attr(res1, clscrs_PLACEMENT, (oratext *)"favored");
245:  clscrs_res_set_attr(res1, clscrs_HOSTING_MEMBERS, nodename);
256:  clscrs_res_set_attr(res2, clscrs_PLACEMENT, (oratext *)"favored");
263:  clscrs_res_set_attr(res2, clscrs_HOSTING_MEMBERS, nodename);
274:  clscrs_res_set_attr(res3, clscrs_PLACEMENT, (oratext *)"favored");
281:  clscrs_res_set_attr(res3, clscrs_HOSTING_MEMBERS, nodename);

So it seems that some Oracle engineers are actively working on this exact code. I’m still running into a few problems… seem to be some memory issues. Also I’m not sure if vmware is causing problems. But more about that later – if I figure anything earth-shattering out then I’ll post it.

About Jeremy

Building and running reliable data platforms that scale and perform. about.me/jeremy_schneider


Comments are closed.


This is my personal website. The views expressed here are mine alone and may not reflect the views of my employer.

contact: 312-725-9249 or schneider @ ardentperf.com




Enter your email address to receive notifications of new posts by email.

Join 56 other subscribers
%d bloggers like this: