Centralized TAF Configuration in 10g, Part 2

Posted by Jeremy ⋅ March 29, 2007 ⋅ 5 Comments

For the past month I’ve been intending to put together a post about DBMS_SERVICES and TAF – so I guess it’s about time to do it. I introduced the topic back in February in my discussion of 10g enhancements to client-side TAF configuration. As I pointed out then, there are two ways to configure TAF in 10g: client-side with TNSNAMES and server-side with DBMS_SERVICES.

The big advantage of centralized configuration is manageability. In 10g, even a TNSNAMES approach can be a little more centralized through the use of tools like DBCA and srvctl. (That’s what the other post was about.) But Oracle 10g introduced a completely new feature which pretty much rethinks the whole paradigm: as part of 10g’s expanded services framework you can now define TAF settings as an attribute of services instead of defining them in the TNSNAMES file.

However there are limitations to this approach. Before diving into specifics it’s worth noting a few of them:

This requires a 10g client. Probably not a big deal for a lot of people… but for example one client I worked with in California had a legacy application that was built with an 8i client. Upgrading the client would be a whole different project that required it’s own round of testing.
The server-side settings override the client. “The house always wins.” So if you want to change the failover configuration for a client you have to use a different service to connect.
You cannot yet configure server-side PRECONNECT failover; Oracle 10g only supports BASIC failover as a service attribute. But it seems to me that most people are using the BASIC/SELECT failover combination; so most people should still be able to use server-side TAF.
Remember that TAF does not support transactions. All TAF limitations of course still apply whether the configuration is server-side or client-side. There are a few other things that it also doesn’t deal with well (for example the PL/SQL environment and session attributes are lost on failover) – but the lack of transactional support is the most significant limitation in my opinion. No DML failover. You’ll get reconnected but every SQL statement will return an error until you rollback.

TAF Settings

You will remember that there are four settings used to configure BASIC TAF. (There is a fifth for PRECONNECT but it’s not presently supported as a service attribute.) Let’s review them in the context of server-side configuration:

Parameter	Description
METHOD	Highest-level, most important failover-related setting – defines what mechanism Oracle will use for failover. In 10g it will determine your strategy for the service’s PREFERRED and AVAILABLE instances since different methods use these instances in different ways. In TNSNAMES the METHOD can be NONE, BASIC, or PRECONNECT; however as a service attribute it can only be NONE or BASIC. NONE: No failover for this service. BASIC: If a connection fails then the next statement will attempt to automatically reconnect.
TYPE	Whether or not to use extra client memory to track information including the sql plan hash, SCN and rows fetched for each active SELECT statement so that they can be resumed transparently during a failover. It seems to me that since the client memory required is trivial there’s isn’t much reason not to use statement failover. Can be NONE, SESSION, or SELECT. NONE: No failover for this service. SESSION: Only reconnect the session; do not resume select statements. SELECT: Use small amount of additional client memory to store state information necessary to resume non-transactional select statements during a failover. Transactions will still break and require a rollback.
RETRIES	When TAF is enabled and a connection is lost Oracle will immediately try to reconnect. If it cannot reconnect then it will try again. This setting controls the number of attempts to connect before a failure is reported to the client application.
DELAY	Pause between each reconnection attempt as described above. (In seconds.) RETRIES and DELAY will determine how long a client will hang if the database becomes entirely unavailable before returning an error. If you make these big then your app could hang for a long time with no error when the DB goes down.

By default TAF is disabled but if any of these attributes are defined then TAF turns on and uses defaults for all undefined attributes. In a TNSNAMES file the presence of an empty FAILOVER_MODE block is enough to enable TAF. In the data dictionary service definition you must specify a TYPE of SESSION or SELECT to enable TAF. The defaults are BASIC/SESSION failover with a 5 RETRIES and a DELAY of 1 second.

Now that we’ve reviewed the TAF settings let’s take a closer look at how to actually configure and use it.

Server-Side TAF: Data Dictionary Only, No TNSNAMES Required

These settings were originally always defined as part of your connect descriptors in TNSNAMES. However as I’ve already mentioned they can now be defined as service attributes. As service attributes they are stored only once – in the data dictionary – for each service.

Before we get started lets setup a TNSNAMES entry for some testing:

db2taf1 =
  (description =
    (address_list=
      (address=(protocol=tcp)(host=10.9.8.65)(port=1521))
      (address=(protocol=tcp)(host=10.9.8.66)(port=1521)))
    (connect_data=
      (service_name=db2svc2.lab.ardentperf.com)
      (failover_mode=
        (method=preconnect)
        (backup=db2taf1))))

First question: is there any way to know if your session has TAF enabled without actually unplugging a server (or using any other method of killing the instance you’re connected to)? In fact it’s rather simple; since Oracle 8.0 the V$SESSION table has contained two fields FAILOVER_TYPE and FAILOVER_METHOD which show each session’s TAF settings.

SQL> connect jeremy/jeremy@db2taf1
Connected.
SQL> col service_name format a20
SQL> col username format a10
SQL> select username, service_name, failover_method, failover_type
  2  from v$session where sid=(select max(sid) from v$mystat);

USERNAME   SERVICE_NAME         FAILOVER_M FAILOVER_TYPE
---------- -------------------- ---------- -------------
JEREMY     db2svc2              PRECONNECT SESSION

Looks like we picked up the settings from the TNSNAMES entry.

Second question: how do you view the current TAF settings stored in the data dictionary? Service attributes can be viewed through the system views DBA_SERVICES and ALL_SERVICES (which are the same):

SQL> desc all_services
 Name                    Null?    Type
 ----------------------- -------- ----------------
    [...]
 FAILOVER_METHOD                  VARCHAR2(64)
 FAILOVER_TYPE                    VARCHAR2(64)
 FAILOVER_RETRIES                 NUMBER(10)
 FAILOVER_DELAY                   NUMBER(10)
    [...]

SQL> set null '(null)'
SQL> col failover_method format a20
SQL> col failover_type format a20
SQL> select name from all_services;

NAME
----------------------------------------------------------------
SYS$BACKGROUND
SYS$USERS
db2rac1XDB
db2rac1.lab.ardentperf.com
db2svc1
db2svc2
db2svc3
db2svc3_PRECONNECT

8 rows selected.

SQL> select failover_method, failover_type, failover_retries, failover_delay
  2  from all_services where name='db2svc2';

FAILOVER_METHOD      FAILOVER_TYPE        FAILOVER_RETRIES FAILOVER_DELAY
-------------------- -------------------- ---------------- --------------
(null)               (null)               (null)           (null)

Oracle will automatically create these entries in the data dictionary for every service on your system but will leave the TAF attributes empty. Enabling server-side TAF is as simple as setting these attributes. And this is accomplished with the DBMS_SERVICES package:

SQL> begin
  2    dbms_service.modify_service(
  3      service_name=>'db2svc2',
  4      failover_type=>DBMS_SERVICE.FAILOVER_TYPE_SELECT
  5    );
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> select failover_method, failover_type, failover_retries, failover_delay
  2  from all_services where name='db2svc2';

FAILOVER_METHOD      FAILOVER_TYPE        FAILOVER_RETRIES FAILOVER_DELAY
-------------------- -------------------- ---------------- --------------
(null)               SELECT               (null)           (null)

Research: Finding Out How It Really Works

Now that we’ve covered the fundamentals lets just run a few quick tests to confirm that TAF works the way we expect. First lets test the change we just made. The PL/SQL Reference notes that these server-side settings always override client-side settings… so to start lets just try the current configuration. We’ve set the TYPE to SELECT at the server which should override the default SESSION setting from my TNSNAMES configuration.

SQL> connect jeremy/jeremy@db2taf1
Connected.
SQL> select username, service_name, failover_method, failover_type
  2  from v$session where sid=(select max(sid) from v$mystat);

USERNAME   SERVICE_NAME         FAILOVER_METHOD      FAILOVER_TYPE
---------- -------------------- -------------------- --------------------
JEREMY     db2svc2              BASIC                SELECT

Do you see what else changed? Notice that even though my TNSNAMES specified a METHOD of PRECONNECT it was overridden by the default value of BASIC — even though I haven’t said to use BASIC failover anywhere! Oops! You will also find that if you specify METHOD alone (without TYPE) in the data dictionary then Oracle will ignore it altogether. This doesn’t seem ideal to me but it does sortof line up with the remarks in the PL/SQL Reference that I mentioned above.

Let’s do one more quick test. (There’s lots we could do but I don’t have all day to do these things… <g>) Lets see if we can explicitly disable TAF for this service.

SQL> begin
  2    dbms_service.modify_service(
  3      service_name=>'db2svc2',
  4      failover_type=>DBMS_SERVICE.FAILOVER_TYPE_NONE,
  5      failover_method=>DBMS_SERVICE.FAILOVER_METHOD_NONE
  6    );
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> select failover_method, failover_type, failover_retries, failover_delay
  2  from all_services where name='db2svc2';

FAILOVER_METHOD      FAILOVER_TYPE        FAILOVER_RETRIES FAILOVER_DELAY
-------------------- -------------------- ---------------- --------------
NONE                 NONE                 (null)           (null)

SQL> connect jeremy/jeremy@db2taf1
Connected.
SQL> select username, service_name, failover_method, failover_type
  2  from v$session where sid=(select max(sid) from v$mystat);

USERNAME   SERVICE_NAME         FAILOVER_METHOD      FAILOVER_TYPE
---------- -------------------- -------------------- --------------------
JEREMY     db2svc2              PRECONNECT           SESSION

Interesting! One thing that the docs fail to mention is that apparently the server will not override the client TAF settings if TYPE is set to NONE. (I’ll leave it to you to make up your own test, but I also found that if METHOD is set to NONE but TYPE is assigned then Oracle ignored the METHOD and use its default of BASIC. This applies to both DBMS_SERVICES and TNSNAMES.)

One final note: how does the client know where to reconnect to? If node 1 fails then how does the client find node 2? This information still needs to be defined in the TNSNAMES file; TAF can be centralized as a service attribute but connect-time load balancing and must still be defined on each client. TNSNAMES isn’t going away anytime soon. :)

The End

Oracle’s new services framework provides a much improved model for managing failover by making TAF an attribute of the service. This could allow you to truly centralize your management of failover-related settings and simplifies TNSNAMES file management. As always there are caveats and the important reminder to test everything – but I believe that there’s very little reason not to start taking advantage of this feature if you are using TAF in an environment with 10g clients and servers!

About Jeremy

Building and running reliable data platforms that scale and perform. about.me/jeremy_schneider

View all posts by Jeremy »

Discussion

5 thoughts on “Centralized TAF Configuration in 10g, Part 2”

Now, with your posts, you have put lots of seeds of doubts in my mind.

I am using 9i, but will soon migrate to 10g. I haven’t used TAF yet, but have always used standby database. Now in 10g I am going to use RAC. After reading your post, I am perplexed, which option to use. I was planning to use a 2-node RAC, with one hot standby.

Can TAF and RAC go hand in hand? Please pardon, if I am getting all of it awfully wrong from the start.

regards.

LikeLike

Posted by Fahd Mirza | April 1, 2007, 11:57 pm
I would recommend doing a little more reading; clustering with Oracle can be a little complicated and it’s good to have a solid understanding of the basics before you try to implement it.

Regarding your questions, of course RAC (Real Application Clusters) and TAF (Transparent Application Failover) work together. It is unclear to me what you’re saying about standby and RAC.

Anyway Oracle’s High Availability manual is a great place to start reading:
download-west.oracle.com/docs/cd/B19306_01/server.102/ b14210/hafeatures.htm#i1007163

Also, this is a good guide to getting the most benefit from online communities:
http://www.catb.org/~esr/faqs/smart-questions.html

LikeLike

Posted by Jeremy | April 23, 2007, 3:18 pm
Jeremy:

Just want to know if TAF can be applied on Primary and Standby database, when primary is not available, fast fail-over happens and service is still available.

In RAC env, we can use vips, can we use this concept in standby database?

Thanks,

Hank
NJ

LikeLike

Posted by Hank | November 8, 2007, 3:58 pm
I am not having a whole ton of luck with this. =)

Given:
currently, bannerbcp is the primary and oradev is the standby

tns entry is
chuck = (DESCRIPTION=
(ADDRESS_LIST=
(LOAD_BALANCE=off)(FAILOVER=on) (ADDRESS=(PROTOCOL=TCP)(HOST=bannerbcp)(PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST=oradev)(PORT=1521))
)
(CONNECT_DATA=
(SERVER=DEDICATED)
(SERVICE_NAME=eas-dev) (FAILOVER_MODE=(TYPE=SESSION)(METHOD=BASIC)(RETRIES=180)(DELAY=5))))

Server-side TAF is configured; note how the TYPE is different than specified in the tns string.

USERNAME SERVICE_NAME FAILOVER_M FAILOVER_TYPE
———- ——————– ———- ————-
ORACLE eas-dev BASIC SELECT

I have observed the following:
– if LOAD_BALANCE is on, I sometimes get the standby (ORA-01033)
– if LOAD_BALANCE is off and FAILOVER is on, I always get the first ADDRESS. At least, this has proven true after 10 tests and switching the order for half of them.
– If both LOAD_BALANCE and FAILOVER are off, I always get the first ADDRESS (same as above). What is FAILOVER doing for me?

I have a gut feeling that I know just enough to get myself in trouble and really confuse things. =)

LikeLike

Posted by Charles Schultz | July 9, 2009, 8:26 am