GC Buffer Busy Waits in RAC: Finding Hot Blocks

Posted by Jeremy ⋅ September 12, 2007 ⋅ 28 Comments

Well I don’t have a lot of time to write anything up… sheesh – it’s like 10pm and I’m still messing with this. I should be in bed. But before I quit for the night I thought I’d just do a quick post with some queries that might be useful for anyone working on a RAC system who sees a lot of the event “gc buffer busy”.

Now you’ll recall that this event simply means that we’re waiting for another instance who has the block. But generally if you see lots of these then it’s an indication of contention across the cluster. So here’s how I got to the bottom of a problem on a pretty active 6-node cluster here in NYC.

Using the ASH

I’ll show two different ways here to arrive at the same conclusion. First, we’ll look a the ASH to see what the sampled sessions today were waiting on. Second, we’ll look at the segment statistics captured by the AWR.

First of all some setup. I already knew what the wait events looked like from looking at dbconsole but here’s a quick snapshot using the ASH data from today:

select min(begin_interval_time) min, max(end_interval_time) max 
from dba_hist_snapshot
where snap_id between 12831 and 12838;

MIN                            MAX
------------------------------ ------------------------------
12-SEP-07 09.00.17.451 AM      12-SEP-07 05.00.03.683 PM

This is the window I’m going to use; 9am to 5pm today.


select wait_class_id, wait_class, count(*) cnt 
from dba_hist_active_sess_history
where snap_id between 12831 and 12838
group by wait_class_id, wait_class
order by 3;

WAIT_CLASS_ID WAIT_CLASS                            CNT
------------- ------------------------------ ----------
   3290255840 Configuration                         169
   2000153315 Network                               934
   4108307767 System I/O                           7199
   3386400367 Commit                               7809
   4217450380 Application                         12248
   3875070507 Concurrency                         14754
   1893977003 Other                               35499
                                                  97762
   3871361733 Cluster                            104810
   1740759767 User I/O                           121999

You can see that there were a very large number of cluster events recorded in the ASH. Let’s look a little closer.


select event_id, event, count(*) cnt from dba_hist_active_sess_history
where snap_id between 12831 and 12838 and wait_class_id=3871361733
group by event_id, event
order by 3;

  EVENT_ID EVENT                                           CNT
---------- ---------------------------------------- ----------
3905407295 gc current request                                4
3785617759 gc current block congested                       10
2705335821 gc cr block congested                            15
 512320954 gc cr request                                    16
3794703642 gc cr grant congested                            17
3897775868 gc current multi block request                   17
1742950045 gc current retry                                 18
1445598276 gc cr disk read                                 148
1457266432 gc current split                                229
2685450749 gc current grant 2-way                          290
 957917679 gc current block lost                           579
 737661873 gc cr block 2-way                               699
2277737081 gc current grant busy                           991
3570184881 gc current block 3-way                         1190
3151901526 gc cr block lost                               1951
 111015833 gc current block 2-way                         2078
3046984244 gc cr block 3-way                              2107
 661121159 gc cr multi block request                      4092
3201690383 gc cr grant 2-way                              4129
1520064534 gc cr block busy                               4576
2701629120 gc current block busy                         14379
1478861578 gc buffer busy                                67275

Notice the huge gap between the number of buffer busy waits and everything else. Other statistics I checked also confirmed that this wait event was the most significant on the cluster. So now we’ve got an event and we know that 67,275 sessions were waiting on it during ASH snapshots between 9am and 5pm today. Let’s see what SQL these sessions were executing when they got snapped. In fact lets even include the “gc current block busy” events since there was a bit of a gap for them too.


select sql_id, count(*) cnt from dba_hist_active_sess_history
where snap_id between 12831 and 12838 
and event_id in (2701629120, 1478861578)
group by sql_id
having count(*)>1000
order by 2;

SQL_ID               CNT
------------- ----------
6kk6ydpp3u8xw       1011
2hvs3mpab5j0w       1022
292jxfuggtsqh       1168
3mcxaqffnzgfw       1226
a36pf34c87x7s       1328
4vs8wgvpfm87w       1390
22ggtj4z9ak3a       1574
gsqhbt5a6d4uv       1744
cyt90uk11a22c       2240
39dtqqpr7ygcw       4251
8v3b2m405atgy      42292

Wow – another big leap – 4,000 to 42,000! Clearly there’s one SQL statement which is the primary culprit. What’s the statement?


select sql_text from dba_hist_sqltext where sql_id='8v3b2m405atgy';

SQL_TEXT
---------------------------------------------------------------------------
insert into bigtable(id, version, client, cl_business_id, cl_order_id, desc

I’ve changed the table and field names so you can’t guess who my client might be. :) But it gets the idea across – an insert statement. Hmmm. Any guesses yet about what the problem might be? Well an insert statement could access a whole host of objects (partitions and indexes)… and even more in this case since there are a good number of triggers on this table. Conveniently, the ASH in 10g records what object is being waited on so we can drill down even to that level.


select count(distinct(current_obj#)) from dba_hist_active_sess_history
where snap_id between 12831 and 12838 
and event_id=1478861578 and sql_id='8v3b2m405atgy';

COUNT(DISTINCT(CURRENT_OBJ#))
-----------------------------
                           14

select current_obj#, count(*) cnt from dba_hist_active_sess_history
where snap_id between 12831 and 12838 
and event_id=1478861578 and sql_id='8v3b2m405atgy'
group by current_obj#
order by 2;

CURRENT_OBJ#        CNT
------------ ----------
     3122841          1
     3122868          3
     3173166          4
     3324924          5
     3325122          8
     3064307          8
          -1         10
     3064369        331
           0        511
     3122795        617
     3064433        880
     3208619       3913
     3208620       5411
     3208618      22215

Well a trend is emerging. Another very clear outlier – less than a thousand sessions waiting on most objects but the last one is over twenty-two thousand. Let’s have a look at all three of the biggest ones.


select object_id, owner, object_name, subobject_name, object_type from dba_objects
where object_id in (3208618, 3208619, 3208620);

 OBJECT_ID OWNER      OBJECT_NAME                    SUBOBJECT_NAME                 OBJECT_TYPE
---------- ---------- ------------------------------ ------------------------------ -------------------
   3208618 JSCHDER    BIGTABLE_LOG                   P_2007_09                      TABLE PARTITION
   3208619 JSCHDER    BIGTABL_LG_X_ID                P_2007_09                      INDEX PARTITION
   3208620 JSCHDER    BIGTABL_LG_X_CHANGE_DATE       P_2007_09                      INDEX PARTITION

Now wait just a moment… this isn’t even the object we’re updating!! Well I’ll spare you the details but one of the triggers logs every change to BIGTABLE with about 7 inserts into this one. It’s all PL/SQL so we get bind variables and everything – it’s just the sheer number of accesses that is causing all the contention.

One further thing we can do is actually see which blocks are getting most contended for – the ASH records this too. (Isn’t the ASH great?)


select current_file#, current_block#, count(*) cnt 
from dba_hist_active_sess_history
where snap_id between 12831 and 12838 
and event_id=1478861578 and sql_id='8v3b2m405atgy' 
and current_obj# in (3208618, 3208619, 3208620)
group by current_file#, current_block#
having count(*)>50
order by 3;

CURRENT_FILE# CURRENT_BLOCK#        CNT
------------- -------------- ----------
         1330         238073         51
         1542          22645         55
         1487         237914         56
         1330         238724         61
         1330         244129         76
         1487         233206        120

One thing that I immediately noticed is that there does not seem to be a single hot block!!! (What?) Out of 40,000 sessions accessing these three objects no more than 120 ever tried to hit the same block. Let’s quickly check if any of these are header blocks on the segments.


select segment_name, header_file, header_block
from dba_segments where owner='JHEIDER' and partition_name='P_2007_09'
and segment_name in ('PLACEMENTS_LOG','PLCMNTS_LG_X_ID',
  'PLCMNTS_LG_X_CHANGE_DATE');

SEGMENT_NAME                   HEADER_FILE HEADER_BLOCK
------------------------------ ----------- ------------
BIGTABL_LG_X_CHANGE_DATE              1207       204809
BIGTABL_LG_X_ID                       1207       196617
BIGTABLE_LOG                          1209        16393

No – all seem to be data blocks. Why so much contention? Maybe the RAC and OPS experts out there already have some guesses… but first let’s explore one alternative method to check the same thing and see of the numbers line up.

AWR Segment Statistics

Here’s a handy little query I made up the other day to quickly digest any of the segment statistics from the AWR and grab the top objects for the cluster, reporting on each instance. I’m not going to explain the whole thing but I’ll just copy it verbatim – feel free to use it but you’ll have to figure it out yourself. :)



col object format a60
col i format 99
select * from (
select o.owner||'.'||o.object_name||decode(o.subobject_name,NULL,'','.')||
  o.subobject_name||' ['||o.object_type||']' object,
  instance_number i, stat
from (
  select obj#||'.'||dataobj# obj#, instance_number, sum(
GC_BUFFER_BUSY_DELTA
) stat
  from dba_hist_seg_stat
  where (snap_id between 12831 and 12838)
  and (instance_number between 1 and 6)
  group by rollup(obj#||'.'||dataobj#, instance_number)
  having obj#||'.'||dataobj# is not null
) s, dba_hist_seg_stat_obj o
where o.dataobj#||'.'||o.obj#=s.obj#
order by max(stat) over (partition by s.obj#) desc,
  o.owner||o.object_name||o.subobject_name, nvl(instance_number,0)
) where rownum<=40;

OBJECT                                                         I       STAT
------------------------------------------------------------ --- ----------
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]                    2529540
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]               1     228292
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]               2     309684
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]               3     289147
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]               4     224155
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]               5    1136822
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]               6     341440
JSCHDER.BIGTABL_LG_X_CHANGE_DATE.P_2007_09 [INDEX PARTITION]        2270221
JSCHDER.BIGTABL_LG_X_CHANGE_DATE.P_2007_09 [INDEX PARTITION]   1     220094
JSCHDER.BIGTABL_LG_X_CHANGE_DATE.P_2007_09 [INDEX PARTITION]   2     313038
JSCHDER.BIGTABL_LG_X_CHANGE_DATE.P_2007_09 [INDEX PARTITION]   3     299509
JSCHDER.BIGTABL_LG_X_CHANGE_DATE.P_2007_09 [INDEX PARTITION]   4     217489
JSCHDER.BIGTABL_LG_X_CHANGE_DATE.P_2007_09 [INDEX PARTITION]   5     940827
JSCHDER.BIGTABL_LG_X_CHANGE_DATE.P_2007_09 [INDEX PARTITION]   6     279264
JSCHDER.BIGTABLE.P_WAREHOUSE [TABLE PARTITION]                      1793931
JSCHDER.BIGTABLE.P_WAREHOUSE [TABLE PARTITION]                 1     427482
JSCHDER.BIGTABLE.P_WAREHOUSE [TABLE PARTITION]                 2     352305
JSCHDER.BIGTABLE.P_WAREHOUSE [TABLE PARTITION]                 3     398699
JSCHDER.BIGTABLE.P_WAREHOUSE [TABLE PARTITION]                 4     268045
JSCHDER.BIGTABLE.P_WAREHOUSE [TABLE PARTITION]                 5     269230
JSCHDER.BIGTABLE.P_WAREHOUSE [TABLE PARTITION]                 6      78170
JSCHDER.BIGTABL_LG_X_ID.P_2007_09 [INDEX PARTITION]                  771060
JSCHDER.BIGTABL_LG_X_ID.P_2007_09 [INDEX PARTITION]            1     162296
JSCHDER.BIGTABL_LG_X_ID.P_2007_09 [INDEX PARTITION]            2     231141
JSCHDER.BIGTABL_LG_X_ID.P_2007_09 [INDEX PARTITION]            3     220573
JSCHDER.BIGTABL_LG_X_ID.P_2007_09 [INDEX PARTITION]            4     157050
JSCHDER.BIGTABLE.P_DEACTIVE [TABLE PARTITION]                        393663
JSCHDER.BIGTABLE.P_DEACTIVE [TABLE PARTITION]                  1      66277
JSCHDER.BIGTABLE.P_DEACTIVE [TABLE PARTITION]                  2      10364
JSCHDER.BIGTABLE.P_DEACTIVE [TABLE PARTITION]                  3       6930
JSCHDER.BIGTABLE.P_DEACTIVE [TABLE PARTITION]                  4       3484
JSCHDER.BIGTABLE.P_DEACTIVE [TABLE PARTITION]                  5     266722
JSCHDER.BIGTABLE.P_DEACTIVE [TABLE PARTITION]                  6      39886
JSCHDER.BIGTABLE.P_ACTIVE_APPROVED [TABLE PARTITION]                 276637
JSCHDER.BIGTABLE.P_ACTIVE_APPROVED [TABLE PARTITION]           1      13750
JSCHDER.BIGTABLE.P_ACTIVE_APPROVED [TABLE PARTITION]           2      12207
JSCHDER.BIGTABLE.P_ACTIVE_APPROVED [TABLE PARTITION]           3      23522
JSCHDER.BIGTABLE.P_ACTIVE_APPROVED [TABLE PARTITION]           4      28336
JSCHDER.BIGTABLE.P_ACTIVE_APPROVED [TABLE PARTITION]           5      99704
JSCHDER.BIGTABLE.P_ACTIVE_APPROVED [TABLE PARTITION]           6      99118

40 rows selected.

As an aside, there is a line in the middle that says “GC_BUFFER_BUSY_DELTA”. You can replace that line with any of these values to see the top objects for the corresponding waits during the reporting period:


LOGICAL_READS_DELTA
BUFFER_BUSY_WAITS_DELTA
DB_BLOCK_CHANGES_DELTA
PHYSICAL_READS_DELTA
PHYSICAL_WRITES_DELTA
PHYSICAL_READS_DIRECT_DELTA
PHYSICAL_WRITES_DIRECT_DELTA
ITL_WAITS_DELTA
ROW_LOCK_WAITS_DELTA
GC_CR_BLOCKS_SERVED_DELTA
GC_CU_BLOCKS_SERVED_DELTA
GC_BUFFER_BUSY_DELTA
GC_CR_BLOCKS_RECEIVED_DELTA
GC_CU_BLOCKS_RECEIVED_DELTA
SPACE_USED_DELTA
SPACE_ALLOCATED_DELTA
TABLE_SCANS_DELTA

Now as you can see, these statistics confirm what we observed from the ASH: the top waits in the system are for the BIGTABLE_LOG table. However this also reveals something the ASH didn’t – that the date-based index on the same table is a close second.

The Real Culprit

Any time you see heavy concurrency problems during inserts on table data blocks there should always be one first place to look: space management. Since ancient versions of OPS it has been a well-known fact that freelists are the enemy of concurrency. In this case, that was exactly the culprit.


select distinct tablespace_name from dba_tab_partitions 
where table_name='BIGTABLE_LOG';

TABLESPACE_NAME
------------------------------
BIGTABLE_LOG_DATA

select extent_management, allocation_type, segment_space_management
from dba_tablespaces where tablespace_name='BIGTABLE_LOG_DATA';

EXTENT_MAN ALLOCATIO SEGMEN
---------- --------- ------
LOCAL      USER      MANUAL

SQL> select distinct freelists, freelist_groups from dba_tab_partitions
  2  where table_name='BIGTABLE_LOG';

 FREELISTS FREELIST_GROUPS
---------- ---------------
         1               1

And there you have it. The busiest table on their 6-node OLTP RAC system is using MSSM with a single freelist group. I’m pretty sure this could cause contention problems! But in this case it wasn’t quite what I expected. It looks to me like the single freelist itself wasn’t the point of contention – but it was pointing all of the nodes to the same small number of blocks for inserts and these data blocks were getting fought over. But they were probably filling up quickly and so no single block had a large number of waits reported in the ASH. I haven’t proven that but it’s my current theory. :) If anyone has another idea then leave a comment and let me know!

About Jeremy

Building and running reliable data platforms that scale and perform. about.me/jeremy_schneider

View all posts by Jeremy »

Discussion

28 thoughts on “GC Buffer Busy Waits in RAC: Finding Hot Blocks”

perfect investigation and perfect explanation

thank you for this great post

LikeLike

Posted by coskan | September 13, 2007, 10:53 am
It is a great paper which will stimulate more thought. Appreciate for the knowledge share!
Should the contention be on the Block’s Header? You already pointed out the contention is not in Header Block, so it would not be in Freelist which is in Header Block. I wonder how the v$waitstat looks as it will shed more light. But if it is in Block Header then bump up INITRANS would help.

LikeLike

Posted by ming | September 20, 2007, 10:49 am
SegmentSpaceManagement Manual and a single
FreeList.
Is this the pattern for all the tablespaces
and objects ?
Database probably converted from non-RAC to RAC ?
Or objects manually created using “scripts”
generated from a simple development environment.

So you had only the 1 table with high inter-instance contention for the free blocks ? Lucky, were you ?

LikeLike

Posted by Hemant K Chitale | September 25, 2007, 9:36 am
Hi Jeremy,
Great post and i really like the way you ilustrate that ASH is a really cool tool.
Couple of questions though, why not check directly with P1-P2 which one is the block causing the problems — maybe i missed something.
If this a datablok shouldn’t be worry about initrans or how to reduce the density (rows per block) of the segment instead of checking freelist conf … ?
Bye .. and again great post¡¡
–Cesar

LikeLike

Posted by cesar | November 26, 2007, 5:32 pm
Ming – V$WAITSTAT showed high contention mainly on data blocks (as expected). It’s unlikely that there was a shortage of ITL entries because the contention was occuring on inserts into empty blocks. The ITL will automatically grow unless the block is full.

Hemant – Yes, all objects were MSSM and single freelist. Yes, the database was converted from non-RAC to RAC. The only reason we saw one table with problems is that it had significantly more traffic than any others. After we fix this table, another will become a problem as soon as there’s enough traffic. Or actually, the date-based index will probably become a problem. (And that’s much more difficult to fix; not sure if it’s possible without application changes.) And finally: yes, I got lucky. :)

Cesar – I did look at P1/P2 (through a tool called Lab128) however it didn’t help because there was no single hot block. As I explained above it’s unlikely that INITTRANS was the problem. Reducing the row density might help but (1) makes these already-huge tables even more unmanageably huge and (2) isn’t nearly as good of a solution as ASSM or multiple freelist groups which eliminate the contention altogether without compromising block density.

LikeLike

Posted by Jeremy | November 27, 2007, 9:40 am
Can you please share your knowledge on finding what are the active processes running on a given Oracle node and what are the problems each process may causing. If its too vast, I would to know at least what each process is doing and its CPU utilization. ANy info you can share would highly be appreciated. Thanks for your previous posts.

Ray.

LikeLike

Posted by Ray Tomilson | April 11, 2008, 10:29 am
Have seen many presentations and slides but the way things started and funneling to block level with a cross ref is Xcellent !!

Hari

LikeLike

Posted by Hari | September 11, 2008, 5:36 pm
I love this article
very nice

LikeLike

Posted by adarsh Kumar | June 18, 2009, 2:03 pm
Great Post.
Got the same problem, and now I have something to start with.

LikeLike

Posted by Jay A | June 25, 2009, 11:33 pm
very nice and intelligent post, keep it up buddy.

cheers

LikeLike

Posted by Mehmoo | November 15, 2009, 4:41 am
Man, thank you very much for explaining your flow of ideas to identify bottlenecks. i will really look forward for such an approach in my work to come.

Thank you again :)

LikeLike

Posted by Praveen | November 19, 2009, 12:23 am
I am facing the same issue but space management is also set to auto.
So , what should be my next workaround from here.

For me, the wait is on 2 tables (75%) AND DEPENDENT INDEXES(25%).

thanks,
Ershad

LikeLike

Posted by ershad | December 13, 2009, 5:16 am
Perefct.

This is the greatest article which i ever gone through related to this topic. Appreciate for your time.

LikeLike

Posted by Jammula | December 30, 2009, 12:34 am
Excellent drill down Jeremy. I’m just waiting for a “GC Buffer Busy Wait” in my RAC instance :)

~ jp ~

LikeLike

Posted by Prem | January 19, 2010, 10:48 am
Jeremy – Thanks for the walk through – we ran into the same problem (9.2.0.8).

My recommendation was to reduce the block size from 8k to 2k.
Thus reduce the number of rows per block and block contention.

IsleOracle

LikeLike

Posted by IsleOracle | February 10, 2010, 4:05 pm
Well presented research work! Thank you for sharing!
For 16k or higher block size, if we increase the PCTFREE would reduce the number of rows per block, may reduce the gc buffer busy waits. I will try this step and let you know. the default value of PCTFREE is 10, may want to change to 40 in a 16k block. Not sure this will work but I will give it a try.

Unni Kirandumkara.

LikeLike

Posted by Unni | February 24, 2010, 9:45 am
Excellent article and wonderful explanation…

thanks,
baskar.l

LikeLike

Posted by baskar.loganathan | March 2, 2010, 9:35 pm
Thanks for the beautiful drill down. It added more to my tuning experience.

LikeLike

Posted by Ade | April 21, 2010, 10:41 am
great post.

LikeLike

Posted by SEKAR P | May 18, 2010, 6:14 pm
Excellent article.

Thnx for valuable contribution.

LikeLike

Posted by Jag | May 24, 2010, 3:56 am
Excellent investigation using ASH and AWR. This thing should be happening with freelist contention with manual segment space management. Thanks for sharing.

LikeLike

Posted by Jack Nicholson | August 18, 2010, 1:15 pm
Good stuff=) I will require a decent amount of time to absorb the writing=D

LikeLike

Posted by Lindsey Rawling | November 30, 2010, 9:18 pm
Outstanding!!simple and powerful..
i liked it when you got the culprit just with 3 sqls…:)

LikeLike

Posted by pramod | July 11, 2011, 12:19 pm
Great Research..Gave a good understanding on objects with Wait events in RAC

LikeLike

Posted by Awar | July 21, 2011, 12:20 am
Very well explained, thanks a lot Jeremy

LikeLike

Posted by CNS | August 8, 2011, 7:14 am
thanks , its really a super illustration of ASH..

LikeLike

Posted by Mirza | November 29, 2011, 2:14 am