>
Linux, Oracle, Technical

RAC Investigation on Low-Memory Linux

Back in the Oracle 9i days, I was one of those people who got on eBay to buy firewire PCI cards and disks that could do non-exclusive login.  Remember that?  The first time a little test cluster could be cheap enough for the home enthusiast?  I still have the parts in my closet.

Of course, we all know what happened after that – virtualization.  It didn’t take long before my home-built test clusters were running on VMware.  (Personally, I think that virtualization really started because of those NES and SNES emulators.  Most great achievements start with a geek who wants to play more video games.)  There are lots of people now who run RAC on virtual environments and it’s easy to find tutorials on the web for many different OS and VM combinations.

Low-Memory Linux

Something I haven’t seen many other people do is RAC with a very small memory configuration.  Like 760M of memory per server. (!)  Of course you’d only do this for a hobby setup – never on a system where you want any kind of support.  But I’m kinda cheap… and running RAC on these small VMs means that I don’t have to go buy an expensive new home computer.  My current gateway laptop with Vista Home does the job quite nicely!

10.2 and 11.1 RAC will install and run on servers with 760M of memory. But things were a little unstable at first. Now I’m the curious type… I like to fiddle with things… so I investigated a little bit.

Basic Unix Investigation

There are two basic investigation scenarios:

what happened in the past My main tool is sar (System Activity Reporter). Or Java-based ksar on my desktop – it gets data via ssh and graphs it.
what is happening now My starting point is vmstat and top. To dig a little deeper, I might then use other tools like ps, free, iostat or netstat.

In this particular case, I noticed pretty quickly from the top utility that one process was consuming over 30% of the system’s memory!  (Note: in top, you can press the ‘<‘ and ‘>’ keys to move the sort column left and right.  The initial sort column is %CPU.  I moved it one column to the right, sorting by %MEM.)

top - 18:41:13 up  5:28,  3 users,  load average: 0.04, 0.35, 0.60
Tasks: 180 total,   2 running, 178 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.7%us,  3.7%sy,  0.0%ni, 89.4%id,  6.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    767020k total,   754296k used,    12724k free,     7084k buffers
Swap:  1540088k total,   654696k used,   885392k free,   361800k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17435 oracle    RT   0  230m 229m  31m S  0.3 30.6   0:26.21 /u01/crs/oracle/product/11.1.0/crs/bin/ocssd.bin
 7765 oracle    15   0  464m 111m 102m S  0.0 14.9   0:04.52 ora_smon_RAC1
 7743 oracle    -2   0  445m  90m  83m S  0.0 12.1   0:12.01 ora_lms0_RAC1
 7783 oracle    15   0  440m  70m  66m S  0.0  9.4   0:05.35 ora_mmon_RAC1
20321 oracle    15   0  438m  48m  45m S  0.0  6.5   0:00.88 oracleRAC1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
 8676 oracle    16   0  437m  46m  45m S  0.0  6.2   0:02.69 oracleRAC1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
 8801 oracle    15   0  440m  44m  41m S  0.0  6.0   0:04.57 ora_cjq0_RAC1

This process (ocssd) is Oracle’s Cluster Synchronization Services Daemon.  It’s the process that sends and receives heartbeats from other nodes. Any delays sending or receiving those heartbeats can cause node evictions (a.k.a. server reboots) – so it’s a pretty important process! That’s why it runs with realtime (RT) scheduling priority, as you can see in the above output from top.

I was surprised that CSS uses so much physical memory – usually Linux is very good at memory management. In top, the VIRT column shows how much total memory each process is using, while the RES column shows how much actual physical memory Linux has allocated to it. It’s clear that Linux is pretty actively managing the physical memory for other processes.

A quick glance at vmstat shows that although we are actively swapping, it seems under control. This is about what I’d expect when we’re idle and all of the processes except CSS are sharing only 500M of memory:

collabn1:/home/oracle[RAC1]$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0 604144  14024  34948 300632    0    0    32    86 1044 1797  1  8 84  7  0
 2  1 603912  11224  34968 303968   87    0   763    81 1083 1902  3 11 56 30  0
 0  0 604840  12660  34912 303904   25    0   253    38 1050 1975  2 11 69 18  0
 0  0 604840  12728  34924 303900    0    0    68   106 1043 1975  1  9 82  8  0
 0  0 604784  14536  34936 304140   19    0    89    71 1043 2042  1  5 81 13  0
 1  0 604752  14168  34944 304496    6    0   119    21 1040 2016  2  8 80 11  0
 0  1 604736  18732  35020 306212    0    0   384    73 1050 2489  1 10 70 19  0
 1  1 604736  11144  35232 311252  104    0  1209   128 1074 2024  2 12 38 47  0
 3  1 607900   8056  30788 307352   77 1500   504  1661 1055 2642  9 16 56 19  0
 0  0 607900   9836  30800 307360    0    0    71    70 1031 1798  1  6 85  8  0
 1  0 607884  10536  30812 307400    8    0    44    93 1031 1832  1  4 87  8  0

The SO column tells us when memory is written to disk (and removed from physical). The SI column tells us when memory is read from disk (and put back in physical). On a side note, remember that on a healthy Unix system the free memory is always small. Sometimes this is confusing at first.

Linux Process Memory Investigation

Nonetheless, I’m not happy that CSS is using 30% of my physical memory in this highly-constrained hobby environment. Why is Linux allowing this? The first clue comes simply from the output of the familiar unix ps utility:

collabn1:/home/oracle[RAC1]$ ps v -C ocssd.bin
  PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
17435 ?        SLl    0:08      7   588 235331 234840 30.6 /u01/crs/oracle/product/11.1.0/crs/bin/ocssd.bin

On Linux, the “v” flag tells ps to give information relevant to virtual memory. TRS and DRS tell me how much physical (resident) memory is used for machine executable code (text) and data, respectively. But more importantly – the STAT column gives some informative BSD-style flags about the process. That capital-L indicates that CSS has some pages that are locked into physical memory. Bingo.

If I have root access, then I can get a very detailed report on process memory usage with the pmap command. The output was a little long, so I’ve abbreviated it here:

[root@collabn1 ~]# pmap -x 17435
17435:   /u01/crs/oracle/product/11.1.0/crs/bin/ocssd.bin
Address   Kbytes     RSS    Anon  Locked Mode   Mapping
00110000     656       -       -       - r-x--  libhasgen11.so
001b4000       8       -       -       - rwx--  libhasgen11.so
  ... 37 more library blocks
02a0d000     100       -       -       - rwx--    [ anon ]
02a26000       4       -       -       - --x--    [ anon ]
02a27000   10240       -       -       - rwx--    [ anon ]
03427000       4       -       -       - --x--    [ anon ]
03428000   10240       -       -       - rwx--    [ anon ]
  ... 10 more anonymous blocks, half are 10240K
08048000     592       -       -       - r-x--  ocssd.bin
080dc000       4       -       -       - rwx--  ocssd.bin
  ... 30 more anonymous blocks, half are 10240K
bfe40000     148       -       -       - rwx--    [ stack ]
bfe65000       8       -       -       - rw---    [ anon ]
-------- ------- ------- ------- -------
total kB  235920       -       -       -

Interestingly, the linux pmap utility does not indicate any locked memory! I don’t know whether that output column is non-functional or if it refers only to some particular kind of locking. But at any rate, I know something is locked. I couldn’t think of anything better, so the next place I looked was in the Linux /proc pseudo-filesystem.

collabn1:/home/oracle[RAC1]$ grep Vm /proc/17435/status
VmPeak:   235924 kB
VmSize:   235920 kB
VmLck:    235920 kB
VmHWM:    234844 kB
VmRSS:    234840 kB
VmData:   202272 kB
VmStk:       156 kB
VmExe:       592 kB
VmLib:     31960 kB
VmPTE:       268 kB

Now we’re talking. The process has a total of 235920 kB of memory – and it’s ALL locked. On a normal RAC system you’d want this. Generally, important realtime processes should be locked so that they are never delayed by paging or swapping. (Remember how that could cause node reboots?)

But I personally doubt that all of the memory really NEEDS to be locked, and I think that Linux will actually do a decent job of not swapping the most important parts. And my highly constrained hobby environment will probably run much smoother if Linux has more flexibility when managing a measly 760M of memory.

Unlocking Linux Process Memory

But is it actually possible to unlock the process memory? As far as I know, Oracle provides no option to disable CSS memory locking. (For good reason.) There is the system call munlockall() – which unlocks all of a particular processes’ memory. But the CSS process itself would have to call this function. And of course it will not. Or will it?

If you’ve got root, then there’s a hacker-back-door way of doing this. Remember, you’d be crazy to try this anywhere besides a dark closet at home. And if you type too slow then CSS could reboot your machine.

But watch this…

[root@collabn1 ~]# grep Vm /proc/17435/status
VmPeak:   235924 kB
VmSize:   235920 kB
VmLck:    235920 kB
VmHWM:    234844 kB
VmRSS:    234840 kB
VmData:   202272 kB
VmStk:       156 kB
VmExe:       592 kB
VmLib:     31960 kB
VmPTE:       268 kB

[root@collabn1 ~]# gdb -p 17435 < call munlockall()
> quit
> EOF

GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu".
Attaching to process 17435
Reading symbols from /u01/crs/oracle/product/11.1.0/crs/bin/ocssd.bin...done.
  ... 17 more Reading/Loading symbols.
[Thread debugging using libthread_db enabled]
[New Thread 0xb7f629f0 (LWP 17435)]
  ... 19 more New Threads.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libnsl.so.1...done.
  ... 13 more Loaded/Reading symbols.
0x008e7402 in __kernel_vsyscall ()
(gdb) $1 = 0
(gdb) The program is running.  Quit anyway (and detach it)? (y or n) [answered Y; input not from terminal]
Detaching from program: /u01/crs/oracle/product/11.1.0/crs/bin/ocssd.bin, process 17435

[root@collabn1 ~]# grep Vm /proc/17435/status
VmPeak:   235924 kB
VmSize:   235920 kB
VmLck:         0 kB
VmHWM:    234844 kB
VmRSS:    234840 kB
VmData:   202272 kB
VmStk:       156 kB
VmExe:       592 kB
VmLib:     31960 kB
VmPTE:       268 kB

Ha. This is something you won’t find on metalink. After generating some activity on the system, the top utility shows me that Linux has significantly reduced the physical memory used by CSS.

These days I’ve actually scripted this for my home and classroom VM environments. I haven’t done a careful comparison or analysis, but it really has seemed to me that my low-memory Linux systems run noticeably smoother.

About Jeremy

Building and running reliable data platforms that scale and perform. about.me/jeremy_schneider

Discussion

9 thoughts on “RAC Investigation on Low-Memory Linux

  1. Jeremy,

    Have you done any research of this kind with 11.2 ?

    Thanks !

    Like

    Posted by Tony Johnson | September 3, 2010, 12:23 pm
    • Yes I have – interestingly, CRS 11.2 added a few new processes that attempt to lock even more memory. I couldn’t start CRS with only 760M of memory – but after doing something similar to what I demonstrated here, I now have 11.2 RAC up and running in those same low-memory VMs on my laptop. :)

      Like

      Posted by Jeremy | September 3, 2010, 12:30 pm
  2. this is a pretty cool hack. can’t wait to try it out. :D

    Like

    Posted by acentian | September 20, 2010, 6:14 am
  3. Thanks Jeremy for mentioning this during MOTS.
    I did that trick with 11.2.0.2 – works like a breeze.

    Like

    Posted by Alex Gorbachev | November 24, 2010, 7:57 am
  4. Hi Jeremy,

    Can you please describe what you ‘ve done for CRS 11.2 as i’m planning to do on my lab too.

    Many thanks,
    Dani

    Like

    Posted by DanyC | June 27, 2011, 5:59 am
  5. Jeremy,

    Thanks so much for sharing this. I had setup an 4 node 11.2.0.1 RAC on my MacbookPro with 8GB RAM and dual core i7 processor (hyperthreaded, so 4 threads). Without this hack, i was finding it difficult to run 4 RAC nodes on this configuration, as minimum i needed to allocate to each VM was 1300M and even though that is only 5200MB (4x1300MB) out of 8190MB, leaving only 2990MB for rest of process on Mac. kernel_task process in Mac takes about 644MB on my system and other small process soon start to add up and leave just 500MB or so free. It really use to push my system. Now i have upgraded my hard disk to 480 OCZ Vertex3 SSD capable of 480MB/s read/write compared to old spinning hd capable of 70MB/s read/write. After using your hack, i have currently reduced each VM memory to 1GB, but i can go further down, i will experiment with how much i can safely reduced the VMs memory to.

    But your hack saved my upgrading my laptop RAM from 8GB to 16GB, especially since MBP have only 2 dimm slots so purchasing 8GB DIMM would have costed me lot of money.

    Great work !!!

    Like

    Posted by Vishal Gupta | October 29, 2011, 9:32 am
  6. Hi Jeremy,
    excellent post … many thanks!!!

    goran

    Like

    Posted by goran | November 5, 2011, 8:07 am
  7. Hi Jeremy,

    excellent post … in my 11.2.0.3 RAC apart from ocssd, ologgerd process is also top consumer of physical memory:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    4243 root RT 0 176m 142m 60m S 0.0 8.0 0:03.34 ologgerd
    3792 oracle RT 0 131m 105m 53m S 1.0 5.9 0:16.04 ocssd.bin

    … so, I did small change on your script to unlock memory from this process too – I don’t think this should be an issue for correct work of GI:

    for CHECK in [o]cssd [c]ssdmonitor [c]ssdagent ologgerd; do

    great work and many thanks for post!!!

    goran

    Like

    Posted by goran | November 5, 2011, 9:28 am

Trackbacks/Pingbacks

  1. Pingback: OpenWorld Haiku : Ardent Performance Computing - September 30, 2010

Disclaimer

This is my personal website. The views expressed here are mine alone and may not reflect the views of my employer.

contact: 312-725-9249 or schneider @ ardentperf.com


https://about.me/jeremy_schneider

oaktableocmaceracattack

(a)

Enter your email address to receive notifications of new posts by email.

Join 68 other subscribers