About a month ago I wrote an overview of Linux Caching and I/O Queues as they pertain to Oracle. I was working on a project to architect, install and configure the beginnings of an 8-node cluster consisting of either one or two RAC databases. During the project, while I was waiting for the OS guys to resolve some networking issues, I ran a bunch of benchmarks on the storage subsystem. Specifically, I experimented with the size of the HBA Queue Depth to see if it would make a difference in performance.
But before getting into the results, a quick overview of our configuration: it was 11g RAC on Red Hat Enterprise Linux 5; Dell servers with four dual-core Opteron chips each. The RAC cluster initially had four nodes but will grow to at least eight as the data is migrated. The system has 4G QLogic cards, a McData switch and a 3Par SAN (which is blazing fast). ASM (no CFS) and dedicated Oracle Homes. The first spec had an InfiniBand interconnect but after a teleconference with Alex from Pythian discussing the project’s specific requirements, the spec was updated to use redundant Gigabit Ethernet.
Picking up where I left off: the default limit set by the Linux qla2xxx driver for concurrent I/O requests on QLogic cards (32 per LUN) is conservative. So can I increase performance by increasing this limit? The best way to answer a question like this is simply to try it.
Tweaking the HBA Queue Depth
With QLogic HBA’s on Linux the queue depth is configured through the ql2xmaxqdepth module option. I want to run an experiment where I vary this parameter and measure the I/O performance. To start out, I’d like to compare the default queue size of 32 with an increased setting of 64. But how can I effectively measure I/O performance? I’m specifically interested in how Oracle’s RDBMS will perform, so I think that the best tool is Oracle’s Orion benchmarking tool which is designed to simulate database I/O patterns and measure the result.
Now there are two ways to measure I/O performance: IOPS and MBPS.
When you measure IOPS you’re usually investigating small I/O operations and putting stress on the overhead associated with a single read or write. Throughput is not the main concern when you measure IOPS. This is most relevant on transactional systems.
When you measure MBPS you’re usually investigating large I/O operations and putting stress on the overall throughput. Latency is not the main concern when you measure MBPS. This is most relevant on warehouse and analytical systems.
For more reading, James Koopmann just wrote a few good articles over at Database Journal about getting IOPS/MBPS measurements from existing databases and the relationship between IOPS/MBPS and vendor-supplied disk specs.
Now this particular project’s database will back several high-volume websites and the traffic is probably only 10-20% writes and 80-90% reads. However much of the data changes somewhat frequently and it is definitely an OLTP workload – almost entirely index-based reads. On the AWR report from a current production database, scattered reads almost didn’t even register while sequential reads were by far the most significant event. This means that we need to optimize this storage system – and our benchmark – for IOPS rather than raw throughput.
The basic idea behind Orion is pretty simple. Orion is designed to simulate a mixed workload between single-block reads and multi-block reads. You give it a bunch of parameters that describe your environment – and then it runs its test over and over again, varying the balance between single-block and multi-block reads while measuring the IOPS, latency and MBPS. One important point: Orion varies the balance by changing the number of threads that are concurrently doing reads/writes. Unlike swingbench or hammerora, there is no think time – each thread constantly tries to do I/O.
Since the database for this project is nearly 100% single-block reads, I decided to just run a “basic” matrix – which doesn’t test mixed workloads. It runs about 45 tests with varying levels of concurrency between 1 thread and 500 threads for single-block reads and the same for multiblock-reads. (I just ignored the multi-block results.) I instructed Orion to do 15% write operations and 85% read operations.
# # ./orion_linux_em64t -run advanced -testname simple -num_disks 100 -write=15 -matrix=basic -verbose # #
I ran this benchmark four times in a row. Before each run I changed the queue depth and rebooted the server. I alternated between 32 and 64, using each queue depth twice. The result was a consistent 7% improvement in IOPS and latency by doubling the queue depth.
What’s the Best Setting for Queue Depth?
That depends on how many clients are accessing the storage device. Do not max out the queue depth on all your servers based only on this article. Remember from the first article that the Storage Device’s FC Port can concurrently process a limited number of requests. If you have a large number of devices accessing the same storage array and you increase the queue depth on all of them, then you will start seeing the dreaded “QUEUE FULL SCSI” errors! However if there are only a handful of clients and you know that you won’t be adding more then you can certainly increase this parameter and get the associated performance boost at peak workload.
To get the optimal value you need to consult the manuals or support channels for your storage system to find out its queue depth. Factor in the number of clients you have accessing this array plus some buffer for safety and then you can determine optimal values for each server.