This is the fourth of twelve articles in a series called Operationally Scalable Practices. The first article gives an introduction and the second article contains a general overview. In short, this series suggests a comprehensive and cogent blueprint to best position organizations and DBAs for growth.
In the first half of this article we discussed platform standardization at a fairly high level. Now it’s time to get specific. What are some detailed suggestions for the platform that your Oracle databases will run on? Which decisions today could most impede or ease scaling your operations tomorrow?
The Oracle database is best known for running on expensive and powerful enterprise platforms. But you probably started thinking about standards long before this point – probably when your databases were still running on a handful of small commodity servers. This is when you want to start doing things right. And just to be clear, this isn’t mainly about spending more money. It’s more about how you do things. The standards and driving principles at very large organizations can benefit your commodity servers right now and eliminate many growing pains down the road.
As we said before, successfully moving into standards works best when you start at the bottom of the stack – and the very bottom of your stack is your storage.
Storage is incredibly important to the platform underneath an Oracle database. As you craft standards for the storage component of your platform, start small and think modular.
- This isn’t specific to Oracle, but I strongly recommend drawing pictures to help understand storage architecture. I do this all the time. Start with the spindles and draw each link (SATA, PCI, etc) on the path to the CPU. Ignore caching and for extra credit add throughput numbers.
- Think mainly about spindle count. Size/capacity is secondary. Standardize on one size for your local disks; optionally add a second “high-capacity” option but two disk size purchasing options should take your org a long way. (Exadata only offers two.)
- Minimum 4 spindles in your smallest servers. I feel self-conscious suggesting this to smaller budget-conscious organizations but I think the benefits will make it worthwhile, particularly as standardization enables you to better consolidate.
- Two spindles mirrored for data and two spindles mirrored for recovery. This basic data/recovery split is deeply ingrained in oracle database layout – oracle automatically keeps a copy of important data on both volumes (control files, online logs). Don’t compromise on having two physically independent volumes. The disks used for recovery can also be used for the operating system.
- Add more spindles as you grow, cheap DAS or NAS is fine once the server chassis is full. Just keep those pictures up-to-date. Sketching and scanning is fine if you don’t want to spend too much time in visio or lucidchart.
- Introducing any form of shared network for storage requires great care, whether it’s fiber or ethernet. It’s not just cost – inexpensive options greatly increase complexity and risk (similar to clustering). Don’t go here until you must and then make sure you develop or hire the skills to manage it well.
- Parity can be tempting because of the extra capacity – but stick with mirroring for data protection. It won’t break the bank to buy a little more disk. I’m concerned more with local storage here than I am with fancy storage appliances that use parity under the covers. My biggest reason is that it’s far simpler to balance and rebalance I/O with RAID1 than it is with RAID5. Also, James Morle’s SANE SAN papers (both the original and the 2010 update) are informative and worthwhile read even if you’re just using local disk.
- Appliances can ease balancing and rebalancing I/O – but sometimes they get too smart for their own good, creating new problems which can be near impossible to track down due to sky-high levels of complexity in the total system. New storage appliances are adding deduplication, compression and even serialization to the list of clever enhancements. There is some really cool engineering here with the potential to get more capacity and performance with a lot less hardware investment. I really enjoy working with these in lab and pilot environments. But tread carefully with these newer, less-understood, non-traditional storage platforms. One other word on appliances: as the SANE SAN paper points out, cache (which generally includes any SSD storage) is great but design to the disks.
- Use ASM unless you have a good reason not to. My biggest reason is manageability. Specifically, ASM’s rebalancing is stellar. Skipping RAID and letting ASM handle the data protection can offer some big advantages too. ASM’s ability to do mirroring at a more granular level offers some advantages over traditional disk-by-disk RAID mirroring. You can add a single spindle into a diskgroup and ASM will re-balance all data evenly across the group while making sure each individual block maintains a mirrored copy somewhere. You can also remove spindles one-by-one with the same advantages.
- Having a standard disk type is fundamental here. Different disks should not be mixed in a single diskgroup.
- As you grow, ASM can still manage rebalancing even if you move back toward hardware RAID. Nonetheless you still have to know the underlying disk types and make sure not to combine LUNs built on different disk types into a single diskgroup.
- Sketch out a growth path for the next year or so and set expectations for capacity, for IOPS and for throughput. I won’t belabor this; in a nutshell I’m just saying to do a little thinking ahead. Have a rough idea how much disk you are willing to attach to a server, think about how many PCI slots you might need, etc. Always remember that I/O performance (IOPS and throughput) is most closely tied to spindle count rather than capacity. Bigger is not faster.
At this point, I’ve dumped a lot of information and made more than one reckless assertion. Hopefully I won’t start any religious wars. In the next article I’ll discuss CPU, memory, networking and a useful architectural pattern I call “slots”. But before I go there, what are your thoughts on storage for oracle database platforms? It’s a pretty important topic. What did I overlook or forget? Where do you disagree with me? Let me know your thoughts!