OSP #2a: Build a Standard Platform from the Bottom-Up

Posted by Jeremy ⋅ November 14, 2013

Filed Under standards

This is the fourth of twelve articles in a series called Operationally Scalable Practices. The first article gives an introduction and the second article contains a general overview. In short, this series suggests a comprehensive and cogent blueprint to best position organizations and DBAs for growth.

I hesitate greatly to use the word “standard” having seen the ways it gets bandied about by IT groups everywhere. But the ubiquitous presence of this term and the manifold frustrations which always accompany it only prove that there’s something relevant here. We’re all familiar with the basic ideas behind so-called standardization. Anyone who has had a job maintaining more than one server for more than one month got frustrated at some point because something worked in one place and unexpectedly failed in another. The reason? Something was different between those two places. And so began the crusade for “standardization” – at least for those of us who had the time, energy, comfort level and motivation to try changing things!

The basic idea behind standards is to make things similar. The more evolved and realistic version of standards is that there’s process wrapped around the differences. (Hope you have those change history systems ready!) The “standard platform” itself has a lifecycle – and there will be new standard platforms to come on whatever schedule makes sense for your business (personally I think 2-4 years is a good place to start). Even though you may still have more than one configuration, by limiting them it becomes realistic to develop predictable processes for moving between them. Of course there will always be bespoke systems to manage; some will be legacy, some will be special-purpose. Smart standardization is not idealistic but rather business-driven. Hope I’m not sounding repetitious by saying that! For each exception, carefully weigh the costs and the benefits over the long term.

As a brief aside here: I do think outside opinions (user groups, professional networks and paid consultation) are very valuable and generally under-utilized. But use them mainly to test your own thinking for flaws. Consultants can sometimes come across very confident and convincing about the right way to do something (it’s an important skill for success in that field) but nobody knows your actual infrastructure and operations better than you. Get lots of input – even pay for some – then question everything and don’t follow advice that you don’t understand. Blame me if the consultants resent you for it!

Successfully moving into standards works best when you start at the bottom of the stack: with your physical hardware. I’ve worked at larger companies who are further along this road and they sometimes have the spec nailed down to the peripheral model numbers and firmware versions. But how can a small company get started on this when they purchase infrequently and have a bunch of existing servers to deal with? A few suggestions:

Survey your hardware and figure out which server you have the most of. Do the same for peripherals: network cards, memory, CPUs, etc.
See if you can make these servers look more alike without spending too much money. (For example cpu core count, memory size and hard drive configuration!)
By this point you should be getting a good idea what CPU, memory and storage needs you have. Look at the newer servers you’ve already bought and see if there’s a suitable choice to be the “standard” moving forward over the next few years.
Make a simple rule for your organization: for the next year, if a new server is purchased then it will be this exact server in this exact configuration. Is that possible for you? After a year see how things are going and consider extending for another one to three years before choosing the next “standard” platform.

When you survey existing equipment, don’t worry if you can’t get things exactly identical – just get as close as possible. It’s worthwhile just to get the CPU core count, memory size and disk spindle counts to match – or at least minimize the variation. That alone will help a lot when you tackle standards for the layers above the hardware.

Maybe you’ll need a “high-storage” option or a “large-memory” one (Amazon’s options for EC2 instances are informative when defining options). Minimizing variation means that you only allow defined configurations: any app that needs a little extra of some resource needs to go all the way up to the next tier or option you’ve defined. Any app that needs less of some resource isn’t allowed to save money by trimming the hardware. After all, one of the major benefits of good standards is to make consolidation easy – so a bunch of those small apps should share one server. No in-betweens or compromises. In this sense, having a “standard” simply comes down to the courage and ability to convincingly say no to anything else. You need the foresight to see the benefits and the hindsight to recount the costs.

This is already enough material for digesting and discussing, so I’ll split this topic into two articles. In the next article I’ll get into some very specific suggestions from an Oracle database perspective. Any thoughts so far? What do you agree or disagree with?