>
Oracle, Technical

OSP: Overview

This is the second of twelve articles in a series called Operationally Scalable Practices.  You can read the introduction in the first article. In short, this series offers helpful suggestions for younger organizations and newer DBAs to best position them for very large-scale growth.

Before getting into specifics, we will lay out a general overview of the content. I expect this overview to be revised the most as the series is refined over time – so check  periodically to see if there have been updates!

First, one quick note: my single overarching principle is simplicity. Your specific implementation of a guideline offered here will depend on your company size and will change as your company grows.  A small team can get a lot of mileage out of online collaborative spreadsheets for inventory or deployment checklists.  I recommend a bias toward simpler platforms like this instead of big enterprise software solutions – the key is that you’re *doing* the inventory or checklist itself. If you keep growing then sophisticated tools will eventually become necessary but they’re complicated and costly and add overhead to your team activities.  Always try to find the simplest possible techniques and tools for your team as you implement the guidelines herein!

And now, here’s the outline of the next ten articles.  The outline in itself is actually a great checklist as you think about your operations.  It’s ambitious – hopefully I’ll be able to follow through with getting this written!


The Foundation

  • Keep documentation & processes somewhere with change history
  • Use checklists for general tasks, maintenance and deployments
    • Start simple, grow slowly into more sophisticated systems (e.g. ticketing or release management)
  • Make sure your basics are solid: monitoring (both email & paging), backups, inventory
    • Monitor what matters to your business from end-to-end (networks, applications, databases, storage)
    • Actively manage paging events and thresholds, no “expected & non-critical” pages

Build a Standard Platform from the Bottom-Up (Part 1) (Part 2) (Part3)

  • Define storage
    • Start small & simple (local RAID)
    • Minimum two independent volumes per server (recovery area)
    • Set expectations for capacity, IOPS and throughput
    • Define a growth path for the next year or so
    • Morle’s SANE SAN is still worth reading for storage appliances
    • Tread carefully with newer, less-understood, non-traditional storage (dedupe, serialized, etc)
  • Calculate or choose a rough core-to-memory ratio
    • Need extra memory for consolidation
    • This standard core count becomes minimum unit of licensing; balance cost implications with technical considerations
  • Discuss network communication for inter-DB traffic and backup traffic (DG, GG, DP, environment refreshes, etc)
  • Suggestions for defining slots per server
  • Suggestions for defining workload per slot

Build a Standard Cluster Platform (Part 1)

  • When to consider clusters and when not to consider clusters
    • Parallel or distributed processing, fault tolerance, incremental growth, pooled resources for better utilization
    • Expensive software licenses, require application modifications, complexity drives very expensive training and hiring
  • There are many ways to cluster (not just RAC)
  • Clustering with Oracle will require shared storage of some kind (DAS,NAS,NFS,SAN)
  • Individual services or applications can be active-active or active-passive independently
  • Keep it simple – for example, with slots and pools

Suggestions on Naming

  • Servers
  • Virtual Machines
  • DNS, NIS, Domains
  • Database, SID, etc
  • Users (NIS, OS, DB)
  • Cluster, DG config
  • Services
  • SQLNET (tnsnames)

Design Small yet Ready for MAA

  • Consolidate high in the stack
  • Scale up before out
  • Security topics: minimize privileges and roles
  • Maximize what you already bought, minimize expensive extras
    • Express Edition, Personal Edition, Standard Edition
    • Standby for fault tolerance (data guard, dbvisit)
    • Other tools – especially for standard edition (MOS, sash, etc)
  • ASM vs FS, OMF
    • Suggestions on DB file paths, names, extensions for all file types
  • Standardize on RMAN with a backup agent, hot backups, BCT
    • Use storage snapshots and standby DBs for backups when justified
  • Create databases ready for adding features later with minimal change
    • Physical replication (standby)
    • Logical replication (goldengate, etc)
    • Encryption
    • RAC
  • Justify non-default config
  • Strongly justify hidden config

Managing Software (install/patch/upgrade)

  • Plan for patching
  • Suggestions for users and groups
    • OS users & groups (ASM and CRS, individual accounts)
    • DB users (including sysdba, sysasm, sysoper)
    • Account privileges
    • Records and auditing
  • Suggestions for directory layouts
    • OFA
    • Using (or not using) symbolic links
    • Installing cluster software on local storage
  • Server time zones and time synchronization
  • Repackaging oracle binaries
    • Versioning your packages

Managing Configuration & Scripts

  • Centralize
  • Use version control (a.k.a. revision control, source control or change control)
  • Configuration management and automation
  • Configuration that needs managing
    • Database configuration (e.g. tnsnames, wallets, init files, pwfiles)
    • OS configuration (e.g. hugepages, udev/asmlib)
  • Files and directories that need cleanup/housekeeping
  • Consistent management of jobs and schedules (cron, DB, oem, backup software, etc)
    • Built-in DB maintenance windows and jobs
  • Management & administration GUIs

Operations Team Processes

  • Using automation and scripting (deployment, db create, backups, schema/user creation, config)
  • Utilize backups to move data
  • Trust Oracle not Google (when it comes to Oracle software)
    • Cite references in process documentation
    • Official documentation should be the source for 99% of processes
    • MOS is secondary source; notes can have problems but may also be more detailed and up-to-date than the official documentation
    • Heavy internal testing and documentation is only acceptable tertiary source
    • Blogs, forums, wikis and other public websites are never an acceptable source for processes

Operations Team Calendar

  • Assess quarterly patches
  • Schedule recovery exercises
  • Schedule failover exercises
  • The importance of vacations

Attitude and Culture

  • Vendor support services are your ally not your enemy – even if you’re a small company
  • Value soft skills (personality insights, negotiation, other interpersonal communication skills)
  • Value learning & education
    • Encourage a culture of experiments, tests, evidence and proof
    • Always cite references
  • Understand business priorities: customer relationships, vendor relationships, cost, time, budgeting, negotiation, complexity, manpower…
  • Don’t get too cynical or too comfortable; expect business first
  • Importance of breaks and hobbies

This covers a lot of ground – but I still see a few potential gaps.  In particular, I’m still getting up to speed on some of the latest updates to Oracle’s database.  For those of you who are working with 12c – is there anything we should do differently to be prepared for upcoming changes?  For example, I’m still contemplating whether this blueprint should be tweaked for multitenant or far sync readiness.  If you have any thoughts then please share them!

About Jeremy

Building and running reliable data platforms that scale and perform. about.me/jeremy_schneider

Discussion

One thought on “OSP: Overview

  1. Jeremy, Kudos for getting this important topic started. On the “Design Small yet Ready for MAA”, consider adding “Design for Scale out” [i.e. design as if you would deploy on RAC] but “Deploy on Single node before you consider Multi-node”. I agree you should scale up before you scale out, but scaling up usually has a higher cost [think expensive HP Superdomes or IBM P-series servers] and more importantly, a hard upper limit [which is increasing but is still costly]. You might also consider my article “Oracle Scalability – Some perspectives” that was recent;y published on IOUG’s SELECT Journal.

    Like

    Posted by jkanagaraj | December 27, 2013, 2:37 pm

Disclaimer

This is my personal website. The views expressed here are mine alone and may not reflect the views of my employer.

contact: 312-725-9249 or schneider @ ardentperf.com


https://about.me/jeremy_schneider

oaktableocmaceracattack

(a)

Enter your email address to receive notifications of new posts by email.

Join 68 other subscribers