One of my recent customers was a company with a somewhat large warehouse (around 60TB) on Oracle 10gR2. The system was using RAC, though it was a fairly simple setup: two nodes, very large AIX LPARs, workload manually partitioned between them and somewhat evenly balanced. The most important demand of their business is a large number of reports that must be generated every day from the warehouse. These reports were beginning to take most of the day and consume a large amount of resources… and the current forecast is for dramatic data growth later this year. So our project goal was to improve performance.
Our project was very successful – the key changes had dramatic effects and we are now running the same set of reports much faster. (Some reports went from many hours to less than 30 minutes.) My final project summary included a small section where I passed along a few suggestions specifically for these developers – but looking over the suggestions now, I think they might actually be good tips for a wider audience. At any rate it might be interesting to put them out for comments!
Of course these tips aren’t my own original work; in fact I think they’re pretty widely accepted in the Oracle professional community. We’ve been hearing these ideas for years now at conferences and user groups. I just picked a few that could make the biggest impact for this team right now – and I put the ideas in my own words. These were my top three:
- Aggressively search for opportunities to do less computational work.
- Fewer switches between SQL, PL/SQL and Java.
- Java is slowest, PL/SQL is faster, C and SQL are fastest.
- Operate in bulk whenever possible.
- Avoid using any kind of loop in your code whenever possible. If loops are required, operate on large chunks at a time (for example, by increasing fetch sizes).
- Avoid coding in multiple steps whenever possible. If multiple steps are faster, ask how you’re code works and why Oracle isn’t automatically doing the same.
- Always utilize tracing/logging when coding and tuning. Create logs at the most granular level possible, then summarize the results by time (aka profiling). Calibrate everything in terms of wall clock elapsed time for entire job.
- Utilize database 10046 log files for any jobs where the database is an important component. There are several profilers available with various feature levels and price points (starting with free); utilize one of these to summarize the logs. These logs can safely be generated in production.
- Use the Oracle PL/SQL Profiler to generate profiles on critical path PL/SQL code.
- When tuning SQL, follow these steps:
- Remove all hints.
- Use dbms_xplan, 10046 and 10053 traces – in that order – to determine where the CBO went wrong. Compare CBO row count estimates with actual row counts before time-consuming, in-depth analysis of 10053 logs.
- Add hints only with justification about why the hint is preferable to fixing underlying cause. Generally speaking, hints should be viewed as band-aids.
- When a good plan is found and underlying causes cannot be addressed, use DBMS_XPLAN functions with the ADVANCED flag to retrieve a full set of hints. Use this to lock in the good plan and document the situation with SQL comments. (11g will offer new options for plan management, but this method will be sufficient for 10g.)
I’m sure that these tips aren’t perfect – so please share your thoughts! How could I better word them? Do they apply to your development team?
I’m definitely no guru of Oracle performance yet, but I have been involved in a number of performance-oriented projects. My experience has been that it’s pretty common to have certain assumptions built in to the project through scoping and goal statements before I arrive. Often, project sponsors have already decided that one of the most important items in a performance project is that someone with experience in storage and system configuration for Oracle (this is usually me) will review settings and statistics for things like database parameters, AWR reports, operating system kernel settings, networking, I/O, etc. The contractual “statement of work” might say something like “review and assess configuration and architecture” with a list of specific areas of configuration.
I’m not opposed to periodic configuration audits or having a second set of eyes look over the setup. But this very customer was a perfect example of how these assumptions about performance tuning through system review really are not the best way to work. Due to the many parties involved in this contract and various expectations, it was important for me to review the system configuration – so I did spend time on this review. However I was lucky that the team and managers who I directly worked with were sharp and open-minded. After some spirited discussion we slightly altered our approach: starting at the top of the stack and working down rather than the other way around.
Our first two weeks were focused on the most important business reports and only during the last two weeks did I take an in-depth look at storage and system configuration. Can you guess what happened? Looking at the top business problems led us straight to the most important bottlenecks impacting reports across the board. It also showed me which areas of system configuration could have the biggest impact on the business. By the time I started doing a system review, I already knew which areas to focus on. Furthermore, I could already make some estimations about best-case impact of system changes on overall report run-times. Not surprisingly, I knew that no system change was going to give us the improvement we wanted – but tuning a few small pieces of code that were used widely gave us even more improvement than we had thought possible.
Again, this isn’t new… it’s stuff we’ve been hearing at conferences and user groups for years now. But it seems to me that it’s still worth repeating.