Just a few days ago I was reading Dominic’s challenge to write a query that would give results like an Encyclopedia Spline. It apparently took over the entire day for another Dominic and I was planning to take it on myself too as soon as I had the time… but today another challenge came my way – but not just for entertainment, but for a real client. Basically I found myself looking at a 9000-word query (33 pages when I pasted it into MS Word). It was for a manufacturing client and it was pretty amazing… had been taking about 20-30 minutes but the guy added a single subquery and that plummeted the query to not even finishing in four hours.
Well we got the subquery thing rewritten and fixed the abysmal performance – but after analyzing the query I realized that the whole thing could be drastically improved by using a hierarchical query and SQL analytics. So I simplified the problem and ended up with a very interesting challenge for the day!
But before I share the challenge, let me give a brief overview of the original problem query. I analyzed it a bit and this was the basic structure:
select [lots of fields] from ( select [more fields] from ( select [fields] from [lots of tables] union all select [fields] from [lots of tables] union all select [fields] from [lots of tables] ... there were 8 queries union-all-ed together ... ) Q1, [one other table] where [lots of join conditions] ) Q0, [lots more tables] where [lots of join conditions] ;
Yep. It was going to be the underlying query for a discoverer workbook. After talking to the guy onsite for awhile I realized that what they were actually trying to do was a heirarchical query up to 8 levels deep – however there was some really complex stuff in there. The trickiest part was that one of the results had to be the mathematical product of the numeric values from several fields of the current level and from each and every parent level. I was pretty sure that it’s possible to do this with a hierarchical query… but I really wasn’t sure how.
But after simplifying the problem and wrestling with it for a few hours I finally got somewhere. Rather than work on the complex dataset I decided to start by figuring out the hard parts on a significantly simpler dataset: the SCOTT.EMP table. :)
Here’s the output that I needed to (and eventually did) generate:
SQL> / ENAME EMPNO VP_NAME SAL UPLINE_SAL --------------- ---------- ---------- ---------- ---------- KING 7839 5000 5000 JONES 7566 JONES 2975 7975 SCOTT 7788 JONES 3000 10975 ADAMS 7876 JONES 1100 12075 FORD 7902 JONES 3000 10975 SMITH 7369 JONES 800 11775 BLAKE 7698 BLAKE 2850 7850 ALLEN 7499 BLAKE 1600 9450 WARD 7521 BLAKE 1250 9100 MARTIN 7654 BLAKE 1250 9100 TURNER 7844 BLAKE 1500 9350 JAMES 7900 BLAKE 950 8800 CLARK 7782 CLARK 2450 7450 MILLER 7934 CLARK 1300 8750 14 rows selected.
The trickiest part was the UPLINE_SAL column. Notice that it is the sum of the employee, his manager, his manager’s manager, etc – as far back as it can go. By the way this has to work on 9i Release 2.
Unfortunately I couldn’t come up with a perfect solution; mine will not go infinite levels (I coded it to handle up to eight levels). So I’m sortof wondering… anyone want to guess how I did it? Maybe you can think of a better way than I did. :)