Just a few days ago I was reading Dominic’s challenge to write a query that would give results like an Encyclopedia Spline. It apparently took over the entire day for another Dominic and I was planning to take it on myself too as soon as I had the time… but today another challenge came my way – but not just for entertainment, but for a real client. Basically I found myself looking at a 9000-word query (33 pages when I pasted it into MS Word). It was for a manufacturing client and it was pretty amazing… had been taking about 20-30 minutes but the guy added a single subquery and that plummeted the query to not even finishing in four hours.
Well we got the subquery thing rewritten and fixed the abysmal performance – but after analyzing the query I realized that the whole thing could be drastically improved by using a hierarchical query and SQL analytics. So I simplified the problem and ended up with a very interesting challenge for the day!
But before I share the challenge, let me give a brief overview of the original problem query. I analyzed it a bit and this was the basic structure:
select
[lots of fields]
from (
select
[more fields]
from (
select
[fields]
from
[lots of tables]
union all
select
[fields]
from
[lots of tables]
union all
select
[fields]
from
[lots of tables]
... there were 8 queries union-all-ed together ...
) Q1,
[one other table]
where
[lots of join conditions]
) Q0,
[lots more tables]
where
[lots of join conditions]
;
Yep. It was going to be the underlying query for a discoverer workbook. After talking to the guy onsite for awhile I realized that what they were actually trying to do was a heirarchical query up to 8 levels deep – however there was some really complex stuff in there. The trickiest part was that one of the results had to be the mathematical product of the numeric values from several fields of the current level and from each and every parent level. I was pretty sure that it’s possible to do this with a hierarchical query… but I really wasn’t sure how.
But after simplifying the problem and wrestling with it for a few hours I finally got somewhere. Rather than work on the complex dataset I decided to start by figuring out the hard parts on a significantly simpler dataset: the SCOTT.EMP table. :)
Here’s the output that I needed to (and eventually did) generate:
SQL> /
ENAME EMPNO VP_NAME SAL UPLINE_SAL
--------------- ---------- ---------- ---------- ----------
KING 7839 5000 5000
JONES 7566 JONES 2975 7975
SCOTT 7788 JONES 3000 10975
ADAMS 7876 JONES 1100 12075
FORD 7902 JONES 3000 10975
SMITH 7369 JONES 800 11775
BLAKE 7698 BLAKE 2850 7850
ALLEN 7499 BLAKE 1600 9450
WARD 7521 BLAKE 1250 9100
MARTIN 7654 BLAKE 1250 9100
TURNER 7844 BLAKE 1500 9350
JAMES 7900 BLAKE 950 8800
CLARK 7782 CLARK 2450 7450
MILLER 7934 CLARK 1300 8750
14 rows selected.
The trickiest part was the UPLINE_SAL column. Notice that it is the sum of the employee, his manager, his manager’s manager, etc – as far back as it can go. By the way this has to work on 9i Release 2.
Unfortunately I couldn’t come up with a perfect solution; mine will not go infinite levels (I coded it to handle up to eight levels). So I’m sortof wondering… anyone want to guess how I did it? Maybe you can think of a better way than I did. :)
Hi,
I used a query for parsing the sal string.
SELECT rm,
lvl,
ename,
empno,
sal_str,
first_value(no_space_ename) OVER (PARTITION BY vp_num) vp_name,
sal,
(
SELECT sum(
to_number (
SUBSTR (sal_str,
INSTR(sal_str , ‘+’, 1, ROWNUM ) + 1,
INSTR(sal_str, ‘+’, 1, ROWNUM + 1) –
INSTR(sal_str , ‘+’, 1, ROWNUM ) – 1
))
)
FROM dual
CONNECT BY LEVEL
LikeLike
SELECT rm,
lvl,
ename,
empno,
sal_str,
first_value(no_space_ename) OVER (PARTITION BY vp_num) vp_name,
sal,
(
SELECT sum( to_number (SUBSTR (sal_str, INSTR(sal_str , ‘+’, 1, ROWNUM ) + 1, INSTR(sal_str, ‘+’, 1, ROWNUM + 1) – INSTR(sal_str , ‘+’, 1, ROWNUM ) – 1)))
FROM dual
CONNECT BY LEVEL
LikeLike
Hi,
Query is not fitting in the comments box, do you have an e-mail where i can send it?
Thanks
Raj
LikeLike
hmmm… i think there might be a typo in there somewhere…
also, it might be good to access the EMP table at some point since that’s where the data is… good start though – keep trying!
Update: WordPress apparently thought your last comment was spam until I found it just now… so to answer your question, if you click “Home” and then “Contact Info” you will see my email addresses. Drop me a line anytime!
LikeLike
My query is not fitting completly, i am posting in 3 parts:
Part1:
SELECT rm,
lvl,
ename,
empno,
sal_str,
first_value(no_space_ename) OVER (PARTITION BY vp_num) vp_name,
sal,
(
SELECT sum(
to_number (
SUBSTR (sal_str,
INSTR(sal_str , ‘+’, 1, ROWNUM ) + 1,
INSTR(sal_str, ‘+’, 1, ROWNUM + 1) –
INSTR(sal_str , ‘+’, 1, ROWNUM ) – 1
))
)
FROM dual
CONNECT BY LEVEL
LikeLike
Why posts are not visible, my ie7 says there is some runtime error “Unknown runtime error!”.
Please let me know if i can mail the query.
Yes, I do have selected from EMP table.
My inner most query:
SELECT ROWNUM rm,
LEVEL lvl,
sys_connect_by_path(sal, ‘+’)||’+’ sal_str,
decode(level, 2, rownum, null) vp_rm,
LPAD(ename, length(ename) + level, ‘ ‘) ename,
ename no_space_ename,
empno,
sal
FROM emp
START WITH mgr IS NULL
CONNECT BY mgr = PRIOR empno
Thanks
Raj
LikeLike
Hi
I have pasted the complete query in the following blog.
http://rajplsql.blogspot.com/
Please share your thoughts on the same.
Thanks
Raj
LikeLike
Aaah — I bet that the ‘<‘ was messing you up.
If you embed your code inside <pre> and <code> then it will work and will be formatted better too. You can also use the “Preview” button to check that it works.
Now that I see your whole query I could run it – very cool! I took a very similar approach to you; I’ll post what I did in a few days just in case anyone else is thinking of posting. Only thing is that the VP_NAME column is wrong for everyone under JONES… also, KING lists himself as a VP whereas in the problem statement there is no VP listed for KING. But you got the salary part which I thought was the hardest.
One other thing – I was able to do it without using sys_connect_by_path… ;)
LikeLike