Combining Hierarchical Queries With Analytics

Posted by Jeremy ⋅ May 18, 2007 ⋅ 9 Comments

Just a few days ago I was reading Dominic’s challenge to write a query that would give results like an Encyclopedia Spline. It apparently took over the entire day for another Dominic and I was planning to take it on myself too as soon as I had the time… but today another challenge came my way – but not just for entertainment, but for a real client. Basically I found myself looking at a 9000-word query (33 pages when I pasted it into MS Word). It was for a manufacturing client and it was pretty amazing… had been taking about 20-30 minutes but the guy added a single subquery and that plummeted the query to not even finishing in four hours.

Well we got the subquery thing rewritten and fixed the abysmal performance – but after analyzing the query I realized that the whole thing could be drastically improved by using a hierarchical query and SQL analytics. So I simplified the problem and ended up with a very interesting challenge for the day!

But before I share the challenge, let me give a brief overview of the original problem query. I analyzed it a bit and this was the basic structure:


select
  [lots of fields]
from (
  select
    [more fields]
  from (
    select
      [fields]
    from
      [lots of tables]
  union all
    select
      [fields]
    from
      [lots of tables]
  union all
    select
      [fields]
    from
      [lots of tables]
  ... there were 8 queries union-all-ed together ...
  ) Q1,
  [one other table]
  where
    [lots of join conditions]
) Q0,
[lots more tables]
where
  [lots of join conditions]
;

Yep. It was going to be the underlying query for a discoverer workbook. After talking to the guy onsite for awhile I realized that what they were actually trying to do was a heirarchical query up to 8 levels deep – however there was some really complex stuff in there. The trickiest part was that one of the results had to be the mathematical product of the numeric values from several fields of the current level and from each and every parent level. I was pretty sure that it’s possible to do this with a hierarchical query… but I really wasn’t sure how.

But after simplifying the problem and wrestling with it for a few hours I finally got somewhere. Rather than work on the complex dataset I decided to start by figuring out the hard parts on a significantly simpler dataset: the SCOTT.EMP table. :)

Here’s the output that I needed to (and eventually did) generate:

SQL> /

ENAME                EMPNO VP_NAME           SAL UPLINE_SAL
--------------- ---------- ---------- ---------- ----------
KING                  7839                  5000       5000
 JONES                7566 JONES            2975       7975
  SCOTT               7788 JONES            3000      10975
   ADAMS              7876 JONES            1100      12075
  FORD                7902 JONES            3000      10975
   SMITH              7369 JONES             800      11775
 BLAKE                7698 BLAKE            2850       7850
  ALLEN               7499 BLAKE            1600       9450
  WARD                7521 BLAKE            1250       9100
  MARTIN              7654 BLAKE            1250       9100
  TURNER              7844 BLAKE            1500       9350
  JAMES               7900 BLAKE             950       8800
 CLARK                7782 CLARK            2450       7450
  MILLER              7934 CLARK            1300       8750

14 rows selected.

The trickiest part was the UPLINE_SAL column. Notice that it is the sum of the employee, his manager, his manager’s manager, etc – as far back as it can go. By the way this has to work on 9i Release 2.

Unfortunately I couldn’t come up with a perfect solution; mine will not go infinite levels (I coded it to handle up to eight levels). So I’m sortof wondering… anyone want to guess how I did it? Maybe you can think of a better way than I did. :)

About Jeremy

Building and running reliable data platforms that scale and perform. about.me/jeremy_schneider

View all posts by Jeremy »

Discussion

9 thoughts on “Combining Hierarchical Queries With Analytics”

Hi,

I used a query for parsing the sal string.

SELECT rm,
lvl,
ename,
empno,
sal_str,
first_value(no_space_ename) OVER (PARTITION BY vp_num) vp_name,
sal,
(
SELECT sum(
to_number (
SUBSTR (sal_str,
INSTR(sal_str , ‘+’, 1, ROWNUM ) + 1,
INSTR(sal_str, ‘+’, 1, ROWNUM + 1) –
INSTR(sal_str , ‘+’, 1, ROWNUM ) – 1
))
)
FROM dual
CONNECT BY LEVEL

LikeLike

Posted by Raj | May 18, 2007, 9:49 am
SELECT rm,
lvl,
ename,
empno,
sal_str,
first_value(no_space_ename) OVER (PARTITION BY vp_num) vp_name,
sal,
(
SELECT sum( to_number (SUBSTR (sal_str, INSTR(sal_str , ‘+’, 1, ROWNUM ) + 1, INSTR(sal_str, ‘+’, 1, ROWNUM + 1) – INSTR(sal_str , ‘+’, 1, ROWNUM ) – 1)))
FROM dual
CONNECT BY LEVEL

LikeLike

Posted by Raj | May 18, 2007, 9:51 am
Hi,
Query is not fitting in the comments box, do you have an e-mail where i can send it?

Thanks
Raj

LikeLike

Posted by Raj | May 18, 2007, 10:52 am
hmmm… i think there might be a typo in there somewhere…
```
...
  9  SELECT sum( to_number (SUBSTR (sal_str,
 10    INSTR(sal_str , '+', 1, ROWNUM ) + 1,
 11    INSTR(sal_str, '+', 1, ROWNUM + 1)
 12      - INSTR(sal_str , '+', 1, ROWNUM ) - 1)))
 13  FROM dual
 14* CONNECT BY LEVEL
first_value(no_space_ename) OVER (PARTITION BY vp_num) vp_name,
*
ERROR at line 6:
ORA-00921: unexpected end of SQL command
```
also, it might be good to access the EMP table at some point since that’s where the data is… good start though – keep trying!

Update: WordPress apparently thought your last comment was spam until I found it just now… so to answer your question, if you click “Home” and then “Contact Info” you will see my email addresses. Drop me a line anytime!

LikeLike

Posted by Jeremy | May 18, 2007, 11:46 am
My query is not fitting completly, i am posting in 3 parts:

Part1:
SELECT rm,
lvl,
ename,
empno,
sal_str,
first_value(no_space_ename) OVER (PARTITION BY vp_num) vp_name,
sal,
(
SELECT sum(
to_number (
SUBSTR (sal_str,
INSTR(sal_str , ‘+’, 1, ROWNUM ) + 1,
INSTR(sal_str, ‘+’, 1, ROWNUM + 1) –
INSTR(sal_str , ‘+’, 1, ROWNUM ) – 1
))
)
FROM dual
CONNECT BY LEVEL

LikeLike

Posted by Raj | May 18, 2007, 12:03 pm
Why posts are not visible, my ie7 says there is some runtime error “Unknown runtime error!”.

Please let me know if i can mail the query.

Yes, I do have selected from EMP table.

My inner most query:
SELECT ROWNUM rm,
LEVEL lvl,
sys_connect_by_path(sal, ‘+’)||’+’ sal_str,
decode(level, 2, rownum, null) vp_rm,
LPAD(ename, length(ename) + level, ‘ ‘) ename,
ename no_space_ename,
empno,
sal
FROM emp
START WITH mgr IS NULL
CONNECT BY mgr = PRIOR empno

Thanks
Raj

LikeLike

Posted by Raj | May 18, 2007, 12:12 pm
Hi

I have pasted the complete query in the following blog.

http://rajplsql.blogspot.com/

Please share your thoughts on the same.

Thanks
Raj

LikeLike

Posted by Raj | May 18, 2007, 12:56 pm

Aaah — I bet that the ‘<‘ was messing you up.

If you embed your code inside <pre> and <code> then it will work and will be formatted better too. You can also use the “Preview” button to check that it works.

Now that I see your whole query I could run it – very cool! I took a very similar approach to you; I’ll post what I did in a few days just in case anyone else is thinking of posting. Only thing is that the VP_NAME column is wrong for everyone under JONES… also, KING lists himself as a VP whereas in the problem statement there is no VP listed for KING. But you got the salary part which I thought was the hardest.

One other thing – I was able to do it without using sys_connect_by_path… ;)

SQL> run
  1  select --rm,
  2         --lvl,
  3         ename,
  4         empno,
  5         first_value(no_space_ename) over (partition by vp_num) vp_name,
  6         sal,
  7         (
  8            select sum(
  9                        to_number (
 10                          substr (sal_str,
 11                            instr(sal_str , '+', 1, rownum ) + 1,
 12                            instr(sal_str, '+', 1, rownum + 1) -
 13                              instr(sal_str , '+', 1, rownum ) - 1
 14                              ))
 15                      )
 16            from dual
 17            connect by level <=
 18                      length(
 19                        translate (q1.sal_str, chr(9)||'0123456789', ' '))  -1
 20         ) sal_sum
 21  from
 22  (
 23  select rm,
 24         lvl,
 25         ename,
 26         empno,
 27         no_space_ename,
 28         sal_str,
 29         max(vp_rm) over (order by rownum) vp_num,
 30         sal
 31  from
 32  (
 33  select rownum rm,
 34         level lvl,
 35         sys_connect_by_path(sal, '+')||'+' sal_str,
 36         decode(level, 2, rownum, null) vp_rm,
 37         lpad(ename, length(ename) + level, ' ') ename,
 38         ename no_space_ename,
 39         empno,
 40         sal
 41  from emp
 42  start with mgr is null
 43  connect by mgr = prior empno
 44  )
 45  ) q1
 46* order by rm

ENAME                EMPNO VP_NAME           SAL    SAL_SUM
--------------- ---------- ---------- ---------- ----------
 KING                 7839 KING             5000       5000
  JONES               7566 SMITH            2975       7975
   SCOTT              7788 SMITH            3000      10975
    ADAMS             7876 SMITH            1100      12075
   FORD               7902 SMITH            3000      10975
    SMITH             7369 SMITH             800      11775
  BLAKE               7698 BLAKE            2850       7850
   ALLEN              7499 BLAKE            1600       9450
   WARD               7521 BLAKE            1250       9100
   MARTIN             7654 BLAKE            1250       9100
   TURNER             7844 BLAKE            1500       9350
   JAMES              7900 BLAKE             950       8800
  CLARK               7782 CLARK            2450       7450
   MILLER             7934 CLARK            1300       8750

14 rows selected.

LikeLike

Posted by Jeremy | May 18, 2007, 1:06 pm