>
Oracle, Technical

Combining Hierarchical Queries With Analytics

Just a few days ago I was reading Dominic’s challenge to write a query that would give results like an Encyclopedia Spline. It apparently took over the entire day for another Dominic and I was planning to take it on myself too as soon as I had the time… but today another challenge came my way – but not just for entertainment, but for a real client. Basically I found myself looking at a 9000-word query (33 pages when I pasted it into MS Word). It was for a manufacturing client and it was pretty amazing… had been taking about 20-30 minutes but the guy added a single subquery and that plummeted the query to not even finishing in four hours.

Well we got the subquery thing rewritten and fixed the abysmal performance – but after analyzing the query I realized that the whole thing could be drastically improved by using a hierarchical query and SQL analytics. So I simplified the problem and ended up with a very interesting challenge for the day!

But before I share the challenge, let me give a brief overview of the original problem query. I analyzed it a bit and this was the basic structure:


select
  [lots of fields]
from (
  select
    [more fields]
  from (
    select
      [fields]
    from
      [lots of tables]
  union all
    select
      [fields]
    from
      [lots of tables]
  union all
    select
      [fields]
    from
      [lots of tables]
  ... there were 8 queries union-all-ed together ...
  ) Q1,
  [one other table]
  where
    [lots of join conditions]
) Q0,
[lots more tables]
where
  [lots of join conditions]
;

Yep. It was going to be the underlying query for a discoverer workbook. After talking to the guy onsite for awhile I realized that what they were actually trying to do was a heirarchical query up to 8 levels deep – however there was some really complex stuff in there. The trickiest part was that one of the results had to be the mathematical product of the numeric values from several fields of the current level and from each and every parent level. I was pretty sure that it’s possible to do this with a hierarchical query… but I really wasn’t sure how.

But after simplifying the problem and wrestling with it for a few hours I finally got somewhere. Rather than work on the complex dataset I decided to start by figuring out the hard parts on a significantly simpler dataset: the SCOTT.EMP table. :)

Here’s the output that I needed to (and eventually did) generate:

SQL> /

ENAME                EMPNO VP_NAME           SAL UPLINE_SAL
--------------- ---------- ---------- ---------- ----------
KING                  7839                  5000       5000
 JONES                7566 JONES            2975       7975
  SCOTT               7788 JONES            3000      10975
   ADAMS              7876 JONES            1100      12075
  FORD                7902 JONES            3000      10975
   SMITH              7369 JONES             800      11775
 BLAKE                7698 BLAKE            2850       7850
  ALLEN               7499 BLAKE            1600       9450
  WARD                7521 BLAKE            1250       9100
  MARTIN              7654 BLAKE            1250       9100
  TURNER              7844 BLAKE            1500       9350
  JAMES               7900 BLAKE             950       8800
 CLARK                7782 CLARK            2450       7450
  MILLER              7934 CLARK            1300       8750

14 rows selected.

The trickiest part was the UPLINE_SAL column. Notice that it is the sum of the employee, his manager, his manager’s manager, etc – as far back as it can go. By the way this has to work on 9i Release 2.

Unfortunately I couldn’t come up with a perfect solution; mine will not go infinite levels (I coded it to handle up to eight levels). So I’m sortof wondering… anyone want to guess how I did it? Maybe you can think of a better way than I did. :)

About Jeremy

Building and running reliable data platforms that scale and perform. about.me/jeremy_schneider

Discussion

9 thoughts on “Combining Hierarchical Queries With Analytics

  1. Hi,

    I used a query for parsing the sal string.

    SELECT rm,
    lvl,
    ename,
    empno,
    sal_str,
    first_value(no_space_ename) OVER (PARTITION BY vp_num) vp_name,
    sal,
    (
    SELECT sum(
    to_number (
    SUBSTR (sal_str,
    INSTR(sal_str , ‘+’, 1, ROWNUM ) + 1,
    INSTR(sal_str, ‘+’, 1, ROWNUM + 1) –
    INSTR(sal_str , ‘+’, 1, ROWNUM ) – 1
    ))
    )
    FROM dual
    CONNECT BY LEVEL

    Like

    Posted by Raj | May 18, 2007, 9:49 am
  2. SELECT rm,
    lvl,
    ename,
    empno,
    sal_str,
    first_value(no_space_ename) OVER (PARTITION BY vp_num) vp_name,
    sal,
    (
    SELECT sum( to_number (SUBSTR (sal_str, INSTR(sal_str , ‘+’, 1, ROWNUM ) + 1, INSTR(sal_str, ‘+’, 1, ROWNUM + 1) – INSTR(sal_str , ‘+’, 1, ROWNUM ) – 1)))
    FROM dual
    CONNECT BY LEVEL

    Like

    Posted by Raj | May 18, 2007, 9:51 am
  3. Hi,
    Query is not fitting in the comments box, do you have an e-mail where i can send it?

    Thanks
    Raj

    Like

    Posted by Raj | May 18, 2007, 10:52 am
  4. hmmm… i think there might be a typo in there somewhere…

    ...
      9  SELECT sum( to_number (SUBSTR (sal_str,
     10    INSTR(sal_str , '+', 1, ROWNUM ) + 1,
     11    INSTR(sal_str, '+', 1, ROWNUM + 1)
     12      - INSTR(sal_str , '+', 1, ROWNUM ) - 1)))
     13  FROM dual
     14* CONNECT BY LEVEL
    first_value(no_space_ename) OVER (PARTITION BY vp_num) vp_name,
    *
    ERROR at line 6:
    ORA-00921: unexpected end of SQL command
    

    also, it might be good to access the EMP table at some point since that’s where the data is… good start though – keep trying!

    Update: WordPress apparently thought your last comment was spam until I found it just now… so to answer your question, if you click “Home” and then “Contact Info” you will see my email addresses. Drop me a line anytime!

    Like

    Posted by Jeremy | May 18, 2007, 11:46 am
  5. My query is not fitting completly, i am posting in 3 parts:

    Part1:
    SELECT rm,
    lvl,
    ename,
    empno,
    sal_str,
    first_value(no_space_ename) OVER (PARTITION BY vp_num) vp_name,
    sal,
    (
    SELECT sum(
    to_number (
    SUBSTR (sal_str,
    INSTR(sal_str , ‘+’, 1, ROWNUM ) + 1,
    INSTR(sal_str, ‘+’, 1, ROWNUM + 1) –
    INSTR(sal_str , ‘+’, 1, ROWNUM ) – 1
    ))
    )
    FROM dual
    CONNECT BY LEVEL

    Like

    Posted by Raj | May 18, 2007, 12:03 pm
  6. Why posts are not visible, my ie7 says there is some runtime error “Unknown runtime error!”.

    Please let me know if i can mail the query.

    Yes, I do have selected from EMP table.

    My inner most query:
    SELECT ROWNUM rm,
    LEVEL lvl,
    sys_connect_by_path(sal, ‘+’)||’+’ sal_str,
    decode(level, 2, rownum, null) vp_rm,
    LPAD(ename, length(ename) + level, ‘ ‘) ename,
    ename no_space_ename,
    empno,
    sal
    FROM emp
    START WITH mgr IS NULL
    CONNECT BY mgr = PRIOR empno

    Thanks
    Raj

    Like

    Posted by Raj | May 18, 2007, 12:12 pm
  7. Hi

    I have pasted the complete query in the following blog.

    http://rajplsql.blogspot.com/

    Please share your thoughts on the same.

    Thanks
    Raj

    Like

    Posted by Raj | May 18, 2007, 12:56 pm
  8. Aaah — I bet that the ‘<‘ was messing you up.

    If you embed your code inside <pre> and <code> then it will work and will be formatted better too. You can also use the “Preview” button to check that it works.

    Now that I see your whole query I could run it – very cool! I took a very similar approach to you; I’ll post what I did in a few days just in case anyone else is thinking of posting. Only thing is that the VP_NAME column is wrong for everyone under JONES… also, KING lists himself as a VP whereas in the problem statement there is no VP listed for KING. But you got the salary part which I thought was the hardest.

    One other thing – I was able to do it without using sys_connect_by_path… ;)

    SQL> run
      1  select --rm,
      2         --lvl,
      3         ename,
      4         empno,
      5         first_value(no_space_ename) over (partition by vp_num) vp_name,
      6         sal,
      7         (
      8            select sum(
      9                        to_number (
     10                          substr (sal_str,
     11                            instr(sal_str , '+', 1, rownum ) + 1,
     12                            instr(sal_str, '+', 1, rownum + 1) -
     13                              instr(sal_str , '+', 1, rownum ) - 1
     14                              ))
     15                      )
     16            from dual
     17            connect by level <=
     18                      length(
     19                        translate (q1.sal_str, chr(9)||'0123456789', ' '))  -1
     20         ) sal_sum
     21  from
     22  (
     23  select rm,
     24         lvl,
     25         ename,
     26         empno,
     27         no_space_ename,
     28         sal_str,
     29         max(vp_rm) over (order by rownum) vp_num,
     30         sal
     31  from
     32  (
     33  select rownum rm,
     34         level lvl,
     35         sys_connect_by_path(sal, '+')||'+' sal_str,
     36         decode(level, 2, rownum, null) vp_rm,
     37         lpad(ename, length(ename) + level, ' ') ename,
     38         ename no_space_ename,
     39         empno,
     40         sal
     41  from emp
     42  start with mgr is null
     43  connect by mgr = prior empno
     44  )
     45  ) q1
     46* order by rm
    
    ENAME                EMPNO VP_NAME           SAL    SAL_SUM
    --------------- ---------- ---------- ---------- ----------
     KING                 7839 KING             5000       5000
      JONES               7566 SMITH            2975       7975
       SCOTT              7788 SMITH            3000      10975
        ADAMS             7876 SMITH            1100      12075
       FORD               7902 SMITH            3000      10975
        SMITH             7369 SMITH             800      11775
      BLAKE               7698 BLAKE            2850       7850
       ALLEN              7499 BLAKE            1600       9450
       WARD               7521 BLAKE            1250       9100
       MARTIN             7654 BLAKE            1250       9100
       TURNER             7844 BLAKE            1500       9350
       JAMES              7900 BLAKE             950       8800
      CLARK               7782 CLARK            2450       7450
       MILLER             7934 CLARK            1300       8750
    
    14 rows selected.
    

    Like

    Posted by Jeremy | May 18, 2007, 1:06 pm

Trackbacks/Pingbacks

  1. Pingback: Accessing Arbitrary Ancestors in Hierarchical Queries : Ardent Performance Computing - May 23, 2007

Disclaimer

This is my personal website. The views expressed here are mine alone and may not reflect the views of my employer.

contact: 312-725-9249 or schneider @ ardentperf.com


https://about.me/jeremy_schneider

oaktableocmaceracattack

(a)

Enter your email address to receive notifications of new posts by email.

Join 56 other subscribers
%d bloggers like this: