>
Oracle, Technical

Encyclopedia Spine Problem

Dominic just posted a hint to the encyclopedia spine problem he posted last week. Sheesh… personally I think the hint totally gave it away. Or at least it gave away one solution.

Here’s what I came up with. I wonder if this is how Dominic solved the problem:

set linesize 120
set pagesize 40
col spine format a40
undefine num_pages
with q1 as ( -- get the dataset and use analytics to paginate
  select 
    column_name c, 
    ntile(&num_pages) over (order by column_name) p
  from ac1
),
q2 as ( -- group by pages, get start/end names and lead/lag values to compare
  select 
    p,
    min(c) sn, 
    max(c) en,
    nvl(lag(max(c)) over (order by p),min(c)) sn_prev,
    nvl(lead(min(c)) over (order by p),max(c)) en_next
  from q1
group by p
),
q3 as ( -- generate a regular expression to search for Common Leading Substr
  select
    p, sn, en, 
    '^'||rpad(regexp_replace(sn_prev,'(.)','(1'),length(sn_prev)*4,')?') search_sn,
    '^'||rpad(regexp_replace(en_next,'(.)','(1'),length(en_next)*4,')?') search_en
  from q2
),
q4 as ( -- actually do the regexp search
  select
    p, sn, en,
    nvl(length(regexp_substr(sn,search_sn)),0) cls_sn,
    nvl(length(regexp_substr(en,search_en)),0) cls_en
  from q3
)
select -- format the output
  substr(sn,1,cls_sn+1)||decode(cls_sn,length(sn),'','...')||' thru '||
  substr(en,1,cls_en+1)||decode(cls_en,length(en),'','...') spine,
  sn, en
from q4
order by p
;


Enter value for num_pages: 15
old   4:     ntile(&num_pages) over (order by column_name) p
new   4:     ntile(15) over (order by column_name) p

SPINE                                    SN                             EN
---------------------------------------- ------------------------------ ------------------------------
ABSTRACT_LOBS thru BC...                 ABSTRACT_LOBS                  BCV
BE... thru CONC...                       BENEFIT                        CONCAT_OPERATION_ID
COND... thru DELETES...                  CONDITION                      DELETES
DELETE_... thru EXC...                   DELETE_FREQ                    EXCEPTION_INDEX
EXE... thru HINT#...                     EXE                            HINT#
HINTC... thru IS_I...                    HINTCOUNT                      IS_INTERFACE
IS_L... thru LOG_OWNER                   IS_LEGACY                      LOG_OWNER
LOG_OWNERI... thru NOC...                LOG_OWNERID                    NOCACHE_LOBS
NOD... thru PACKAGE_N...                 NODE                           PACKAGE_NAME
PACKAGE_P... thru PRED...                PACKAGE_PREFIX                 PREDS_PER_EXPR
PREF... thru RESOLVED...                 PREFIX_LENGTH                  RESOLVED_DATE
RESOLVE_... thru SERIAL#...              RESOLVE_STATUS                 SERIAL#
SERIALI... thru STO...                   SERIALIZABLE                   STORED
STR... thru TSP...                       STREAMS_NAME                   TSPNAME
TS_... thru YOUNGEST                     TS_ID                          YOUNGEST

15 rows selected.

And this is the plan it comes up with. Might be able to make this a bit more efficient but it seems good enough for now. :)

SQL> select * from plan;

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 477712919

--------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |  3354 |   265K|     8  (50)| 00:00:01 |
|   1 |  SORT ORDER BY          |      |  3354 |   265K|     8  (50)| 00:00:01 |
|   2 |   VIEW                  |      |  3354 |   265K|     7  (43)| 00:00:01 |
|   3 |    WINDOW BUFFER        |      |  3354 |    98K|     7  (43)| 00:00:01 |
|   4 |     SORT GROUP BY       |      |  3354 |    98K|     7  (43)| 00:00:01 |
|   5 |      VIEW               |      |  3354 |    98K|     5  (20)| 00:00:01 |
|   6 |       WINDOW SORT       |      |  3354 | 40248 |     5  (20)| 00:00:01 |
|   7 |        TABLE ACCESS FULL| AC1  |  3354 | 40248 |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------

14 rows selected.

Interesting, though, how it gets the cardinality wrong. Hmm. I’ll have to look at that closer sometime.

About Jeremy

Doing stuff with Oracle Database, Performance, Clusters, Linux. about.me/jeremy_schneider

Discussion

4 thoughts on “Encyclopedia Spine Problem

  1. Hi,

    Just cut and pasted your solution into 10.2.0.3.0 on Linux – the spines were coming out funny:

    SPINE,SN,EN
    #… thru B…,#_OF_PRODUCTS,BLOCKSZ
    B… thru C…,BLOCKS_CACHED,CONSTR_NAME
    C… thru D…,CONSUMED_CPU_TIME,DEFMAXEXTS
    D… thru E…,DEFMAXTRANS,ENQUEUE_TIME
    E… thru F…,ENQ_CSCN,FROM_VERSION
    F… thru I…,FR_OPERATIONS,INTALG
    I… thru L…,INTCOL,LINSTANCE#
    L… thru M…,LISTENER,MRCT_BASELINE_ID
    M… thru O…,MRCT_PURGE_TIME_NUM,OPEXPR_C1
    O… thru P…,OPEXPR_N1,POLOWN
    P… thru R…,POLPKG,REF_MAKE_USER
    R… thru S…,REF_NAME,SERVICE_HASH
    S… thru S…,SERVICE_ID,STORAGE
    S… thru T…,STORAGESIZE,TRUE_RULES
    T… thru Z…,TRUNCATED,ZERO_RESULTS

    Posted by Dominic Brooks | May 22, 2007, 2:26 am
  2. Oops, bitten by the cut-and-paste-backslash-eater again. I copied the query into wordpress and it ate all my backslashes. Should be correct now, I escaped them.

    Your output actually makes perfect sense because this was the only line affected (there was a second but it’s almost identical):

    '^'||rpad(regexp_replace(sn_prev,'(.)','(1'),length(sn_prev)*4,')?') search_sn,
    
    

    It was missing the backslash which effectively made the search string “^(1(1(1)?)?)?” rather than “^(C(O(L)?)?)?” … which obviously isn’t going to find any matches – therefor the algorithm thought that there were no Common Leading Strings and sent back only the first character!

    Anyway it should work now…

    On a side note, I also tried using the extended regexp syntax “^((?:C(?:O(?:L)?)?)?)” as was discussed in the CLS paper to prevent capturing on all those internal parenthesis… but Oracle doesn’t seem to support that operator for regular expressions.

    Posted by Jeremy | May 22, 2007, 6:52 am
  3. Pretty cool. Yeah, it’s easy once you read and understand that paper. But I’m not sure everyone speaks Regex so well…

    Posted by Dominic Delmolino | May 22, 2007, 1:01 pm
  4. Sheesh… how did I miss the ntile() function. Duuh. Just noticed that everyone else used that, so I updated my query to do the same. Also updated a few variable names to make it easier to follow. And realized that I wasn’t even using the row_number(), so I removed it. Somehow problems like this are just never “done” for me. :)

    Posted by Jeremy | May 22, 2007, 2:58 pm

Disclaimer

(a) This is my personal blog.

The views expressed on this website are mine alone and do not necessarily reflect the views of my employer.

racattack-iconOakTableLogo-small

about.me

Jeremy Schneider
Follow

Get every new post delivered to your Inbox.

Join 931 other followers

%d bloggers like this: