Encyclopedia Spine Problem

Posted by Jeremy ⋅ May 21, 2007 ⋅ 4 Comments

Dominic just posted a hint to the encyclopedia spine problem he posted last week. Sheesh… personally I think the hint totally gave it away. Or at least it gave away one solution.

Here’s what I came up with. I wonder if this is how Dominic solved the problem:

set linesize 120
set pagesize 40
col spine format a40
undefine num_pages
with q1 as ( -- get the dataset and use analytics to paginate
  select 
    column_name c, 
    ntile(&num_pages) over (order by column_name) p
  from ac1
),
q2 as ( -- group by pages, get start/end names and lead/lag values to compare
  select 
    p,
    min(c) sn, 
    max(c) en,
    nvl(lag(max(c)) over (order by p),min(c)) sn_prev,
    nvl(lead(min(c)) over (order by p),max(c)) en_next
  from q1
group by p
),
q3 as ( -- generate a regular expression to search for Common Leading Substr
  select
    p, sn, en, 
    '^'||rpad(regexp_replace(sn_prev,'(.)','(1'),length(sn_prev)*4,')?') search_sn,
    '^'||rpad(regexp_replace(en_next,'(.)','(1'),length(en_next)*4,')?') search_en
  from q2
),
q4 as ( -- actually do the regexp search
  select
    p, sn, en,
    nvl(length(regexp_substr(sn,search_sn)),0) cls_sn,
    nvl(length(regexp_substr(en,search_en)),0) cls_en
  from q3
)
select -- format the output
  substr(sn,1,cls_sn+1)||decode(cls_sn,length(sn),'','...')||' thru '||
  substr(en,1,cls_en+1)||decode(cls_en,length(en),'','...') spine,
  sn, en
from q4
order by p
;


Enter value for num_pages: 15
old   4:     ntile(&num_pages) over (order by column_name) p
new   4:     ntile(15) over (order by column_name) p

SPINE                                    SN                             EN
---------------------------------------- ------------------------------ ------------------------------
ABSTRACT_LOBS thru BC...                 ABSTRACT_LOBS                  BCV
BE... thru CONC...                       BENEFIT                        CONCAT_OPERATION_ID
COND... thru DELETES...                  CONDITION                      DELETES
DELETE_... thru EXC...                   DELETE_FREQ                    EXCEPTION_INDEX
EXE... thru HINT#...                     EXE                            HINT#
HINTC... thru IS_I...                    HINTCOUNT                      IS_INTERFACE
IS_L... thru LOG_OWNER                   IS_LEGACY                      LOG_OWNER
LOG_OWNERI... thru NOC...                LOG_OWNERID                    NOCACHE_LOBS
NOD... thru PACKAGE_N...                 NODE                           PACKAGE_NAME
PACKAGE_P... thru PRED...                PACKAGE_PREFIX                 PREDS_PER_EXPR
PREF... thru RESOLVED...                 PREFIX_LENGTH                  RESOLVED_DATE
RESOLVE_... thru SERIAL#...              RESOLVE_STATUS                 SERIAL#
SERIALI... thru STO...                   SERIALIZABLE                   STORED
STR... thru TSP...                       STREAMS_NAME                   TSPNAME
TS_... thru YOUNGEST                     TS_ID                          YOUNGEST

15 rows selected.

And this is the plan it comes up with. Might be able to make this a bit more efficient but it seems good enough for now. :)

SQL> select * from plan;

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 477712919

--------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |  3354 |   265K|     8  (50)| 00:00:01 |
|   1 |  SORT ORDER BY          |      |  3354 |   265K|     8  (50)| 00:00:01 |
|   2 |   VIEW                  |      |  3354 |   265K|     7  (43)| 00:00:01 |
|   3 |    WINDOW BUFFER        |      |  3354 |    98K|     7  (43)| 00:00:01 |
|   4 |     SORT GROUP BY       |      |  3354 |    98K|     7  (43)| 00:00:01 |
|   5 |      VIEW               |      |  3354 |    98K|     5  (20)| 00:00:01 |
|   6 |       WINDOW SORT       |      |  3354 | 40248 |     5  (20)| 00:00:01 |
|   7 |        TABLE ACCESS FULL| AC1  |  3354 | 40248 |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------

14 rows selected.

Interesting, though, how it gets the cardinality wrong. Hmm. I’ll have to look at that closer sometime.

About Jeremy

Building and running reliable data platforms that scale and perform. about.me/jeremy_schneider

View all posts by Jeremy »

Discussion

4 thoughts on “Encyclopedia Spine Problem”

Hi,

Just cut and pasted your solution into 10.2.0.3.0 on Linux – the spines were coming out funny:

SPINE,SN,EN
#… thru B…,#_OF_PRODUCTS,BLOCKSZ
B… thru C…,BLOCKS_CACHED,CONSTR_NAME
C… thru D…,CONSUMED_CPU_TIME,DEFMAXEXTS
D… thru E…,DEFMAXTRANS,ENQUEUE_TIME
E… thru F…,ENQ_CSCN,FROM_VERSION
F… thru I…,FR_OPERATIONS,INTALG
I… thru L…,INTCOL,LINSTANCE#
L… thru M…,LISTENER,MRCT_BASELINE_ID
M… thru O…,MRCT_PURGE_TIME_NUM,OPEXPR_C1
O… thru P…,OPEXPR_N1,POLOWN
P… thru R…,POLPKG,REF_MAKE_USER
R… thru S…,REF_NAME,SERVICE_HASH
S… thru S…,SERVICE_ID,STORAGE
S… thru T…,STORAGESIZE,TRUE_RULES
T… thru Z…,TRUNCATED,ZERO_RESULTS

LikeLike

Posted by Dominic Brooks | May 22, 2007, 2:26 am
Oops, bitten by the cut-and-paste-backslash-eater again. I copied the query into wordpress and it ate all my backslashes. Should be correct now, I escaped them.

Your output actually makes perfect sense because this was the only line affected (there was a second but it’s almost identical):
```
'^'||rpad(regexp_replace(sn_prev,'(.)','(1'),length(sn_prev)*4,')?') search_sn,
```
It was missing the backslash which effectively made the search string “^(1(1(1)?)?)?” rather than “^(C(O(L)?)?)?” … which obviously isn’t going to find any matches – therefor the algorithm thought that there were no Common Leading Strings and sent back only the first character!

Anyway it should work now…

On a side note, I also tried using the extended regexp syntax “^((?:C(?:O(?:L)?)?)?)” as was discussed in the CLS paper to prevent capturing on all those internal parenthesis… but Oracle doesn’t seem to support that operator for regular expressions.

LikeLike

Posted by Jeremy | May 22, 2007, 6:52 am
Pretty cool. Yeah, it’s easy once you read and understand that paper. But I’m not sure everyone speaks Regex so well…

LikeLike

Posted by Dominic Delmolino | May 22, 2007, 1:01 pm
Sheesh… how did I miss the ntile() function. Duuh. Just noticed that everyone else used that, so I updated my query to do the same. Also updated a few variable names to make it easier to follow. And realized that I wasn’t even using the row_number(), so I removed it. Somehow problems like this are just never “done” for me. :)

LikeLike

Posted by Jeremy | May 22, 2007, 2:58 pm