Dominic just posted a hint to the encyclopedia spine problem he posted last week. Sheesh… personally I think the hint totally gave it away. Or at least it gave away one solution.
Here’s what I came up with. I wonder if this is how Dominic solved the problem:
set linesize 120
set pagesize 40
col spine format a40
undefine num_pages
with q1 as ( -- get the dataset and use analytics to paginate
select
column_name c,
ntile(&num_pages) over (order by column_name) p
from ac1
),
q2 as ( -- group by pages, get start/end names and lead/lag values to compare
select
p,
min(c) sn,
max(c) en,
nvl(lag(max(c)) over (order by p),min(c)) sn_prev,
nvl(lead(min(c)) over (order by p),max(c)) en_next
from q1
group by p
),
q3 as ( -- generate a regular expression to search for Common Leading Substr
select
p, sn, en,
'^'||rpad(regexp_replace(sn_prev,'(.)','(1'),length(sn_prev)*4,')?') search_sn,
'^'||rpad(regexp_replace(en_next,'(.)','(1'),length(en_next)*4,')?') search_en
from q2
),
q4 as ( -- actually do the regexp search
select
p, sn, en,
nvl(length(regexp_substr(sn,search_sn)),0) cls_sn,
nvl(length(regexp_substr(en,search_en)),0) cls_en
from q3
)
select -- format the output
substr(sn,1,cls_sn+1)||decode(cls_sn,length(sn),'','...')||' thru '||
substr(en,1,cls_en+1)||decode(cls_en,length(en),'','...') spine,
sn, en
from q4
order by p
;
Enter value for num_pages: 15
old 4: ntile(&num_pages) over (order by column_name) p
new 4: ntile(15) over (order by column_name) p
SPINE SN EN
---------------------------------------- ------------------------------ ------------------------------
ABSTRACT_LOBS thru BC... ABSTRACT_LOBS BCV
BE... thru CONC... BENEFIT CONCAT_OPERATION_ID
COND... thru DELETES... CONDITION DELETES
DELETE_... thru EXC... DELETE_FREQ EXCEPTION_INDEX
EXE... thru HINT#... EXE HINT#
HINTC... thru IS_I... HINTCOUNT IS_INTERFACE
IS_L... thru LOG_OWNER IS_LEGACY LOG_OWNER
LOG_OWNERI... thru NOC... LOG_OWNERID NOCACHE_LOBS
NOD... thru PACKAGE_N... NODE PACKAGE_NAME
PACKAGE_P... thru PRED... PACKAGE_PREFIX PREDS_PER_EXPR
PREF... thru RESOLVED... PREFIX_LENGTH RESOLVED_DATE
RESOLVE_... thru SERIAL#... RESOLVE_STATUS SERIAL#
SERIALI... thru STO... SERIALIZABLE STORED
STR... thru TSP... STREAMS_NAME TSPNAME
TS_... thru YOUNGEST TS_ID YOUNGEST
15 rows selected.
And this is the plan it comes up with. Might be able to make this a bit more efficient but it seems good enough for now. :)
SQL> select * from plan;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 477712919
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3354 | 265K| 8 (50)| 00:00:01 |
| 1 | SORT ORDER BY | | 3354 | 265K| 8 (50)| 00:00:01 |
| 2 | VIEW | | 3354 | 265K| 7 (43)| 00:00:01 |
| 3 | WINDOW BUFFER | | 3354 | 98K| 7 (43)| 00:00:01 |
| 4 | SORT GROUP BY | | 3354 | 98K| 7 (43)| 00:00:01 |
| 5 | VIEW | | 3354 | 98K| 5 (20)| 00:00:01 |
| 6 | WINDOW SORT | | 3354 | 40248 | 5 (20)| 00:00:01 |
| 7 | TABLE ACCESS FULL| AC1 | 3354 | 40248 | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------------
14 rows selected.
Interesting, though, how it gets the cardinality wrong. Hmm. I’ll have to look at that closer sometime.



Hi,
Just cut and pasted your solution into 10.2.0.3.0 on Linux – the spines were coming out funny:
SPINE,SN,EN
#… thru B…,#_OF_PRODUCTS,BLOCKSZ
B… thru C…,BLOCKS_CACHED,CONSTR_NAME
C… thru D…,CONSUMED_CPU_TIME,DEFMAXEXTS
D… thru E…,DEFMAXTRANS,ENQUEUE_TIME
E… thru F…,ENQ_CSCN,FROM_VERSION
F… thru I…,FR_OPERATIONS,INTALG
I… thru L…,INTCOL,LINSTANCE#
L… thru M…,LISTENER,MRCT_BASELINE_ID
M… thru O…,MRCT_PURGE_TIME_NUM,OPEXPR_C1
O… thru P…,OPEXPR_N1,POLOWN
P… thru R…,POLPKG,REF_MAKE_USER
R… thru S…,REF_NAME,SERVICE_HASH
S… thru S…,SERVICE_ID,STORAGE
S… thru T…,STORAGESIZE,TRUE_RULES
T… thru Z…,TRUNCATED,ZERO_RESULTS
LikeLike
Posted by Dominic Brooks | May 22, 2007, 2:26 amOops, bitten by the cut-and-paste-backslash-eater again. I copied the query into wordpress and it ate all my backslashes. Should be correct now, I escaped them.
Your output actually makes perfect sense because this was the only line affected (there was a second but it’s almost identical):
It was missing the backslash which effectively made the search string “^(1(1(1)?)?)?” rather than “^(C(O(L)?)?)?” … which obviously isn’t going to find any matches – therefor the algorithm thought that there were no Common Leading Strings and sent back only the first character!
Anyway it should work now…
On a side note, I also tried using the extended regexp syntax “^((?:C(?:O(?:L)?)?)?)” as was discussed in the CLS paper to prevent capturing on all those internal parenthesis… but Oracle doesn’t seem to support that operator for regular expressions.
LikeLike
Posted by Jeremy | May 22, 2007, 6:52 amPretty cool. Yeah, it’s easy once you read and understand that paper. But I’m not sure everyone speaks Regex so well…
LikeLike
Posted by Dominic Delmolino | May 22, 2007, 1:01 pmSheesh… how did I miss the ntile() function. Duuh. Just noticed that everyone else used that, so I updated my query to do the same. Also updated a few variable names to make it easier to follow. And realized that I wasn’t even using the row_number(), so I removed it. Somehow problems like this are just never “done” for me. :)
LikeLike
Posted by Jeremy | May 22, 2007, 2:58 pm