Archive for the ‘Wall of Shame’ Category.

Funny Developer Tricks – Decode

I ran into a really ugly SQL statement last week. It was the most expensive statement running on the system for the last several weeks. On top of the fact that the statement ran for hours, it also had a number of “issues”. The statement had the following characteristics:

  • Several DISTINCT Operators – these tend to show up in statements where developers have left off a join condition
  • ANSI Join Syntax – generally Oracle converts this syntax back to Oracle join syntax under the covers. I realize this is standard SQL syntax, but it still makes me nervous due to previous experience with problems caused by this conversion process.
  • Functions Used in WHERE Clauses (DECODE and NVL in particular) – can disable indexes
  • UNION – forces a sort of the union’ed result sets so that it can eliminate duplicates (which is often not necessary). Depending on how the statement is coded, UNIONs can also sometimes result in multiple accesses of the same objects (i.e. if the developer has used the UNION in the place of a more robust where clause).
  • Numerous != and <> Comparison Operators – Not necessarily a problem but often can be coded in a more efficient manner
  • Several OR’s – OR’s are tricky. I get suspicious when there are numerous other problems in a statement as well.

I’ll refrain from publishing the actual statement to protect the guilty, but one of the most interesting bits of the statement looked something like this:

...
WHERE a.col1 =  DECODE ('XYZ', '*', a.col1 ,'XYZ')
...

The processing of the where clause can be roughly translated to the following psuedo code:


if 'XYZ' = '*' then 
  x = a.col1
else
  x = 'XYZ'
end if

...
WHERE a.col1 = x
...

Since the literal ‘XYZ’ will never be equal to the literal ‘*’, the value returned by this DECODE will always be ‘XYZ’. Therefore the statement could have been written much more simply as:

...
WHERE a.col1 = 'XYZ'
...

So this is an extremely strange way to write the statement and clouds the intention considerably, but does it hurt performance? Well actually no. The optimizer is smart enough to know that this DECODE will always result in the same literal value and so it appears that Oracle does not run the DECODE statement for each row. Here’s a quick example from a 10.2.0.4 database. (note that I used dplan.sql and fss.sql in this code snippet)

Continue reading ‘Funny Developer Tricks – Decode’ »

Most Expensive SQL Statement Ever

You know the cost calculation that the cost based optimizer (CBO) uses to determine which execution plan to choose for a SQL statement, right? If you don’t, you should immediately stop reading this and pick up a good novel instead. Ah, you’re still here? Well I got an interesting email today from one of my co-workers saying he had to kill a query yesterday. Actually that’s a big part of his current job. Killing runaway queries – apparently that job takes most of his time between 8 and 5. Anyway, he sent me this execution plan today, no comments, “just have a look at this”, he said.

---------------------------------------------------------------------------------------------------
| Id  | Operation              | Name             | Rows  | Bytes|TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                  |       |       |      |    65P(100)|          |
|   1 |  SORT ORDER BY         |                  |    18E|    15E|   15E|    65P (78)|999:59:59 |
|   2 |   COUNT                |                  |       |       |      |            |          |
|*  3 |    FILTER              |                  |       |       |      |            |          |
|   4 |     NESTED LOOPS       |                  |    18E|    15E|      |    14P  (3)|999:59:59 |
|   5 |      NESTED LOOPS      |                  |   984G|   216T|      |    14G  (3)|999:59:59 |
|   6 |       TABLE ACCESS FULL| CAT_6000_6001TBL |  7270K|  1074M|      |   176K  (3)| 00:15:46 |
|   7 |       TABLE ACCESS FULL| CAT_6000TBL      |   135K|    11M|      |  1950   (3)| 00:00:11 |
|   8 |      INDEX FULL SCAN   | PK_OBJECTS       |    32M|   306M|      | 15207   (3)| 00:01:22 |
---------------------------------------------------------------------------------------------------

So I had a look. Yes – that’s a 65P in the cost column. I’ve seen worse (but not in a production system). Cost is not always a good indication of run time, by the way. It’s just a sort of normalized estimation after all. But the estimate for the number of rows and bytes (18E and 15E) are very impressive as well. This query ran for several hours before my buddy finally killed it. As you might expect, the query was missing a join condition between a couple of large tables (7M and 32M).

Here’s a test I worked up to see how big a number I could get.

SYS@LAB1024> !cat dplan.sql
set lines 150
select * from table(dbms_xplan.display_cursor('&sql_id','&child_no','typical'))
/

SYS@LAB1024> @dplan
Enter value for sql_id: gf5nnx0pyfqq2
Enter value for child_no: 

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  gf5nnx0pyfqq2, child number 0
-------------------------------------
select a.col2, sum(a.col1) from kso.skew a, kso.skew b group by a.col2

Plan hash value: 321450672

-----------------------------------------------------------------------------------
| Id  | Operation               | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |         |       |       |   689G(100)|          |
|   1 |  HASH GROUP BY          |         |     1 |    16 |   689G (84)|999:59:59 |
|   2 |   MERGE JOIN CARTESIAN  |         |  1024T|    14P|   145G (22)|999:59:59 |
|   3 |    TABLE ACCESS FULL    | SKEW    |    32M|   488M| 10032  (18)| 00:01:21 |
|   4 |    BUFFER SORT          |         |    32M|       |   689G (84)|999:59:59 |
|   5 |     INDEX FAST FULL SCAN| SKEW_PK |    32M|       |  4558  (22)| 00:00:37 |
-----------------------------------------------------------------------------------


17 rows selected.

SYS@LAB1024> @dplan
Enter value for sql_id: 12p7fuydx3dd5
Enter value for child_no: 

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  12p7fuydx3dd5, child number 0
-------------------------------------
select a.col2, sum(a.col1) from kso.skew a, kso.skew b, kso.skew c group by
a.col2

Plan hash value: 175710540

------------------------------------------------------------------------------------
| Id  | Operation                | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |         |       |       |    18E(100)|          |
|   1 |  HASH GROUP BY           |         |     1 |    16 |    18E (81)|999:59:59 |
|   2 |   MERGE JOIN CARTESIAN   |         |    18E|    15E|  4670P (22)|999:59:59 |
|   3 |    MERGE JOIN CARTESIAN  |         |  1024T|    14P|   145G (22)|999:59:59 |
|   4 |     TABLE ACCESS FULL    | SKEW    |    32M|   488M| 10032  (18)| 00:01:21 |
|   5 |     BUFFER SORT          |         |    32M|       |   145G (22)|999:59:59 |
|   6 |      INDEX FAST FULL SCAN| SKEW_PK |    32M|       |  4558  (22)| 00:00:37 |
|   7 |    BUFFER SORT           |         |    32M|       |    18E (81)|999:59:59 |
|   8 |     INDEX FAST FULL SCAN | SKEW_PK |    32M|       |  4558  (22)| 00:00:37 |
------------------------------------------------------------------------------------


21 rows selected.

SYS@LAB1024> @dplan
Enter value for sql_id: 7b53dxh6w6mpj
Enter value for child_no: 

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  7b53dxh6w6mpj, child number 0
-------------------------------------
select a.col2, sum(a.col1) from kso.skew a, kso.skew b, kso.skew c, kso.skew
d group by a.col2

Plan hash value: 3965951819

-------------------------------------------------------------------------------------
| Id  | Operation                 | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |         |       |       |    18E(100)|          |
|   1 |  HASH GROUP BY            |         |     1 |    16 |    18E  (0)|999:59:59 |
|   2 |   MERGE JOIN CARTESIAN    |         |    18E|    15E|    18E  (0)|999:59:59 |
|   3 |    MERGE JOIN CARTESIAN   |         |    18E|    15E|  4670P (22)|999:59:59 |
|   4 |     MERGE JOIN CARTESIAN  |         |  1024T|    14P|   145G (22)|999:59:59 |
|   5 |      TABLE ACCESS FULL    | SKEW    |    32M|   488M| 10032  (18)| 00:01:21 |
|   6 |      BUFFER SORT          |         |    32M|       |   145G (22)|999:59:59 |
|   7 |       INDEX FAST FULL SCAN| SKEW_PK |    32M|       |  4558  (22)| 00:00:37 |
|   8 |     BUFFER SORT           |         |    32M|       |  4670P (22)|999:59:59 |
|   9 |      INDEX FAST FULL SCAN | SKEW_PK |    32M|       |  4558  (22)| 00:00:37 |
|  10 |    BUFFER SORT            |         |    32M|       |    18E  (0)|999:59:59 |
|  11 |     INDEX FAST FULL SCAN  | SKEW_PK |    32M|       |  4558  (22)| 00:00:37 |
-------------------------------------------------------------------------------------


24 rows selected.


So it looks like the cost tops out at 18E as does the estimated number of rows. Oddly the number of bytes appears to top out at 15E. So the production query had maxed out the rows and bytes estimate although the cost was significantly under the max. Still 65P is the biggest cost I’ve seen in a production system. Anyone seen a bigger one?

P.S. I have two categories for SQL related posts. “Developer Tricks” and “Wall of Shame”. This one gets both tags.

Funny Developer Tricks – (substr(cust_id,1,length(:b1))

Ha. This one was a little surprising. I ran across a SQL statement that gets fired off hundreds at a time in rapid succession. Each execution taking several seconds – too long for thousands of executions in a row. The statement looked like this (cleaned up to protect the guilty):

b1 := '10355P034001SGL00066';
b2 := '10355P034001SGL00066';

select count(cust_id) 
from customers
where substr(cust_id,1,length(:b1)) = :b2;

What was the developer trying to do? How can we fix it? Your comments are welcomed.

Continue reading ‘Funny Developer Tricks – (substr(cust_id,1,length(:b1))’ »

Funny Developer Tricks – first_rows(999999999)

I ran across a funny SQL statement recently (funny strange, not funny ha ha – well actually funny ha ha too I guess). It had a first_rows hint like so:



select /*+ FIRST_ROWS (999999999)  */ 
"MANDT" , "OPBEL" , "OPUPW" , "OPUPK" , "OPUPZ" , "BUKRS" , "GSBER" , "BUPLA" , "SEGMENT" , 
"AUGST" , "GPART" , "VTREF" , "VTRE2" , "VKONT" , "ABWBL" , "ABWTP" , "ABWKT" , "APPLK" , 
"HVORG" , "TVORG" , "KOFIZ" , "SPART" , "HKONT", "MWSKZ" , "MWSZKZ" , "XANZA" , "STAKZ" , 
"BLDAT" , "BUDAT" , "OPTXT" , "WAERS" , "FAEDN", "FAEDS" , "VERKZ" , "STUDT" , "SKTPZ" , 
"XMANL" , "KURSF" , "BETRH" , "BETRW" , "BETR2" , "BETR3" , "SKFBT" , "SBETH" , "SBETW" , 
"SBET2" , "SBET3" , "MWSKO" , "MWVKO" , "TXRUL" , "SPZAH" , "PYMET" , "PYBUK" , "PERNR" , 
"GRKEY" , "PERSL" , "XAESP" , "AUGDT" , "AUGBL" , "AUGBD" , "AUGRD" , "AUGWA" , "AUGBT" , 
"AUGBS" , "AUGSK" , "AUGVD" , "AUGOB" , "WHANG" , "WHGRP" , "XEIPH" , "MAHNV" , "MANSP" , 
"XAUGP" , "ABRZU" , "ABRZO" , "FDGRP" , "FDLEV" , "FDZTG", "FDWBT" , "XTAUS" , "AUGRS" , 
"PYGRP" , "PDTYP" , "SPERZ" , "INFOZ" , "TXJCD" , "TXDAT" ,"VBUND" , "KONTT" , "KONTL" , 
"OPSTA" , "BLART" , "EMGPA" , "EMBVT" , "EMADR" , "IKEY" , "EUROU" , "XRAGL" , "ASTKZ" , 
"ASBLG" , "XBLNR" , "INKPS" , "RNDPS" , "QSSKZ" , "QSSEW" , "QSPTP" , "QSSHB" , "QBSHB" , 
"QSZNR" , "RFUPK" , "STRKZ" , "FITPR" , "XPYOR" , "LANDL" , "INTBU", "EMCRD" , "C4EYE" , 
"C4EYP" , "SCTAX" , "STTAX" , "STZAL" , "ORUPZ" , "NEGBU" , "SUBAP" , "PSWSL" , "PSWBT" , 
"PSWTX" , "PSGRP" , "FINRE" , "RDSTA" , "RDSTB" , "DEAKTIV" , "SGRKEY", "SOLLDAT" , "RECPT" , 
"TOCOLLECT" , "EINMALANF" , "VORAUSZAHL" , "APERIODIC" , "ABRABS" , "GRBBP" , "ASMETH" , 
"INT_CROSSREFNO" , "ETHPPM" , "PAYFREQID" , "INVOICING_PARTY" , "PPMST" , "LOGNO" , "APERIODICT" , 
"ADD_REFOBJ" , "ADD_REFOBJID" , "ADD_SERVICE" , "ZZAGENCY" , "ZZ_EXT_REF" , "ZZ_PAY_AGENT" , 
"ZZFUNDSOURCE" , "ZZINSTALLMENT" , "Z_PROD_ID" , "ZZUSERNAME" , "ZZWF_STAT" , "ZZPAYCHANNEL" 
FROM "DFKKOP" 
WHERE "MANDT" = :A0 -- NDV=1
AND "BUKRS" = :A1 -- NDV=1 
AND "AUGST" = :A2 -- NDV=2 
AND "FAEDN" < :A3 -- less than today probably all records
AND ( "PYMET" = :A4 OR "PYMET" = :A5 ) -- NDV=8
AND ROWNUM <= :A6; -- less than 1B


Yes - that's a first rows hint with about a billion as the number of rows to optimizer for.

The reason I noticed it is that it runs for 15 hours before getting a Snapshot Too Old error. The attempted solution was to restart it the next day (thinking maybe it will run better the second time I guess). The table has roughly 100M rows. There was no index on PYMET which is unfortunate as the two values requested account for only about 0.15% (not 15%, 0.15%). The optimizer chooses an index on MANDT, BURKRS, AUGST, FAEDN and as you might expect, it doesn't work very well (see the NDV comments I added to the statement).

Funny things:

The First_Rows hint is requesting the Oracle optimizer to return the first billion records as fast as possible (even though there are only 100M rows).

The documentation for the First_Rows hint in 11g looks like this:

The FIRST_ROWS hint instructs Oracle to optimize an individual SQL statement for fast response, choosing the plan that returns the first n rows most efficiently. For integer, specify the number of rows to return.

For example, the optimizer uses the query optimization approach to optimize the following statement for best response time:

SELECT /*+ FIRST_ROWS(10) */ employee_id, last_name, salary, job_id
FROM employees
WHERE department_id = 20;

In this example each department contains many employees. The user wants the first 10 employees of department 20 to be displayed as quickly as possible.

So I can see where the developers might have interpreted this as the ever elusive "Go Fast" hint.

The developers also added "and rownum < 999999999" to the where clause which limits the amount of rows that can be returned. I'm not sure whether they knew it or not, but this clause also has the same affect as the hint. That is to say that the clause causes the optimizer to modify it's calculations as if the first_rows_N hint had been applied. Maybe the developers weren't getting the "go fast" behavior they expected from the hint and after doing some research found that the "rownum <" syntax would basically do the same thing. I'm guessing that's the case because I can't see why they would really want to limit the number of rows coming back, but I'm not sure. It's a very odd statement because the First_Rows hint tends to push the optimizer towards index usage, and this statement was behaving badly precisely because it was using an index (a full table scan only took about 1 hour). Regardless of what the developers were trying to do, the fact that they used such a big number caused the optimizer to ignore the hint anyway. Since the table only had 100M rows and the parameter was 1B, the hint was ignored (well at least the "First K Rows" modifications to the optimizer calculations were not used). This happens to the "rownum <" induced behavior as well by the way. Here's a bit of a couple of 10053 trace file showing some details: Continue reading ‘Funny Developer Tricks – first_rows(999999999)’ »

Funny Developer Tricks – upper(number)

I saw a funny one today. I’ll paraphrase:

select * from table_x
where upper(acct_number) = '876876'
or upper(acct_number) = '826531';

Nice huh?

1. Obviously turns off any indexes on acct_number (unless they had a function based index).
2. Looks like they are probably storing numeric data in a character data type (or implicitly converting a number to character string).
3. Not using bind variables so they are not helping themselves from a parsing standpoint.
4. Finally, they’re making darn sure they take care of any mixed case numbers!

Some fun.

(by the way, those lower case numbers cause me problems all the time)

Check to Make Sure My New Record Doesn’t Already Exist

Here’s another interesting piece of SQL that I ran into last week:

select con_id from xyz_blah where con_id=’BS-002342′;

Actually there was a whole set of them with different literals. My first thought was, “Why would the developers want to select the con_id when they already had the con_id?”, quickly followed by “Ohhhhhhhh, it’s one of those deals where the developers didn’t trust the database or don’t know how to check for an error after executing a SQL statement”.

Presumably they know that ‘BS-002342’ is a valid con_id (it looks pretty specific to me). So the app’s probably doing one of the following three bad things (listed in what I think is the most likely order).

  1. Checking to make sure a record exists with that con_id before doing something (UPDATE or DELETE).
  2. Checking to make sure that a record doesn’t already exist with that con_id, before doing an INSERT.
  3. Checking to make sure a DELETE actually worked.

When I looked at the stats in v$sql, the queries never return any rows. So it’s probably not #1. I didn’t find any DELETEs on the table but I did find INSERT statements with matching con_id’s, so it looks like it’s the check before insert scenario (#2). There is a Primary Key on the con_id field, so the check is of course, totally unnecessary. They should have just done the insert and handled the duplicate key error if one ever happened. By the way, in the 30 days of AWR data we had available, none of these statements ever returned a row. So most likely, they never have this issue in the first place. Of course, they might also consider using a sequence to generate the key instead of having the app manufacture a 9 character key!

Anyway, this is one of a whole set of coding issues where unnecessary work is done as a standard coding practice. Like Cary Millsap always says, “the fastest way to do anything is not to do it at all”.

How Many Bind Variables Is Too Many?

I saw this statement last week:

SELECT co_id, co_name, extra_object_id FROM XYZ_BLAH
WHERE co_id in (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:
11,:12,:13,:14,:15,:16,:17,:18,:19,:20,:21,:22,:23,:24,:25,:26,:
27,:28,:29,:30,:31,:32,:33,:34,:35,:36,:37,:38,:39,:40,:41,:42,:
43,:44,:45,:46,:47,:48,:49,:50,:51,:52,:53,:54,:55,:56,:57,:58,:
59,:60,:61,:62,:63,:64,:65,:66,:67,:68,:69,:70,:71,:72,:73,:74,:
75,:76,:77,:78,:79,:80,:81,:82,:83,:84,:85,:86,:87,:88,:89,:90,:
91,:92,:93,:94,:95,:96,:97,:98,:99,:100,:101,:102,:103,:104,:105
,:106,:107,:108,:109,:110,:111,:112,:113,:114,:115,:116,:117,:11
8,:119,:120,:121,:122,:123,:124,:125,:126,:127,:128,:129,:130,:1
31,:132,:133,:134,:135,:136,:137,:138,:139,:140,:141,:142,:143,:
144,:145,:146,:147,:148,:149,:150,:151,:152,:153,:154,:155,:156,
:157,:158,:159,:160,:161,:162,:163,:164,:165,:166,:167,:168,:169
,:170,:171,:172,:173,:174,:175,:176,:177,:178,:179,:180,:181,:18
2,:183,:184,:185,:186,:187,:188,:189,:190,:191,:192,:193,:194,:1
95,:196,:197,:198,:199,:200,:201,:202,:203,:204,:205,:206,:207,:
208,:209,:210,:211,:212,:213,:214,:215,:216,:217,:218,:219,:220,
:221,:222,:223,:224,:225,:226,:227,:228,:229,:230,:231,:232,:233
,:234,:235,:236,:237,:238,:239,:240,:241,:242,:243,:244,:245,:24
6,:247,:248,:249,:250,:251,:252,:253,:254,:255,:256,:257,:258,:2
59,:260,:261,:262,:263,:264,:265,:266,:267,:268,:269,:270,:271,:
272,:273,:274,:275,:276,:277,:278,:279,:280,:281,:282,:283,:284,
:285,:286,:287,:288,:289,:290,:291,:292,:293,:294,:295,:296,:297
,:298,:299,:300,:301,:302,:303,:304,:305,:306,:307,:308,:309,:31
0,:311,:312,:313,:314,:315,:316,:317,:318,:319,:320,:321,:322,:3
23,:324,:325,:326,:327,:328,:329,:330,:331,:332,:333,:334,:335,:
336,:337,:338,:339,:340,:341,:342,:343,:344,:345,:346,:347,:348,
:349,:350,:351,:352,:353,:354,:355,:356,:357,:358,:359,:360,:361
,:362,:363,:364,:365,:366,:367,:368,:369,:370,:371,:372,:373,:37
4,:375,:376,:377,:378,:379,:380,:381,:382,:383,:384,:385,:386,:3
87,:388,:389,:390,:391,:392,:393,:394,:395,:396,:397,:398,:399,:
400,:401,:402,:403,:404,:405,:406,:407,:408,:409,:410,:411,:412,
:413,:414,:415,:416,:417,:418,:419,:420,:421,:422,:423,:424,:425
,:426,:427,:428,:429,:430,:431,:432,:433,:434,:435,:436,:437,:43
8,:439,:440,:441,:442,:443,:444,:445,:446,:447,:448,:449,:450,:4
51,:452,:453,:454,:455,:456,:457,:458,:459,:460,:461,:462,:463,:
464,:465,:466,:467,:468,:469,:470,:471,:472,:473,:474,:475,:476,
:477,:478,:479,:480,:481,:482,:483,:484,:485,:486,:487,:488,:489
,:490,:491,:492,:493,:494,:495,:496,:497,:498,:499,:500,:501,:50
2,:503,:504,:505,:506,:507,:508,:509,:510,:511,:512,:513,:514,:5
15,:516,:517,:518,:519,:520,:521,:522,:523,:524,:525,:526,:527,:
528,:529,:530,:531,:532,:533,:534,:535,:536,:537,:538,:539,:540,
:541,:542,:543,:544,:545,:546,:547,:548,:549,:550,:551,:552,:553
,:554,:555,:556,:557,:558,:559,:560,:561,:562,:563,:564,:565,:56
6,:567,:568,:569,:570,:571,:572,:573,:574,:575,:576,:577,:578,:5
79,:580,:581,:582,:583,:584,:585,:586,:587,:588,:589,:590,:591,:
592,:593,:594,:595,:596,:597,:598,:599,:600,:601,:602,:603,:604,
:605,:606,:607,:608,:609,:610,:611,:612,:613,:614,:615,:616,:617
,:618,:619,:620,:621,:622,:623,:624,:625,:626,:627,:628,:629,:63
0,:631,:632,:633,:634,:635,:636,:637) ORDER BY upper(co_name)
/

That’s a lot of bind variables. I wondered how big the table was.

SQL> select count(*) from xyz_blah;

  COUNT(*)
----------
       644

Ha! That’s funny. Basically get all the rows by specifying each primary key (except for 7) in a bind variable. Seems like the following would have been simpler:

SELECT co_id, co_name, extra_object_id FROM XYZ_BLAH
WHERE co_id not in (:1,:2,:3,:4,:5,:6,:7) ORDER BY upper(co_name)
/

It made me wonder how Oracle handled that many bind variables. There does appear to be some internal limit on how many the optimizer will keep up with (and that number appears to be 585). XPLAN also reports all above 585 as “Not Captured”. Actually the number keep track of appears to depend on the size. For example, if varchar2(10) variables are used as opposed to varchar2(30) variables, the number of values that Oracle keeps up with goes up significantly.

Continue reading ‘How Many Bind Variables Is Too Many?’ »