Oracle 11g Adaptive Cursor Sharing (ACS)

Ah, the long awaited fix for the Bind Variable Peeking problem.

Adaptive Cursor Sharing (ACS) sounds great if you say it fast. Unfortunately it doesn’t work quite as well as I had hoped (at least as of 11.1.0.7). I was really hoping that we’d never have to worry about plan instability due to bind variable peeking again, but it seems that will remain an elusive goal for a while longer.

So the goal of this post is to provide some data about how ACS works based on observations of a couple of production systems. By the way, this post assumes you already have a basic understanding of how ACS works. There are some links at the bottom of this post to other references and in particular, this post is a good basic description of ACS.

So here’s some background info:
Observations were made on a version 10.2.0.4 RAC database that was being migrated to an 11.1.0.7 RAC database. Both are on Redhat Linux and similar Dell hardware. The version 10 database had several queries which suffered from bind variable peeking issues which were addressed with Outlines (while the developers considered using literals). Also, the statements were using real bind variables, not the fake kind that result from setting the cursor_sharing parameter. We were hoping that ACS would solve the bind variable peeking problem without manual intervention (i.e. without using Outlines or SQL Profiles or Baselines to lock in a specific plan).

I have used several scripts in this post:

unstable_plans.sql – this one shows statements that have multiple plans with large variance in execution time
awr_plan_change.sql – history of how the statement plan changes over time
awr_plan_stats.sql – aggregate statement stats (like elapsed_time, lio, etc) grouped by plan
find_sql_acs.sql – A queries v$sql and shows ACS related columns
mismatch3.sql – A variation of Dion Cho’s script to display data from v$sql_shared_cursor (see his post here)

The first couple of these scripts I discussed in a previous post on Plan Instability. I also did a previous post that is a basic overview of the Bind Variable Peeking issue, if you want a little more background info.

So here’s a look at what’s going on with that newly migrated system:

SQL*Plus: Release 11.1.0.7.0 - Production on Mon May 4 19:53:00 2009

Copyright (c) 1982, 2008, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> -- find statements with multiple plans with big differences in elapsed time
SQL>
SQL> @unstable_plans
Enter value for min_stddev:
Enter value for min_etime: 1

SQL_ID        SUM(EXECS)   MIN_ETIME   MAX_ETIME   NORM_STDDEV
------------- ---------- ----------- ----------- -------------
f2pz10qx59awc          2       11.75       45.31        2.0197
5mh2127hkzbtr         70        2.64       18.06        3.0272
0hur96bxr18jn         24        1.65        9.14        3.1981
76gduk3urk91d          6        9.75       57.82        3.4880
cqxkwk9hp8rpf         31        7.18       43.77        3.6015
3u2bxt4y0740a         17        0.49        4.19        4.1316
af6j2dyzawp7w         78        6.83       60.31        4.4492
2mzzy3u2rtgqx         93        4.55       34.13        4.6025
6vb3gxdfzbhuz         33        0.63        5.21        5.1502
gc69y3vv5ws31         72        0.87        7.64        5.5128
b6zcb86sub9gv          2        0.95        8.74        5.8349
6pdbja617d2g7          2        2.24       23.55        6.7132
cc7yv55yn2wzm          5        0.26        3.05        7.4533
ach69kkyac77x          5        0.23        2.65        7.5202
09xqs3yhmnnc2         82        0.31        4.47        7.5534
3zqwzanpnddt1          2        0.69        8.75        8.2026
70zr1cg7ssfkw          2        0.71        9.04        8.2487
664tcunn5nks9          3        0.36        4.83        8.7858
39f6bx40awrms          2        0.74       10.15        9.0125
5z7v0n6amf8yx        116        9.06      125.53        9.0942
3q67n6qh7tcqv          2        0.08        1.20        9.2843
30yrz0qst88mj         68        0.21        2.97        9.3903
a1mk6hw6s20an         19        0.26        3.78        9.6454
26070mzfxps8d         72        0.18        2.75       10.0027
byznf81kmwumv          2        0.75       11.45       10.0407
1fj1qcqtmgmd1          2        0.54        9.33       11.4092
fg96c4y95u3s8          2        0.49        9.18       12.6745
chxn6vwj02xj4          2        0.65       12.27       12.7001
2xg8psmr3k7vz          2        0.76       14.38       12.7429
grysxv4s2hhkb          2        0.63       12.10       12.8338
2bfs79h84xsch        148        5.65      108.87       12.9179
4qz01hjwat4u3         39        0.27        5.50       13.6528
3wj5ngbv1sa9x          2        0.40        8.70       14.7151
g68szg2ydq6dg         14        0.26        7.48       19.4587
9vt6aaf5xcmh2        381        0.67       24.90       19.9706
cs8ng5sv7jgj4          4        1.12       35.70       21.8969
12a2xbmwn5v6z          7        0.23        8.08       23.7235
d2x7g9wr30v33          4       12.45      638.60       25.4008
4ymph98r42nbj        192        0.11        4.43       26.9188
1uby0zcw55cnp          4        0.63       31.22       34.3088
4jhx8qrkzztqg         15        1.22       66.07       37.6215
7xu6y0cvs55gh         34        0.44       30.08       37.9640
6hstnb2fghg76         77        0.65       49.01       39.2329
gzwgbycgs0fx9        265        0.08        7.37       47.4965
f028rmkp3qjvq         47        0.03        1.99       48.5307
7s1nc9wa2pc4r          2        0.19       15.53       57.6592
f2tm6xrb26y51          2        0.17       14.80       61.7260
0xz1j5y313f3c        206        0.02        2.38       71.6444
3dhwvfmkjzwtv         72        1.24      154.56       87.6684
fb90nawgwx3mj         37        0.34       51.96      108.3873
b7d0d3gu5fvqp         22        0.13       33.85      137.5321
ghdy23pftj44q       1100        0.02        5.93      143.7765
7v2jtb3u02qx6         10        0.43      139.36      172.2951
fw9ntwzhygmcq         87        0.32      100.44      220.3149
gxv6umvct0xsv       4188        0.84      333.79      225.1338
6jdu11g4zzjkh         42        0.41      137.88      234.5440
2wrngntk7v0st        183        0.11       50.61      333.9597
27wha8b8s21xw       1112        0.01        7.08      408.3026
dhmvcrmgdk3sn        881        0.05       46.99      620.1920
6u4c54k36ngwf       4329        0.00        4.39      656.4506
8jt9yh7jf8tn8       4788        0.09      160.35    1,251.5684
4fy7uznh9zz27       4723        0.03       47.16    1,256.8971
2ddh1h012t8au         12        0.03       67.62    1,491.8099
fw9hrrv37hb3v          5        0.04      100.86    1,503.5361
68wg4gjb51dsh       4411        0.01       16.44    1,539.4726
7d407h5cqmv81       4202        0.00      180.63   28,768.6107

66 rows selected.

Please note that the unstable_plan.sql script is not guaranteed to find every statement that suffers from plan instability, because some of the good plans never make it into the AWR tables. This is due to the fact that AWR only grabs the “worst” statements in the various categories that AWR reports (i.e. elapsed time, cpu time, gets, reads, and executions). But you could argue that it will find the most important ones, because even when they run fast they still end up in the top of at least one category. Anyway, it’s obvious that there are many statements that are running with multiple plans with wide variances in their execution time.

So let’s look a little closer:

SQL> -- Let's check out the worst one on the list
SQL> --   first let's see how many plans have been used and how they stack up
SQL>
SQL> @awr_plan_stats
Enter value for sql_id: 7d407h5cqmv81
Enter value for snap_id:

SQL_ID        PLAN_HASH_VALUE        EXECS          ETIME    AVG_ETIME        AVG_LIO
------------- --------------- ------------ -------------- ------------ --------------
7d407h5cqmv81      4289789142          840            2.4        0.003          273.7
7d407h5cqmv81      1007536393          551          116.2        0.211       14,563.1
7d407h5cqmv81      1723168170        2,852       44,716.2       15.679      255,603.3
7d407h5cqmv81      2337118685            1          180.6      180.629    6,495,990.0

SQL> -- So the best plan was executed 840 times and took .003 seconds on average 
SQL> -- The worst one was only executed once but took 180.6 seconds and did about 6.5M lios
SQL>
SQL>  -- Now let's see if it looks like bind variable peeking - (i.e. the plan is flip-flopping)
SQL>
SQL> @awr_plan_change
Enter value for sql_id: 7d407h5cqmv81

   SNAP_ID   NODE BEGIN_INTERVAL_TIME            SQL_ID        PLAN_HASH_VALUE        EXECS    AVG_ETIME        AVG_LIO
---------- ------ ------------------------------ ------------- --------------- ------------ ------------ --------------
      1732      2 22-APR-09 01.00.06.582 PM      7d407h5cqmv81      1007536393            1        1.478        5,269.0
      1733      2 22-APR-09 02.00.06.425 PM      7d407h5cqmv81                            8        1.635       14,580.0
      1734      2 22-APR-09 03.00.01.366 PM      7d407h5cqmv81                            6        0.196       14,580.0
      1734      3 22-APR-09 03.00.00.710 PM      7d407h5cqmv81      1723168170            2        8.891      228,740.5
      1735      2 22-APR-09 04.00.19.942 PM      7d407h5cqmv81      1007536393            3        0.189       14,580.0
      1736      2 22-APR-09 05.00.06.982 PM      7d407h5cqmv81                           63        0.188       14,580.0
      1736      3 22-APR-09 05.00.06.350 PM      7d407h5cqmv81      1723168170           22        9.564      236,902.3
      1737      2 22-APR-09 06.00.03.582 PM      7d407h5cqmv81      1007536393          244        0.187       14,580.0
      1737      3 22-APR-09 06.00.04.267 PM      7d407h5cqmv81      1723168170           76        9.505      236,959.9
      1738      2 22-APR-09 07.00.28.278 PM      7d407h5cqmv81      1007536393          226        0.188       14,580.0
      1738      3 22-APR-09 07.00.28.926 PM      7d407h5cqmv81      1723168170           71        9.395      240,745.0
      1739      2 22-APR-09 08.00.09.681 PM      7d407h5cqmv81                           81       11.114      253,481.2
      1739      3 22-APR-09 08.00.10.422 PM      7d407h5cqmv81                           22        9.866      249,882.3
      1740      2 22-APR-09 09.00.43.176 PM      7d407h5cqmv81                          215       13.682      256,467.0
      1740      3 22-APR-09 09.00.43.910 PM      7d407h5cqmv81                           80       12.605      252,606.0
      1741      2 22-APR-09 10.00.16.298 PM      7d407h5cqmv81                          367       20.209      256,157.1
      1741      3 22-APR-09 10.00.17.072 PM      7d407h5cqmv81                          214       15.039      257,287.3
      1742      2 22-APR-09 11.00.28.426 PM      7d407h5cqmv81                          376       17.855      257,066.5
      1742      3 22-APR-09 11.00.27.666 PM      7d407h5cqmv81                          178       13.437      255,902.4
      1743      2 23-APR-09 12.00.16.137 AM      7d407h5cqmv81                          352       18.639      257,326.6
      1743      3 23-APR-09 12.00.15.386 AM      7d407h5cqmv81                          182       13.787      258,192.2
      1744      2 23-APR-09 01.00.03.520 AM      7d407h5cqmv81                          285       16.275      256,335.3
      1744      3 23-APR-09 01.00.02.838 AM      7d407h5cqmv81                           94       13.492      258,508.9
      1745      2 23-APR-09 02.00.21.107 AM      7d407h5cqmv81                          132       14.876      258,913.5
      1745      3 23-APR-09 02.00.20.482 AM      7d407h5cqmv81                          103       13.113      256,821.0
      1750      2 23-APR-09 07.00.33.920 AM      7d407h5cqmv81      4289789142           77        0.002          267.0
      1753      2 23-APR-09 09.34.00.440 AM      7d407h5cqmv81                          278        0.001          267.0
      1754      2 23-APR-09 11.00.06.058 AM      7d407h5cqmv81                           96        0.001          267.0
      1754      3 23-APR-09 11.00.05.422 AM      7d407h5cqmv81                           42        0.000          267.0
      1755      2 23-APR-09 11.17.45.649 AM      7d407h5cqmv81                          283        0.001          267.0
      1756      2 23-APR-09 12.00.35.977 PM      7d407h5cqmv81                           62        0.000          267.0
      1894      3 29-APR-09 06.00.09.823 AM      7d407h5cqmv81                            1        0.444        2,777.0
      1900      2 29-APR-09 12.00.36.427 PM      7d407h5cqmv81      2337118685            1      180.629    6,495,990.0
      1943      3 01-MAY-09 07.00.02.507 AM      7d407h5cqmv81      4289789142            1        1.199        3,426.0

34 rows selected.

SQL> -- It does look like a typical bind variable peeking flip-flop pattern
SQL>
SQL> -- Now let's see if it's still in the shared pool and if it's bind aware
SQL>
SQL> @find_sql_acs
Enter value for sql_text:
Enter value for sql_id: 7d407h5cqmv81
Enter value for is_bind_aware:

SQL_ID         CHILD PLAN_HASH_VALUE IBS IBA ISH      EXECS ROWS_PROCESSED  AVG_ETIME    AVG_CPU    AVG_PIO    AVG_LIO SQL_TEXT
------------- ------ --------------- --- --- --- ---------- -------------- ---------- ---------- ---------- ---------- ------------------------------
7d407h5cqmv81      0      4289789142 N   N   Y            2              2       0.62       0.07     315.50      1,758 SELECT COUNT(*) as total_recor
7d407h5cqmv81      1      4289789142 N   N   Y            2              2       0.03       0.01       1.00        264

SQL> -- It is in the shared pool, but it is not bind aware
SQL> 

So the previous example showed a statement that had several plans, one of which definitely had better performance characteristics. But since there was only one plan in the shared pool at the time we examined it, there is is no reason to expect that Oracle would have recognized that the statement should be bind aware. This brings up an interesting point. It does not appear that (at least as of 11.1.0.7) Oracle keeps track of bind sensitivity of a statement persistently. That is to say, if the statement gets flushed for any reason, the optimizer appears to completely forget about any analysis it has done up to that point. Thus the painful (for the user) process of discovering which sets of bind variables should go with which plan starts all over.

So here’s another statement:

SQL> -- another example from the list of unstable_plans
SQL>
SQL> @awr_plan_stats
Enter value for sql_id: 8jt9yh7jf8tn8

SQL_ID        PLAN_HASH_VALUE        EXECS          ETIME    AVG_ETIME        AVG_LIO
------------- --------------- ------------ -------------- ------------ --------------
8jt9yh7jf8tn8      1093407144        4,818          481.5        0.100        2,818.6
8jt9yh7jf8tn8      4076066623           21        3,269.3      155.679    5,901,988.0

SQL> @awr_plan_change
Enter value for sql_id: 8jt9yh7jf8tn8

   SNAP_ID   NODE BEGIN_INTERVAL_TIME            SQL_ID        PLAN_HASH_VALUE        EXECS    AVG_ETIME        AVG_LIO
---------- ------ ------------------------------ ------------- --------------- ------------ ------------ --------------
      1785      3 24-APR-09 05.00.13.361 PM      8jt9yh7jf8tn8      1093407144            6        1.102        2,872.7
      1786      2 24-APR-09 06.00.02.510 PM      8jt9yh7jf8tn8                          158        0.024        2,873.0
      1786      3 24-APR-09 06.00.03.170 PM      8jt9yh7jf8tn8                          223        0.023        2,873.0
      1787      2 24-APR-09 07.00.30.171 PM      8jt9yh7jf8tn8                          749        0.020        2,873.0
      1787      3 24-APR-09 07.00.30.935 PM      8jt9yh7jf8tn8                          873        0.019        2,873.0
      1788      2 24-APR-09 08.00.03.359 PM      8jt9yh7jf8tn8                          726        0.020        2,873.9
      1788      3 24-APR-09 08.00.04.148 PM      8jt9yh7jf8tn8                          871        0.020        2,873.9
      1789      2 24-APR-09 09.00.28.203 PM      8jt9yh7jf8tn8                          373        0.016        2,874.0
      1789      3 24-APR-09 09.00.27.481 PM      8jt9yh7jf8tn8                          566        0.016        2,874.0
      1892      2 29-APR-09 04.00.02.385 AM      8jt9yh7jf8tn8                            1        2.613        3,811.0
      1894      2 29-APR-09 06.00.09.154 AM      8jt9yh7jf8tn8                            2        0.462          733.0
      1894      3 29-APR-09 06.00.09.823 AM      8jt9yh7jf8tn8                            2        1.056          847.0
      1895      2 29-APR-09 07.00.00.226 AM      8jt9yh7jf8tn8                            7        1.714        2,869.6
      1895      3 29-APR-09 07.00.00.921 AM      8jt9yh7jf8tn8                            2        0.856        1,208.0
      1896      2 29-APR-09 08.00.20.446 AM      8jt9yh7jf8tn8                            9        1.635        2,103.0
      1897      2 29-APR-09 09.00.09.367 AM      8jt9yh7jf8tn8                            2        8.179        8,529.0
      1897      3 29-APR-09 09.00.10.115 AM      8jt9yh7jf8tn8                           17        1.714        3,416.5
      1898      2 29-APR-09 10.00.43.551 AM      8jt9yh7jf8tn8                            5        2.553        2,733.2
      1898      3 29-APR-09 10.00.42.788 AM      8jt9yh7jf8tn8                            6        3.751        4,484.2
      1899      3 29-APR-09 11.00.10.447 AM      8jt9yh7jf8tn8                            7        1.742        3,284.4
      1900      3 29-APR-09 12.00.35.788 PM      8jt9yh7jf8tn8                            2        1.199          966.0
      1901      2 29-APR-09 01.00.18.515 PM      8jt9yh7jf8tn8                            8        2.345        2,409.6
      1902      3 29-APR-09 02.00.15.910 PM      8jt9yh7jf8tn8                            2        3.941        2,649.5
      1905      2 29-APR-09 05.00.02.254 PM      8jt9yh7jf8tn8                            1        0.887        1,230.0
      1918      2 30-APR-09 06.00.09.089 AM      8jt9yh7jf8tn8                            1        0.653        1,248.0
      1918      3 30-APR-09 06.00.08.403 AM      8jt9yh7jf8tn8                            2        0.421          485.5
      1919      2 30-APR-09 07.00.28.148 AM      8jt9yh7jf8tn8                            1        1.152        1,242.0
      1920      2 30-APR-09 08.00.03.733 AM      8jt9yh7jf8tn8                            4        3.273        3,200.3
      1920      3 30-APR-09 08.00.04.389 AM      8jt9yh7jf8tn8                           12        2.491        3,314.2
      1921      2 30-APR-09 09.00.10.125 AM      8jt9yh7jf8tn8                            5        3.947        3,333.4
      1921      3 30-APR-09 09.00.10.838 AM      8jt9yh7jf8tn8                            2        2.416        1,769.5
      1922      3 30-APR-09 10.00.25.754 AM      8jt9yh7jf8tn8      4076066623            2       54.237    2,291,432.5
      1923      2 30-APR-09 11.00.17.345 AM      8jt9yh7jf8tn8      1093407144            2        0.812          975.0
      1923      3 30-APR-09 11.00.18.032 AM      8jt9yh7jf8tn8      4076066623            3      134.031      933,124.3
      1924      3 30-APR-09 12.00.15.448 PM      8jt9yh7jf8tn8                            3      227.009    6,987,169.3
      1926      2 30-APR-09 02.00.11.921 PM      8jt9yh7jf8tn8      1093407144            8        0.818        1,574.5
      1926      3 30-APR-09 02.00.11.174 PM      8jt9yh7jf8tn8      4076066623            2      175.709    8,963,417.0
      1927      2 30-APR-09 03.00.24.923 PM      8jt9yh7jf8tn8      1093407144            4        1.344        1,068.8
      1927      3 30-APR-09 03.00.24.306 PM      8jt9yh7jf8tn8      4076066623            5      156.378   10,159,992.0
      1928      2 30-APR-09 04.00.30.061 PM      8jt9yh7jf8tn8      1093407144            6        0.923        1,225.8
      1928      3 30-APR-09 04.00.29.416 PM      8jt9yh7jf8tn8      4076066623            1      180.488    2,150,190.0
      1930      3 30-APR-09 06.00.37.119 PM      8jt9yh7jf8tn8                            2      180.371    8,255,881.5
      1934      3 30-APR-09 10.00.12.055 PM      8jt9yh7jf8tn8                            1      180.491    3,102,577.0
      1939      2 01-MAY-09 03.00.31.764 AM      8jt9yh7jf8tn8      1093407144           21        0.825        1,041.8
      1939      3 01-MAY-09 03.00.32.453 AM      8jt9yh7jf8tn8                            4        0.575        1,211.8
      1944      3 01-MAY-09 08.00.15.686 AM      8jt9yh7jf8tn8                            6        1.328        1,788.3
      1946      2 01-MAY-09 10.00.17.105 AM      8jt9yh7jf8tn8                            1        1.170        2,411.0
      1946      3 01-MAY-09 10.00.16.458 AM      8jt9yh7jf8tn8                            4        2.041        2,414.3
      1947      3 01-MAY-09 11.00.14.385 AM      8jt9yh7jf8tn8                           10        1.725        2,937.1
      1948      3 01-MAY-09 12.00.08.928 PM      8jt9yh7jf8tn8                            3        2.232        3,415.7
      1987      2 03-MAY-09 03.00.03.550 AM      8jt9yh7jf8tn8                            7        1.029          901.0
      1990      3 03-MAY-09 06.00.07.641 AM      8jt9yh7jf8tn8                            3        1.225        1,465.7
      1991      3 03-MAY-09 07.00.05.445 AM      8jt9yh7jf8tn8                           26        0.370          710.5
      1992      2 03-MAY-09 08.00.20.010 AM      8jt9yh7jf8tn8                            6        0.213          685.7
      1992      3 03-MAY-09 08.00.19.366 AM      8jt9yh7jf8tn8                            3        0.658          883.0
      1993      2 03-MAY-09 09.00.04.983 AM      8jt9yh7jf8tn8                            8        0.769          950.9
      1996      2 03-MAY-09 12.00.19.205 PM      8jt9yh7jf8tn8                            2        0.101          861.5
      2015      3 04-MAY-09 07.00.13.869 AM      8jt9yh7jf8tn8                            4        0.376          854.5
      2016      3 04-MAY-09 08.00.09.477 AM      8jt9yh7jf8tn8                            6        0.143          571.0
      2019      2 04-MAY-09 11.00.11.317 AM      8jt9yh7jf8tn8                           12        0.937        1,352.1
      2019      3 04-MAY-09 11.00.10.691 AM      8jt9yh7jf8tn8                           10        1.612        1,341.9
      2019      3 04-MAY-09 11.00.10.691 AM      8jt9yh7jf8tn8      4076066623            1       41.592    3,942,672.0
      2020      2 04-MAY-09 12.00.06.355 PM      8jt9yh7jf8tn8      1093407144           15        1.037        1,734.6
      2020      3 04-MAY-09 12.00.06.919 PM      8jt9yh7jf8tn8      4076066623            1      181.044    1,764,007.0
      2022      2 04-MAY-09 02.00.26.599 PM      8jt9yh7jf8tn8      1093407144            2        2.214        2,780.5

65 rows selected.

SQL> -- typical bind variable peeking pattern
SQL> -- let's look at just one node - it'll be a little more clear
SQL> l8
  8* and ss.instance_number = S.instance_number
SQL> i
  9i and s.instance_number = &inst
 10i
SQL>
SQL> /
Enter value for sql_id: 8jt9yh7jf8tn8
Enter value for inst: 3

   SNAP_ID   NODE BEGIN_INTERVAL_TIME            SQL_ID        PLAN_HASH_VALUE        EXECS    AVG_ETIME        AVG_LIO
---------- ------ ------------------------------ ------------- --------------- ------------ ------------ --------------
      1785      3 24-APR-09 05.00.13.361 PM      8jt9yh7jf8tn8      1093407144            6        1.102        2,872.7
      1786      3 24-APR-09 06.00.03.170 PM      8jt9yh7jf8tn8                          223        0.023        2,873.0
      1787      3 24-APR-09 07.00.30.935 PM      8jt9yh7jf8tn8                          873        0.019        2,873.0
      1788      3 24-APR-09 08.00.04.148 PM      8jt9yh7jf8tn8                          871        0.020        2,873.9
      1789      3 24-APR-09 09.00.27.481 PM      8jt9yh7jf8tn8                          566        0.016        2,874.0
      1894      3 29-APR-09 06.00.09.823 AM      8jt9yh7jf8tn8                            2        1.056          847.0
      1895      3 29-APR-09 07.00.00.921 AM      8jt9yh7jf8tn8                            2        0.856        1,208.0
      1897      3 29-APR-09 09.00.10.115 AM      8jt9yh7jf8tn8                           17        1.714        3,416.5
      1898      3 29-APR-09 10.00.42.788 AM      8jt9yh7jf8tn8                            6        3.751        4,484.2
      1899      3 29-APR-09 11.00.10.447 AM      8jt9yh7jf8tn8                            7        1.742        3,284.4
      1900      3 29-APR-09 12.00.35.788 PM      8jt9yh7jf8tn8                            2        1.199          966.0
      1902      3 29-APR-09 02.00.15.910 PM      8jt9yh7jf8tn8                            2        3.941        2,649.5
      1918      3 30-APR-09 06.00.08.403 AM      8jt9yh7jf8tn8                            2        0.421          485.5
      1920      3 30-APR-09 08.00.04.389 AM      8jt9yh7jf8tn8                           12        2.491        3,314.2
      1921      3 30-APR-09 09.00.10.838 AM      8jt9yh7jf8tn8                            2        2.416        1,769.5
      1922      3 30-APR-09 10.00.25.754 AM      8jt9yh7jf8tn8      4076066623            2       54.237    2,291,432.5
      1923      3 30-APR-09 11.00.18.032 AM      8jt9yh7jf8tn8                            3      134.031      933,124.3
      1924      3 30-APR-09 12.00.15.448 PM      8jt9yh7jf8tn8                            3      227.009    6,987,169.3
      1926      3 30-APR-09 02.00.11.174 PM      8jt9yh7jf8tn8                            2      175.709    8,963,417.0
      1927      3 30-APR-09 03.00.24.306 PM      8jt9yh7jf8tn8                            5      156.378   10,159,992.0
      1928      3 30-APR-09 04.00.29.416 PM      8jt9yh7jf8tn8                            1      180.488    2,150,190.0
      1930      3 30-APR-09 06.00.37.119 PM      8jt9yh7jf8tn8                            2      180.371    8,255,881.5
      1934      3 30-APR-09 10.00.12.055 PM      8jt9yh7jf8tn8                            1      180.491    3,102,577.0
      1939      3 01-MAY-09 03.00.32.453 AM      8jt9yh7jf8tn8      1093407144            4        0.575        1,211.8
      1944      3 01-MAY-09 08.00.15.686 AM      8jt9yh7jf8tn8                            6        1.328        1,788.3
      1946      3 01-MAY-09 10.00.16.458 AM      8jt9yh7jf8tn8                            4        2.041        2,414.3
      1947      3 01-MAY-09 11.00.14.385 AM      8jt9yh7jf8tn8                           10        1.725        2,937.1
      1948      3 01-MAY-09 12.00.08.928 PM      8jt9yh7jf8tn8                            3        2.232        3,415.7
      1990      3 03-MAY-09 06.00.07.641 AM      8jt9yh7jf8tn8                            3        1.225        1,465.7
      1991      3 03-MAY-09 07.00.05.445 AM      8jt9yh7jf8tn8                           26        0.370          710.5
      1992      3 03-MAY-09 08.00.19.366 AM      8jt9yh7jf8tn8                            3        0.658          883.0
      2015      3 04-MAY-09 07.00.13.869 AM      8jt9yh7jf8tn8                            4        0.376          854.5
      2016      3 04-MAY-09 08.00.09.477 AM      8jt9yh7jf8tn8                            6        0.143          571.0
      2019      3 04-MAY-09 11.00.10.691 AM      8jt9yh7jf8tn8      4076066623            1       41.592    3,942,672.0
      2019      3 04-MAY-09 11.00.10.691 AM      8jt9yh7jf8tn8      1093407144           10        1.612        1,341.9
      2020      3 04-MAY-09 12.00.06.919 PM      8jt9yh7jf8tn8      4076066623            1      181.044    1,764,007.0
      2031      3 04-MAY-09 11.00.03.519 PM      8jt9yh7jf8tn8      1093407144            1        0.737          482.0
      2039      3 05-MAY-09 07.00.27.610 AM      8jt9yh7jf8tn8      4076066623            5       42.900    4,295,251.8
      2041      3 05-MAY-09 09.00.09.829 AM      8jt9yh7jf8tn8                            2        3.282    1,968,698.5
      2044      3 05-MAY-09 12.00.16.920 PM      8jt9yh7jf8tn8      1093407144            2        1.535          844.5
      2045      3 05-MAY-09 01.00.08.637 PM      8jt9yh7jf8tn8                            2        2.384        1,683.5

41 rows selected.

SQL> -- the plans are clearly flip-flopping
SQL> -- and the performance of plan 1093407144 is clearly orders of magnitude better
SQL> 
SQL> -- Let's see what's in the shared pool
SQL>
SQL> @find_sql_acs
Enter value for sql_text:
Enter value for sql_id: 8jt9yh7jf8tn8
Enter value for is_bind_aware:

SQL_ID         CHILD PLAN_HASH_VALUE IBS IBA ISH      EXECS ROWS_PROCESSED  AVG_ETIME    AVG_CPU    AVG_PIO    AVG_LIO SQL_TEXT
------------- ------ --------------- --- --- --- ---------- -------------- ---------- ---------- ---------- ---------- -----------------------------------
8jt9yh7jf8tn8      0      4076066623 N   N   Y           19              6     160.35      29.28 ##########  6,222,898 SELECT row_order, cdr_id, site_id,
8jt9yh7jf8tn8      1      1093407144 N   N   Y           61              0       1.92       0.13     565.70      2,860 SELECT row_order, cdr_id, site_id,
8jt9yh7jf8tn8      2                 N   N   Y           55              0       1.01       0.06     252.09      1,555 SELECT row_order, cdr_id, site_id,

SQL> -- So there are children with both plans in the shared the pool (135 execs total)
SQL> -- But the cursors are not marked as bind aware
SQL> -- So why are there multiple child cursors?
SQL> -- (using a modified version of Dion Cho's script)
SQL>
SQL> @mismatch3
Enter value for sql_id: 8jt9yh7jf8tn8
SQL_ID                         = 8jt9yh7jf8tn8
CHILD_NUMBER                   = 1
--------------------------------------------------
SQL_ID                         = 8jt9yh7jf8tn8
CHILD_NUMBER                   = 0
ROLL_INVALID_MISMATCH          = Y
--------------------------------------------------
SQL_ID                         = 8jt9yh7jf8tn8
CHILD_NUMBER                   = 2
ROLL_INVALID_MISMATCH          = Y
--------------------------------------------------

PL/SQL procedure successfully completed.

SQL> -- none of the bind related flags show up, so it doesn't look promising for ACS
SQL> -- 

Brief Digression: The ROLL_INVALID_MISMATCH flag has to do with changing stats using AUTO_INVALIDATE which does not cause cursors to be immediately invalidated, but instead let’s them be invalidated over a rolling window (_optimizer_invalidation_period which defaults to 5 hours).

So then I got to wondering if any statements were being marked as bind aware.

SQL> @find_sql_acs
Enter value for sql_text:
Enter value for sql_id:
Enter value for is_bind_aware: Y

SQL_ID         CHILD PLAN_HASH_VALUE IBS IBA ISH      EXECS ROWS_PROCESSED  AVG_ETIME    AVG_CPU    AVG_PIO    AVG_LIO SQL_TEXT
------------- ------ --------------- --- --- --- ---------- -------------- ---------- ---------- ---------- ---------- ---------------
0qvgb3dyfg539      1       722236007 Y   Y   Y            2              0        .15        .02      23.00      3,487 SELECT row_orde
17uuqnvxmzxhj      1      3038781757 Y   Y   Y           31             31        .17        .01      43.74      1,126 SELECT COUNT(*)
                   3      3038781757 Y   Y   N           21             21        .02        .02       1.33      7,290
                   4      3038781757 Y   Y   Y           52             52        .23        .02     120.29      2,046
                   6      3038781757 Y   Y   Y           51             51        .00        .00        .10        284
34x6683rpwtxa      4       722236007 Y   Y   Y           18            164        .01        .00        .17        498 SELECT row_orde
3tfx8fzp64vkb      1      3038781757 Y   Y   Y            2              2        .01        .01        .00      1,178 SELECT COUNT(*)
4vb86f36xqc50      1      2983410489 Y   Y   Y           62           1683        .12        .01      53.76        960 SELECT row_orde
                   4      2983410489 Y   Y   Y            7            163        .69        .07     328.43        723
58p0j1q6rmv34      1      1144901783 Y   Y   Y            2              2        .02        .00        .00        335 SELECT COUNT(*)
5mxqphz5qfs4d      1      1144901783 Y   Y   Y            2              2        .02        .00        .00         54 SELECT COUNT(*)
dt1v1cmua9cnq      1      4076066623 Y   Y   Y            8             37       3.47       3.41      15.63  2,233,144 SELECT row_orde
ftxa99d89yzz0      1      4289789142 Y   Y   Y            2              2        .01        .00        .00        157 SELECT COUNT(*)
g375mcpc30dy5      2      1690109023 Y   Y   N            1             10        .03        .01        .00        767 SELECT row_orde
                   3      1690109023 Y   Y   Y            5             24        .02        .00       2.00        362

15 rows selected.

SQL> -- 

Sure enough, there are a few (10 statements to be exact with a total of 15 child cursors) that are marked as Bind Aware. However, there are only 3 statements that have multiple child cursors (sql_id: 17uuqnvxmzxhj, 4vb86f36xqc50, g375mcpc30dy5). And for each of those statements, the child cursors all have the same plan (sql_id: 17uuqnvxmzxhj, for example, has 3 children all using plan_hash_value: 3038781757). So even though they are marked Bind Aware, and have multiple child cursors, they are not producing distinct plans for those statements. So even the ones that the optimizer has noticed, do not appear to be providing us any benefit.

But back to the original question. We have many statements that are suffering from bind variable peeking issues that are apparently not being noticed. Why is that? Well I got to thinking about why ACS might be ignoring the statements that I know have this issue. One of the characteristics of these statements is that they have a relatively large number of bind variables. Some of them have as many as a couple of hundred. So I let my mind wander off into programmer land. How would the guys writing the optimizer code deal with a situation where they had a virtually unlimited number of bind variables. My first thought was to just put a hard limit on the number of variables to handle. Histograms have a relatively small number of potential buckets, and since ACS uses a histogram approach, it stands to reason the programmers might have put in a relatively small limit for their first pass. So I went off to test this idea (since this post is already so long I’ll cut right to the chase). There does appear to be a hard limit of 14 bind variables and statements that have more than that are apparently not evaluated for bind sensitivity. Here’s a test case:

> !sql
sqlplus "/ as sysdba"

SQL*Plus: Release 11.1.0.7.0 - Production on Mon May 11 11:16:06 2009

Copyright (c) 1982, 2008, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> !cat t14.sql
set echo off
variable v1 number
variable v2 varchar2(30)
variable v3 varchar2(30)
variable v4 varchar2(30)
variable v5 varchar2(30)
variable v6 varchar2(30)
variable v7 varchar2(30)
variable v8 varchar2(30)
variable v9 varchar2(30)
variable v9 varchar2(30)
variable v10 varchar2(30)
variable v11 varchar2(30)
variable v12 varchar2(30)
variable v13 varchar2(30)
variable v14 varchar2(30)
variable v15 varchar2(30)
variable v16 varchar2(30)
variable v17 varchar2(30)
variable v18 varchar2(30)
variable v19 varchar2(30)
exec :v1 := 999999;
exec :v2 := 'TESTING'
exec :v3 := '01-jan-08'
exec :v4 := '01-jan-09'
exec :v5 := 'Y'
exec :v6 := 'X'
exec :v7 := 'y'
exec :v8 := 'Z'
exec :v9 := 'a'
exec :v10 := 'b'
exec :v10 := 'c'
exec :v11 := 'd'
exec :v12 := 'e'
exec :v13 := 'f'
exec :v14 := 'g'
exec :v15 := 'h'
exec :v16 := 'i'
exec :v17 := 'j'
exec :v18 := 'k'
exec :v19 := 'l'
set echo on
select avg(pk_col) from kso.little_skew
where col1 = :v1
and col2 = :v2
and col3 between :v3 and :v4
and col4 in (:v5,:v6,:v7,:v8,:v9,:v10,:v11,:v12,:v13,:v14)
/

SQL> @flush_pool
SQL> @t14
SQL> @t14
SQL> @t14
SQL> @t14
SQL> 
SQL> -- 4 execs of the first bind variable set, now let's use a different set of bind variables
SQL> -- t14a.sql is the same except v1=1 - which (due to histogram) wants to use a different plan
SQL>
SQL> @t14a
SQL> @t14a
SQL> @t14a
SQL> @t14a
SQL> @t14a
SQL> @t14a
SQL> @t14a
SQL>
SQL> -- 7 execs with the new set of bind variables, should give it plenty of info to make it bind aware
SQL> -- now a few more with the original set of bind variables
SQL>
SQL> @t14
SQL> @t14
SQL> @t14
SQL> @t14
SQL>
SQL> -- now let's see what we've got
SQL>
SQL> @find_sql_acs 
Enter value for sql_text: %v14)%
Enter value for sql_id: 
Enter value for is_bind_aware: 

SQL_ID         CHILD PLAN_HASH_VALUE IBS IBA ISH      EXECS ROWS_PROCESSED  AVG_ETIME    AVG_CPU    AVG_PIO    AVG_LIO SQL_TEXT
------------- ------ --------------- --- --- --- ---------- -------------- ---------- ---------- ---------- ---------- -----------------------------------
2yk8hbkc4kbd5      0       376960484 N   N   N            8              8        .18        .18        .00      2,641 select avg(pk_col) from kso.little_
                                                                                                                       skew where col1 = :v1 and col2 = :v
                                                                                                                       2 and col3 between :v3 and :v4 and
                                                                                                                       col4 in (:v5,:v6,:v7,:v8,:v9,:v10,:
                                                                                                                       v11,:v12,:v13,:v14)

2yk8hbkc4kbd5      1      3746388338 N   Y   Y            3              3        .00        .00        .00         15 select avg(pk_col) from kso.little_
                                                                                                                       skew where col1 = :v1 and col2 = :v
                                                                                                                       2 and col3 between :v3 and :v4 and
                                                                                                                       col4 in (:v5,:v6,:v7,:v8,:v9,:v10,:
                                                                                                                       v11,:v12,:v13,:v14)

2yk8hbkc4kbd5      2       376960484 N   Y   Y            4              4        .00        .00        .00          3 select avg(pk_col) from kso.little_
                                                                                                                       skew where col1 = :v1 and col2 = :v
                                                                                                                       2 and col3 between :v3 and :v4 and
                                                                                                                       col4 in (:v5,:v6,:v7,:v8,:v9,:v10,:
                                                                                                                       v11,:v12,:v13,:v14)


3 rows selected.

SQL> -- As expected, the optimizer has noticed and marked it Bind Aware
SQL> -- Now let's try the same thing with 15 bind variables instead of 14
SQL> -- (note: t15 is the same with one more bind variable)
SQL>
SQL> @t15
SQL> @t15
SQL> @t15
SQL> @t15
SQL> @t15a
SQL> @t15a
SQL> @t15a
SQL> @t15a
SQL> @t15a
SQL> @t15a
SQL> @t15a
SQL> @t15
SQL> @t15
SQL> @t15
SQL> @t15
SQL> @find_sql_acs
Enter value for sql_text: %v15)%
Enter value for sql_id: 
Enter value for is_bind_aware: 

SQL_ID         CHILD PLAN_HASH_VALUE IOB IBA ISH      EXECS ROWS_PROCESSED  AVG_ETIME    AVG_CPU    AVG_PIO    AVG_LIO SQL_TEXT
------------- ------ --------------- --- --- --- ---------- -------------- ---------- ---------- ---------- ---------- -----------------------------------
4ndg8w1ga5dth      0       376960484 N   N   Y           15             15        .17        .16        .00      2,465 select avg(pk_col) from kso.little_
                                                                                                                       skew where col1 = :v1 and col2 = :v
                                                                                                                       2 and col3 between :v3 and :v4 and
                                                                                                                       col4 in (:v5,:v6,:v7,:v8,:v9,:v10,:
                                                                                                                       v11,:v12,:v13,:v14,:v15)


1 row selected.

SQL> -- no joy!
SQL>
SQL> -- so now I wondered if it really was just a hard limit or something more complicated
SQL> -- so I wrote a little script to count the number of bind variables in Bind Aware statements
SQL>
SQL> !cat acs_bind_count.sql
select distinct sql_id, bind_count from (
select sql_id, child_number, count(*) bind_count from v$sql_bind_capture
where sql_id in (
select sql_id from v$sqlarea where is_bind_aware = 'Y')
group by sql_id, child_number
)
order by bind_count
/

SQL> @acs_bind_count.sql

SQL_ID        BIND_COUNT
------------- ----------
2jmg3n6k2q82x          2
dxjvgtd3stafn          4
ck67mwdmdnaww          8
9pcw6qh83mj69          9
883uvnn2gz36u         10
4ysyd47nkap6d         11
7mygfqgs52au2         12
2744r65xx9h2h         13
2yk8hbkc4kbd5         14

9 rows selected.

So it looks like there probably is a hard limit. Let’s see if I can disprove that by looking at a couple of production systems. Here’s the output from the same script on four production instances (note: system 1 is that same one that I referenced earlier with the statements with known Bind Variable Peeking issues, where ACS was not noticing them):

-- System 1, Node 2

SQL> @acs_bind_count

SQL_ID        BIND_COUNT
------------- ----------
17uuqnvxmzxhj          8
58p0j1q6rmv34          8
5mxqphz5qfs4d          8
g375mcpc30dy5          8
3tfx8fzp64vkb          9
4vb86f36xqc50          9
34x6683rpwtxa         10
dt1v1cmua9cnq         11
ftxa99d89yzz0         11
0qvgb3dyfg539         12

10 rows selected.


-- System 1, Node 3

SQL> @acs_bind_count

SQL_ID        BIND_COUNT
------------- ----------
5bjg6w9btv3zn          7
17uuqnvxmzxhj          8
6n1790gk5m0hy          8
2pv7g3nutnyq5          9
4vb86f36xqc50          9
11at9nnhrw3w9         10
4p9zp29bcu927         10
4axts2hm73n98         11
ftxa99d89yzz0         11
490gh9uugyqqq         13

10 rows selected.

-- System 2, Node 1

SQL> @acs_bind_count

SQL_ID        BIND_COUNT
------------- ----------
4rqy85b9r413x          1
7h47mb48s19t3          1
51z6b0kbduf8m          2
djkchp3qwadu1          2
9g5ng8xjj4h05          3
amyvvzzjccvsg          3
cuv2pyzvz7b4z          3
d68jnfnqs6uh7          3
dkc2kbj7kdsu7          3
fxmfhkkhcrdj6          3
6196v5td8pk4n          4
9m63xcj6sbswu          4
0mn3xx02ww6pd          5
c7k3pwbjg4d1d          5
dy98ajwqc8s2x          5
3qh6yw37n8m1a          6
3xxgg248shg62          6
83v17f53rnzcv          6
a2271k4f7j211          6
06n7rk2jw3rsy          7
f4xr9x5wh46zt          7
48ryzfg5dxpqn          8
an9114vtxukz2          8

23 rows selected.

-- System 2, Node 2

SQL> @acs_bind_count

SQL_ID        BIND_COUNT
------------- ----------
4rqy85b9r413x          1
7h47mb48s19t3          1
gvj749s4654hm          1
51z6b0kbduf8m          2
6qwznvjfajh7t          2
djkchp3qwadu1          2
2mhfdtyrvhz3c          3
92a7wu0891bjk          3
9g5ng8xjj4h05          3
9p9gsakw5qyrg          3
amyvvzzjccvsg          3
bs63sfxynwtc6          3
cuv2pyzvz7b4z          3
d68jnfnqs6uh7          3
dvh3d8g2f844j          3
0341h5ytqbv7k          4
...
gj0rzh6xw2msp          4
grvh9mmkynm4u          4
0mn3xx02ww6pd          5
7ys2u732cj1ag          5
9kfj6kj32p7ku          5
dy98ajwqc8s2x          5
83v17f53rnzcv          6
d79xsp0utuvm2          6
gqkx5bsg1zfzc          6
4q3n2wwqtghma          7
f4xr9x5wh46zt          7
g7c90sr77bb78          7
7gb24h57r0spb          8
9zcy2yz537z41          8
an9114vtxukz2          8
7uwy0uxauuabz          9
80xmux45pvkrr          9

73 rows selected.

So it appears that there probably is a hard limit and that it probably is 14 (although there certainly could be a more complicated algorithm in play). None of the 4 production systems I checked had a single statement that was marked bind aware with more than 14 bind variables.

So here are a few closing thoughts:

To me, ACS doesn’t seem to be completely cooked yet. In fact, it seems to be somewhat of a futile attempt, since in the very best case, one execution with a bad plan would be necessary for the optimizer to recognize that a bind variable peeking issue existed. This wouldn’t be so bad if the information was persisted, so that the same “learning” process wouldn’t have to be endured if/when a statement gets flushed from the shared pool. This issue alone is enough to keep this feature from being relied upon in situations where performance is critical. In my opinion, the best approach remains unchanged from version 10, that being the judicious use of literals where necessary to prevent plan instability due to bind variable peeking.

My impression of it is not completely negative though. It’s certainly better than the old approach of pick a plan and stick with it. And for systems that use a limited number of binds per statement where performance swings are not as dramatic and/or performance is not as critical, it seems to work pretty well.

Here are some other references:

OptimizerMagic Blog – Good basic description of ACS
OptimizerMagic Blog – Update on ACS
Ric Van Dyke – Intelligent Cusor Sharing in 11.1.0.6
Ric Van Dyke – Intelligent Cusor Sharing in 11.1.0.7
Section on ACS in Troubleshooting Oracle Performance By Christian Antognini

Building Slow Development Systems (On Purpose)

I had an interesting discussion last week with an Oracle DBA that I have a lot of respect for. He surprised me by telling me that he thought development systems should intentionally be built to be slow. That way, the developers would be forced to write efficient code. He was not just stating a philosophical position either. He had actually built some databases with 32M buffer caches (the one I mentioned in this previous post on Buffer Cache Extremes was one of his). He went so far as to say that he thought that developers should be using 286’s for their development systems.

I have argued for several years that development systems should match production as closely as possible. But that conversation really got me thinking. Maybe intentionally slowing down development systems is not such a bad idea…

So what do you think?

I’ll Gladly Pay You Tuesday for a Hamburger Today

Those of you who are old enough to remember disco dancing will probably remember this guy. Wimpy (J. Wellington Wimpy actually) was a lazy, glutinous character in the old Popeye cartoons.

I was reminded of Wimpy last week by a performance problem that occurred on an 11g database I was looking at (I’ll tell you why in a minute). So here’s what happened. There were several statements that were running much more slowly than usual (10’s of seconds instead of less than a second). So I started poking around and here’s what I saw:

(note that I cleaned this up a bit to protect the innocent)

(also note that many of the scripts I use have been documented in other posts – so if I haven’t shown the SQL for a particular script, try searching for it on this site)

SQL> @find_sql_acs
Enter value for sql_text:
Enter value for sql_id: 3dhvsp0duttat
Enter value for is_bind_aware:

SQL_ID         CHILD PLAN_HASH_VALUE IOB IBS IBA ISH      EXECS ROWS_PROCESSED  AVG_ETIME    AVG_CPU    AVG_PIO    AVG_LIO SQL_TEXT
------------- ------ --------------- --- --- --- --- ---------- -------------- ---------- ---------- ---------- ---------- ------------------
3dhvsp0duttat      0      3289235442 N   N   N   Y            9             46       0.02       0.00       1.11        272 SELECT row_order, 
3dhvsp0duttat      1      3289235442 N   N   N   Y           13             48       0.08       0.01      12.62        546 SELECT row_order, 
3dhvsp0duttat      2      4076066623 N   N   N   Y           15             31      38.07       6.55  27,176.53  2,426,778 SELECT row_order, 


SQL> @dplan
Enter value for sql_id: 3dhvsp0duttat
Enter value for child_no: 0

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  3dhvsp0duttat, child number 0
-------------------------------------
SELECT row_order, ...

Plan hash value: 3289235442

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                        |       |       |     5 (100)|          |       |       |
|*  1 |  VIEW                                 |                        |     1 |   586 |     5  (20)| 00:00:01 |       |       |
|*  2 |   WINDOW SORT PUSHED RANK             |                        |     1 |   258 |     5  (20)| 00:00:01 |       |       |
|*  3 |    FILTER                             |                        |       |       |            |          |       |       |
|   4 |     PARTITION RANGE ITERATOR          |                        |     1 |   258 |     4   (0)| 00:00:01 |   KEY |   KEY |
|*  5 |      TABLE ACCESS BY LOCAL INDEX ROWID| REALLY_BIG_TABLE       |     1 |   258 |     4   (0)| 00:00:01 |   KEY |   KEY |
|*  6 |       INDEX RANGE SCAN                | GOOD_INDEX             |     1 |       |     3   (0)| 00:00:01 |   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(...)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "DATE_COL")<=:25)
   3 - filter((TO_DATE(:1,:2)<=TO_DATE(:3,:4) AND :24<=:25))
   5 - filter(...)
   6 - access("COL1" LIKE :21 AND "DATE_COL">=TO_DATE(:1,:2) AND "DATE_COL"<=TO_DATE(:3,:4))

SQL> @dplan
Enter value for sql_id: 3dhvsp0duttat
Enter value for child_no: 2

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  3dhvsp0duttat, child number 2
-------------------------------------
SELECT row_order, ...

Plan hash value: 4076066623

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                  | Rows  | Bytes | Cost (%CPU)| Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                       |       |       |     1 (100)|       |       |
|*  1 |  VIEW                                 |                       |     1 |   586 |     0   (0)|       |       |
|*  2 |   WINDOW NOSORT STOPKEY               |                       |     1 |   574 |     0   (0)|       |       |
|*  3 |    FILTER                             |                       |       |       |            |       |       |
|   4 |     PARTITION RANGE ITERATOR          |                       |     1 |   574 |     0   (0)|   KEY |   KEY |
|*  5 |      TABLE ACCESS BY LOCAL INDEX ROWID| REALLY_BIG_TABLE      |     1 |   574 |     0   (0)|   KEY |   KEY |
|*  6 |       INDEX RANGE SCAN                | BAD_INDEX             |     1 |       |     0   (0)|   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("ROW_ORDER">=:24 AND "ROW_ORDER"<=:25))
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "DATE_COL")<=:25)
   3 - filter((TO_DATE(:1,:2)<=TO_DATE(:3,:4) AND :24<=:25))
   5 - filter(...)
   6 - access("DATE_COL">=TO_DATE(:1,:2) AND "DATE_COL"<=TO_DATE(:3,:4))

SQL> -- Uh Oh, - cost of 1, no records expected 
SQL>
SQL> @child_load_time
Enter value for sql_text:
Enter value for sql_id: 3dhvsp0duttat

SQL_ID         CHILD  PLAN_HASH LAST_LOAD_TIME       EXECS     AVG_ETIME    AVG_PIO      AVG_LIO SQL_TEXT
------------- ------ ---------- --------------- ---------- ------------- ---------- ------------ -----------------------------------------
3dhvsp0duttat      0 3289235442 2009-05-01/12:04:20      9          0.02          1          272 SELECT row_order, ... 
3dhvsp0duttat      1 3289235442 2009-05-05/19:43:14     13          0.08         13          546
3dhvsp0duttat      2 4076066623 2009-05-07/16:49:36     15         38.07     27,177    2,426,778


SQL> -- so it looks like maybe something bad happened with stats on 5/7
SQL> -- let's make use of the diif_table_stats procedure to see if we can spot what happened

SQL> !date
Tue May 12 11:59:22 CDT 2009

SQL> !cat diff_table_stats.sql
select report, maxdiffpct from
table(dbms_stats.diff_table_stats_in_history('&owner','&table_name',
systimestamp-&days_ago))
/

SQL> @diff_table_stats.sql
Enter value for owner: XXX
Enter value for table_name: REALLY_BIG_TABLE
Enter value for days_ago: 5

REPORT                                                                           MAXDIFFPCT
-------------------------------------------------------------------------------- ----------
###############################################################################

STATISTICS DIFFERENCE REPORT FOR:
.................................

TABLE         : REALLY_BIG_TABLE
OWNER         : XXX
SOURCE A      : Statistics as of 07-MAY-09 11.46.36.000000 AM -05:00
SOURCE B      : Current Statistics in dictionary
PCTTHRESHOLD  : 10
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TABLE / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................

OBJECTNAME                  TYP SRC ROWS       BLOCKS     ROWLEN     SAMPSIZE
...............................................................................

P2009W19                    P   A   0          0          0          0
                                B   4001801    187810     342        4001801
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

COLUMN STATISTICS DIFFERENCE:
.............................

COLUMN_NAME     SRC NDV     DENSITY    HIST NULLS   LEN  MIN   MAX   SAMPSIZ
...............................................................................


                               PARTITION: P2009W19
                               ...................

COL1            A   0       0          NO   0       0      0
                B   105136  .000009511 NO   2565391 4    302E3 6E303 1436410
COL2            A   0       0          NO   0       0      0
                B   669696  .000001493 NO   1011264 9    2A2A2 554E4 2990537
COL3            A   0       0          NO   0       0      0
                B   988     .001012145 NO   1113316 4    30303 39393 2888485
COL4            A   0       0          NO   0       0      0
                B   669696  .000037461 YES  0       11   2A2A2 554E4 5540
...

SQL> -- Sure enough, stats were created when the partition was empty, ouch!

So a couple of things to note here:

1. the table is partitioned
2. dbms_stats.diff_table_stats_in_history is smart enough to show us partition stats (nice)
3. on May 7th the stats changed dramatically
4. this statement was not re-parsed due to the stats changing

So what’s the fix? Well since stats had been generated on this partition every night for the last 5 days, the fix was just to get the optimizer to re-parse my query (and any other affected statements). In 11g this is very easy as there is a built function (psuedo documented) to flush a single SQL statement from the shared pool (see my previous post – Flush A Single SQL Statement – for details). However, I wasn’t sure how many statements were affected (more on that in a minute). But certainly any statement that touched that partitioned table which got parsed on May 7th was almost certain to have a bad plan. So eventually we decided to do the brute force approach and just flush the entire shared pool. (I know, not elegant, but fairly effective) Note that there is no guarantee that flushing the shared pool will clear all the statements. In fact, on a busy system you’re almost guaranteed that it won’t flush every statement. But it did clear up most of the issues. So with the pressure now off (and the system back to running pretty well) I had time to write a little script to look for SQL statements with a cost of 1. (I guess I could have looked for statements which were parsed on May 7th that accessed my partitioned table, but I didn’t.)

Here’s a bit of the SQL*Plus Session:

SQL>-- initially we flushed a couple directly
SQL>
SQL> @flush_sql
Enter value for sql_id: 3dhvsp0duttat

PL/SQL procedure successfully completed.

SQL> -- so we can find all of these guys and flush them - or just flush the whole shared pool
SQL> -- flushing the shared pool is easier!
SQL>
SQL> alter system flush shared_pool;

System altered.

SQL> -- but there are still no guarantees they are all cleaned up, let's check
SQL>
SQL> !cat find_sql_by_cost.sql
select distinct parsing_schema_name, sa.sql_id, sp.plan_hash_value, cost
from v$sql_plan sp, v$sqlarea sa
where sp.sql_id = sa.sql_id
and parsing_schema_name like nvl('&parsing_user',parsing_schema_name)
and cost like nvl(to_number('&cost'),to_number(cost))
and sp.plan_hash_value like nvl('&plan_hash_value',sp.plan_hash_value)
and parent_id is null
order by 4 desc
/

SQL> @find_sql_by_cost
Enter value for parsing_user:
Enter value for cost: 1
Enter value for plan_hash_value: 407%

PARSING_SCHEMA_NAME            SQL_ID        PLAN_HASH_VALUE       COST
------------------------------ ------------- --------------- ----------
XXXX_PROD                      02t2tt28xp72m      4076066623          1
XXXX_PROD                      5byphafyx78j6      4076066623          1

SQL> -- note I did a little checking here to make sure they were suffering from the same issue
SQL> -- then I cleaned them up
SQL> 
SQL>
SQL> @flush_sql
Enter value for sql_id: 02t2tt28xp72m

PL/SQL procedure successfully completed.

SQL> @find_sql
Enter value for sql_text:
Enter value for sql_id: 02t2tt28xp72m

no rows selected

SQL> @flush_sql
Enter value for sql_id: 5byphafyx78j6

PL/SQL procedure successfully completed.

SQL> @find_sql
Enter value for sql_text:
Enter value for sql_id: 5byphafyx78j6

no rows selected

So the short term fix worked. The statements picked up the correct plans after re-parsing. Now let’s talk about the long term fix. Why did the partition not have stats on it? Turns out the partitioning scheme in place was range by date with a new partition for each week. Each partition goes from Thursday through the following Wednesday. Several partitions had been created in advance but as you may have guessed by now, nothing was done about the stats. Meaning that the partitions had values of 0 for the number of rows. Since the gather stats job was kicking in every evening, a whole days worth of data could be loaded into the partition before the stats would be updated. So that means that 24 hours worth of activity would occur with “bad” stats every week. Couple that with the default of NO_INVALIDATE=TRUE on the stats gathering and you get statements that may hang around for extended periods of time, even if the stats change dramatically (0 rows to a few million is dramatic, right?). So we manually set the stats on the pre-created partitions using a combination of dbms_stats.copy_table_stats and dbms_stats.set_table_stats and dbms_stats.set_column_stats (see Dion Cho’s excellent post for details and example code – and also this post on the Optimizermagic Blog for further info). I should also mention that Greg Rahn has several good posts on 11g Statistics Enhancements here, here, and here.

So why was I reminded of Wimpy? Because Wimpy’s favorite saying was “I’ll gladly pay you Tuesday for a hamburger today”. And the way this system was set up, every Tuesday (er, Thursday), we were paying the price for our bad/missing statistics. OK, so it’s a stretch, but I like the pictures of Wimpy!

Please feel free to leave me a comment letting me know how you feel …

… about the pictures of Wimpy.

Shopping Trip

This evening I went to the grocery store with my wife, Jill. It was an interesting trip. I have to admit that I took a sneak peek at the shopping list before I agreed to go, and seeing that there were only 7 items on the list, I figured I could earn some extra points without spending too much time. Sneaky, I know, but I am a little weird about optimizing my time. By way of example, one of the main reasons I have a pony tail is because I once calculated how much time it took to get a haircut every month. The calculation went something like this:

Haircut Calculation
Time Primary Action Snide Comment
0.50 Commute to the barber shop Everything is a half hour away in Dallas
0.25 Wait for an open barber chair Not totally wasted because you could read the 6 year old copies of Field and Stream or the 10 year old copies of Lowrider Magazine – just in case I ever decide to go fishing in my newly restored 57 Chevy
0.75 Actually getting my haircut Takes a little extra time because of barbers need to give advice on everything from re-wiring a house to colonoscopy
0.25 Paying up Again takes a little extra time to listen to advice on tattoo removal (not that I have any tattoos, much less any tattoos that need to be removed, but it just seems rude not to act interested, especially to a guy that’s basically in charge of you public image)
0.50 Commute back to wherever Can take longer if it’s raining because nobody can drive in the rain in Dallas

So that’s a total of 2.75 hours per haircut. Multiply that by 12 times a year and that comes out to 33 hours per year. So that’s almost a whole work week. That just seemed like too much time to commit to such an unrewarding task, so I quit getting my hair cut.

But I digress.

Back to the shopping trip. Like I said, the list had 7 items on it, so I expected a quick trip. Well we get into the store and Jill asks me to get a shopping cart. I’m thinking a little hand basket thingy would be fine, because I’ve seen the list! But I don’t want to tell her that (I’m still being sneaky), so I get the shopping cart. And we proceed down the first isle, stopping every two feet or so to put something in the basket. By the end of the first isle we have at least double the number of items on the list, and by the way, nothing has been marked off the list yet. Just to give you an example, we ended up with 4 bags of chips (Lays Bar-B-Que, Tostitos Scoops, Fritos, and Sun Chips). We’d have had one more (because part of my family likes the Scoop Tostitos and another contingent likes the flat Tostitos) but we had to draw the line somewhere. It wasn’t all my wife’s doing either. We were both throwing stuff into the cart like we were expecting a hurricane to cut off food supplies for the next several weeks. By the time we got out of the store we had 17 bags of groceries. They were the small plastic kind (which I don’t really like – I always say paper when they ask – but I digress again).

I found the whole thing quite amusing, although I don’t think Jill really did.  I asked if I could check the list while we were checking out, just to make sure we hadn’t missed anything. She didn’t really appreciate my humor. (we did forget one of the things on the list by the way)

Anyway, the experience reminded me that trying to get too clever can often backfire. It also reminded me that I am predisposed to make that particular mistake. Implementing a tricky solution to a problem, while it may be intellectually stimulating,  is often not the best approach. The experience reminded of a talk I did a couple of years ago (Creative Problem Solving for Oracle Systems) where I talked about several tendencies that all of us have (to a greater or lesser extent). And how those tendencies can interfere with our ability to solve problems. One of the points of the talk was that being aware of your tendencies can help you avoid pitfalls.

Like Clint Eastwood said, “A Man’s Gotta Know His Limitations”.

Oracle Tech Day Presentation – 11g (Upgrade?)

I did a little impromptu talk yesterday at an Oracle Tech Day. It was kind of a funny situation. They had a speaker that couldn’t make it, so they called me at about 1:30 to see if I could talk at 2:00. Luckily my office is very close to the Oracle office, so I said sure.

The topic was supposed to be something about upgrading to 11g, and I had done a talk a month or so ago (at a Dallas Oracle Users Group meeting) about some of my favorite 11g features. So you know how it works, when you’ve got a hammer, everything starts to look like a nail. So I used the Power Point from that earlier talk and tried to tie in the upgrade theme as best I could. At about 3:05 I looked down at my watch and realized my hour was up (everyone knows that standard presentations are one hour). So I politely asked the audience if I should be wrapping up and they said, no you’re supposed to go until 4:00. Ouch!

Anyway, it wasn’t too painful, thanks to some good questions. Thanks to everyone who put up with the last minute sub. I promised to post the presentation materials. So here’s a link: 11g Presentation Materials