Unstable Plans (Oracle Plan Stability/Instability)

One of the most frustrating things about the Cost Based Optimizer in Oracle is the sometimes erratic performance. It can seem almost random at times. Resolving these issues can also be challenging. It’s a little like taking your car to the mechanic when it makes a funny noise. It never seems to make the noise when the mechanic is there. Fortunately we have ASH/AWR which captures a ton of information about what our databases are doing.

Here’s a couple of scripts that I find useful.

The first one can be used to show statements that have experienced significant variances in execution time (it can be modified to look for variances in the amount of logical i/o, but I’ll leave it as an exercise for the reader).  I called the script unstable_plans.sql. It uses an analytic function to calculate a standard deviation on the average elapsed time by plan. So the statements that have multiple plans with wild variations in the response time between plans will be returned by the script. The script prompts for a couple of values. The first is minimum number of standard deviations. The second is the minimum elapsed time (I usually don’t care if a statement executes sometimes in .005 seconds and sometimes in .02 seconds, even though this is a large swing statistically). Both these inputs are defaulted by the way.

The second one is called awr_plan_change.sql. It’s shows how the plans for a given statement have changed over time, along with some statistics such as the average elapsed time and the average lio’s.

Anyway, here’s an example of the usage of these two scripts (by the way, the example is on an 11gR1 database, but the scrips work fine on 10g as well).

 

> sqlplus / as sysdba
 
SQL*Plus: Release 11.1.0.6.0 - Production on Tue Oct 7 15:44:20 2008
 
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
 
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
 
SQL> alter system set optimizer_mode=rule;
 
System altered.
 
SQL> @awr_snap 
 
PL/SQL procedure successfully completed.
 
SQL> @unstable_plans
Enter value for min_stddev: 
Enter value for min_etime: 
 
SQL_ID        SUM(EXECS)   MIN_ETIME   MAX_ETIME   NORM_STDDEV
------------- ---------- ----------- ----------- -------------
c5by9gdw3814r         14         .03         .12        2.1274
848dyu9288c3h         16         .02         .16        2.1309
2am60vd2kw8ux         11         .05         .22        2.4976
frxg53fac2n8j          5         .03         .30        4.2479
0qa98gcnnza7h         62       25.58      314.34        7.9833
g0jvz8csyrtcf          2         .09        1.19        8.2304
2cn0kc8u4b81w        545         .02         .42       12.8022
9wt62290ah0f7          6         .01         .47       38.5857
d8mayxqw0wnpv       1373         .01         .85       48.3874
 
9 rows selected.
 
SQL> /
Enter value for min_stddev: 
Enter value for min_etime: 2
 
SQL_ID        SUM(EXECS)   MIN_ETIME   MAX_ETIME   NORM_STDDEV
------------- ---------- ----------- ----------- -------------
0qa98gcnnza7h         62       25.58      314.34        7.9833
 
SQL> @find_sql
Enter value for sql_text: 
Enter value for address: 
Enter value for sql_id: 0qa98gcnnza7h
 
SQL_ID         CHILD  PLAN_HASH        EXECS         ETIME     AVG_ETIME USERNAME      SQL_TEXT
------------- ------ ---------- ------------ ------------- ------------- ------------- -----------------------------------------
0qa98gcnnza7h      0 3723858078            5        356.53         71.31 SYS           select avg(pk_col) from kso.skew where co
                                                                                       l1 > 0
 
0qa98gcnnza7h      1  568322376            1          7.92          7.92 SYS           select avg(pk_col) from kso.skew where co
                                                                                       l1 > 0
 
0qa98gcnnza7h      2  568322376           10         52.14          5.21 SYS           select avg(pk_col) from kso.skew where co
                                                                                       l1 > 0
 
0qa98gcnnza7h      3  568322376           30      1,064.19         35.47 KSO           select avg(pk_col) from kso.skew where co
                                                                                       l1 > 0
 
0qa98gcnnza7h      4 3723858078           10      4,558.62        455.86 KSO           select avg(pk_col) from kso.skew where co
                                                                                       l1 > 0
 
SQL> @awr_plan_change
Enter value for sql_id: 0qa98gcnnza7h
 
   SNAP_ID   NODE BEGIN_INTERVAL_TIME            SQL_ID        PLAN_HASH_VALUE        EXECS    AVG_ETIME        AVG_LIO
---------- ------ ------------------------------ ------------- --------------- ------------ ------------ --------------
      3206      1 02-OCT-08 08.00.38.743 AM      0qa98gcnnza7h       568322376            4       10.359      121,722.8
      3235      1 03-OCT-08 01.00.44.932 PM      0qa98gcnnza7h                            1       10.865      162,375.0
      3235      1 03-OCT-08 01.00.44.932 PM      0qa98gcnnza7h      3723858078            1      127.664   28,913,271.0
      3236      1 03-OCT-08 01.28.09.000 PM      0qa98gcnnza7h       568322376            1        7.924      162,585.0
      3236      1 03-OCT-08 01.28.09.000 PM      0qa98gcnnza7h      3723858078            1       86.682   27,751,123.0
      3305      1 06-OCT-08 10.00.11.988 AM      0qa98gcnnza7h                            4       64.138   22,616,931.5
      3305      1 06-OCT-08 10.00.11.988 AM      0qa98gcnnza7h       568322376            2        5.710       81,149.0
      3306      1 06-OCT-08 11.00.16.490 AM      0qa98gcnnza7h                            6        5.512      108,198.5
      3307      1 06-OCT-08 12.00.20.716 PM      0qa98gcnnza7h                            2        3.824       81,149.0
      3328      1 07-OCT-08 08.39.20.525 AM      0qa98gcnnza7h                           30       35.473      156,904.7
      3335      1 07-OCT-08 03.00.20.950 PM      0qa98gcnnza7h      3723858078           10      455.862   28,902,128.6
 
11 rows selected.
 
SQL> @dplan_awr
Enter value for sql_id: 0qa98gcnnza7h
Enter value for plan_hash_value: 
 
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 0qa98gcnnza7h
--------------------
select avg(pk_col) from kso.skew where col1 > 0
 
Plan hash value: 568322376
 
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       | 44497 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    11 |            |          |
|   2 |   TABLE ACCESS FULL| SKEW |    32M|   335M| 44497   (2)| 00:08:54 |
---------------------------------------------------------------------------
 
SQL_ID 0qa98gcnnza7h
--------------------
select avg(pk_col) from kso.skew where col1 > 0
 
Plan hash value: 3723858078
 
--------------------------------------------------
| Id  | Operation                    | Name      |
--------------------------------------------------
|   0 | SELECT STATEMENT             |           |
|   1 |  SORT AGGREGATE              |           |
|   2 |   TABLE ACCESS BY INDEX ROWID| SKEW      |
|   3 |    INDEX RANGE SCAN          | SKEW_COL1 |
--------------------------------------------------
 
Note
-----
   - rule based optimizer used (consider using cbo)
 
33 rows selected.

A couple of quick comments. The unstable_plan.sql script shows several statements that are switching plans. Most of them are sub-second response times though. The one that stands out is sql_id 0qa98gcnnza7h. It has been executed 62 times and one plan averages about 25 seconds while another averages about 314 seconds. The awr_plan_changes.sql script shows that this statement has been switching between two plans (3723858078 and 568322376). 3723858078 is the less efficient plan (doing 25M or so logical i/o’s per execution) while plan 568322376 is much better (doing only about 120K logical i/o’s per execution).

This is a typical pattern we see when bind variable peeking is an issue. Plans that flip flop back and forth between two or three options. This post is not specifically about bind variable peeking and so I won’t go into all the details (if you want more info just search on this site). But I will say that it is quite often the case that a single plan is acceptable regardless of the bind variables that are used (in this case the 568322376 plan). It is pretty easy to get Oracle to use a specific plan using an Outline, SQL Profile, or Baseline (depending on the version of Oracle you are using). And that’s exactly what we did in this case to resolve the issue.

Leave a Reply