Extremes

Last week I got to look at a version 9 Statspack Report from a database where the buffer cache was set to 32M. That’s right, 32M. I don’t think you can buy a thumb drive that small. They were using 16K blocks too, so what’s that:

       (32*1024*1024)/(16*1024) = 2,048 blocks

Not very many blocks for that database. Needless to say it wasn’t working very well for the multi-hour batch job they were trying to run.

Then over the weekend someone sent an AWR report from a version 10 database and it had a 60G buffer cache. That’s the biggest one I’ve ever seen!

So it was a pretty interesting week (from a geeky technical standpoint anyway)!

It reminded me of Muggsy Bogues guarding Shawn Bradley, or Manute Bol, or MJ, or just about anybody.

Anyone seen a bigger (or smaller) buffer cache recently?

I’d love to hear about it.

AWR DBtime script

Here’s a handy little script for showing when the database has been most busy. I wrote this one because I wanted to be able to identify the workload characteristics of an instance when it was at its “busiest”, so we could model our testing strategy “correctly”. So the first part of that problem was to identify when the instance was “busiest”. The DBtime in the AWR reports seemed like a good indicator of “busiest”, so there you go. The script calculates the DBtime for each snapshot using the lag function and then reports the top 30. The script could easily be modified to calculate across multiple snapshots if desired to give the busiest day for example, but I’ll leave that as an exercise for the reader. By the way, the calculation came from 11g statspack report. The AWR report is wrapped, but the statspack report is not. So I just modified the query to use the AWR tables instead of the Statspack tables and added the lag function. This script works on 10gR2 as well.

Here’s the script: dbtime.sql
And an example of its use:

SQL*Plus: Release 11.1.0.7.0 - Production on Tue May 5 10:08:58 2009

Copyright (c) 1982, 2008, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> @dbtime
Enter value for instance_number: 
Enter value for begin_snap_id: 
Enter value for end_snap_id: 

BEGIN_SNAP   END_SNAP BEGIN_TIMESTAMP                                INST      DBTIME
---------- ---------- ---------------------------------------- ---------- -----------
      8205       8206 27-APR-09 09.21.22.022 AM                         1       32.87
      8207       8208 27-APR-09 09.40.50.404 AM                         1       32.63
      8245       8246 28-APR-09 10.00.58.734 PM                         1        5.65
      8389       8390 04-MAY-09 10.00.26.563 PM                         1        5.62
      8325       8326 02-MAY-09 06.00.06.252 AM                         1        4.81
      8220       8221 27-APR-09 10.00.06.667 PM                         1        4.28
      8317       8318 01-MAY-09 10.00.18.928 PM                         1        3.14
      8230       8231 28-APR-09 08.00.17.951 AM                         1        2.80
      8293       8294 30-APR-09 10.00.56.784 PM                         1        2.37
      8349       8350 03-MAY-09 06.00.28.355 AM                         1        2.35
      8244       8245 28-APR-09 09.00.51.632 PM                         1        2.26
      8232       8233 28-APR-09 10.00.32.610 AM                         1        1.97
      8305       8306 01-MAY-09 10.00.07.398 AM                         1        1.93
      8292       8293 30-APR-09 09.00.50.806 PM                         1        1.86
      8268       8269 29-APR-09 09.00.29.463 PM                         1        1.82
      8269       8270 29-APR-09 10.00.35.137 PM                         1        1.80
      8388       8389 04-MAY-09 09.00.20.561 PM                         1        1.80
      8353       8354 03-MAY-09 10.00.51.838 AM                         1        1.32
      8316       8317 01-MAY-09 09.00.12.973 PM                         1        1.31
      8329       8330 02-MAY-09 10.00.29.899 AM                         1        1.22
      8348       8349 03-MAY-09 05.00.22.374 AM                         1        1.18
      8213       8214 27-APR-09 03.00.16.481 PM                         1        1.17
      8365       8366 03-MAY-09 10.00.02.746 PM                         1        1.17
      8357       8358 03-MAY-09 02.00.15.346 PM                         1        1.08
      8341       8342 02-MAY-09 10.00.40.654 PM                         1        1.04
      8337       8338 02-MAY-09 06.00.17.054 PM                         1        1.03
      8204       8205 27-APR-09 09.20.32.516 AM                         1        1.00
      8361       8362 03-MAY-09 06.00.39.298 PM                         1        1.00
      8202       8203 27-APR-09 09.17.02.145 AM                         1        0.97
      8333       8334 02-MAY-09 02.00.53.480 PM                         1        0.97

30 rows selected.

And here’s the output of the AWR report for the top period.


WORKLOAD REPOSITORY report for

DB Name         DB Id    Instance     Inst Num Startup Time    Release     RAC
------------ ----------- ------------ -------- --------------- ----------- ---
LAB111        1385947011 LAB111              1 22-Apr-09 14:19 11.1.0.7.0  NO

Host Name        Platform                         CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
homer            Linux IA (32-bit)                  16     4       4      31.73

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:      8205 27-Apr-09 09:21:22        25       1.8
  End Snap:      8206 27-Apr-09 09:27:33        23       1.8
   Elapsed:                6.19 (mins)
   DB Time:               32.87 (mins)

Oracle 11g SQL Plan Management – SQL Plan Baselines

Note, this post applies to 11gR1 (specifically 11.1.0.7).

11g has a new method of dealing with plan instability (i.e. the tendency of a statement to flip flop between plans). The goal has always been to provide the best performance, but until 11g there has not been a mechanism that had as a design goal, the idea of eliminating backwards movement (“performance regressions” as the Oracle documentation calls them). That is, not allowing a statement to switch to a plan that is significantly slower than the one it has already been executing. This new mechanism depends on a construct called a Baseline. So for this post I am not going to focus on the process of evolving plans to prevent “performance regressions”, but rather on the Baselines themselves. (hopefully I’ll get to the plan management part soon)

Here’s a little excerpt from the 11gR1 documentation:

Each time a SQL statement is compiled, the optimizer first uses a cost-based search method to build a best-cost plan, then tries to find a matching plan in the SQL plan baseline. If a match is found, the optimizer will proceed using this plan. Otherwise, it evaluates the cost of each accepted plan in the SQL plan baseline and selects the plan with the lowest cost. The best-cost plan found by the optimizer that does not match any plans in the plan history for the SQL statement represents a new plan, and is added as a non-accepted plan to the plan history. The new plan is not used until it is verified to not cause a performance regression. However, if a change in the system (such as a dropped index) causes all accepted plans to become non-reproducible, the optimizer will select the best-cost plan. Thus, the presence of a SQL plan baseline causes the optimizer to use conservative plan selection strategy for the SQL statement.

Clear as mud?

Now my description:

Baselines are the latest evolution in Oracle’s efforts to allow a plan to be locked. The previous efforts were Outlines and SQL Profiles. Both of those efforts were based on the idea that hints could be applied to limit the optimizers choices down to one. That approach seems a little flawed. Why not just save the plan and be done with it? I believe that’s what Baselines are intended to do. Unfortunately, they don’t appear to do it yet. But they do have an advantage over Outlines and SQL Profiles in that they at least save the plan_hash_value, so they know if they are reproducing the correct plan or not. (see my previous post, Do Baselines Use Hints? – Take 2) And hopefully, they will evolve to the point where the actual plan is saved along with the hash value – we’ll see.

So anyway, here’s some basic housekeeping stuff on Baselines:

  • Baselines will be used by default in 11gR1, if they exist. There is a parameter to control whether they are used or not (OPTIMIZER_USE_SQL_PLAN_BASELINE). It is set to TRUE by default.
  • Baselines will not be created by default in 11gR1. So, much like with the older Outlines or SQL Profiles, you must do something to create them.
  • There is a view called DBA_SQL_PLAN_BASELINES that exposes the Baselines that have been created.
  • Just like Outlines and SQL Profiles, Baselines apply to all instances in a RAC environment (they are not localized to a specific instance).

Baselines can be created a couple of different ways. They can be created automatically by the database as it’s running by setting the OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES parameter. They can also be created for statements in a SQL Tuning Set. But I’m not going to talk about either one of those approaches in this post, as I don’t think they really make much sense in the case where you want to use a Baseline to lock a plan for a specific statement. Instead, I am going to discuss creating a baseline for a statement that is in the shared pool via the DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE function.

All the function needs is a SQL_ID and a PLAN_HASH_VALUE. Optionally a parameter can be used to define the baseline as FIXED. If it’s FIXED then it gets priority over any other Baselines for that statement, except other FIXED Baselines. Confused? Well it’s not exactly the most straight forward setup. I’m kind of a simple guy, so at this point I’m thinking one FIXED Baseline is enough.

Before you get too bored, here’s a quick example:

SQL> select avg(pk_col) from kso.little_skew where col1 = 99320;

AVG(PK_COL)
-----------
     628922

SQL> @find_sql
Enter value for sql_text: %little%
Enter value for sql_id: 

SQL_ID         CHILD  PLAN_HASH      EXECS     AVG_ETIME      AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------ ------------------------------------------------------------
4dd2qjxbjav6u      0 2709260180          1           .02           89 select avg(pk_col) from kso.little_skew where col1 = 99320

SQL> @dplan
Enter value for sql_id: 4dd2qjxbjav6u
Enter value for child_no: 0

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  4dd2qjxbjav6u, child number 0
-------------------------------------
select avg(pk_col) from kso.little_skew where col1 = 99320

Plan hash value: 2709260180

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |       |       |     4 (100)|          |
|   1 |  SORT AGGREGATE              |                  |     1 |     9 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| LITTLE_SKEW      |     1 |     9 |     4   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | LITTLE_SKEW_COL1 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("COL1"=99320)

20 rows selected.

SQL> !cat create_baseline.sql
var ret number
exec :ret := dbms_spm.load_plans_from_cursor_cache(-
    sql_id=>'&sql_id', -
    plan_hashvalue=>&plan_hash_value,-
    fixed=>'&fixed');

SQL> @create_baseline
Enter value for sql_id: 4dd2qjxbjav6u
Enter value for plan_hash_value: 2709260180
Enter value for fixed: NO

PL/SQL procedure successfully completed.

SQL> !cat baselines.sql
set lines 155
col sql_text for a50 trunc
col last_executed for a28
col enabled for a7
col plan_hash_value for a16
col last_executed for a16
select spb.sql_handle, spb.plan_name, spb.sql_text,
spb.enabled, spb.accepted, spb.fixed,
to_char(spb.last_executed,'dd-mon-yy HH24:MI') last_executed
from
dba_sql_plan_baselines spb
where spb.sql_text like nvl('%'||'&sql_text'||'%',spb.sql_text)
and spb.sql_handle like nvl('&name',spb.sql_handle)
and spb.plan_name like nvl('&plan_name',spb.plan_name)
/

SQL> @baselines
Enter value for sql_text: %little%
Enter value for name:
Enter value for plan_name: 

SQL_HANDLE                     PLAN_NAME                      SQL_TEXT                                           ENABLED ACC FIX LAST_EXECUTED
------------------------------ ------------------------------ -------------------------------------------------- ------- --- --- ----------------
SYS_SQL_b77256b604589dec       SYS_SQL_PLAN_04589dec2fdfd157  select avg(pk_col) from kso.little_skew where col1 YES     YES NO

SQL> select avg(pk_col) from kso.little_skew where col1 = 99320;

AVG(PK_COL)
-----------
     628922

SQL> select avg(pk_col) from kso.little_skew where col1 = 99320;

AVG(PK_COL)
-----------
     628922

SQL> @find_sql
Enter value for sql_text:
Enter value for sql_id: 4dd2qjxbjav6u

SQL_ID         CHILD  PLAN_HASH      EXECS     AVG_ETIME      AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------ ------------------------------------------------------------
4dd2qjxbjav6u      0 2709260180          1           .02           89 select avg(pk_col) from kso.little_skew where col1 = 99320
4dd2qjxbjav6u      1 2709260180          1           .00            5 select avg(pk_col) from kso.little_skew where col1 = 99320

SQL> @dplan
Enter value for sql_id: 4dd2qjxbjav6u
Enter value for child_no: 1

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  4dd2qjxbjav6u, child number 1
-------------------------------------
select avg(pk_col) from kso.little_skew where col1 = 99320

Plan hash value: 2709260180

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |       |       |     4 (100)|          |
|   1 |  SORT AGGREGATE              |                  |     1 |     9 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| LITTLE_SKEW      |     1 |     9 |     4   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | LITTLE_SKEW_COL1 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("COL1"=99320)

Note
-----
   - SQL plan baseline SYS_SQL_PLAN_04589dec2fdfd157 used for this statement

24 rows selected.

So that’s a quick example of how easy it is to create a Baseline. Once a Baseline is created it behaves much like an Outline or SQL Profile, in that it will be applied to any SQL statement where the normalized text matches. It’s interesting to note that even though Outlines are way outdated in 11g, they still take precedence over Baselines. So if you have an Outline and a Baseline on the same statement, the Outline will be used. Here’s an example of that behavior:
Continue reading ‘Oracle 11g SQL Plan Management – SQL Plan Baselines’ »

Do SQL Plan Baselines Use Hints? – Take 2

Warning – This post is almost pure speculation!

You may have read a previous post where I said I thought Baselines didn’t use hints. We’ll I’ve changed my mind. I no longer think that Baselines store an actual plan for a SQL statement. (See my previous post – Do SQL Plan Baselines Use Hints?) I never could find a place where the plan data was stored (only the plan_hash_value). So I had a closer look at some more 10053 trace files. Here’s a quick run down of what I found. First the set up:

SQL> @flush_pool

System altered.

SQL> select avg(pk_col) from kso.little_skew where col1 = 1;

AVG(PK_COL)
-----------
 310999.357

SQL> @find_sql
Enter value for sql_text: %little%
Enter value for sql_id: 

SQL_ID         CHILD  PLAN_HASH      EXECS     AVG_ETIME      AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------ ------------------------------------------------------------
9df852xw86wxt      0 3498336203          1           .18        3,437 select avg(pk_col) from kso.little_skew where col1 = 1

SQL> @dplan
Enter value for sql_id: 9df852xw86wxt
Enter value for child_no: 0

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  9df852xw86wxt, child number 0
-------------------------------------
select avg(pk_col) from kso.little_skew where col1 = 1

Plan hash value: 3498336203

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |       |       |   907 (100)|          |
|   1 |  SORT AGGREGATE    |             |     1 |     9 |            |          |
|*  2 |   TABLE ACCESS FULL| LITTLE_SKEW |   620K|  5449K|   907   (2)| 00:00:11 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("COL1"=1)


19 rows selected.

SQL> select /*+ index (a little_skew_col1) */ avg(pk_col) from kso.little_skew a where col1 = 1;

AVG(PK_COL)
-----------
 310999.357

SQL> @find_sql
Enter value for sql_text: %little%
Enter value for sql_id: 

SQL_ID         CHILD  PLAN_HASH      EXECS     AVG_ETIME      AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------ ------------------------------------------------------------
9df852xw86wxt      0 3498336203          1           .18        3,437 select avg(pk_col) from kso.little_skew where col1 = 1
fbnx1bws6n4xu      0 2709260180          1           .86        5,278 select /*+ index (a little_skew_col1) */ avg(pk_col) from ks
                                                                      o.little_skew a where col1 = 1


SQL> @dplan
Enter value for sql_id: fbnx1bws6n4xu
Enter value for child_no: 0

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  fbnx1bws6n4xu, child number 0
-------------------------------------
select /*+ index (a little_skew_col1) */ avg(pk_col) from
kso.little_skew a where col1 = 1

Plan hash value: 2709260180

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |       |       | 20835 (100)|          |
|   1 |  SORT AGGREGATE              |                  |     1 |     9 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| LITTLE_SKEW      |   620K|  5449K| 20835   (1)| 00:04:11 |
|*  3 |    INDEX RANGE SCAN          | LITTLE_SKEW_COL1 |   620K|       |  3213   (1)| 00:00:39 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("COL1"=1)


21 rows selected.

SQL> -- need to create baseline for original statement to get SQL_HANDLE
SQL>
SQL> !cat create_baseline.sql
var ret number
exec :ret := dbms_spm.load_plans_from_cursor_cache(-
    sql_id=>'&sql_id', -
    plan_hash_value=>&plan_hash_value,-
    fixed=>'&fixed');




SQL> @create_baseline
Enter value for sql_id: 9df852xw86wxt
Enter value for plan_hash_value: 3498336203
Enter value for fixed: NO

PL/SQL procedure successfully completed.

SQL> -- now add the other plan as a fixed Baseline
SQL>
SQL> !cat create_baseline2.sql
var ret number
exec :ret := dbms_spm.load_plans_from_cursor_cache(-
    sql_id=>'&sql_id', -
    plan_hash_value=>&plan_hash_value,-
    sql_handle=>'&sql_handle',-
    fixed=>'&fixed');


SQL> @create_baseline2
Enter value for sql_id: fbnx1bws6n4xu
Enter value for plan_hash_value: 2709260180
Enter value for sql_handle: SYS_SQL_19fcc3effcce06c0
Enter value for fixed: YES

PL/SQL procedure successfully completed.

SQL> @baselines
Enter value for sql_text: 
Enter value for name: 
Enter value for plan_name: 

SQL_HANDLE                     PLAN_NAME                      SQL_TEXT                                           ENABLED ACC FIX LAST_EXECUTED
------------------------------ ------------------------------ -------------------------------------------------- ------- --- --- ----------------
SYS_SQL_19fcc3effcce06c0       SYS_SQL_PLAN_fcce06c08b1420d2  select avg(pk_col) from kso.little_skew where col1 YES     YES NO
SYS_SQL_19fcc3effcce06c0       SYS_SQL_PLAN_fcce06c02fdfd157  select avg(pk_col) from kso.little_skew where col1 YES     YES YES

SQL> select avg(pk_col) from kso.little_skew where col1 = 1;

AVG(PK_COL)
-----------
 310999.357

SQL> /        

AVG(PK_COL)
-----------
 310999.357

SQL> -- check to see if it's using the fixed Baseline
SQL> @find_sql
Enter value for sql_text: %little%
Enter value for sql_id: 

SQL_ID         CHILD  PLAN_HASH      EXECS     AVG_ETIME      AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------ ------------------------------------------------------------
9df852xw86wxt      0 3498336203          2           .19        3,279 select avg(pk_col) from kso.little_skew where col1 = 1
fbnx1bws6n4xu      0 2709260180          1           .86        5,278 select /*+ index (a little_skew_col1) */ avg(pk_col) from ks
                                                                      o.little_skew a where col1 = 1


SQL> 
SQL> @dplan
Enter value for sql_id: 9df852xw86wxt
Enter value for child_no: 0

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  9df852xw86wxt, child number 0
-------------------------------------
select avg(pk_col) from kso.little_skew where col1 = 1

Plan hash value: 3498336203

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |       |       |   907 (100)|          |
|   1 |  SORT AGGREGATE    |             |     1 |     9 |            |          |
|*  2 |   TABLE ACCESS FULL| LITTLE_SKEW |   620K|  5449K|   907   (2)| 00:00:11 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("COL1"=1)


19 rows selected.

SQL> -- no it's not using the Baseline, why not?

So we created a Fixed Baseline on a statement that used a hint to force an index based access path, attached it to another statement with no hints, and then re-ran the un-hinted statement. It did not pick up the Fixed Baseline. The next step was to have a look at a 10053 trace file. Here are some excerpts from the 10053 trace file.


...

SPM: statement found in SMB

...

******************************************
----- Current SQL Statement for this session (sql_id=9df852xw86wxt) -----
select avg(pk_col) from kso.little_skew where col1 = 1
*******************************************

...

SPM: cost-based plan found in the plan baseline, planId = 2333352146
SPM: fixed planId's of plan baseline are: 803197271
SPM: using qksan to reproduce, cost and select accepted plan, sig = 1872586980585244352
SPM: reproduce plan round 1, using session OFE=11010007
SPM: using qksan to reproduce accepted plan, planId = 803197271
Registered qb: SEL$1 0xf6d872ec (PARSER)
---------------------
QUERY BLOCK SIGNATURE
---------------------
  signature (): qb_name=SEL$1 nbfros=1 flg=0
    fro(0): flg=4 objn=90376 hint_alias="LITTLE_SKEW"@"SEL$1"

DOP: Automatic degree of parallelism is disabled: Parameter.
PM: Considering predicate move-around in query block SEL$1 (#0)
**************************
Predicate Move-Around (PM)
**************************
OPTIMIZER INFORMATION

******************************************
----- Current SQL Statement for this session (sql_id=7039h2744jjrv) -----
/* SQL Analyze(110,0) */ select avg(pk_col) from kso.little_skew where col1 = 1
*******************************************

...

SPM: planId in plan baseline = 803197271, planId of reproduced plan = 2333352146
------- START SPM Plan Dump -------
SPM: failed to reproduce the plan using the following info:
  parse_schema name        : SYS
  plan_baseline signature  : 1872586980585244352
  plan_baseline plan_id    : 803197271
  plan_baseline hintset    :
    hint num  1 len 27 text: IGNORE_OPTIM_EMBEDDED_HINTS
    hint num  2 len 37 text: OPTIMIZER_FEATURES_ENABLE('11.1.0.7')
    hint num  3 len 22 text: DB_VERSION('11.1.0.7')
    hint num  4 len  8 text: ALL_ROWS
    hint num  5 len 22 text: OUTLINE_LEAF(@"SEL$1")
    hint num  6 len 57 text: INDEX_RS_ASC(@"SEL$1" "A"@"SEL$1" ("LITTLE_SKEW"."COL1"))
SPM: generated non-matching plan:
----- Explain Plan Dump -----
----- Plan Table -----

============
Plan Table
============
-----------------------------------------+-----------------------------------+
| Id  | Operation           | Name       | Rows  | Bytes | Cost  | Time      |
-----------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT    |            |       |       |   907 |           |
| 1   |  SORT AGGREGATE     |            |     1 |     9 |       |           |
| 2   |   TABLE ACCESS FULL | LITTLE_SKEW|  605K | 5449K |   907 |  00:00:11 |
-----------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - filter("COL1"=1)

Content of other_xml column
===========================
  db_version     : 11.1.0.7
  parse_schema   : SYS
  plan_hash      : 3498336203
  plan_hash_2    : 2333352146
  Outline Data:
  /*+
    BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.1.0.7')
      DB_VERSION('11.1.0.7')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "LITTLE_SKEW"@"SEL$1")
    END_OUTLINE_DATA
  */

------- END SPM Plan Dump -------


So it appears that the Baseline is found in the SMB (SQL Management Base). It then appears to try to reproduce the plan in a separate session using the hints. It then announces that it came up with a different plan and spits out the plan data that it came up with. So it appears that the plan_hash_value is just used for comparison to see if it came up with the same plan as the one that was in place when the Baseline was generated. If not, then it falls back to the regular (no Baseline) behavior. Not at all what I had hoped for!

It may be that 11gR1 is an intermediate step and that a future version (11gR2 maybe, I hope) will actually store the plan details instead of just the plan hash value. But as I said in the intro, this is complete speculation on my part.

Hidden SQL – why can’t I find my SQL Text?

I recently ran into an interesting issue that drove me crazy for a couple of days! A very frequently executed statement on a database was generating some waits on “SQL*Net more data from client”. ASH shows the statement, as does the V$SESSION view, but the statement never shows up in the V$SQL view nor in the AWR tables. Here’s a brief run down of how the investigation went.

The reason for looking at this in the first place was that the application was timing out for brief periods of time. So a quick look at an AWR report for the time period showed a significant amount of time being spent waiting on the “SQL*Net more data from client” event. This event was taking a much larger slice of the pie than we saw during periods of “normal” activity. Here’s a little excerpt from one of the AWR reports:

Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
DB CPU                                            2,045          57.7
SQL*Net more data from client       165,760         868      5   24.5 Network
db file sequential read              58,083         449      8   12.7 User I/O
log file sync                        33,656          85      3    2.4 Commit
direct path read                     22,026          42      2    1.2 User I/O


...

                                                    % of Waits
                                 -----------------------------------------------
                           Total
Event                      Waits  <1ms  <2ms  <4ms  <8ms <16ms <32ms  <=1s   >1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
SQL*Net break/reset to cli  833   95.1    .6    .2          .1   3.6    .4
SQL*Net message to client  7537K 100.0    .0
SQL*Net more data from cli  165K  87.5    .4    .1    .0    .0   7.1   4.8
SQL*Net more data to clien  410K 100.0    .0

Obviously this is not good. This is on an 11g system so the AWR report provides a nice little histogram section on the wait events. The histogram indicates that most of these events are under 1ms, but a significant percentage are in the 16ms – 1sec range. So the question was, what could be causing those waits.

No problem, just check the Active Session History, find the sql_id for the statement(s) waiting on this event, and then look in V$SQL for the text. But not so fast bucko!

SQL> !cat find_sql_waiting_on_event.sql
select event, sql_id, count(*),
avg(time_waited) avg_time_waited
from v$active_session_history
where event like nvl('&event','%more data from%')
group by event, sql_id
order by event, 3
/

SQL> @find_sql_waiting_on_event
Enter value for event: 

EVENT                          SQL_ID          COUNT(*)
------------------------------ ------------- ----------
SQL*Net more data from client  dkc2kbj7kdsu7         21
SQL*Net more data from client                        62
SQL*Net more data from client  40du2847gh6sq        156
SQL*Net more data from client  7vtb7h3kpfat0       2965


SQL> -- Ah there's the culpret, 7vtb7h3kpfat0
SQL>
SQL> @find_sql
Enter value for sql_text:
Enter value for sql_id: 7vtb7h3kpfat0
Enter value for plan_hash_value:

no rows selected

SQL> -- hmm, must have aged out of the shared pool (seems unlikely since ASH has it so many times)
SQL> -- but maybe it's in the AWR tables
SQL>
SQL> @find_sql_awr
Enter value for sql_text:
Enter value for sql_id: 7vtb7h3kpfat0

no rows selected

SQL> -- nope, not there either

So I tried a couple of things to see if I could prove to myself that it was never in V$SQL (and thus never in the AWR tables). First I tried coloring the statement with the dbms_workload_repository.add_colored_sql procedure. (See Dion Cho’s post on this feature for more info). No luck! Then I tried using Tanel Poder’s sample.sql script (more info on that here) to run a bunch of queries against the V$SQLAREA view to see if the statement ever showed up. It did not! I then traced (10046) a bunch of the sessions that were executing the statement, hoping that the trace files would have the statement text. They did not, nor was the SQL_ID even contained in any of the trace files!

Turns out it is never in the shared pool and it is never in the AWR tables. But it is captured by ASH and it shows up in the SQL_ID column in V$SESSION. Here’s a little more output from the system:

SQL> select username, sid, serial#, status, sql_id, event, seconds_in_wait
  2  from v$session
  3  where username like nvl('&username',username)
  4  and sid like nvl('&sid',sid)
  5  and sql_id='7vtb7h3kpfat0'
  6* order by username, sid, serial#;

Enter value for username:
Enter value for sid:
 
USERNAME                     SID    SERIAL# STATUS   SQL_ID        EVENT                          SECONDS_IN_WAIT
------------------------- ------ ---------- -------- ------------- ------------------------------ ---------------
XX_OWNER                    4661       4929 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                823
XX_OWNER                    4676       3835 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                  0
XX_OWNER                    4682       9170 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                 16
XX_OWNER                    4699      11522 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                537
XX_OWNER                    4712       9592 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                 93
XX_OWNER                    4717       6659 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                814
XX_OWNER                    4727      25467 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                155
XX_OWNER                    4739       8796 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                 17
XX_OWNER                    4755       7327 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                  9
XX_OWNER                    4800       4002 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                 11
XX_OWNER                    4826      15699 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                153
XX_OWNER                    4838       6283 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                 68
XX_OWNER                    4849      11468 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client               1052
XX_OWNER                    5089       8610 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                  1
XX_OWNER                    5091       7709 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                 31
XX_OWNER                    5168       4728 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                585
XX_OWNER                    5185       8298 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                157
XX_OWNER                    5186       6969 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client               1197
XX_OWNER                    5210      10830 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                  0
XX_OWNER                    5232       6509 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                  1
XX_OWNER                    5251      12231 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                155
XX_OWNER                    5400       5217 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                697
XX_OWNER                    5423      12903 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                  5
XX_OWNER                    5434      14154 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                191
XX_OWNER                    5441      33222 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                  3
XX_OWNER                    5442       5669 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client                217

26 rows selected.

SQL> -- so there are a lot of sessions that execute our statement and then wait
SQL> -- (sometimes for a long time) before the user requests another action
SQL>
SQL> -- Using Tanel's sample.sql script - does about 1000 queries against V$SQLAREA in a sec or so
SQL>
SQL> @sample sql_text v$sqlarea "sql_id = '7vtb7h3kpfat0'" 1000

no rows selected

SQL> -- doesn't look like we'll ever catch it in the shared pool

Grasping at straws, I started looking for any views that had a sql_id and/or sql_text field, and I found one that I hadn’t thought to look in – V$OPEN_CURSOR. Ah, that’s probably a good place to look since surely a cursor must be created for this statement. Sure enough, this turned out to provide the necessary clue.

SQL> select distinct * from v$open_cursor
  2  where rownum < 10
  3* and sql_id like '%fat%';

SADDR               SID USER_NAME                      ADDRESS          HASH_VALUE SQL_ID        SQL_TEXT                                  LAST_SQL_
---------------- ------ ------------------------------ ---------------- ---------- ------------- ----------------------------------------- ---------
SQL_EXEC_ID
-----------
0000000210ED3980   4750 XX_OWNER                       000000021AFEB748 3847695136 7vtb7h3kpfat0 table_4_9_fbd5_0_0_0

00000002111EC2E8   5043 SIGNON                         0000000219E02E80 1527693606 6pfatm1dhxh96 SELECT  u.app_user_id, u.username, u.pass
                                                                                                 word, u.first_name,

000000021163BDA8   5451 XX_OWNER                       000000021AFEB748 3847695136 7vtb7h3kpfat0 table_4_9_fbd5_0_0_0

00000002114067E0   5242 XX_OWNER                       000000021AFEB748 3847695136 7vtb7h3kpfat0 table_4_9_fbd5_0_0_0

00000002114372F0   5260 SIGNON                         0000000219E02E80 1527693606 6pfatm1dhxh96 SELECT  u.app_user_id, u.username, u.pass
                                                                                                 word, u.first_name,

0000000210F06FD8   4769 XX_OWNER                       000000021AFEB748 3847695136 7vtb7h3kpfat0 table_4_9_fbd5_0_0_0

0000000211439E38   5261 SIGNON                         0000000219E02E80 1527693606 6pfatm1dhxh96 SELECT  u.app_user_id, u.username, u.pass
                                                                                                 word, u.first_name,

00000002111E97A0   5042 SIGNON                         0000000219E02E80 1527693606 6pfatm1dhxh96 SELECT  u.app_user_id, u.username, u.pass
                                                                                                 word, u.first_name,

000000021161E190   5440 XX_OWNER                       000000021AFEB748 3847695136 7vtb7h3kpfat0 table_4_9_fbd5_0_0_0


9 rows selected.

So the SQL_ID was present in the V$OPEN_CURSOR view and there was a value in the SQL_TEXT field. Unfortunately, it didn't provide the text of the original statement, but at least I now had something to work with. A quick google for table_4_9_fbd5_0_0_0 (which is what was in the SQL_TEXT field) turned up nothing, of course. But a little digging turned up this post on Ask Tom. You'll have to read down a ways in that post to find that the real culpret was LOBs. Also, see this post on the Oracle-l list (which is what I originally found that pointed me to the Ask Tom article).

So it appears that the issue is that LOBs are being accessed, which causes Oracle to generate a new SQL_ID for the LOB access part of the execution. Apparently, these types of statements do not get exposed via the V$SQL and V$SQLAREA views and thus do not get captured by AWR. There are other types of statements that exhibit this behavior as well, by the way. The Tom Kyte post talks about nested tables for example. So back to the question, what statement generated the SQL_ID with the missing sql_text in the first place? We'll the first clue came from the Tom Kyte post. Apparently the middle bit of the text in the V$OPEN_CURSOR view is a hex representation of the object_id of the object that is being accessed (in our case, "fbd5"). So if we convert it to decimal we can find the object that is contains the LOB, like so:

SQL> select owner, object_name, object_type
  2  from dba_objects
  3  where object_id = (select to_number('&hex_value','XXXXXX') from dual);
Enter value for hex_value: fbd5

OWNER                          OBJECT_NAME                    OBJECT_TYPE
------------------------------ ------------------------------ -------------------
XX_OWNER                       XXX_MY_TABLE                   TABLE

So now I know that I'm looking for a statement that is accessing XXX_MY_TABLE (note: names have been changed to protect the innocent). But how to find the statement, that's the question that was never answered in the Tom Kyte post. It was at about this point, while I was filling my coffee cup, that I had one of those ah-ha moments (usually I have them when I'm doing something else - not even actively thinking about the problem). Anyway, I got to thinking about the PREV_SQL_ID in the V$SESSION view. If a statement that accesses a LOB, immediately generates a new SQL_ID for the LOB access part of the execution, perhaps the original statement's SQL_ID would be in the PREV_SQL_ID field. And that's exactly what I found.


SQL> select username, sid, serial#, status, sql_id, prev_sql_id, event, seconds_in_wait
  2  from v$session
  3  where username like nvl('&username',username)
  4  and sid like nvl('&sid',sid)
  5  and sql_id='7vtb7h3kpfat0'
  6* order by username, sid, serial#;

Enter value for username:
Enter value for sid:


USERNAME                     SID    SERIAL# STATUS   SQL_ID        PREV_SQL_ID   EVENT                          SECONDS_IN_WAIT
------------------------- ------ ---------- -------- ------------- ------------- ------------------------------ ---------------
XX_OWNER                    4665       8707 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client               1235
XX_OWNER                    4745       9383 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client                  1
XX_OWNER                    4792      23304 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client               1235
XX_OWNER                    4816      11098 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client                  0
XX_OWNER                    4844       3050 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client                773
XX_OWNER                    4907       8184 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client                232
XX_OWNER                    4959      35967 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client                633
XX_OWNER                    5154      33701 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client                 25
XX_OWNER                    5230      12792 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client                510
XX_OWNER                    5248      49715 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client                456
XX_OWNER                    5249      32300 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client                147
XX_OWNER                    5282      21331 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client                182
XX_OWNER                    5285        386 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client                185
XX_OWNER                    5295        792 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client                 40
XX_OWNER                    5416      49780 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client                  7
XX_OWNER                    5419      22645 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client                890
XX_OWNER                    5442       6200 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client               1326

17 rows selected.

SQL> @find_sql_long
Enter value for sql_id: 78jdubkb65p2w

SQL_TEXT
----------------------------------------------------------------------
SELECT  XYZ_COLUMN, BLAH ... FROM XXX_MY_TABLE ...


SQL> @find_sql
Enter value for sql_text:
Enter value for sql_id: 78jdubkb65p2w

SQL_ID         CHILD  PLAN_HASH      EXECS     AVG_ETIME      AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------ -----------------------------------------
78jdubkb65p2w      0  147781538  9,534,632         .0006           11 SELECT  XYZ_COLUMN, BLAH
78jdubkb65p2w      1  147781538        191         .0005           11

One last comment (and then I'll shut up):

I was surprised to find that the statement was short - only a few lines long. I was expecting a really long string that would not fit in a single SQL*Net packet. After all, we are waiting on the "SQL*Net more data FROM client" event, right? Which is supposed to be the time waited for an additional packet from the client when Oracle knows that more is coming. Clearly this was not the case. Rather it is waiting because Oracle is sending data back to the client that takes more than one packet (i.e. the LOB). You would expect that time would be logged under the "SQL*Net more data TO client" event in this case. However, it appears that time is logged to that event only until the packet is turned over to the network. It appears that Oracle then starts logging time to the "SQL*Net more data FROM client" event while it waits for the next response from the client. Therefore, the actual network transfer time is clocked to the wrong event. You can understand why the developers made this choice, because Oracle has no way of knowing how long it actually takes the packet to get to the client. It only knows how long the round trip takes. So maybe they should have named the event slightly differently, like "SQL*Net more data from client including the entire round trip network time" or something. Anyway, Tanel Poder has a post that briefly mentions this quirk of the SQL*Net events.

Note: this is somewhat speculative on my part as I have not tried to prove exactly where the "TO" event stops and the "FROM" event begins. But I am pretty sure Tanel is correct on this point.

So there you have it. Just a little story about wandering through the weeds on the way to enlightenment.

=============================

Note: This part was added after Dion Cho commented that he wasn't able to re-produce the behavior, so I gave it shot (I got close). Here's the output:

> sqlplus / as sysdba

SQL*Plus: Release 11.1.0.7.0 - Production on Mon Apr 20 14:59:25 2009

Copyright (c) 1982, 2008, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> -- session (SID 124) connected as KSO issues "select xml_clob from test_lob"

...


SQL> -- what's going on?
SQL> @sessions
Enter value for username: 
Enter value for sid: 

USERNAME                   SID    SERIAL# STATUS   SQL_ID        EVENT                          SECONDS_IN_WAIT
------------------------- ---- ---------- -------- ------------- ------------------------------ ---------------
KSO                        124       2402 ACTIVE   cu98u9ukjqj19 SQL*Net message from client                  0
SYS                        128       3580 ACTIVE   bsbmyw5thg8qf SQL*Net message to client                    0

SQL> -- KSO is running cu98u9ukjqj19, what's that doing?
SQL> @find_sql
Enter value for sql_text: 
Enter value for sql_id: cu98u9ukjqj19

no rows selected

SQL> -- so that's probably the LOB access, let's check 
SQL> select * from v$open_cursor where sql_id like 'cu98u9ukjqj19';

SADDR     SID USER_NAME               ADDRESS  HASH_VALUE SQL_ID        SQL_TEXT                              LAST_SQL_ SQL_EXEC_ID
-------- ---- ----------------------- -------- ---------- ------------- ------------------------------------- --------- -----------
DB1A69C8  124 KSO                     E0BD4C0C 2770027561 cu98u9ukjqj19 table_4_9_19a1d_0_0_0



SQL> -- yep, LOB access, let' see if we can find the original statement
SQL> @sessionsp
Enter value for username: 
Enter value for sid: 

USERNAME                   SID    SERIAL# STATUS   SQL_ID        PREV_SQL_ID   EVENT                          SECONDS_IN_WAIT
------------------------- ---- ---------- -------- ------------- ------------- ------------------------------ ---------------
KSO                        124       2402 ACTIVE   cu98u9ukjqj19 9babjv8yq8ru3 direct path read                             0
SYS                        128       3580 ACTIVE   grdgy5dcmqd0f 9babjv8yq8ru3 SQL*Net message to client                    0

SQL> -- says prev statement is 9babjv8yq8ru3
SQL> @find_sql
Enter value for sql_text: 
Enter value for sql_id: 9babjv8yq8ru3

SQL_ID         CHILD  PLAN_HASH      EXECS     AVG_ETIME      AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------ ------------------------------------------------------------
9babjv8yq8ru3      1          0         39           .01           54 BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END;

SQL> -- oops, SQL*Plusism - need to turn off serveroutput

/*
-- go to other session here and set serveroutput off and re-run select statement
*/

SQL> -- wait a little while until the statement finishes 
SQL> @sessionsp
Enter value for username: 
Enter value for sid: 

USERNAME                   SID    SERIAL# STATUS   SQL_ID        PREV_SQL_ID   EVENT                          SECONDS_IN_WAIT
------------------------- ---- ---------- -------- ------------- ------------- ------------------------------ ---------------
KSO                        124       2402 INACTIVE               djqbj30qjyngw SQL*Net message from client                 74
SYS                        128       3580 ACTIVE   grdgy5dcmqd0f 9babjv8yq8ru3 SQL*Net message to client                    0

SQL> -- note that the SQL_ID field has been wiped after the statement completes - need to look a little closer at this behavior
SQL> -- the java application I was looking at in production did not cause this behavior - i.e. the SQL_ID field would not be cleared
SQL> -- so finding the SQL_ID for the LOB access and the PREV_SQL_ID for the original statement didn't happen in my test case 
SQL> -- although the PREV_SQL_ID field does contain the value for the original statement
 
SQL> @find_sql
Enter value for sql_text: 
Enter value for sql_id: djqbj30qjyngw

SQL_ID         CHILD  PLAN_HASH      EXECS     AVG_ETIME      AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------ ------------------------------------------------------------
djqbj30qjyngw      0  825212230          2          2.61       53,065 select xml_clob from test_lob