I’ll Gladly Pay You Tuesday for a Hamburger Today

Those of you who are old enough to remember disco dancing will probably remember this guy. Wimpy (J. Wellington Wimpy actually) was a lazy, glutinous character in the old Popeye cartoons.

I was reminded of Wimpy last week by a performance problem that occurred on an 11g database I was looking at (I’ll tell you why in a minute). So here’s what happened. There were several statements that were running much more slowly than usual (10′s of seconds instead of less than a second). So I started poking around and here’s what I saw:

(note that I cleaned this up a bit to protect the innocent)

(also note that many of the scripts I use have been documented in other posts – so if I haven’t shown the SQL for a particular script, try searching for it on this site)

SQL> @find_sql_acs
Enter value for sql_text:
Enter value for sql_id: 3dhvsp0duttat
Enter value for is_bind_aware:
 
SQL_ID         CHILD PLAN_HASH_VALUE IOB IBS IBA ISH      EXECS ROWS_PROCESSED  AVG_ETIME    AVG_CPU    AVG_PIO    AVG_LIO SQL_TEXT
------------- ------ --------------- --- --- --- --- ---------- -------------- ---------- ---------- ---------- ---------- ------------------
3dhvsp0duttat      0      3289235442 N   N   N   Y            9             46       0.02       0.00       1.11        272 SELECT row_order, 
3dhvsp0duttat      1      3289235442 N   N   N   Y           13             48       0.08       0.01      12.62        546 SELECT row_order, 
3dhvsp0duttat      2      4076066623 N   N   N   Y           15             31      38.07       6.55  27,176.53  2,426,778 SELECT row_order, 
 
 
SQL> @dplan
Enter value for sql_id: 3dhvsp0duttat
Enter value for child_no: 0
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  3dhvsp0duttat, child number 0
-------------------------------------
SELECT row_order, ...
 
Plan hash value: 3289235442
 
--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                        |       |       |     5 (100)|          |       |       |
|*  1 |  VIEW                                 |                        |     1 |   586 |     5  (20)| 00:00:01 |       |       |
|*  2 |   WINDOW SORT PUSHED RANK             |                        |     1 |   258 |     5  (20)| 00:00:01 |       |       |
|*  3 |    FILTER                             |                        |       |       |            |          |       |       |
|   4 |     PARTITION RANGE ITERATOR          |                        |     1 |   258 |     4   (0)| 00:00:01 |   KEY |   KEY |
|*  5 |      TABLE ACCESS BY LOCAL INDEX ROWID| REALLY_BIG_TABLE       |     1 |   258 |     4   (0)| 00:00:01 |   KEY |   KEY |
|*  6 |       INDEX RANGE SCAN                | GOOD_INDEX             |     1 |       |     3   (0)| 00:00:01 |   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(...)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "DATE_COL")<=:25)
   3 - filter((TO_DATE(:1,:2)<=TO_DATE(:3,:4) AND :24<=:25))
   5 - filter(...)
   6 - access("COL1" LIKE :21 AND "DATE_COL">=TO_DATE(:1,:2) AND "DATE_COL"<=TO_DATE(:3,:4))
 
SQL> @dplan
Enter value for sql_id: 3dhvsp0duttat
Enter value for child_no: 2
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  3dhvsp0duttat, child number 2
-------------------------------------
SELECT row_order, ...
 
Plan hash value: 4076066623
 
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                  | Rows  | Bytes | Cost (%CPU)| Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                       |       |       |     1 (100)|       |       |
|*  1 |  VIEW                                 |                       |     1 |   586 |     0   (0)|       |       |
|*  2 |   WINDOW NOSORT STOPKEY               |                       |     1 |   574 |     0   (0)|       |       |
|*  3 |    FILTER                             |                       |       |       |            |       |       |
|   4 |     PARTITION RANGE ITERATOR          |                       |     1 |   574 |     0   (0)|   KEY |   KEY |
|*  5 |      TABLE ACCESS BY LOCAL INDEX ROWID| REALLY_BIG_TABLE      |     1 |   574 |     0   (0)|   KEY |   KEY |
|*  6 |       INDEX RANGE SCAN                | BAD_INDEX             |     1 |       |     0   (0)|   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(("ROW_ORDER">=:24 AND "ROW_ORDER"<=:25))
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "DATE_COL")<=:25)
   3 - filter((TO_DATE(:1,:2)<=TO_DATE(:3,:4) AND :24<=:25))
   5 - filter(...)
   6 - access("DATE_COL">=TO_DATE(:1,:2) AND "DATE_COL"<=TO_DATE(:3,:4))
 
SQL> -- Uh Oh, - cost of 1, no records expected 
SQL>
SQL> @child_load_time
Enter value for sql_text:
Enter value for sql_id: 3dhvsp0duttat
 
SQL_ID         CHILD  PLAN_HASH LAST_LOAD_TIME       EXECS     AVG_ETIME    AVG_PIO      AVG_LIO SQL_TEXT
------------- ------ ---------- --------------- ---------- ------------- ---------- ------------ -----------------------------------------
3dhvsp0duttat      0 3289235442 2009-05-01/12:04:20      9          0.02          1          272 SELECT row_order, ... 
3dhvsp0duttat      1 3289235442 2009-05-05/19:43:14     13          0.08         13          546
3dhvsp0duttat      2 4076066623 2009-05-07/16:49:36     15         38.07     27,177    2,426,778
 
 
SQL> -- so it looks like maybe something bad happened with stats on 5/7
SQL> -- let's make use of the diif_table_stats procedure to see if we can spot what happened
 
SQL> !date
Tue May 12 11:59:22 CDT 2009
 
SQL> !cat diff_table_stats.sql
select report, maxdiffpct from
table(dbms_stats.diff_table_stats_in_history('&owner','&table_name',
systimestamp-&days_ago))
/
 
SQL> @diff_table_stats.sql
Enter value for owner: XXX
Enter value for table_name: REALLY_BIG_TABLE
Enter value for days_ago: 5
 
REPORT                                                                           MAXDIFFPCT
-------------------------------------------------------------------------------- ----------
###############################################################################
 
STATISTICS DIFFERENCE REPORT FOR:
.................................
 
TABLE         : REALLY_BIG_TABLE
OWNER         : XXX
SOURCE A      : Statistics as of 07-MAY-09 11.46.36.000000 AM -05:00
SOURCE B      : Current Statistics in dictionary
PCTTHRESHOLD  : 10
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
TABLE / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................
 
OBJECTNAME                  TYP SRC ROWS       BLOCKS     ROWLEN     SAMPSIZE
...............................................................................
 
P2009W19                    P   A   0          0          0          0
                                B   4001801    187810     342        4001801
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
COLUMN STATISTICS DIFFERENCE:
.............................
 
COLUMN_NAME     SRC NDV     DENSITY    HIST NULLS   LEN  MIN   MAX   SAMPSIZ
...............................................................................
 
 
                               PARTITION: P2009W19
                               ...................
 
COL1            A   0       0          NO   0       0      0
                B   105136  .000009511 NO   2565391 4    302E3 6E303 1436410
COL2            A   0       0          NO   0       0      0
                B   669696  .000001493 NO   1011264 9    2A2A2 554E4 2990537
COL3            A   0       0          NO   0       0      0
                B   988     .001012145 NO   1113316 4    30303 39393 2888485
COL4            A   0       0          NO   0       0      0
                B   669696  .000037461 YES  0       11   2A2A2 554E4 5540
...
 
SQL> -- Sure enough, stats were created when the partition was empty, ouch!

So a couple of things to note here:

1. the table is partitioned
2. dbms_stats.diff_table_stats_in_history is smart enough to show us partition stats (nice)
3. on May 7th the stats changed dramatically
4. this statement was not re-parsed due to the stats changing

So what’s the fix? Well since stats had been generated on this partition every night for the last 5 days, the fix was just to get the optimizer to re-parse my query (and any other affected statements). In 11g this is very easy as there is a built function (psuedo documented) to flush a single SQL statement from the shared pool (see my previous post – Flush A Single SQL Statement – for details). However, I wasn’t sure how many statements were affected (more on that in a minute). But certainly any statement that touched that partitioned table which got parsed on May 7th was almost certain to have a bad plan. So eventually we decided to do the brute force approach and just flush the entire shared pool. (I know, not elegant, but fairly effective) Note that there is no guarantee that flushing the shared pool will clear all the statements. In fact, on a busy system you’re almost guaranteed that it won’t flush every statement. But it did clear up most of the issues. So with the pressure now off (and the system back to running pretty well) I had time to write a little script to look for SQL statements with a cost of 1. (I guess I could have looked for statements which were parsed on May 7th that accessed my partitioned table, but I didn’t.)

Here’s a bit of the SQL*Plus Session:

SQL>-- initially we flushed a couple directly
SQL>
SQL> @flush_sql
Enter value for sql_id: 3dhvsp0duttat
 
PL/SQL procedure successfully completed.
 
SQL> -- so we can find all of these guys and flush them - or just flush the whole shared pool
SQL> -- flushing the shared pool is easier!
SQL>
SQL> alter system flush shared_pool;
 
System altered.
 
SQL> -- but there are still no guarantees they are all cleaned up, let's check
SQL>
SQL> !cat find_sql_by_cost.sql
select distinct parsing_schema_name, sa.sql_id, sp.plan_hash_value, cost
from v$sql_plan sp, v$sqlarea sa
where sp.sql_id = sa.sql_id
and parsing_schema_name like nvl('&parsing_user',parsing_schema_name)
and cost like nvl(to_number('&cost'),to_number(cost))
and sp.plan_hash_value like nvl('&plan_hash_value',sp.plan_hash_value)
and parent_id is null
order by 4 desc
/
 
SQL> @find_sql_by_cost
Enter value for parsing_user:
Enter value for cost: 1
Enter value for plan_hash_value: 407%
 
PARSING_SCHEMA_NAME            SQL_ID        PLAN_HASH_VALUE       COST
------------------------------ ------------- --------------- ----------
XXXX_PROD                      02t2tt28xp72m      4076066623          1
XXXX_PROD                      5byphafyx78j6      4076066623          1
 
SQL> -- note I did a little checking here to make sure they were suffering from the same issue
SQL> -- then I cleaned them up
SQL> 
SQL>
SQL> @flush_sql
Enter value for sql_id: 02t2tt28xp72m
 
PL/SQL procedure successfully completed.
 
SQL> @find_sql
Enter value for sql_text:
Enter value for sql_id: 02t2tt28xp72m
 
no rows selected
 
SQL> @flush_sql
Enter value for sql_id: 5byphafyx78j6
 
PL/SQL procedure successfully completed.
 
SQL> @find_sql
Enter value for sql_text:
Enter value for sql_id: 5byphafyx78j6
 
no rows selected

So the short term fix worked. The statements picked up the correct plans after re-parsing. Now let’s talk about the long term fix. Why did the partition not have stats on it? Turns out the partitioning scheme in place was range by date with a new partition for each week. Each partition goes from Thursday through the following Wednesday. Several partitions had been created in advance but as you may have guessed by now, nothing was done about the stats. Meaning that the partitions had values of 0 for the number of rows. Since the gather stats job was kicking in every evening, a whole days worth of data could be loaded into the partition before the stats would be updated. So that means that 24 hours worth of activity would occur with “bad” stats every week. Couple that with the default of NO_INVALIDATE=TRUE on the stats gathering and you get statements that may hang around for extended periods of time, even if the stats change dramatically (0 rows to a few million is dramatic, right?). So we manually set the stats on the pre-created partitions using a combination of dbms_stats.copy_table_stats and dbms_stats.set_table_stats and dbms_stats.set_column_stats (see Dion Cho’s excellent post for details and example code – and also this post on the Optimizermagic Blog for further info). I should also mention that Greg Rahn has several good posts on 11g Statistics Enhancements here, here, and here.

So why was I reminded of Wimpy? Because Wimpy’s favorite saying was “I’ll gladly pay you Tuesday for a hamburger today”. And the way this system was set up, every Tuesday (er, Thursday), we were paying the price for our bad/missing statistics. OK, so it’s a stretch, but I like the pictures of Wimpy!

Please feel free to leave me a comment letting me know how you feel …

… about the pictures of Wimpy.

8 Comments

  1. Asif Momen says:

    Good one.

    We had exactly the same issue on one of our Data Warehouse databases. A query which would normally take few seconds took several minutes. The fix was to copy stats from the previous partition.

  2. Narendra says:

    “4. this statement was not re-parsed due to the stats changing”

    Kerry,

    Interesting. Is this a “new feature” or has been there in earlier versions as well? I was under impression (and have observed as well) that the queries are re-parsed (or hard parsed) when the stats on underlying tables change.

  3. osborne says:

    Narendra,

    NO_INVALIDATE has been around for a while as part of the DBMS_STATS package. If I remember correctly, the default value was FALSE in 9i and changed to AUTO_INVALIDATE in 10g. This basically allows cursors to not be invalidated when stats are generated. There is a related hidden parameter (_optimizer_invalidation_period) which defaults to 1800 seconds (5 hours) that tells Oracle how long to allow the affected cursors to remain viable. But it is unclear to me exactly how that works, as it appears in my 11g example that cursors stuck around for a much longer period of time than the 5 hours. There is a pretty good thread on the oracle-l list about this issue here:

    http://www.freelists.org/post/oracle-l/dbms-statsauto-invalidate

    And another good one on OTN here:

    http://forums.oracle.com/forums/thread.jspa?threadID=592771

    Also, Jonathan Lewis had a post about it here:

    http://jonathanlewis.wordpress.com/2008/02/13/statistics-problem/

    Kerry

  4. Tom says:

    Kerry,

    Those are pretty cool scripts. Would you mind sharing them? :)

    Thanks!

  5. osborne says:

    Tom,

    Most of the scripts are already on the site in other posts but here is a list of the ones that I used in this post, linked to the script itself.

    find_sql_acs.sql
    dplan.sql
    child_load_time.sql
    diff_table_stats.sql
    flush_sql.sql
    find_sql_by_cost.sql
    find_sql.sql

    Note that I change these little scripts occasionally, so the output may not exactly match the session above, but it should be very close.

    Kerry

  6. [...] “I’ll Gladly Pay You Tuesday for a Hamburger Today,” quotes Kerry Osborne, who was in a Wimpy frame of mind at the time. The analogy to J. Wellington Wimpy’s burger-mooching , Kerry writes, applies to, “ . . . paying the price for our bad/missing statistics.” [...]

  7. Larry says:

    I know virtually nothing about Oracle, programming, or statistics, but being old and coming out of the humanities, I do know something about disco dancing, hamburgers, and Wimpy. Much earlier than 1970s disco, Wimpy’s “famous line…was first introduced to the cartoons in 1934′s We Aim to Please.” See “J. Wellington Wimpy” in Wikipedia, which has a screen shot that may be even more symbolic of “paying the price for our bad/missing statistics.”

  8. Klepjax says:

    I like this analogy re Wimpy’s favorite saying, I was rolling on the floor for about 5 minutes until Wendy our PL/SQL developer nearly fell over my spazzed body when she opened the door to the server room. Many thanks/good info

Leave a Reply