Archive for May 2009

Building Slow Development Systems (On Purpose)

I had an interesting discussion last week with an Oracle DBA that I have a lot of respect for. He surprised me by telling me that he thought development systems should intentionally be built to be slow. That way, the developers would be forced to write efficient code. He was not just stating a philosophical position either. He had actually built some databases with 32M buffer caches (the one I mentioned in this previous post on Buffer Cache Extremes was one of his). He went so far as to say that he thought that developers should be using 286’s for their development systems.

I have argued for several years that development systems should match production as closely as possible. But that conversation really got me thinking. Maybe intentionally slowing down development systems is not such a bad idea…

So what do you think?

I’ll Gladly Pay You Tuesday for a Hamburger Today

Those of you who are old enough to remember disco dancing will probably remember this guy. Wimpy (J. Wellington Wimpy actually) was a lazy, glutinous character in the old Popeye cartoons.

I was reminded of Wimpy last week by a performance problem that occurred on an 11g database I was looking at (I’ll tell you why in a minute). So here’s what happened. There were several statements that were running much more slowly than usual (10’s of seconds instead of less than a second). So I started poking around and here’s what I saw:

(note that I cleaned this up a bit to protect the innocent)

(also note that many of the scripts I use have been documented in other posts – so if I haven’t shown the SQL for a particular script, try searching for it on this site)

SQL> @find_sql_acs
Enter value for sql_text:
Enter value for sql_id: 3dhvsp0duttat
Enter value for is_bind_aware:

SQL_ID         CHILD PLAN_HASH_VALUE IOB IBS IBA ISH      EXECS ROWS_PROCESSED  AVG_ETIME    AVG_CPU    AVG_PIO    AVG_LIO SQL_TEXT
------------- ------ --------------- --- --- --- --- ---------- -------------- ---------- ---------- ---------- ---------- ------------------
3dhvsp0duttat      0      3289235442 N   N   N   Y            9             46       0.02       0.00       1.11        272 SELECT row_order, 
3dhvsp0duttat      1      3289235442 N   N   N   Y           13             48       0.08       0.01      12.62        546 SELECT row_order, 
3dhvsp0duttat      2      4076066623 N   N   N   Y           15             31      38.07       6.55  27,176.53  2,426,778 SELECT row_order, 


SQL> @dplan
Enter value for sql_id: 3dhvsp0duttat
Enter value for child_no: 0

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  3dhvsp0duttat, child number 0
-------------------------------------
SELECT row_order, ...

Plan hash value: 3289235442

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                        |       |       |     5 (100)|          |       |       |
|*  1 |  VIEW                                 |                        |     1 |   586 |     5  (20)| 00:00:01 |       |       |
|*  2 |   WINDOW SORT PUSHED RANK             |                        |     1 |   258 |     5  (20)| 00:00:01 |       |       |
|*  3 |    FILTER                             |                        |       |       |            |          |       |       |
|   4 |     PARTITION RANGE ITERATOR          |                        |     1 |   258 |     4   (0)| 00:00:01 |   KEY |   KEY |
|*  5 |      TABLE ACCESS BY LOCAL INDEX ROWID| REALLY_BIG_TABLE       |     1 |   258 |     4   (0)| 00:00:01 |   KEY |   KEY |
|*  6 |       INDEX RANGE SCAN                | GOOD_INDEX             |     1 |       |     3   (0)| 00:00:01 |   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(...)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "DATE_COL")<=:25)
   3 - filter((TO_DATE(:1,:2)<=TO_DATE(:3,:4) AND :24<=:25))
   5 - filter(...)
   6 - access("COL1" LIKE :21 AND "DATE_COL">=TO_DATE(:1,:2) AND "DATE_COL"<=TO_DATE(:3,:4))

SQL> @dplan
Enter value for sql_id: 3dhvsp0duttat
Enter value for child_no: 2

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  3dhvsp0duttat, child number 2
-------------------------------------
SELECT row_order, ...

Plan hash value: 4076066623

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                  | Rows  | Bytes | Cost (%CPU)| Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                       |       |       |     1 (100)|       |       |
|*  1 |  VIEW                                 |                       |     1 |   586 |     0   (0)|       |       |
|*  2 |   WINDOW NOSORT STOPKEY               |                       |     1 |   574 |     0   (0)|       |       |
|*  3 |    FILTER                             |                       |       |       |            |       |       |
|   4 |     PARTITION RANGE ITERATOR          |                       |     1 |   574 |     0   (0)|   KEY |   KEY |
|*  5 |      TABLE ACCESS BY LOCAL INDEX ROWID| REALLY_BIG_TABLE      |     1 |   574 |     0   (0)|   KEY |   KEY |
|*  6 |       INDEX RANGE SCAN                | BAD_INDEX             |     1 |       |     0   (0)|   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("ROW_ORDER">=:24 AND "ROW_ORDER"<=:25))
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "DATE_COL")<=:25)
   3 - filter((TO_DATE(:1,:2)<=TO_DATE(:3,:4) AND :24<=:25))
   5 - filter(...)
   6 - access("DATE_COL">=TO_DATE(:1,:2) AND "DATE_COL"<=TO_DATE(:3,:4))

SQL> -- Uh Oh, - cost of 1, no records expected 
SQL>
SQL> @child_load_time
Enter value for sql_text:
Enter value for sql_id: 3dhvsp0duttat

SQL_ID         CHILD  PLAN_HASH LAST_LOAD_TIME       EXECS     AVG_ETIME    AVG_PIO      AVG_LIO SQL_TEXT
------------- ------ ---------- --------------- ---------- ------------- ---------- ------------ -----------------------------------------
3dhvsp0duttat      0 3289235442 2009-05-01/12:04:20      9          0.02          1          272 SELECT row_order, ... 
3dhvsp0duttat      1 3289235442 2009-05-05/19:43:14     13          0.08         13          546
3dhvsp0duttat      2 4076066623 2009-05-07/16:49:36     15         38.07     27,177    2,426,778


SQL> -- so it looks like maybe something bad happened with stats on 5/7
SQL> -- let's make use of the diif_table_stats procedure to see if we can spot what happened

SQL> !date
Tue May 12 11:59:22 CDT 2009

SQL> !cat diff_table_stats.sql
select report, maxdiffpct from
table(dbms_stats.diff_table_stats_in_history('&owner','&table_name',
systimestamp-&days_ago))
/

SQL> @diff_table_stats.sql
Enter value for owner: XXX
Enter value for table_name: REALLY_BIG_TABLE
Enter value for days_ago: 5

REPORT                                                                           MAXDIFFPCT
-------------------------------------------------------------------------------- ----------
###############################################################################

STATISTICS DIFFERENCE REPORT FOR:
.................................

TABLE         : REALLY_BIG_TABLE
OWNER         : XXX
SOURCE A      : Statistics as of 07-MAY-09 11.46.36.000000 AM -05:00
SOURCE B      : Current Statistics in dictionary
PCTTHRESHOLD  : 10
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TABLE / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................

OBJECTNAME                  TYP SRC ROWS       BLOCKS     ROWLEN     SAMPSIZE
...............................................................................

P2009W19                    P   A   0          0          0          0
                                B   4001801    187810     342        4001801
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

COLUMN STATISTICS DIFFERENCE:
.............................

COLUMN_NAME     SRC NDV     DENSITY    HIST NULLS   LEN  MIN   MAX   SAMPSIZ
...............................................................................


                               PARTITION: P2009W19
                               ...................

COL1            A   0       0          NO   0       0      0
                B   105136  .000009511 NO   2565391 4    302E3 6E303 1436410
COL2            A   0       0          NO   0       0      0
                B   669696  .000001493 NO   1011264 9    2A2A2 554E4 2990537
COL3            A   0       0          NO   0       0      0
                B   988     .001012145 NO   1113316 4    30303 39393 2888485
COL4            A   0       0          NO   0       0      0
                B   669696  .000037461 YES  0       11   2A2A2 554E4 5540
...

SQL> -- Sure enough, stats were created when the partition was empty, ouch!

So a couple of things to note here:

1. the table is partitioned
2. dbms_stats.diff_table_stats_in_history is smart enough to show us partition stats (nice)
3. on May 7th the stats changed dramatically
4. this statement was not re-parsed due to the stats changing

So what’s the fix? Well since stats had been generated on this partition every night for the last 5 days, the fix was just to get the optimizer to re-parse my query (and any other affected statements). In 11g this is very easy as there is a built function (psuedo documented) to flush a single SQL statement from the shared pool (see my previous post – Flush A Single SQL Statement – for details). However, I wasn’t sure how many statements were affected (more on that in a minute). But certainly any statement that touched that partitioned table which got parsed on May 7th was almost certain to have a bad plan. So eventually we decided to do the brute force approach and just flush the entire shared pool. (I know, not elegant, but fairly effective) Note that there is no guarantee that flushing the shared pool will clear all the statements. In fact, on a busy system you’re almost guaranteed that it won’t flush every statement. But it did clear up most of the issues. So with the pressure now off (and the system back to running pretty well) I had time to write a little script to look for SQL statements with a cost of 1. (I guess I could have looked for statements which were parsed on May 7th that accessed my partitioned table, but I didn’t.)

Here’s a bit of the SQL*Plus Session:

SQL>-- initially we flushed a couple directly
SQL>
SQL> @flush_sql
Enter value for sql_id: 3dhvsp0duttat

PL/SQL procedure successfully completed.

SQL> -- so we can find all of these guys and flush them - or just flush the whole shared pool
SQL> -- flushing the shared pool is easier!
SQL>
SQL> alter system flush shared_pool;

System altered.

SQL> -- but there are still no guarantees they are all cleaned up, let's check
SQL>
SQL> !cat find_sql_by_cost.sql
select distinct parsing_schema_name, sa.sql_id, sp.plan_hash_value, cost
from v$sql_plan sp, v$sqlarea sa
where sp.sql_id = sa.sql_id
and parsing_schema_name like nvl('&parsing_user',parsing_schema_name)
and cost like nvl(to_number('&cost'),to_number(cost))
and sp.plan_hash_value like nvl('&plan_hash_value',sp.plan_hash_value)
and parent_id is null
order by 4 desc
/

SQL> @find_sql_by_cost
Enter value for parsing_user:
Enter value for cost: 1
Enter value for plan_hash_value: 407%

PARSING_SCHEMA_NAME            SQL_ID        PLAN_HASH_VALUE       COST
------------------------------ ------------- --------------- ----------
XXXX_PROD                      02t2tt28xp72m      4076066623          1
XXXX_PROD                      5byphafyx78j6      4076066623          1

SQL> -- note I did a little checking here to make sure they were suffering from the same issue
SQL> -- then I cleaned them up
SQL> 
SQL>
SQL> @flush_sql
Enter value for sql_id: 02t2tt28xp72m

PL/SQL procedure successfully completed.

SQL> @find_sql
Enter value for sql_text:
Enter value for sql_id: 02t2tt28xp72m

no rows selected

SQL> @flush_sql
Enter value for sql_id: 5byphafyx78j6

PL/SQL procedure successfully completed.

SQL> @find_sql
Enter value for sql_text:
Enter value for sql_id: 5byphafyx78j6

no rows selected

So the short term fix worked. The statements picked up the correct plans after re-parsing. Now let’s talk about the long term fix. Why did the partition not have stats on it? Turns out the partitioning scheme in place was range by date with a new partition for each week. Each partition goes from Thursday through the following Wednesday. Several partitions had been created in advance but as you may have guessed by now, nothing was done about the stats. Meaning that the partitions had values of 0 for the number of rows. Since the gather stats job was kicking in every evening, a whole days worth of data could be loaded into the partition before the stats would be updated. So that means that 24 hours worth of activity would occur with “bad” stats every week. Couple that with the default of NO_INVALIDATE=TRUE on the stats gathering and you get statements that may hang around for extended periods of time, even if the stats change dramatically (0 rows to a few million is dramatic, right?). So we manually set the stats on the pre-created partitions using a combination of dbms_stats.copy_table_stats and dbms_stats.set_table_stats and dbms_stats.set_column_stats (see Dion Cho’s excellent post for details and example code – and also this post on the Optimizermagic Blog for further info). I should also mention that Greg Rahn has several good posts on 11g Statistics Enhancements here, here, and here.

So why was I reminded of Wimpy? Because Wimpy’s favorite saying was “I’ll gladly pay you Tuesday for a hamburger today”. And the way this system was set up, every Tuesday (er, Thursday), we were paying the price for our bad/missing statistics. OK, so it’s a stretch, but I like the pictures of Wimpy!

Please feel free to leave me a comment letting me know how you feel …

… about the pictures of Wimpy.

Shopping Trip

This evening I went to the grocery store with my wife, Jill. It was an interesting trip. I have to admit that I took a sneak peek at the shopping list before I agreed to go, and seeing that there were only 7 items on the list, I figured I could earn some extra points without spending too much time. Sneaky, I know, but I am a little weird about optimizing my time. By way of example, one of the main reasons I have a pony tail is because I once calculated how much time it took to get a haircut every month. The calculation went something like this:

Haircut Calculation
Time Primary Action Snide Comment
0.50 Commute to the barber shop Everything is a half hour away in Dallas
0.25 Wait for an open barber chair Not totally wasted because you could read the 6 year old copies of Field and Stream or the 10 year old copies of Lowrider Magazine – just in case I ever decide to go fishing in my newly restored 57 Chevy
0.75 Actually getting my haircut Takes a little extra time because of barbers need to give advice on everything from re-wiring a house to colonoscopy
0.25 Paying up Again takes a little extra time to listen to advice on tattoo removal (not that I have any tattoos, much less any tattoos that need to be removed, but it just seems rude not to act interested, especially to a guy that’s basically in charge of you public image)
0.50 Commute back to wherever Can take longer if it’s raining because nobody can drive in the rain in Dallas

So that’s a total of 2.75 hours per haircut. Multiply that by 12 times a year and that comes out to 33 hours per year. So that’s almost a whole work week. That just seemed like too much time to commit to such an unrewarding task, so I quit getting my hair cut.

But I digress.

Back to the shopping trip. Like I said, the list had 7 items on it, so I expected a quick trip. Well we get into the store and Jill asks me to get a shopping cart. I’m thinking a little hand basket thingy would be fine, because I’ve seen the list! But I don’t want to tell her that (I’m still being sneaky), so I get the shopping cart. And we proceed down the first isle, stopping every two feet or so to put something in the basket. By the end of the first isle we have at least double the number of items on the list, and by the way, nothing has been marked off the list yet. Just to give you an example, we ended up with 4 bags of chips (Lays Bar-B-Que, Tostitos Scoops, Fritos, and Sun Chips). We’d have had one more (because part of my family likes the Scoop Tostitos and another contingent likes the flat Tostitos) but we had to draw the line somewhere. It wasn’t all my wife’s doing either. We were both throwing stuff into the cart like we were expecting a hurricane to cut off food supplies for the next several weeks. By the time we got out of the store we had 17 bags of groceries. They were the small plastic kind (which I don’t really like – I always say paper when they ask – but I digress again).

I found the whole thing quite amusing, although I don’t think Jill really did.  I asked if I could check the list while we were checking out, just to make sure we hadn’t missed anything. She didn’t really appreciate my humor. (we did forget one of the things on the list by the way)

Anyway, the experience reminded me that trying to get too clever can often backfire. It also reminded me that I am predisposed to make that particular mistake. Implementing a tricky solution to a problem, while it may be intellectually stimulating,  is often not the best approach. The experience reminded of a talk I did a couple of years ago (Creative Problem Solving for Oracle Systems) where I talked about several tendencies that all of us have (to a greater or lesser extent). And how those tendencies can interfere with our ability to solve problems. One of the points of the talk was that being aware of your tendencies can help you avoid pitfalls.

Like Clint Eastwood said, “A Man’s Gotta Know His Limitations”.

Oracle Tech Day Presentation – 11g (Upgrade?)

I did a little impromptu talk yesterday at an Oracle Tech Day. It was kind of a funny situation. They had a speaker that couldn’t make it, so they called me at about 1:30 to see if I could talk at 2:00. Luckily my office is very close to the Oracle office, so I said sure.

The topic was supposed to be something about upgrading to 11g, and I had done a talk a month or so ago (at a Dallas Oracle Users Group meeting) about some of my favorite 11g features. So you know how it works, when you’ve got a hammer, everything starts to look like a nail. So I used the Power Point from that earlier talk and tried to tie in the upgrade theme as best I could. At about 3:05 I looked down at my watch and realized my hour was up (everyone knows that standard presentations are one hour). So I politely asked the audience if I should be wrapping up and they said, no you’re supposed to go until 4:00. Ouch!

Anyway, it wasn’t too painful, thanks to some good questions. Thanks to everyone who put up with the last minute sub. I promised to post the presentation materials. So here’s a link: 11g Presentation Materials

Extremes

Last week I got to look at a version 9 Statspack Report from a database where the buffer cache was set to 32M. That’s right, 32M. I don’t think you can buy a thumb drive that small. They were using 16K blocks too, so what’s that:

       (32*1024*1024)/(16*1024) = 2,048 blocks

Not very many blocks for that database. Needless to say it wasn’t working very well for the multi-hour batch job they were trying to run.

Then over the weekend someone sent an AWR report from a version 10 database and it had a 60G buffer cache. That’s the biggest one I’ve ever seen!

So it was a pretty interesting week (from a geeky technical standpoint anyway)!

It reminded me of Muggsy Bogues guarding Shawn Bradley, or Manute Bol, or MJ, or just about anybody.

Anyone seen a bigger (or smaller) buffer cache recently?

I’d love to hear about it.

AWR DBtime script

Here’s a handy little script for showing when the database has been most busy. I wrote this one because I wanted to be able to identify the workload characteristics of an instance when it was at its “busiest”, so we could model our testing strategy “correctly”. So the first part of that problem was to identify when the instance was “busiest”. The DBtime in the AWR reports seemed like a good indicator of “busiest”, so there you go. The script calculates the DBtime for each snapshot using the lag function and then reports the top 30. The script could easily be modified to calculate across multiple snapshots if desired to give the busiest day for example, but I’ll leave that as an exercise for the reader. By the way, the calculation came from 11g statspack report. The AWR report is wrapped, but the statspack report is not. So I just modified the query to use the AWR tables instead of the Statspack tables and added the lag function. This script works on 10gR2 as well.

Here’s the script: dbtime.sql
And an example of its use:

SQL*Plus: Release 11.1.0.7.0 - Production on Tue May 5 10:08:58 2009

Copyright (c) 1982, 2008, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> @dbtime
Enter value for instance_number: 
Enter value for begin_snap_id: 
Enter value for end_snap_id: 

BEGIN_SNAP   END_SNAP BEGIN_TIMESTAMP                                INST      DBTIME
---------- ---------- ---------------------------------------- ---------- -----------
      8205       8206 27-APR-09 09.21.22.022 AM                         1       32.87
      8207       8208 27-APR-09 09.40.50.404 AM                         1       32.63
      8245       8246 28-APR-09 10.00.58.734 PM                         1        5.65
      8389       8390 04-MAY-09 10.00.26.563 PM                         1        5.62
      8325       8326 02-MAY-09 06.00.06.252 AM                         1        4.81
      8220       8221 27-APR-09 10.00.06.667 PM                         1        4.28
      8317       8318 01-MAY-09 10.00.18.928 PM                         1        3.14
      8230       8231 28-APR-09 08.00.17.951 AM                         1        2.80
      8293       8294 30-APR-09 10.00.56.784 PM                         1        2.37
      8349       8350 03-MAY-09 06.00.28.355 AM                         1        2.35
      8244       8245 28-APR-09 09.00.51.632 PM                         1        2.26
      8232       8233 28-APR-09 10.00.32.610 AM                         1        1.97
      8305       8306 01-MAY-09 10.00.07.398 AM                         1        1.93
      8292       8293 30-APR-09 09.00.50.806 PM                         1        1.86
      8268       8269 29-APR-09 09.00.29.463 PM                         1        1.82
      8269       8270 29-APR-09 10.00.35.137 PM                         1        1.80
      8388       8389 04-MAY-09 09.00.20.561 PM                         1        1.80
      8353       8354 03-MAY-09 10.00.51.838 AM                         1        1.32
      8316       8317 01-MAY-09 09.00.12.973 PM                         1        1.31
      8329       8330 02-MAY-09 10.00.29.899 AM                         1        1.22
      8348       8349 03-MAY-09 05.00.22.374 AM                         1        1.18
      8213       8214 27-APR-09 03.00.16.481 PM                         1        1.17
      8365       8366 03-MAY-09 10.00.02.746 PM                         1        1.17
      8357       8358 03-MAY-09 02.00.15.346 PM                         1        1.08
      8341       8342 02-MAY-09 10.00.40.654 PM                         1        1.04
      8337       8338 02-MAY-09 06.00.17.054 PM                         1        1.03
      8204       8205 27-APR-09 09.20.32.516 AM                         1        1.00
      8361       8362 03-MAY-09 06.00.39.298 PM                         1        1.00
      8202       8203 27-APR-09 09.17.02.145 AM                         1        0.97
      8333       8334 02-MAY-09 02.00.53.480 PM                         1        0.97

30 rows selected.

And here’s the output of the AWR report for the top period.


WORKLOAD REPOSITORY report for

DB Name         DB Id    Instance     Inst Num Startup Time    Release     RAC
------------ ----------- ------------ -------- --------------- ----------- ---
LAB111        1385947011 LAB111              1 22-Apr-09 14:19 11.1.0.7.0  NO

Host Name        Platform                         CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
homer            Linux IA (32-bit)                  16     4       4      31.73

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:      8205 27-Apr-09 09:21:22        25       1.8
  End Snap:      8206 27-Apr-09 09:27:33        23       1.8
   Elapsed:                6.19 (mins)
   DB Time:               32.87 (mins)