What Did My New Index Mess Up?

This week I got to look at a system where a new version of an application had been rolled out. I came across a SQL statement that had taken a turn for the worse in terms of its average elapsed time. It was using an index I hadn’t noticed before, so I took a look to see if the index was new (it was – apparently it was added as part of the code roll out). So I got to wondering if there were any other statements that had suffered a backward movement in performance. So I wrote this little script called whats_changed.sql. It’s the same basic idea as one I wrote about earlier here called unstable_plans.sql. Both of these scripts are based on a calculated variance. The original unstable_plans script uses variance in average elapsed times for statements with multiple plans (i.e. it shows statements that have multiple plans which exhibit a significant difference in average elapsed time between plans). The whats_changed.sql script shows statements that exhibit significant differences in average elapsed time before and after a point in time (i.e. when you rolled out new code, or added an index, etc…) A short example may explain it better than I can:

$ !sql
sqlplus / as sysdba
 
SQL*Plus: Release 11.1.0.7.0 - Production on Thu Jun 25 22:24:35 2009
 
Copyright (c) 1982, 2008, Oracle.  All rights reserved.
 
 
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
 
SQL> @whats_changed
Enter Days ago: 3
Enter value for min_stddev:
Enter value for min_etime:
Enter value for faster_slower:
 
SQL_ID               EXECS AVG_ETIME_BEFORE AVG_ETIME_AFTER   NORM_STDDEV RESULT
------------- ------------ ---------------- --------------- ------------- ------
an9114vtxukz2           17             1.25            5.00        2.1148 Slower
803u6vwz0ah1p           29             0.04            0.18        2.1552 Slower
bvf3fxv3hatw7        2,390             0.03            0.14        2.4147 Slower
5sgs7q9pjnzg5            2             2.21            0.42        3.0130 Faster
b0zkuv9qtcvnn           48             0.16            0.85        3.1500 Slower
9ws5c6p77d384        1,852             0.57            3.30        3.3609 Slower
dy98ajwqc8s2x           15             1.32            8.34        3.7558 Slower
bkka22vb5y7t6          596             0.19            1.26        3.8804 Slower
6qfwkqgvjkn2q           12             0.51            3.60        4.2641 Slower
05xcf43d9psvm        1,197             0.02            0.14        5.4263 Slower
8wgmc9w5ubtsy          183             0.49            4.32        5.4946 Slower
fwfpuf1vfwfu2            6             0.02            0.22        5.6314 Slower
50c18dwvxq4sb          222             0.54            5.29        6.1586 Slower
aukanfjd3d8fa            3             0.88           10.00        7.3496 Slower
44bq4r5z2xrsm           54             3.13           43.39        9.0946 Slower
0az7czjdw8z7j          110             0.62            0.02       17.5933 Faster
f41agfq9qdhk6            3             0.24            8.06       22.6765 Slower
 
17 rows selected.
 
SQL> -- or to see just the slower ones
SQL>
SQL> @whats_changed
Enter Days ago: 3
Enter value for min_stddev:
Enter value for min_etime:
Enter value for faster_slower: Slower
 
SQL_ID               EXECS AVG_ETIME_BEFORE AVG_ETIME_AFTER   NORM_STDDEV RESULT
------------- ------------ ---------------- --------------- ------------- ------
an9114vtxukz2           17             1.25            5.00        2.1148 Slower
803u6vwz0ah1p           29             0.04            0.18        2.1552 Slower
bvf3fxv3hatw7        2,390             0.03            0.14        2.4147 Slower
b0zkuv9qtcvnn           48             0.16            0.85        3.1500 Slower
9ws5c6p77d384        1,852             0.57            3.30        3.3609 Slower
dy98ajwqc8s2x           15             1.32            8.34        3.7558 Slower
bkka22vb5y7t6          596             0.19            1.26        3.8804 Slower
6qfwkqgvjkn2q           12             0.51            3.60        4.2641 Slower
05xcf43d9psvm        1,197             0.02            0.14        5.4263 Slower
8wgmc9w5ubtsy          183             0.49            4.32        5.4946 Slower
fwfpuf1vfwfu2            6             0.02            0.22        5.6314 Slower
50c18dwvxq4sb          222             0.54            5.29        6.1586 Slower
aukanfjd3d8fa            3             0.88           10.00        7.3496 Slower
44bq4r5z2xrsm           54             3.13           43.39        9.0946 Slower
f41agfq9qdhk6            3             0.24            8.06       22.6765 Slower
 
15 rows selected.

A couple of comments on the script:

  • The default minimum normalized standard deviation is 2. So only statements that are 2X slower were listed.
  • The default minimum average elapsed time per execution is 0.1. So only statements that ran longer than 0.1 seconds were reported.
  • The DAYS_AGO parameter sets a pivot point, we’ll call it the REFERENCE_TIME. The calculations are based on averages from before and after the REFERENCE_TIME.

And a brief explanation of what’s in the output:

  • SQL_ID – the statement id
  • EXECS – the total number of executions in the AWR tables
  • AVG_ETIME_BEFORE – the average elapsed time per execution before the REFERENCE_TIME
  • AVG_ETIME_AFTER – the average elapsed time per execution after the REFERENCE_TIME
  • NORM_STDDEV – this is a normalized standard deviation (i.e. how many times slower/faster it is now)

So the above output indicates that only a few statements are significantly worse (avg. elapsed time 2X or greater) since our new code rollout, which occurred about 3 days ago. Statement 44bq4r5z2xrsm obviously sticks out with a new average elapsed time of 43+ seconds and about 9 times slower than before. This was the one that had picked up the new index. The rest were primarily due to increased physical i/o caused by some newly introduced statements which were doing a few million lio’s per execution and thus causing all the other statements to suffer slightly.

Let me know what you think. Your comments are always welcome.

10 Comments

  1. alex says:

    Hi Kerry,

    I think the script is very good and can help a lot in those so called “post release – it should be faster the dev team told me ” situations. Excellent work !

    Regards,
    Alex

  2. B. Polarski says:

    Nice piece of work. Couple it with the Z. Breitling published in 2006 ‘http://www.doggarts.com/sql/xplan.sql’ and you get a powerful monitoring and diagnostic tool:
    Your query pinpoint candidates to investigate plan variance.

  3. Mark Davidson says:

    Kerry,

    Excellent script and thank you! This type of script will be extremely helpful in finding those “little” queries that seem to go bad after a release when the development team forgets to let the DBA’s know something else was changed and also when the DBA’s may have missed something.
    This might be a good presentation for Hotsos 2010… something like to “Tools and Methods for Identifying Post-Release Issues” :-)

    Thanks again,
    Mark

  4. Anand says:

    Great script…Nice work..Waiting for more

    Regards,
    Anand

  5. Great Script !

    It is worthy of sharing at a Conference !

  6. osborne says:

    Thanks for the comments guys. I have been using this one quite a bit lately. I find I often change it to pivot around a specific snap_id. Like add an index at 11:05, find the 11:00 snap_id, and use that as my pivot point. I also find myself reducing the default minimum etime and minimum stddev parameters. It does turn up some interesting information. Thanks again for all the positive comments.

    Kerry

  7. Johnny says:

    Hi Kerry
    Considering the script “whats_changed.sql”…
    Why does sometimes the column “buffer_gets_delta” (and others too) of table DBA_HIST_SQLSTAT is zero?
    Will the AWR sum up the “lost value” in the next snapshot?

    Thanks
    Your blog is fantastic

  8. osborne says:

    Not necessarily a lost value. Could be many reasons. Check “select 1 from dual” for example. Most likely a timing thing with snap – i.e. the query crosses a snap boundary and hadn’t done a buffer get on one side or the other. Sounds like a good homework assignment. Let me know how many reasons you can come up with. :)

    Kerry

    • Johnny says:

      In our environment 11g Enterprise Edition 11.2.0.2.0.
      The results for a specific SQL is:

      SNAP_ID INSTANCE_NUMBER EXECUTIONS_DELTA BUFFER_GETS_DELTA
      ———- ————— —————- —————–
      24309 1 602 24138968
      24311 1 618 24780756
      24312 1 2335 93630476
      24313 1 9799 392945041
      24314 1 30381 0
      24315 1 53867 2159867593
      24315 2 58741 2355317118
      24316 1 67740 0
      24317 1 67143 2692157387
      24317 2 74159 0
      24318 1 58646 0
      24318 2 63856 2560338352
      24319 1 50380 2019860691
      24320 1 49875 0

      How could be zero? This is skewing the “% of DBTime” in the AWR RAC Difference Report.

      DBA_HIST_SQLSTAT 11g documentation states: “This view captures the top SQL statements based on a set of criteria and captures the statistics information from V$SQL”. Well, I imagine that a SQL with huge number of executions and/or Buffer Gets, even with quick execution time, would be captured. I just can to come up with two explanations: a bug or there are some thresholds in AWR preventing that information be captured. I think in the first option, because the results show others columns with values.

      Thanks for your attention

Leave a Reply