SQL Tuning Advisor

I recently had a very positive experience with Oracle’s SQL Tuning Advisor. I was asked to assist with a 10gR2 RAC database that was performing badly. The database was spending a lot of time waiting on gc buffer busy waits and was doing a lot of physical i/o. These however turned out to be red herrings. A quick check showed transfer times on the inter-connect were not an issue. The disk access times were also not out of line. The real problem was that a handful of statements were doing an excessive amount of logical i/o.  We were able to identify the point in time when the performance issue began so we had a look at the AWR tables to see if there had been any plan changes for the “bad” statements around that time. Here is a snippet from the log showing changes in the plan_hash_value for one of the statements which produced substantially worse results than the previous plan(s).

SQL> @awr_plan_change
Enter value for sql_id: 2mkw0zbja4t6g

   SNAP_ID BEGIN_INTERVAL_TIME            SQL_ID        PLAN_HASH_VALUE        ETIME          LIO
---------- ------------------------------ ------------- --------------- ------------ ------------
     35519 18-JUL-08 09.00.35.521 AM      2mkw0zbja4t6g      3485252631        .0288        748.4
     35519 18-JUL-08 08.59.49.706 AM      2mkw0zbja4t6g                        .0288        748.4
     35519 18-JUL-08 09.00.35.521 AM      2mkw0zbja4t6g      4125764560       6.8601     86,933.8
     35519 18-JUL-08 08.59.49.706 AM      2mkw0zbja4t6g                       3.3464     66,720.2
     35519 18-JUL-08 09.00.35.521 AM      2mkw0zbja4t6g       593567772        .1594      8,232.7
     35519 18-JUL-08 08.59.49.706 AM      2mkw0zbja4t6g                        .1594      8,232.7
     35519 18-JUL-08 09.00.35.521 AM      2mkw0zbja4t6g      4125764560       3.3464     66,720.2
     35519 18-JUL-08 08.59.49.706 AM      2mkw0zbja4t6g                       6.8601     86,933.8
     35538 18-JUL-08 06.29.58.999 PM      2mkw0zbja4t6g       593567772        .0420      8,012.1
     35538 18-JUL-08 06.30.44.800 PM      2mkw0zbja4t6g                        .0420      8,012.1
     35539 18-JUL-08 07.00.05.767 PM      2mkw0zbja4t6g                        .0426      8,016.7
     35539 18-JUL-08 07.00.51.579 PM      2mkw0zbja4t6g                        .0426      8,016.7
     35540 18-JUL-08 07.30.12.654 PM      2mkw0zbja4t6g                        .0429      8,024.2
     35540 18-JUL-08 07.30.58.419 PM      2mkw0zbja4t6g                        .0429      8,024.2
     35545 18-JUL-08 10.00.32.175 PM      2mkw0zbja4t6g      3526031842      92.8685    203,058.3
     35545 18-JUL-08 09.59.46.365 PM      2mkw0zbja4t6g                      53.5655    136,190.0
     35545 18-JUL-08 10.00.32.175 PM      2mkw0zbja4t6g                      53.5655    136,190.0
     35545 18-JUL-08 09.59.46.365 PM      2mkw0zbja4t6g                      92.8685    203,058.3
     35546 18-JUL-08 10.30.39.168 PM      2mkw0zbja4t6g                     108.2104    220,259.1
     35546 18-JUL-08 10.29.53.369 PM      2mkw0zbja4t6g                     138.8346    217,732.0
     35546 18-JUL-08 10.30.39.168 PM      2mkw0zbja4t6g                     138.8346    217,732.0
     35546 18-JUL-08 10.29.53.369 PM      2mkw0zbja4t6g                     108.2104    220,259.1
     35547 18-JUL-08 11.00.00.396 PM      2mkw0zbja4t6g                     121.4556    217,735.0
     35547 18-JUL-08 11.00.46.148 PM      2mkw0zbja4t6g                     121.4556    217,735.0
     35548 18-JUL-08 11.30.04.018 PM      2mkw0zbja4t6g                     108.4236    217,739.0
     35548 18-JUL-08 11.30.50.076 PM      2mkw0zbja4t6g                     108.4236    217,739.0
     35557 19-JUL-08 04.00.10.812 AM      2mkw0zbja4t6g                     115.0718    217,731.0
     35557 19-JUL-08 04.00.56.631 AM      2mkw0zbja4t6g                     115.0718    217,731.0
     35558 19-JUL-08 04.29.17.277 AM      2mkw0zbja4t6g                     127.8277    217,731.0
     35558 19-JUL-08 04.30.03.100 AM      2mkw0zbja4t6g                     127.8277    217,731.0
     35559 19-JUL-08 04.59.24.163 AM      2mkw0zbja4t6g                     112.7978    198,768.6
     35559 19-JUL-08 05.00.09.945 AM      2mkw0zbja4t6g                     112.7978    198,768.6
     35560 19-JUL-08 05.29.31.269 AM      2mkw0zbja4t6g                     107.6025    193,642.2
     35560 19-JUL-08 05.30.17.076 AM      2mkw0zbja4t6g                     107.6025    193,642.2
     35561 19-JUL-08 06.00.24.094 AM      2mkw0zbja4t6g                     184.9285    217,598.2
     35561 19-JUL-08 05.59.38.289 AM      2mkw0zbja4t6g                     145.1714    204,952.2

The log showed a couple of things:

  1. The “bad” plan first showed up around 10:00 PM on July 18th (plan hash value 3526031842).
  2. The “bad” plan did orders of magnitude more logical i/o than the previous plans.
  3. The plan for this statement had been experiencing instability even prior to the problem (this was due to bind variable peeking).

Since this was a production system, the goal was to return to the original performance as quickly as possible. The major difference between the two plans was that the “bad” plan had a full table scan on a 14M row table while the good plan used an index. My first thought was to force Oracle to use the plan I wanted via hints as the problem was severely impacting the users and it was a relatively simple query. Also, the majority of the query was buried in a view so it would be a simple matter to modify the view without having to roll out any new code. While I started down that path, an enterprising member of the team (Rick Miners), created a SQL Tuning Set out of one of the queries and let Oracle recommend a “fix”. This was done with the DBMS_SQLTUNE package. The recommendation offered was to create a SQL Profile that resulted in the exact plan we wanted – it eliminated the full table scan in favor of the index just as the “good” plan had. We accepted the proposed SQL Profile and watched as the long running versions completed while the new executions picked up the new plan. The same process was repeated for several other troublesome queries – all with the same excellent results.

The success came as quite a shock to me as my previous experiences with this advisor had not been good. Specifically, I was involved in an incident where a DBA had implemented several recommendations which brought a production server to its knees. Note that OEM has an easy point and shoot interface to implement recommendations (often a SQL Profile) with the push of a button. Unfortunately it seems that the shooting sometimes occurs before the pointing. It’s easy to see how that can happen. Imagine yourself in a high pressure production support situation where you know a certain SQL statement is behaving really badly and you’re looking at an OEM screen that says – yeah that SQL statement is a problem and I know how to fix it. Not only that but here’s a bunch of technical sounding supporting data and a 6 digit estimate of the percentage improvement you’ll get if you just push this button. And there is the button … just begging to be clicked. And you think to yourself, “Well it can’t get much worse”, so you click the button.

At any rate, this most recent experience has made me completely rethink my position on Oracle’s Tuning Advisors. While they are certainly not perfect, I think that they can serve a purpose – an extra wrench in the old tool kit if you will.

The easiest/quickest way to use this advisor is of course with OEM/Grid control. However, I don’t always have access to that tool. Also, since I am a performance addict, I don’t really want to be tempted to click the button without doing my homework first. So here are the basic steps to do it from the command line (i.e. sqlplus):

  1. Create a tuning task
    • Creating a task for a single sql statement makes the most sense to me.
    • There are actually 3 parts to this – creating the tuning task, executing it, and then reporting on it
      • DBMS_SQLTUNE.CREATE_TUNING_TASK
      • DBMS_SQLTUNE.EXECUTE_TUNING_TASK
      • DBMS_SQLTUNE.REPORT_TUNING_TASK
    • See create_tuning_task.sql
    • Note the default time limit for the execute part is 30 minutes so you may want to reduce that (the create_tuning_task script prompts you for a value in seconds). By the way, the reason it can take so long is that the advisor can actually execute parts of the statement to get better estimates of the number of rows returned by a particular step.
    • Primary recommendations are often to accept a SQL Profile but there may be other recommendations such as creating indexes, etc..
    • The report output shows the old plan and the proposed new plan.
  2. Review recommendations
    • Never, ever, ever (I really mean it) blindly accept a recommendation – Duh!
    • Look at the new plan as proposed by REPORT_TUNING_TASK
    • Specifically – evaluate the proposed plan and if possible and time permits, test it
  3. Prepare a script to disable the SQL Profile that will be created.
    • You do have a back out plan right?
    • It is very easy to disable a profile, however once a plan is selected it will not be changed. That execution will continue to use that plan until it finishes, or hell freezes over, which ever happens first.
    • See disable_sql_profile.sql
  4. Accept the SQL Profile
  5. Confirm the performance improvement
    • Existing active cursors will continue processing
    • A new child will be created using the profile and subsequent executions will use it
    • You can compare the performance of the old plan and the new
    • See find_sql.sql, find_sql_stats.sql, dplan.sql
  6. Consider creating an Outline for the statement
    • SQL Profiles do not lock the execution plan
    • Outlines do lock the plan
    • You can create an outline on a plan that is using a profile
    • You should disable the profile after enabling the outline
    • See create_outline.sql
  7. Figure out why the plan went crazy in the first place

Jonathan Lewis has a couple of notes about SQL Profiles on his site here SQL Profiles (10g) and here SQL Profiles – 2 that are worth a look. I totally agree with his advice to make sure you know what the profile is going to do before implementing it. You can do this by looking at the proposed plan and the hints to be applied to get there (see tuning_task_hints.sql or sql_profile_hints.sql). Also be aware that profiles will tend to sour on you over time. As things change, the estimates that the CBO comes up with for particular operations can and will change. One of the main techniques used by profiles is to modify cardinality estimates of operations by applying a scaling factor. Once the factor is defined it remains fixed, however, the CBO’s estimate may change for many reasons – new stats, histograms, etc… Therefore, I don’t believe that profiles are a good choice as a long term solution.

Leave a Reply