I recently had a very positive experience with Oracle’s SQL Tuning Advisor. I was asked to assist with a 10gR2 RAC database that was performing badly. The database was spending a lot of time waiting on gc buffer busy waits and was doing a lot of physical i/o. These however turned out to be red herrings. A quick check showed transfer times on the inter-connect were not an issue. The disk access times were also not out of line. The real problem was that a handful of statements were doing an excessive amount of logical i/o. We were able to identify the point in time when the performance issue began so we had a look at the AWR tables to see if there had been any plan changes for the “bad” statements around that time. Here is a snippet from the log showing changes in the plan_hash_value for one of the statements which produced substantially worse results than the previous plan(s).
SQL> @awr_plan_change
Enter value for sql_id: 2mkw0zbja4t6g
SNAP_ID BEGIN_INTERVAL_TIME SQL_ID PLAN_HASH_VALUE ETIME LIO
---------- ------------------------------ ------------- --------------- ------------ ------------
35519 18-JUL-08 09.00.35.521 AM 2mkw0zbja4t6g 3485252631 .0288 748.4
35519 18-JUL-08 08.59.49.706 AM 2mkw0zbja4t6g .0288 748.4
35519 18-JUL-08 09.00.35.521 AM 2mkw0zbja4t6g 4125764560 6.8601 86,933.8
35519 18-JUL-08 08.59.49.706 AM 2mkw0zbja4t6g 3.3464 66,720.2
35519 18-JUL-08 09.00.35.521 AM 2mkw0zbja4t6g 593567772 .1594 8,232.7
35519 18-JUL-08 08.59.49.706 AM 2mkw0zbja4t6g .1594 8,232.7
35519 18-JUL-08 09.00.35.521 AM 2mkw0zbja4t6g 4125764560 3.3464 66,720.2
35519 18-JUL-08 08.59.49.706 AM 2mkw0zbja4t6g 6.8601 86,933.8
35538 18-JUL-08 06.29.58.999 PM 2mkw0zbja4t6g 593567772 .0420 8,012.1
35538 18-JUL-08 06.30.44.800 PM 2mkw0zbja4t6g .0420 8,012.1
35539 18-JUL-08 07.00.05.767 PM 2mkw0zbja4t6g .0426 8,016.7
35539 18-JUL-08 07.00.51.579 PM 2mkw0zbja4t6g .0426 8,016.7
35540 18-JUL-08 07.30.12.654 PM 2mkw0zbja4t6g .0429 8,024.2
35540 18-JUL-08 07.30.58.419 PM 2mkw0zbja4t6g .0429 8,024.2
35545 18-JUL-08 10.00.32.175 PM 2mkw0zbja4t6g 3526031842 92.8685 203,058.3
35545 18-JUL-08 09.59.46.365 PM 2mkw0zbja4t6g 53.5655 136,190.0
35545 18-JUL-08 10.00.32.175 PM 2mkw0zbja4t6g 53.5655 136,190.0
35545 18-JUL-08 09.59.46.365 PM 2mkw0zbja4t6g 92.8685 203,058.3
35546 18-JUL-08 10.30.39.168 PM 2mkw0zbja4t6g 108.2104 220,259.1
35546 18-JUL-08 10.29.53.369 PM 2mkw0zbja4t6g 138.8346 217,732.0
35546 18-JUL-08 10.30.39.168 PM 2mkw0zbja4t6g 138.8346 217,732.0
35546 18-JUL-08 10.29.53.369 PM 2mkw0zbja4t6g 108.2104 220,259.1
35547 18-JUL-08 11.00.00.396 PM 2mkw0zbja4t6g 121.4556 217,735.0
35547 18-JUL-08 11.00.46.148 PM 2mkw0zbja4t6g 121.4556 217,735.0
35548 18-JUL-08 11.30.04.018 PM 2mkw0zbja4t6g 108.4236 217,739.0
35548 18-JUL-08 11.30.50.076 PM 2mkw0zbja4t6g 108.4236 217,739.0
35557 19-JUL-08 04.00.10.812 AM 2mkw0zbja4t6g 115.0718 217,731.0
35557 19-JUL-08 04.00.56.631 AM 2mkw0zbja4t6g 115.0718 217,731.0
35558 19-JUL-08 04.29.17.277 AM 2mkw0zbja4t6g 127.8277 217,731.0
35558 19-JUL-08 04.30.03.100 AM 2mkw0zbja4t6g 127.8277 217,731.0
35559 19-JUL-08 04.59.24.163 AM 2mkw0zbja4t6g 112.7978 198,768.6
35559 19-JUL-08 05.00.09.945 AM 2mkw0zbja4t6g 112.7978 198,768.6
35560 19-JUL-08 05.29.31.269 AM 2mkw0zbja4t6g 107.6025 193,642.2
35560 19-JUL-08 05.30.17.076 AM 2mkw0zbja4t6g 107.6025 193,642.2
35561 19-JUL-08 06.00.24.094 AM 2mkw0zbja4t6g 184.9285 217,598.2
35561 19-JUL-08 05.59.38.289 AM 2mkw0zbja4t6g 145.1714 204,952.2
The log showed a couple of things:
- The “bad” plan first showed up around 10:00 PM on July 18th (plan hash value 3526031842).
- The “bad” plan did orders of magnitude more logical i/o than the previous plans.
- The plan for this statement had been experiencing instability even prior to the problem (this was due to bind variable peeking).
Since this was a production system, the goal was to return to the original performance as quickly as possible. The major difference between the two plans was that the “bad” plan had a full table scan on a 14M row table while the good plan used an index. My first thought was to force Oracle to use the plan I wanted via hints as the problem was severely impacting the users and it was a relatively simple query. Also, the majority of the query was buried in a view so it would be a simple matter to modify the view without having to roll out any new code. While I started down that path, an enterprising member of the team (Rick Miners), created a SQL Tuning Set out of one of the queries and let Oracle recommend a “fix”. This was done with the DBMS_SQLTUNE package. The recommendation offered was to create a SQL Profile that resulted in the exact plan we wanted – it eliminated the full table scan in favor of the index just as the “good” plan had. We accepted the proposed SQL Profile and watched as the long running versions completed while the new executions picked up the new plan. The same process was repeated for several other troublesome queries – all with the same excellent results.
The success came as quite a shock to me as my previous experiences with this advisor had not been good. Specifically, I was involved in an incident where a DBA had implemented several recommendations which brought a production server to its knees. Note that OEM has an easy point and shoot interface to implement recommendations (often a SQL Profile) with the push of a button. Unfortunately it seems that the shooting sometimes occurs before the pointing. It’s easy to see how that can happen. Imagine yourself in a high pressure production support situation where you know a certain SQL statement is behaving really badly and you’re looking at an OEM screen that says – yeah that SQL statement is a problem and I know how to fix it. Not only that but here’s a bunch of technical sounding supporting data and a 6 digit estimate of the percentage improvement you’ll get if you just push this button. And there is the button … just begging to be clicked. And you think to yourself, “Well it can’t get much worse”, so you click the button.
At any rate, this most recent experience has made me completely rethink my position on Oracle’s Tuning Advisors. While they are certainly not perfect, I think that they can serve a purpose – an extra wrench in the old tool kit if you will.
The easiest/quickest way to use this advisor is of course with OEM/Grid control. However, I don’t always have access to that tool. Also, since I am a performance addict, I don’t really want to be tempted to click the button without doing my homework first. So here are the basic steps to do it from the command line (i.e. sqlplus):
- Create a tuning task
- Creating a task for a single sql statement makes the most sense to me.
- There are actually 3 parts to this – creating the tuning task, executing it, and then reporting on it
- DBMS_SQLTUNE.CREATE_TUNING_TASK
- DBMS_SQLTUNE.EXECUTE_TUNING_TASK
- DBMS_SQLTUNE.REPORT_TUNING_TASK
- See create_tuning_task.sql
- Note the default time limit for the execute part is 30 minutes so you may want to reduce that (the create_tuning_task script prompts you for a value in seconds). By the way, the reason it can take so long is that the advisor can actually execute parts of the statement to get better estimates of the number of rows returned by a particular step.
- Primary recommendations are often to accept a SQL Profile but there may be other recommendations such as creating indexes, etc..
- The report output shows the old plan and the proposed new plan.
- Review recommendations
- Never, ever, ever (I really mean it) blindly accept a recommendation – Duh!
- Look at the new plan as proposed by REPORT_TUNING_TASK
- Specifically – evaluate the proposed plan and if possible and time permits, test it
- Prepare a script to disable the SQL Profile that will be created.
- You do have a back out plan right?
- It is very easy to disable a profile, however once a plan is selected it will not be changed. That execution will continue to use that plan until it finishes, or hell freezes over, which ever happens first.
- See disable_sql_profile.sql
- Accept the SQL Profile
- Confirm the performance improvement
- Existing active cursors will continue processing
- A new child will be created using the profile and subsequent executions will use it
- You can compare the performance of the old plan and the new
- See find_sql.sql, find_sql_stats.sql, dplan.sql
- Consider creating an Outline for the statement
- SQL Profiles do not lock the execution plan
- Outlines do lock the plan
- You can create an outline on a plan that is using a profile
- You should disable the profile after enabling the outline
- See create_outline.sql
- Figure out why the plan went crazy in the first place
Jonathan Lewis has a couple of notes about SQL Profiles on his site here SQL Profiles (10g) and here SQL Profiles – 2 that are worth a look. I totally agree with his advice to make sure you know what the profile is going to do before implementing it. You can do this by looking at the proposed plan and the hints to be applied to get there (see tuning_task_hints.sql or sql_profile_hints.sql). Also be aware that profiles will tend to sour on you over time. As things change, the estimates that the CBO comes up with for particular operations can and will change. One of the main techniques used by profiles is to modify cardinality estimates of operations by applying a scaling factor. Once the factor is defined it remains fixed, however, the CBO’s estimate may change for many reasons – new stats, histograms, etc… Therefore, I don’t believe that profiles are a good choice as a long term solution.