How to Tune an Exadata

Q: How do you tune Exadata?
A: Look for long running queries that are not using Smart Scans and then fix them so they do.


We’ve worked on a bunch of Proof of Concepts (POC’s) for customers over the last year or so. These usually involve loading a few terabytes of data and running a bunch queries or some other workload on the data. Generally speaking, anything we have thrown at Exadata has blown the doors off of the platforms that the applications were previously running on. But occasionally we run into a situation where the speed up is just not what we’ve come to expect. Generally speaking it’s because we’ve done something that has kept Exadata from doing what it does best – Smart Scans. While my lead in is obviously a tongue in cheek over simplification, it is basically true. Unfortunately, it’s not as easy as it sounds to determine whether a statement has been offloaded, because our main tool for looking at how a statement was executed (the execution plan) doesn’t tell us whether a Smart Scan was used or not. So in this post, my intent is to give you a couple of options for determining whether Smart Scans are happening or not. Here’s a quick example showing a basic execution plan:


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options


INSTANCE_NAME    STARTUP_TIME      CURRENT_TIME         DAYS    SECONDS
---------------- ----------------- ----------------- ------- ----------
SANDBOX1         24-MAR-2011 16:19 25-MAR-2011 22:57    1.28     110283

SYS@SANDBOX> 
SYS@SANDBOX> set timing on
SYS@SANDBOX> @avgskew3
SYS@SANDBOX> select avg(pk_col) from kso.skew3
  2  where col1 > 0
  3  /

AVG(PK_COL)
-----------
 16093750.2

Elapsed: 00:00:34.80
SYS@SANDBOX> select sql_id, sql_text from v$sql
  2  where sql_text like 'select avg(pk_col) from kso.skew3 where col1 > 0';

SQL_ID        SQL_TEXT
------------- ----------------------------------------------------------------------
4p62g77m9myak select avg(pk_col) from kso.skew3 where col1 > 0

Elapsed: 00:00:00.14
SYS@SANDBOX> @dplan
Enter value for sql_id: 4p62g77m9myak
Enter value for child_no: 

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  4p62g77m9myak, child number 0
-------------------------------------
select avg(pk_col) from kso.skew3 where col1 > 0

Plan hash value: 2684249835

------------------------------------------------------------------------------------
| Id  | Operation                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |       |       |       |   535K(100)|          |
|   1 |  SORT AGGREGATE            |       |     1 |    11 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| SKEW3 |   383M|  4028M|   535K  (1)| 01:47:02 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("COL1">0)
       filter("COL1">0)


20 rows selected.

Elapsed: 00:00:00.22

The storage line in the predicate section indicates that a Smart Scan is possible, but it doesn’t actually tell us that one occurred. So how can you tell. Well there are several ways.

  1. You can Millsap it. (generate a 10046 trace)
  2. You can Poder it. (use Tanel Poder’s snapper script to check stats and wait events)
  3. You can Wolfgang it. (generate a 10053 trace) – well actually this doesn’t work since the optimizer doesn’t know whether a statement will do a Smart Scan or not.
  4. Or you can look in v$sql – I wrote a little script called fsx.sql (short for Find_Sql_eXadata.sql) to do that.

I think that tracing is the most foolproof way to verify a Smart Scan (just look for “cell smart table/index scan” wait events). But it can be a little cumbersome to generate a trace and then find it. (Note: Method-R has a great tool to make this easier called MR Trace which is a plug in for Oracle’s SQL Developer). Tanel’s snapper script is an awesome tool that is very versatile – so it’s a very valid option as well. But both of these methods depend on the fact that you can catch the statement of interest while it is executing. They provide no way of looking back at statements that ran in the past. My fsx script is not nearly as comprehensive as either of these approaches, but it has an advantage in that it looks at values stored in v$sql (which are also captured in AWR by the way). This allows us to do analysis that is not limited to what is happening right now. (i.e. we don’t have to catch the query while it’s running).

So how does it work?

The v$sql view contains a column (IO_CELL_OFFLOAD_ELIGIBLE_BYTES) which tells us whether this child has been offloaded or not. Very simply, if the column contains a value greater than 0, then the statement was processed with a Smart Scan. Here’s the basic trick:

   decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,'No','Yes') Offload 

So basically the fsx script just does a decode on the column and if it’s value is 0 then it returns ‘No’, otherwise it returns ‘Yes’. The script does a few other things too like attempting to calculate the savings in terms of reduced volume of data being transferred across the InfiniBand fabric that the Smart Scan was responsible for. I called the output column IO_SAVED_% although it’s not really I/O, it’s bytes transferred between the storage layer and the database layer. There are some situations where the results of this calculation don’t seem to make much sense, by the way. But that’s too much to go into here. (There are more details in our upcoming Apress Exadata Book if you’re so inclined.) Anyway, here’s a quick example of using the fsx script.

SYS@SANDBOX> @fsx
Enter value for sql_text: select avg(pk_col) from kso.skew3%
Enter value for sql_id: 

SQL_ID         CHILD  PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ---------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
4p62g77m9myak      0 2684249835      2      18.31      0 Yes          71.85 select avg(pk_col) from kso.skew3 where col1 > 0

So that’s pretty easy and straight forward. You can enter a bit of a SQL statement’s text or a SQL_ID or both to locate statements of interest in v$sql. The script’s output will tell you whether it was offloaded or not. But what if we want to get a feel for how the whole system is running with regard to Offloading. We could simply add a couple of additional where clauses to the fsx script to allow us to limit the rows returned based on whether the statements were offloaded or not and maybe add a filter on average execution time as well so we can just look at the long running statements. Have a look at fsxo.sql which does just that.

SYS@SANDBOX> @fsxo
Enter value for sql_text: 
Enter value for sql_id: 
Enter value for min_etime: 10
Enter value for offloaded: 

SQL_ID         CHILD  PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ---------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
09m6t5qpgkywx      0 1885411402      1     116.79      0 No             .00 select /*+ bloom join 2  use_hash (skew temp_skew) */ a.col2, sum(a.co
1nfa7trushhpm      0 2684249835      2      15.87      0 Yes          74.60 select /*+ result_cache */ avg(pk_col) from kso.skew3 where col1 > 1
35q8ahgw2xhsp      0 3734762968      1      37.12      0 Yes          34.22 select /*+ bloom join  use_hash (skew temp_skew) */ a.col2, sum(a.col1
4p62g77m9myak      0 2684249835      2      18.31      0 Yes          71.85 select avg(pk_col) from kso.skew3 where col1 > 0
5zruc4v6y32f9      0          0      2     362.05      0 No             .00 DECLARE job BINARY_INTEGER := :job;  next_date TIMESTAMP WITH TIME ZON
b6usrg82hwsa3      0          0      2     305.30      0 No             .00 call dbms_stats.gather_database_stats_job_proc (  )

6 rows selected.

SYS@SANDBOX> @fsxo
Enter value for sql_text: 
Enter value for sql_id: 
Enter value for min_etime: 
Enter value for offloaded: YES

SQL_ID         CHILD  PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ---------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
0qa98gcnnza7h      0  568322376      2       3.56      0 Yes        -905.77 select avg(pk_col) from kso.skew where col1 > 0
1nfa7trushhpm      0 2684249835      2      15.87      0 Yes          74.60 select /*+ result_cache */ avg(pk_col) from kso.skew3 where col1 > 1
266gctwscrnn2      0  568322376      3       1.06      0 Yes        -805.04 select /*+ result_cache */ avg(pk_col) from kso.skew where col1 > 1
2uzgbm8azqqv3      0 2974987230      2       1.56      0 Yes          71.79 select avg(pk_col) from kso.skew_encrypt where col1 > 0
35q8ahgw2xhsp      0 3734762968      1      37.12      0 Yes          34.22 select /*+ bloom join  use_hash (skew temp_skew) */ a.col2, sum(a.col1
4p62g77m9myak      0 2684249835      2      18.31      0 Yes          71.85 select avg(pk_col) from kso.skew3 where col1 > 0
d15cdr0zt3vtp      0   62424106      1        .31      0 Yes          99.96 SELECT TO_CHAR(current_timestamp AT TIME ZONE 'GMT', 'YYYY-MM-DD HH24:

7 rows selected.

SYS@SANDBOX> @fsxo
Enter value for sql_text: %skew%
Enter value for sql_id: 
Enter value for min_etime: 5
Enter value for offloaded: 

SQL_ID         CHILD  PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ---------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
09m6t5qpgkywx      0 1885411402      1     116.79      0 No             .00 select /*+ bloom join 2  use_hash (skew temp_skew) */ a.col2, sum(a.co
1nfa7trushhpm      0 2684249835      2      15.87      0 Yes          74.60 select /*+ result_cache */ avg(pk_col) from kso.skew3 where col1 > 1
35q8ahgw2xhsp      0 3734762968      1      37.12      0 Yes          34.22 select /*+ bloom join  use_hash (skew temp_skew) */ a.col2, sum(a.col1
399m90n8jzpu6      0 1923773943      2       5.09      0 No             .00 select avg(pk_col) from kso.skew
4p62g77m9myak      0 2684249835      2      18.31      0 Yes          71.85 select avg(pk_col) from kso.skew3 where col1 > 0

So the idea is to be able to take a high level look at what’s being offloaded and what’s not. Obviously you can do the same thing with AWR data for a longer look back in history. But I’ll leave that as an exercise for the reader. Note that my lab system was recently bounced and so the number of statements is very small. On larger production systems you will probably want to limit yourself to longer running statements.

By the way, there is another tool that can be very helpful in determining if a statement was offloaded, DBMS_SQLTUNE.REPORT_SQL_MONITOR. But I’ll have to save that for another post.

7 Comments

  1. Kevin Fries says:

    No point in publishing my comment, but:

    You can Millsap it. (generate a 10043 trace)
    You can Poder it. (use Tanel Poder’s snapper script to check stats and wait events)
    You can Wolfgang it. (generate a 10056 trace)

    Shouldn’t this be `10046′ & ‘10053’?

  2. osborne says:

    Ha – my dyxlesia kicking in again. Fixed now.

    Kerry

  3. Greg Rahn says:

    0. You can Greg Rahn it – look at the SQL Monitor report. No tracing required. :)
    http://twitpic.com/4dnvky
    (I guess you did mention that as the very last sentence)

  4. osborne says:

    Ha – yeah I really like the SQL Monitor report. With your permission I’ll probably verberize your name for that one in my next post. Unfortunately SQL Monitoring doesn’t kick in all statements by default, but arguably it does for any that would be of interest. I actually started out this post to talk about SQL Monitoring, but by the time I got through this much I just didn’t have time and figured I’d do it next time I get a break. The output even shows which steps in a plan get offloaded – pretty cool. But I can’t get distracted on that again today. Must… keep… nose… to… grind… stone.

  5. Kerry,

    Simple and neat as always.

    One question,I wonder if is there a case where smart scan happens for one execution and not happens for the other execution of the same sql ?

    What would you do if this is the case ?

    Thanks

  6. osborne says:

    Ah very perceptive …

    It’s actually a more complicated topic than I have made out in this post. There are more details in the book by the way – but the bottom line is that yes, it is possible for a child cursor to have executions that were sometimes executed with Smart Scan and sometimes not. And since the IO_CELL_OFFLOAD_ELIGIBLE_BYTES column is cumulative, it may be > 0 if only one of the executions was offloaded. So my fsx script is not perfect in this regard. Put another way, Oracle doesn’t create separate cursors just because the choice of doing a Smart Scan or not changes.

    In practice though, this situation rarely if ever happens. You can force it to happen by monkeying with parameters between executions of a statement. Some of them do not invalidate a child, but can enable or disable Smart Scans. All this is possible because the optimizer doesn’t really know it is running on Exadata. In order for Smart Scans to occur, a statement must be executed with the direct path read mechanism. The decision to do direct path reads is not part of the optimizer’s code, so the plan doesn’t store any information about that decision. This is all interesting but probably not really of that much use.

    On the other hand, a similar sort of thing happens with partitioned tables. If a statement is executed serially (no PX) each partition will be evaluated in turn to decide if direct path reads should be used or not (generally speaking big partitions will tend to get DPR and small ones won’t). So in that situation, the fsx script will show that a statement has been offloaded if at least one partition was scanned with a Smart Scan, but the IO_SAVED_% column may show a smaller amount of savings than you might expect. This actually happens in real life and so is probably worth knowing.

    In terms of what would you do to force smart scans –

    Make sure you get Full Scans
    Make sure you get direct path reads

    It’s generally easy to get the plan you want via hints or whatever. It’s usually the direct path reads that trip us up. A couple of possibilities:

    Use parallel query
    Use alter session set “_serial_direct_read”=always;

    Sorry for the long winded answer.

  7. […] The admittedly simple SQL statement takes between 12 and 14 seconds (the average elapsed time over three executions is 13 seconds). And the first execution with the RESULT_CACHE hint (before the Result Cache has been built) takes roughly the same amount of elapsed time. The fsx.sql script shows that both versions were offloaded and each cut out approximately the same amount of payload. (I’ve written about the fsx.sql script previously here: How to Tune an Exadata) […]

Leave a Reply