March 26, 2011, 11:35 am
Q: How do you tune Exadata?
A: Look for long running queries that are not using Smart Scans and then fix them so they do.
We’ve worked on a bunch of Proof of Concepts (POC’s) for customers over the last year or so. These usually involve loading a few terabytes of data and running a bunch queries or some other workload on the data. Generally speaking, anything we have thrown at Exadata has blown the doors off of the platforms that the applications were previously running on. But occasionally we run into a situation where the speed up is just not what we’ve come to expect. Generally speaking it’s because we’ve done something that has kept Exadata from doing what it does best – Smart Scans. While my lead in is obviously a tongue in cheek over simplification, it is basically true. Unfortunately, it’s not as easy as it sounds to determine whether a statement has been offloaded, because our main tool for looking at how a statement was executed (the execution plan) doesn’t tell us whether a Smart Scan was used or not. So in this post, my intent is to give you a couple of options for determining whether Smart Scans are happening or not. Here’s a quick example showing a basic execution plan:
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
INSTANCE_NAME STARTUP_TIME CURRENT_TIME DAYS SECONDS
---------------- ----------------- ----------------- ------- ----------
SANDBOX1 24-MAR-2011 16:19 25-MAR-2011 22:57 1.28 110283
SYS@SANDBOX>
SYS@SANDBOX> set timing on
SYS@SANDBOX> @avgskew3
SYS@SANDBOX> select avg(pk_col) from kso.skew3
2 where col1 > 0
3 /
AVG(PK_COL)
-----------
16093750.2
Elapsed: 00:00:34.80
SYS@SANDBOX> select sql_id, sql_text from v$sql
2 where sql_text like 'select avg(pk_col) from kso.skew3 where col1 > 0';
SQL_ID SQL_TEXT
------------- ----------------------------------------------------------------------
4p62g77m9myak select avg(pk_col) from kso.skew3 where col1 > 0
Elapsed: 00:00:00.14
SYS@SANDBOX> @dplan
Enter value for sql_id: 4p62g77m9myak
Enter value for child_no:
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 4p62g77m9myak, child number 0
-------------------------------------
select avg(pk_col) from kso.skew3 where col1 > 0
Plan hash value: 2684249835
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 535K(100)| |
| 1 | SORT AGGREGATE | | 1 | 11 | | |
|* 2 | TABLE ACCESS STORAGE FULL| SKEW3 | 383M| 4028M| 535K (1)| 01:47:02 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - storage("COL1">0)
filter("COL1">0)
20 rows selected.
Elapsed: 00:00:00.22
The storage line in the predicate section indicates that a Smart Scan is possible, but it doesn’t actually tell us that one occurred. So how can you tell. Well there are several ways.
- You can Millsap it. (generate a 10046 trace)
- You can Poder it. (use Tanel Poder’s snapper script to check stats and wait events)
- You can Wolfgang it. (generate a 10053 trace) – well actually this doesn’t work since the optimizer doesn’t know whether a statement will do a Smart Scan or not.
- Or you can look in v$sql – I wrote a little script called fsx.sql (short for Find_Sql_eXadata.sql) to do that.
I think that tracing is the most foolproof way to verify a Smart Scan (just look for “cell smart table/index scan” wait events). But it can be a little cumbersome to generate a trace and then find it. (Note: Method-R has a great tool to make this easier called MR Trace which is a plug in for Oracle’s SQL Developer). Tanel’s snapper script is an awesome tool that is very versatile – so it’s a very valid option as well. But both of these methods depend on the fact that you can catch the statement of interest while it is executing. They provide no way of looking back at statements that ran in the past. My fsx script is not nearly as comprehensive as either of these approaches, but it has an advantage in that it looks at values stored in v$sql (which are also captured in AWR by the way). This allows us to do analysis that is not limited to what is happening right now. (i.e. we don’t have to catch the query while it’s running).
So how does it work?
Continue reading ‘How to Tune an Exadata’ »
March 2, 2011, 9:52 am
I’ve had several inquiries about whether our upcoming Exadata book is part of the “Alpha” program at Apress. Honestly, I wasn’t even familiar with the program so I asked our editor, Jonathan Gennick, and found out that our book is part of the program. Tanel already did a post explaining how it works here:
I just wanted to follow up with a quick post since I’d had a few questions about it already. It’s actually pretty cool if you are anxious to get your hands on the material. It’s also interesting because you can see how the book writing process goes. You basically have access to draft versions of chapters as the authors turn them in – warts and all. Then you’ll get emails when the chapters are updated as they go through the reviewing and editing process. Here’s a link to the Apress page where you can see all the details:
So welcome to the future where everyone can see everything you’re doing. Next thing you know we’ll all be wearing jet packs.
February 17, 2011, 8:51 pm
Just a quick post (mostly so I can find the info quickly myself when I want it). Here’s how to get the master serial number for an Exadata Rack:
ipmitool sunoem cli ‘show /SP system_identifier’
Apparently by convention the Exadata Rack serial number is appended to the end of ILOM system identifier string. Here’s an example:
[enkdb01:root] /root
> ipmitool sunoem cli 'show /SP system_identifier'
Connected. Use ^D to exit.
-> show /SP system_identifier
/SP
Properties:
system_identifier = Sun Oracle Database Machine _YOUR_SERIAL_NUMBER_HERE_
-> Session closed
Disconnected
Thanks to Dan Norris for that bit of info.
You can also get the individual serial numbers for each component in an Exadata Rack like this:
/opt/oracle.SupportTools/CheckHWnFWProfile -S
So you would log on to each machine and run this. Here’s the output from one of the DB Servers in our system:
Continue reading ‘Exadata Serial Numbers’ »
February 9, 2011, 7:50 pm
I have been getting quite a few questions about our upcoming Exadata Book lately so I thought I would post a quick update. We are working feverishly on it so please give us a break!
Just kidding!
I am actually feeling like we can see the light at the end of the tunnel now. We are well past the half way mark and I am feeling confident about the content. Well more than confident actually. I think it’s going to be awesome! In large part that’s due to the fact that I feel like we have the Dream Team working on the project. Tanel Poder has signed on as a co-author. Kevin Closson is the Official Technical Reviewer (and we’re tentatively planning on including a number of his comments in the book – in little “Kevin Says” sidebars). As one of the main architects of Exadata, this should provide some interesting perspective. Arup Nanda has volunteered as an unofficial technical reviewer as well. I have to say that Arup has been a great help. And I really appreciate him providing another perspective on what we’re writing about. All three of these guys are fellow Oak Table bretheren, by the way. Randy Johnson is the other co-author, and although he generally prefers to keep a low profile, he is extremely knowledgeable on things that the rest of us don’t deal with that much on a day to day basis, namely backup and recovery and storage configuration. He has a great RAC and ASM background as well. I have to also say that a guy none of you has ever heard of (Andy Colvin) has been a huge help as well. He is our in-house Exadata patching guru. Without him I’m not sure we would have been able to do the necessary testing to complete the book.
I must say that I feel honored to be involved in a project with such an accomplished group of guys. And by the way, we have had numerous offers from people that I have a lot of respect for to help with various aspects of the project. I want to thank all of you for those offers, even if we haven’t taken you up on all of them (our little brains can only absorb so much feedback at any one time). The book is actually available for pre-order on the Amazon already (so someone must think we are actually going to finish it pretty soon). I think we’re right on track for later spring delivery. 🙂
February 8, 2011, 1:04 am
You know the cost calculation that the cost based optimizer (CBO) uses to determine which execution plan to choose for a SQL statement, right? If you don’t, you should immediately stop reading this and pick up a good novel instead. Ah, you’re still here? Well I got an interesting email today from one of my co-workers saying he had to kill a query yesterday. Actually that’s a big part of his current job. Killing runaway queries – apparently that job takes most of his time between 8 and 5. Anyway, he sent me this execution plan today, no comments, “just have a look at this”, he said.
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes|TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 65P(100)| |
| 1 | SORT ORDER BY | | 18E| 15E| 15E| 65P (78)|999:59:59 |
| 2 | COUNT | | | | | | |
|* 3 | FILTER | | | | | | |
| 4 | NESTED LOOPS | | 18E| 15E| | 14P (3)|999:59:59 |
| 5 | NESTED LOOPS | | 984G| 216T| | 14G (3)|999:59:59 |
| 6 | TABLE ACCESS FULL| CAT_6000_6001TBL | 7270K| 1074M| | 176K (3)| 00:15:46 |
| 7 | TABLE ACCESS FULL| CAT_6000TBL | 135K| 11M| | 1950 (3)| 00:00:11 |
| 8 | INDEX FULL SCAN | PK_OBJECTS | 32M| 306M| | 15207 (3)| 00:01:22 |
---------------------------------------------------------------------------------------------------
So I had a look. Yes – that’s a 65P in the cost column. I’ve seen worse (but not in a production system). Cost is not always a good indication of run time, by the way. It’s just a sort of normalized estimation after all. But the estimate for the number of rows and bytes (18E and 15E) are very impressive as well. This query ran for several hours before my buddy finally killed it. As you might expect, the query was missing a join condition between a couple of large tables (7M and 32M).
Here’s a test I worked up to see how big a number I could get.
SYS@LAB1024> !cat dplan.sql
set lines 150
select * from table(dbms_xplan.display_cursor('&sql_id','&child_no','typical'))
/
SYS@LAB1024> @dplan
Enter value for sql_id: gf5nnx0pyfqq2
Enter value for child_no:
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID gf5nnx0pyfqq2, child number 0
-------------------------------------
select a.col2, sum(a.col1) from kso.skew a, kso.skew b group by a.col2
Plan hash value: 321450672
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 689G(100)| |
| 1 | HASH GROUP BY | | 1 | 16 | 689G (84)|999:59:59 |
| 2 | MERGE JOIN CARTESIAN | | 1024T| 14P| 145G (22)|999:59:59 |
| 3 | TABLE ACCESS FULL | SKEW | 32M| 488M| 10032 (18)| 00:01:21 |
| 4 | BUFFER SORT | | 32M| | 689G (84)|999:59:59 |
| 5 | INDEX FAST FULL SCAN| SKEW_PK | 32M| | 4558 (22)| 00:00:37 |
-----------------------------------------------------------------------------------
17 rows selected.
SYS@LAB1024> @dplan
Enter value for sql_id: 12p7fuydx3dd5
Enter value for child_no:
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 12p7fuydx3dd5, child number 0
-------------------------------------
select a.col2, sum(a.col1) from kso.skew a, kso.skew b, kso.skew c group by
a.col2
Plan hash value: 175710540
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 18E(100)| |
| 1 | HASH GROUP BY | | 1 | 16 | 18E (81)|999:59:59 |
| 2 | MERGE JOIN CARTESIAN | | 18E| 15E| 4670P (22)|999:59:59 |
| 3 | MERGE JOIN CARTESIAN | | 1024T| 14P| 145G (22)|999:59:59 |
| 4 | TABLE ACCESS FULL | SKEW | 32M| 488M| 10032 (18)| 00:01:21 |
| 5 | BUFFER SORT | | 32M| | 145G (22)|999:59:59 |
| 6 | INDEX FAST FULL SCAN| SKEW_PK | 32M| | 4558 (22)| 00:00:37 |
| 7 | BUFFER SORT | | 32M| | 18E (81)|999:59:59 |
| 8 | INDEX FAST FULL SCAN | SKEW_PK | 32M| | 4558 (22)| 00:00:37 |
------------------------------------------------------------------------------------
21 rows selected.
SYS@LAB1024> @dplan
Enter value for sql_id: 7b53dxh6w6mpj
Enter value for child_no:
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 7b53dxh6w6mpj, child number 0
-------------------------------------
select a.col2, sum(a.col1) from kso.skew a, kso.skew b, kso.skew c, kso.skew
d group by a.col2
Plan hash value: 3965951819
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 18E(100)| |
| 1 | HASH GROUP BY | | 1 | 16 | 18E (0)|999:59:59 |
| 2 | MERGE JOIN CARTESIAN | | 18E| 15E| 18E (0)|999:59:59 |
| 3 | MERGE JOIN CARTESIAN | | 18E| 15E| 4670P (22)|999:59:59 |
| 4 | MERGE JOIN CARTESIAN | | 1024T| 14P| 145G (22)|999:59:59 |
| 5 | TABLE ACCESS FULL | SKEW | 32M| 488M| 10032 (18)| 00:01:21 |
| 6 | BUFFER SORT | | 32M| | 145G (22)|999:59:59 |
| 7 | INDEX FAST FULL SCAN| SKEW_PK | 32M| | 4558 (22)| 00:00:37 |
| 8 | BUFFER SORT | | 32M| | 4670P (22)|999:59:59 |
| 9 | INDEX FAST FULL SCAN | SKEW_PK | 32M| | 4558 (22)| 00:00:37 |
| 10 | BUFFER SORT | | 32M| | 18E (0)|999:59:59 |
| 11 | INDEX FAST FULL SCAN | SKEW_PK | 32M| | 4558 (22)| 00:00:37 |
-------------------------------------------------------------------------------------
24 rows selected.
So it looks like the cost tops out at 18E as does the estimated number of rows. Oddly the number of bytes appears to top out at 15E. So the production query had maxed out the rows and bytes estimate although the cost was significantly under the max. Still 65P is the biggest cost I’ve seen in a production system. Anyone seen a bigger one?
P.S. I have two categories for SQL related posts. “Developer Tricks” and “Wall of Shame”. This one gets both tags.
January 30, 2011, 7:52 pm
Ever wonder why that weird line shows up in the Predicate Section of an execution plan on Exadata? Me too! The STORAGE bit tells us it’s a filter applied at the storage cell layer, but the rest is nonsensical. Well I recently ran across a very brief mention of it in a Metalink note. (I know it’s not called Metalink any more, but I’m kind of set in my ways). The note said it was related to distribution of rows to PX slaves. Ah ha! Let’s test it out. Here’s a plan with the predicate just so you can see what it looks like.
KSO@arcp> @dplan
Enter value for sql_id: a9axwj6ym3b29
Enter value for child_no:
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID a9axwj6ym3b29, child number 0
-------------------------------------
select /*+ parallel_index(t, "ISD_SI_I03",8) dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0)
no_monitoring no_substrb_pad no_expand index_ffs(t,"ISD_SI_I03") */
count(*) as nrw,count(distinct sys_op_lbid(725425,'L',t.rowid)) as
nlb,count(distinct hextoraw(sys_op_descend("SUPPORT_LEVEL")||sys_op_desc
end("SUPPORT_LEVEL_KEY")||sys_op_descend("NAME")||sys_op_descend("VALUE"
))) as ndk,sys_op_countchg(substrb(t.rowid,1,15),1) as clf from
"ISD"."SUPPORT_INFO" sample block ( .0503530742,1) t where
"SUPPORT_LEVEL" is not null or "SUPPORT_LEVEL_KEY" is not null or
"NAME" is not null or "VALUE" is not null
Plan hash value: 1766555783
---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 672 (100)| | | | |
| 1 | SORT GROUP BY | | 1 | 56 | | | | | |
| 2 | PX COORDINATOR | | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10001 | 1 | 56 | | | Q1,01 | P->S | QC (RAND) |
| 4 | SORT GROUP BY | | 1 | 56 | | | Q1,01 | PCWP | |
| 5 | PX RECEIVE | | 1 | 56 | | | Q1,01 | PCWP | |
| 6 | PX SEND HASH | :TQ10000 | 1 | 56 | | | Q1,00 | P->P | HASH |
| 7 | SORT GROUP BY | | 1 | 56 | | | Q1,00 | PCWP | |
| 8 | PX BLOCK ITERATOR | | 397K| 21M| 672 (1)| 00:00:01 | Q1,00 | PCWC | |
|* 9 | INDEX STORAGE SAMPLE FAST FULL SCAN| ISD_SI_I03 | 397K| 21M| 672 (1)| 00:00:01 | Q1,00 | PCWP | |
---------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
9 - storage(:Z>=:Z AND :Z<=:Z AND ("SUPPORT_LEVEL" IS NOT NULL OR "SUPPORT_LEVEL_KEY" IS NOT NULL OR "NAME" IS NOT NULL OR
"VALUE" IS NOT NULL))
filter(("SUPPORT_LEVEL" IS NOT NULL OR "SUPPORT_LEVEL_KEY" IS NOT NULL OR "NAME" IS NOT NULL OR "VALUE" IS NOT NULL))
37 rows selected.
Notice that the plan is for a PX statement. So how can we convince ourselves that it is a PX related predicate. We can try two tests.
- Let's see if we can find any SQL statements that have the predicate that aren't PX statements.
- Let's see if we can find any PX statements that don't have the predicate.
So here we go.
Continue reading ‘storage(:Z>=:Z AND :Z’ »
January 22, 2011, 6:27 pm
Well I turned in the HCC chapter on the Exadata book last week and of course as is usually the case, I immediately ran across something kind of cool on the topic I just finished writing about. (we still have several editing passes though, so I can add it later). Anyway, although I don’t have time to post much these days, I thought this one would be a quick little snippet. So here it is.
The Compression Advisor is part of the DBMS_COMPRESSION package. Specifically it is the GET_COMPRESSION_RATIO procedure. This procedure is worthy of a separate post but I won’t discuss it here except to say that as of 11.2.0.2 you can use it to test HCC compression ratios on non-Exadata platforms. That’s pretty cool, but what I wanted to tell you about is a handy little function in the same package called GET_COMPRESSION_TYPE. This function can tell you exactly what level of compression has been applied to a single row. This can come in handy for investigating the inner workings of HCC (or OLTP or BASIC compression for that matter).
As you probably already know, HCC is only applied to records loaded via direct path writes. Any updates cause rows to be migrated out of that storage format into blocks flagged for OLTP compression. Of course OLTP compression on a block only kicks in when a block is “full”. On top of this, altering a table to change it’s compression does not actually change the storage format of any existing records (unless you use the MOVE keyword). So you could load some data and then change the designation (say from QUERY LOW to QUERY HIGH). Rows that are inserted after the change will be stored in the new format (assuming the records are loaded via direct path writes of course). So why am I telling you all this. Well, because I ran across a statement in some Oracle documentation that said you can check to see what compression method a table is stored with by looking at the COMPRESS_FOR column in the DBA_TABLES view. This column does reveal what the table designation is. However, the setting actually only tells you how rows inserted in the future will be compressed. It tells you absolutely nothing about the way current rows are stored.
As for the mechanics, it appears that each row has a bitmask associated with it showing what compression format is being used. So I wrote a little script to give me what I want to see (check_row_comp.sql) using the DBMS_COMPRESSION.GET_COMPRESSION_TYPE function. Here’s an example of its use.
Continue reading ‘EHCC and the GET_COMPRESSION_TYPE function’ »
January 14, 2011, 10:07 pm
I saw an interesting post recently where Greg Rahn talked about HCC mechanics. He claimed that an update to a record stored in HCC format did not require decompressing the whole Compression Unit (CU) which consist of several Oracle blocks. I’m assuming by this he meant that all the records contained in the CU did not get written back to storage in a non-HCC format due to a single record being updated. Greg then showed an example proving row migration occurred for an updated record. He didn’t show that the other records had not been decompressed though. So since I was already working on an HCC chapter for the upcoming Apress Exadata book, I thought I would take time off from the book writing to post this (hopefully the editors will forgive me).
Here’s the recipe: Basically we’ll update a single row, see that its rowid has changed, veify that we can still get to the record via its original rowid, and check to see if the TABLE FETCH CONTINUED ROW statistic gets updated when we we access the row via its original rowid, thus proving basic row migration (this is what Greg has already shown). Then we’ll look at block dumps for the original and new block to see what’s there.
Continue reading ‘EHCC Mechanics – Proof that whole CU’s are not decompressed’ »
January 7, 2011, 11:25 pm
Jonathan Lewis reminded me on my last post that using SQL Profiles (because they are part of the SQL Tuning Advisor) requires a license for Oracle Tuning Pack (which requires a license for the Diagnostics Pack). He also mentioned that Baselines did not require any additional license (at least creating and using Baselines on SQL statements). It’s been a while since I worked on a database that didn’t have both packs, but frankly I wasn’t sure I had a good handle of what was allowed and what wasn’t. So I thought it might be worthwhile to check. There is an easy way to check by the way. I did a post a while back on Tuning Pack and Diagnostic Pack license requirements for running AWR and how to check what was allowed and what wasn’t using the CONTROL_MANAGEMENT_PACK_ACCESS parameter. Here’s a link to the post:
Oracle Management Packs
Here’s an example using the same technique to show that SQL Profiles are indeed disabled by turning off the Diagnostic and Tuning Packs (at least on 11.2.02).
Continue reading ‘Licensing Requirements for SQL Profiles’ »