Oracle Exadata – Storage Indexes
Wow! – I was stunned a few days ago by Exadata’s Storage Indexes. I was doing a little testing to see what could be offloaded and what couldn’t (more on that later). I have a 384 million row table I was using on our Exadata Quarter Rack test system. A single threaded full scan with no where clause on the table takes about 24 seconds (ho hum – it’s amazing how quickly we become numbed to the outstanding performance ). So imagine my surprise when I decided to check and see how many nulls I had in a column and the result came back in .07 seconds. Wow! I thought it was a bug! Turns out it was the Storage Indexes. Alright already, I’ll show you some output from the system (by the way, as usual I used a couple of scripts: fsx.sql and mystats.sql):
SYS@LABRAT1> select count(*) from kso.skew3;
COUNT(*)
----------
384000048
1 row selected.
Elapsed: 00:00:24.06
SYS@LABRAT1> /
COUNT(*)
----------
384000048
1 row selected.
Elapsed: 00:00:23.94
SYS@LABRAT1> set timing off
SYS@LABRAT1> @mystats
Enter value for name: %storage%
NAME VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index 0
1 row selected.
SYS@LABRAT1> set timing on
SYS@LABRAT1> select count(*) from kso.skew3 where col1 is null;
COUNT(*)
----------
12
1 row selected.
Elapsed: 00:00:00.07
SYS@LABRAT1> set timing off
SYS@LABRAT1> @fsx
Enter value for sql_text: select count(*) from kso.skew3 where col1 is null
Enter value for sql_id:
Enter value for inst_id:
INST SQL_ID CHILD PLAN_HASH EXECS AVG_ETIME AVG_LIO AVG_PIO AVG_PX OFFLOADABLE IO_SAVED_% SQL_TEXT
----- ------------- ------ ---------- ---------- ------------- ------------ ---------- ------ ----------- ---------- ----------------------------------------
1 0u1q4b7puqz6g 0 2684249835 5 .09 1,956,226 1,956,219 0 Yes 100.00 select count(*) from kso.skew3 where col
1 row selected.
SYS@LABRAT1> @mystats
Enter value for name: %storage%
NAME VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index 16012763136
1 row selected.
So apparently Storage Indexes are NULL aware. Very cool! This may have repercussions regarding design and implementation decisions. There are systems that don’t use NULLs in order to insure that they can access records via B-Tree indexes (which as you’re aware do not store NULLs). SAP for example uses a single space character instead of NULLs.
The Sun / Oracle Exadata V2 has a number of features that get a lot of attention because of the dramatic performance gains they can provide. Smart Scan / Offloading, Hybrid Columnar Compression (HCC), Flash Cache, Infiniband Fabric, etc… Exadata Storage Indexes are not really high on the list of things that the sales guys tout, but they can provide a very significant performance benefit. One of the reasons they aren’t talked about much is that there is nothing really that you can do to affect how they work. There is no tuning them in other words. They just work. However, understanding how they work can certainly help you make better decisions about how to build systems and how to write SQL statements. I had an online discussion with someone a few weeks ago about Storage Indexes and the comment was made that we didn’t really need to understand them, because there was nothing we could do about them anyway. My thought was that we couldn’t do anything about the fact that normal B-Tree indexes don’t store NULLs either, but it is pretty important information to know when you are designing a system or trying to access data using a B-Tree index.
I found a very good description of Storage Indexes in an Oracle White Paper:
Oracle Sun Database Machine Application Best Practices for Data Warehousing [ID 1094934.1]Exdata Storage Index
In Oracle Database 11gR2 a new feature called the Exadata Storage Index has been introduced. The storage index contains a summary of the data distribution on the disk and provides an additional method to eliminate unnecessary disk IO.
Each disk in the Exadata storage cell is divided into equal sized pieces called storage regions (default 1MB). There is an index entry for every storage region (1MB of data stored on disk). Each entry contains the minimum and maximum value for columns seen in ‘where’ clause predicates. Information for up to 8 columns can be stored. The index is then used to eliminate disk IO by identifying which storage regions don’t match the ‘where’ clause of a query. For example consider the following query
The storage index is stored in the memory on each of the Exadata storage cells and is created and maintained transparently. However, if a storage cell is shutdown or rebooted the storage index will be lost from memory and will be recreated on subsequent accesses to the data after the cell has been brought back online.
The storage index can be used as an additional partitioning strategy if the data to be loaded is sorted on the additional partitioning column. For example, in the TPC-H benchmark, the Lineitem table is range partitioned on L_PARTKEY but several of the queries have where clause predicates on the SHIPDATE column. By loading the data in the Lineitem table sorted by SHIPDATE, the storage index can be used to eliminate data based on the SHIPDATE.
In order for the data to be fully sorted during the load you MUST load using an INSERT /*+ APPEND */ statement with an order by clause on the column you want indexed. Before doing the load you should examine the execution plan to check that the sort and the load are being done by the same set of parallel server processes and that no data redistribution takes place between the sort and the load steps in the plan.
The same document also contained this nifty graphic. You get the idea though right? Keep a min/max value for each 1M storage region.
The idea of using the Storage Index as an additional partitioning strategy is very interesting. But as you can see from my example, Storage Indexes can also provide a huge benefit for highly skewed data. This is an area that Oracle has traditionally struggled with. The typical scenario involves using an index for highly selective values and using a table scan for non-selective values. This choice is complicated by Oracle’s default bind variable peeking behavior. With Exadata, we may find that we are better off just ignoring the issue altogether. The full table scan will pretty much always provide the best performance possible, without the possibility of bind variable peeking or the overhead of a normal B-Tree index (that is to say, we probably won’t create the index at all). The storage index will provide very quick access to the highly selective values while the normal Exadata Offloading will handle the non-selective values better than we ever thought possible.
By the way, Netezza has a very similar “feature” called zone maps, although they apparently use a 3M storage unit. The larger unit means that it would be much more sensitive to how the data is sorted on disk, at least for the “virtual” partition elimination behavior.
One last thing. There is a way to turn off Storage Indexes on Exadata via a hidden parameter (_kcfis_storageidx_disabled), if you should ever want to do that (just for testing the feature perhaps).
Here’s a quick example:
SYS@LABRAT1> select name, value from v$mystat s, v$statname n where n.statistic# = s.statistic# and name like '%storage%';
NAME VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index 0
Elapsed: 00:00:00.01
SYS@LABRAT1> alter session set "_kcfis_storageidx_disabled"=true -- turn them off
2 /
Session altered.
Elapsed: 00:00:00.00
SYS@LABRAT1> select count(*) from kso.skew3 where col1 is null;
COUNT(*)
----------
12
Elapsed: 00:00:13.91
SYS@LABRAT1> select name, value from v$mystat s, v$statname n where n.statistic# = s.statistic# and name like '%storage%';
NAME VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index 0
Elapsed: 00:00:00.00
SYS@LABRAT1> alter session set "_kcfis_storageidx_disabled"=false -- back to default which turns it on
2 /
Session altered.
Elapsed: 00:00:00.00
SYS@LABRAT1> select count(*) from kso.skew3 where col1 is null;
COUNT(*)
----------
12
Elapsed: 00:00:00.07
SYS@LABRAT1> select name, value from v$mystat s, v$statname n where n.statistic# = s.statistic# and name like '%storage%';
NAME VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index 16012763136
Elapsed: 00:00:00.01
As always, your comments are welcomed.
A very interesting post.
There are so few people who write about their exadata experience.
Please write more often about this new toy 🙂 .
Have you already applied patch bundle 4 for exadata2 ?
Hello Kerry,
thank you for this series – very interesting to read; and the performance of the storage index is stunning indeed. Please continue to post on this topic 🙂
>Each entry contains the minimum and maximum value for columns seen in ‘where’ clause predicates. Information for up to 8 columns can be stored.
I think here is a “tuning thing” which might be configurable (perhaps some time in the future) – which columns to index on the storage level. Do you happen to know which columns Oracle indexes by default on the Exadata? From the picture it seems like Oracle will index first 8 columns of a table, but who knows.
Alex,
I will definitely write more as time allows. We have not applied BP4 yet – hopefully next week.
Timur,
Thanks for your comment. The performance difference caused by the Storage Index is really only stunning because I already had my mind made up (due to years of training) that nulls aren’t stored in indexes. It’s really only the difference between using an index on selective values vs. not using the index. It’s just that Storage Index know about nulls, so I had to re-adjust my thinking. And that what’s so interesting about the platform. I think we will be re-adjusting a lot of our thinking. (think about situations where people have avoided nulls, or built function based indexes to get nulls into an index, tucked nullable columns between two other columns – all so they could use indexes to access them) Anyway, it will be fun to see how our thinking changes.
On the 8 columns bit, I haven’t had a chance to test, but I expect it’s not tied to column position, but rather where clause usage. (i.e. the first where clause it sees takes up the first slot or 2 or 3, the next one gets the next available slot etc…) But this is just a guess. And then what happens when they get to the 9th column? Again a wild guess, they probably just ignore it once the 8 slots are full. I would expect at some point they might reuse a slot if they figure out that one of the slots is holding values for a column that is rarely used in where clauses, but again just guessing. I plan to do some further testing on this feature as soon as time allows.
Kerry
Hi kerry,
Thank you for this nice article, the BP5 is out already 😉
Charles
[…] Osborne is stunned by Exadata’s storage indexes as he does a little […]
Great article, Kerry – highly useful information for folks considering implementing Exadata. Thank you for taking the pains to provide a wonderful introduction to Exadata Storage Indexes!
[…] So both the folks slinging the FUD and the folks spewing the light-speed, anti-matter-powered warp throughput claims have something in common—they don’t understand the technology. That is quickly changing though. Web content is popping up from sources I know and trust. Sources outside the walls of Oracle as well. In fact, two newly accepted co-members of the OakTable Network have started blogging about their Exadata systems. Kerry Osborne and Frits Hoogland have been posting about Exadata lately (e.g., Kerry Osborne on Exadata Storage Indexes). […]
[…] Storage Indexes […]
Forgive me for re-opening an old thread, but I can’t seem to find a reasonable link anywhere to address this question: “Do Exadata’s Storage Indexes work on virtual columns?” I can conceptualize arguments both ways, but lack the in-depth knowledge of Exadata to test out any theory.
No problem, I think the answer is no but it’s hard to prove a negative. The simple cases I have tested all resulted in SI’s not being used. Certainly user defined functions could not be executed by the storage cells, but even simple ones like col+1 don’t appear to work. You could test your specific case by forcing serial reads and making sure you got smart scans. Then checking the “cell physical IO bytes saved by storage index” stat from v$mystat before and after your test.
[…] a quick post on a new Exadata feature called Zone Maps. They’re similar to storage indexes on Exadata, but with more control (you can define the columns and how the data is refreshed for […]
Question – How do Storage Indexes work with tables that are compressed? Compress for OLTP specifically? I think I know the answer “0 bytes saved by storage index” … but wanted to get your feedback.
SI’s should work fine with compression. Just tested on 11.2.0.3 DB with OLTP compression and it worked as expected. Note that it’s possible you hit a bug regarding reporting cell physical IO bytes saved by storage index statistics as I noticed some funkiness during my testing where the SI was obviously being used (elapsed time in hundreths of a second) but the statistic was not updated. I didn’t go looking on MOS for a bug or check with any of the development guys at Oracle, but I suspect there may be a bug there that pops up occasionally. See my example below where without (first execution) SI it was .63 seconds and after it was .03 seconds. In this case the stats reported correctly.
Thanks Kerry … I think my issue might be more with OLTP compression. My current client has a table which has been created with OLTP compression. The compression appears to be fine when the data is initially loaded. However, every row in the table gets updated during the day … herein lies the problem. The number of bytes utilized by the table’s current partition more than doubles by day’s end even though the update simply changes status-relate columns. I’ve been reading about OLTP compressed data being uncompressed and migrated during update operations. Table statistic values (num_rows*avg_row_len) is much smaller when compared to actual bytes used whereas before the update operations, the actual bytes used was less than half of the stats estimation.
Planning on doing some testing with a non-compressed table sorted by the desired key values to fully maximize the SIs.
thanx for your time … SPS
One of the finest explanation of Exadata Storage Indexes.
Thanks Kerry The Rocker!
Thanks
Prashant Dixit
Lead Oracle DBA – Ericsson
Dear Kerry,
Thank you for the post. These kind of things are talked about very less. And for people like me who are interested in them and do some work regarding tuning and maintaining databases, its really helpful. God bless you for the work you have done.