Oracle Exadata – Storage Indexes

Wow! – I was stunned a few days ago by Exadata’s Storage Indexes. I was doing a little testing to see what could be offloaded and what couldn’t (more on that later). I have a 384 million row table I was using on our Exadata Quarter Rack test system. A single threaded full scan with no where clause on the table takes about 24 seconds (ho hum – it’s amazing how quickly we become numbed to the outstanding performance ). So imagine my surprise when I decided to check and see how many nulls I had in a column and the result came back in .07 seconds. Wow! I thought it was a bug! Turns out it was the Storage Indexes. Alright already, I’ll show you some output from the system (by the way, as usual I used a couple of scripts: fsx.sql and mystats.sql):

SYS@LABRAT1> select count(*) from kso.skew3;
 
  COUNT(*)
----------
 384000048
 
1 row selected.
 
Elapsed: 00:00:24.06
SYS@LABRAT1> /
 
  COUNT(*)
----------
 384000048
 
1 row selected.
 
Elapsed: 00:00:23.94
 
SYS@LABRAT1> set timing off
SYS@LABRAT1> @mystats
Enter value for name: %storage%
 
NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index                                        0
 
1 row selected.
 
SYS@LABRAT1> set timing on
SYS@LABRAT1> select count(*) from kso.skew3 where col1 is null;
 
  COUNT(*)
----------
        12
 
1 row selected.
 
Elapsed: 00:00:00.07
SYS@LABRAT1> set timing off
SYS@LABRAT1> @fsx
Enter value for sql_text: select count(*) from kso.skew3 where col1 is null
Enter value for sql_id: 
Enter value for inst_id: 
 
 
 INST SQL_ID         CHILD  PLAN_HASH      EXECS     AVG_ETIME      AVG_LIO    AVG_PIO AVG_PX OFFLOADABLE IO_SAVED_% SQL_TEXT
----- ------------- ------ ---------- ---------- ------------- ------------ ---------- ------ ----------- ---------- ----------------------------------------
    1 0u1q4b7puqz6g      0 2684249835          5           .09    1,956,226  1,956,219      0 Yes             100.00 select count(*) from kso.skew3 where col
 
1 row selected.
 
SYS@LABRAT1> @mystats
Enter value for name: %storage%
 
NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index                              16012763136
 
1 row selected.

So apparently Storage Indexes are NULL aware. Very cool! This may have repercussions regarding design and implementation decisions. There are systems that don’t use NULLs in order to insure that they can access records via B-Tree indexes (which as you’re aware do not store NULLs). SAP for example uses a single space character instead of NULLs.

The Sun / Oracle Exadata V2 has a number of features that get a lot of attention because of the dramatic performance gains they can provide. Smart Scan / Offloading, Hybrid Columnar Compression (HCC), Flash Cache, Infiniband Fabric, etc… Exadata Storage Indexes are not really high on the list of things that the sales guys tout, but they can provide a very significant performance benefit. One of the reasons they aren’t talked about much is that there is nothing really that you can do to affect how they work. There is no tuning them in other words. They just work. However, understanding how they work can certainly help you make better decisions about how to build systems and how to write SQL statements. I had an online discussion with someone a few weeks ago about Storage Indexes and the comment was made that we didn’t really need to understand them, because there was nothing we could do about them anyway. My thought was that we couldn’t do anything about the fact that normal B-Tree indexes don’t store NULLs either, but it is pretty important information to know when you are designing a system or trying to access data using a B-Tree index.

I found a very good description of Storage Indexes in an Oracle White Paper:


Oracle Sun Database Machine Application Best Practices for Data Warehousing [ID 1094934.1]

Exdata Storage Index

In Oracle Database 11gR2 a new feature called the Exadata Storage Index has been introduced. The storage index contains a summary of the data distribution on the disk and provides an additional method to eliminate unnecessary disk IO.

Each disk in the Exadata storage cell is divided into equal sized pieces called storage regions (default 1MB). There is an index entry for every storage region (1MB of data stored on disk). Each entry contains the minimum and maximum value for columns seen in ‘where’ clause predicates. Information for up to 8 columns can be stored. The index is then used to eliminate disk IO by identifying which storage regions don’t match the ‘where’ clause of a query. For example consider the following query

The storage index is stored in the memory on each of the Exadata storage cells and is created and maintained transparently. However, if a storage cell is shutdown or rebooted the storage index will be lost from memory and will be recreated on subsequent accesses to the data after the cell has been brought back online.

The storage index can be used as an additional partitioning strategy if the data to be loaded is sorted on the additional partitioning column. For example, in the TPC-H benchmark, the Lineitem table is range partitioned on L_PARTKEY but several of the queries have where clause predicates on the SHIPDATE column. By loading the data in the Lineitem table sorted by SHIPDATE, the storage index can be used to eliminate data based on the SHIPDATE.

In order for the data to be fully sorted during the load you MUST load using an INSERT /*+ APPEND */ statement with an order by clause on the column you want indexed. Before doing the load you should examine the execution plan to check that the sort and the load are being done by the same set of parallel server processes and that no data redistribution takes place between the sort and the load steps in the plan.

The same document also contained this nifty graphic. You get the idea though right? Keep a min/max value for each 1M storage region.

The idea of using the Storage Index as an additional partitioning strategy is very interesting. But as you can see from my example, Storage Indexes can also provide a huge benefit for highly skewed data. This is an area that Oracle has traditionally struggled with. The typical scenario involves using an index for highly selective values and using a table scan for non-selective values. This choice is complicated by Oracle’s default bind variable peeking behavior. With Exadata, we may find that we are better off just ignoring the issue altogether. The full table scan will pretty much always provide the best performance possible, without the possibility of bind variable peeking or the overhead of a normal B-Tree index (that is to say, we probably won’t create the index at all). The storage index will provide very quick access to the highly selective values while the normal Exadata Offloading will handle the non-selective values better than we ever thought possible.

By the way, Netezza has a very similar “feature” called zone maps, although they apparently use a 3M storage unit. The larger unit means that it would be much more sensitive to how the data is sorted on disk, at least for the “virtual” partition elimination behavior.

One last thing. There is a way to turn off Storage Indexes on Exadata via a hidden parameter (_kcfis_storageidx_disabled), if you should ever want to do that (just for testing the feature perhaps).

Here’s a quick example:

SYS@LABRAT1> select name, value from v$mystat s, v$statname n where n.statistic# = s.statistic# and name like '%storage%';
 
NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index                                        0
 
Elapsed: 00:00:00.01
SYS@LABRAT1> alter session set "_kcfis_storageidx_disabled"=true -- turn them off
  2  /
 
Session altered.
 
Elapsed: 00:00:00.00
SYS@LABRAT1> select count(*) from kso.skew3 where col1 is null;
 
  COUNT(*)
----------
        12
 
Elapsed: 00:00:13.91
SYS@LABRAT1> select name, value from v$mystat s, v$statname n where n.statistic# = s.statistic# and name like '%storage%';
 
NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index                                        0
 
Elapsed: 00:00:00.00
SYS@LABRAT1> alter session set "_kcfis_storageidx_disabled"=false -- back to default which turns it on
  2  /
 
Session altered.
 
Elapsed: 00:00:00.00
SYS@LABRAT1> select count(*) from kso.skew3 where col1 is null;
 
  COUNT(*)
----------
        12
 
Elapsed: 00:00:00.07
SYS@LABRAT1> select name, value from v$mystat s, v$statname n where n.statistic# = s.statistic# and name like '%storage%';
 
NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index                              16012763136
 
Elapsed: 00:00:00.01

As always, your comments are welcomed.

Leave a Reply