The Next Big Thing
Oracle’s 12.1.0.2 was released a few weeks ago (You can download it from OTN here: Oracle 12.1.0.2 Download). While technically a minor point release, it contains a couple of major features that would normally be rolled out in a more substantial version change like 12cR2 or perhaps V13. Of course the most highly anticipated feature is a new option (Oracle In-Memory Option) that provides a column oriented, in-memory store. Enkitec was in the Beta program, so we’ve been testing it out for quite a while now and we are impressed. Here’s a link to a video of a conversation between myself, Tanel Poder and Cary Millsap about the In-memory Option published prior to the general release. Note: the three of us are also scheduled to do a webcast on the topic on Sep. 17th at 9:00AM CDT. You can sign up here if you are interested: In-Memory Webcast
But back to the topic: What this new option provides is a radical departure from the way Oracle has traditionally managed data access. In the past, all data access was done using row-major format, which is a foundation of the Oracle RDBMS architecture (I’m of course leaving out some esoteric formats such as the hybrid columnar compressed (HCC) format that is available on Exadata). At any rate, this columnar format is a major change in the way data is accessed for Oracle, and while the name of the option indicates that the secret sauce is the fact that the data is accessed from memory, I’m going to argue that the “memory” part is not the most important factor. In my opinion, the column-oriented format is why it’s “The Next Big Thing”.
While accessing data from RAM is definitely faster than reading it off disk, it’s important to note that Oracle has been serving data from memory for decades via the standard buffer cache. In fact, you could describe the Oracle RDBMS as a very sophisticated disk caching mechanism. That’s certainly a vast over simplification, but it’s really not too far from reality. Many Oracle systems spend most of their time accessing data from the buffer cache. Back in the day, DBA’s even invented a metric to describe the effectiveness of the caching. The much maligned “buffer cache hit ratio” was used for that purpose and is still present in the modern day AWR reports. While tuning artificial ratios like this one has long since gone out of fashion, it’s important to note that it is not uncommon to see this ratio in the upper 90′s. (i.e. 99% of blocks being accessed from RAM is common) And in fact, we can pin tables in the buffer cache so that all rows are accessed from memory. So if that’s the case, then we should be able to compare speeds of queries executed on data in memory using both the standard row-major format and the new columnar format. Let’s give it a quick try.
SYS@INMEM> -- note that to enable the In-Memory Option we had to set the INMEMORY_SIZE parameter and bounce the database
SYS@INMEM>
SYS@INMEM> @parms
Enter value for parameter: inmemory_size
Enter value for isset:
Enter value for show_hidden:
NAME VALUE ISDEFAUL ISMODIFIED ISSET
-------------------------------------------------- ---------------------------------------------------------------------- -------- ---------- ----------
inmemory_size 107374182400 FALSE FALSE TRUE
Elapsed: 00:00:00.08
SYS@INMEM> -- then we had to tell Oracle that we wanted our table to take advantage of the column store
SYS@INMEM>
SYS@INMEM> alter table kso.skew3 inmemory;
Table altered.
Elapsed: 00:00:00.57
SYS@INMEM> -- we also told Oracle that we wanted our table to be cached in the normal buffer cache
SYS@INMEM>
SYS@INMEM> alter table kso.skew3 cache;
Table altered.
Elapsed: 00:00:00.01
SYS@INMEM> @inmem_tables
Enter value for owner: KSO
Enter value for table_name:
OWNER TABLE_NAME CACHE PRIORITY INMEMORY_DISTRI INMEMORY_COMPRESS
------------------------------ ------------------------------ ----- -------- --------------- -----------------
KSO SKEW3 Y NONE AUTO FOR QUERY LOW
Elapsed: 00:00:00.09
. . .
SYS@INMEM> -- note that the table was accessed in both modes to get it loaded in both the standard buffer cache and the column store
SYS@INMEM> -- (I cut the output to keep from clouding the issue)
SYS@INMEM> -- but you can see the blocks in the buffer cache below
SYS@INMEM>
SYS@INMEM> @buffers_in_use
Enter value for owner: KSO
Enter value for object_name: SKEW3
OWNER OBJECT_NAME OBJECT_TYPE BUFFERS
------------------------------ ------------------------------ -------------------- ----------
KSO SKEW3 TABLE 1305951
Elapsed: 00:00:02.37
SYS@INMEM> -- and you can see the table is in the column store as well
SYS@INMEM>
SYS@INMEM> @inmem_segs
Enter value for owner:
Enter value for segment_name:
OWNER SEGMENT_NAME ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED
------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------
KSO SKEW3 10,210.7 7,278.8 1.4 .0
----------------
sum 7,278.8
Elapsed: 00:00:00.00
SYS@INMEM> -- now let's test performance
SYS@INMEM> -- First let's try it from the standard buffer cache
SYS@INMEM>
SYS@INMEM> ALTER SESSION set inmemory_query = disable;
Session altered.
Elapsed: 00:00:00.00
SYS@INMEM> @avgskew3x
AVG(PK_COL)
-----------
16487500.8
Elapsed: 00:00:12.45
SYS@INMEM> /
AVG(PK_COL)
-----------
16487500.8
Elapsed: 00:00:12.41
SYS@INMEM>
SYS@INMEM> -- So about 12.5 seconds from the buffer cache
SYS@INMEM> @x
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 6vy9zstcnq02k, child number 5
-------------------------------------
select avg(pk_col) from kso.skew3 where col1 between 0 and 1000
Plan hash value: 2684249835 <--- Same as TABLE ACCESS INMEMORY FULL version (see below)
-----------------------------------------------------
| Id | Operation | Name | E-Rows |
-----------------------------------------------------
| 0 | SELECT STATEMENT | | |
| 1 | SORT AGGREGATE | | 1 |
|* 2 | TABLE ACCESS STORAGE FULL| SKEW3 | 64007 | <--- Exadata version of full scan (not necessarily off disk)
-----------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - storage(("COL1"<=1000 AND "COL1">=0))
filter(("COL1"<=1000 AND "COL1">=0))
21 rows selected.
Elapsed: 00:00:00.10
SYS@INMEM> -- Now let's try it from the column store
SYS@INMEM>
SYS@INMEM> ALTER SESSION set inmemory_query = enable;
Session altered.
Elapsed: 00:00:00.00
SYS@INMEM> @avgskew3x
AVG(PK_COL)
-----------
16487500.8
Elapsed: 00:00:03.03
SYS@INMEM> /
AVG(PK_COL)
-----------
16487500.8
Elapsed: 00:00:03.11
SYS@INMEM>
SYS@INMEM> -- So a little over 3 seconds from the column store
SYS@INMEM> @x
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 6vy9zstcnq02k, child number 4
-------------------------------------
select avg(pk_col) from kso.skew3 where col1 between 0 and 1000
Plan hash value: 2684249835 <--- Same as the TABLE ACCESS STORAGE FULL version
------------------------------------------------------
| Id | Operation | Name | E-Rows |
------------------------------------------------------
| 0 | SELECT STATEMENT | | |
| 1 | SORT AGGREGATE | | 1 |
|* 2 | TABLE ACCESS INMEMORY FULL| SKEW3 | 64007 | <--- new access option
------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - inmemory(("COL1"<=1000 AND "COL1">=0))
filter(("COL1"<=1000 AND "COL1">=0))
21 rows selected.
Elapsed: 00:00:00.02
So even though the data was accessed from memory on both tests, the column store was about 4X faster. So the speed up in this admittedly very simple example was not due to the fact that the data was accessed from RAM (because both tests accessed data directly from memory).
A couple of points about this example.
1. I hope you’ll take my word for it that no trickery was performed. The data was stored in standard 8K blocks and I didn’t do anything other than what is in the terminal output.
2. Oracle generally tries to prevent full scans of objects from wiping it out large portions of the buffer cache. This is a wise choice in most systems since the sizes of many databases far exceed the amount of RAM available for caching. However, it is possible to override that behavior by declaring that a table or partition should be cached. That’s what the CACHE table attribute does.
3. Note that the plan was actually different, even though the plan_hash_value didn’t reflect the difference. The first statement has TABLE ACCESS STORAGE FULL but the second version has TABLE ACCESS INMEMORY FULL. The reason the plan_hash_value doesn’t change is because only specific parts of the plan are included in the hash value calculation (see this post for more info: SQL Gone Bad – But Plan Not Changed? In this case, the operation (TABLE ACCESS) and not the options (INMEMORY FULL or STORAGE FULL) are used. This could be problematic as it will make it more difficult to identify a rather major decision by the optimizer (i.e. whether to use the In-Memory column store or not). This is analogous to the situation where a plan on Exadata doesn’t change, but on one execution a full table scan results in a smart scan while on a another execution it doesn’t. Performance can be vastly different even though the plan hash value hasn’t changed. I personally think this is a flaw and the behavior should be changed.
4. Of course the main point was that there is a significant difference in the execution time of the query when using the column store vs. using the buffer cache. In this simple case, the main difference was the fact that the query was very selective and the in-memory column store has a built in form of storage indexes (similar to Exadata storage indexes or the new zone maps) which allow Oracle to skip processing on regions of data via a min/max pruning.
Note that I did generate 10046 traces and the output clearly shows that neither test did any disk i/o.
So clearly the column orientation (and the optimizations enabled by it) makes a huge difference. So what is the secret sauce then? It’s the new format (and the improvement in processing that is possible with data arranged in columnar format) that provides the dramatic speed up. It’s by fundamentally changing the way the data is accessed, as opposed to the fact that the data is accessed from memory, that we are able to get the vast improvement in query execution times.
For those that are not bored out of their skulls at this point, here’s a bit of the tkprof output from the buffer cache test.
********************************************************************************
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 6 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 12.79 12.82 0 1304305 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 12.80 12.83 0 1304311 0 1
Misses in library cache during parse: 1
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 3 0.00 0.00
SQL*Net message from client 3 11.12 14.78
Disk file operations I/O 2 0.00 0.00
OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.00 0.00 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 6 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6 0.00 0.00 0 6 0 0
Misses in library cache during parse: 1
Misses in library cache during execute: 1
1 user SQL statements in session.
2 internal SQL statements in session.
3 SQL statements in session.
********************************************************************************
So as you can see, there was no time spent waiting on disk I/O.
A couple of final comments and then I’ll shut up. I expect this new feature to drive adoption of 12c. Generally we don’t see mass adoption of new Oracle versions until the second major point release (i.e. 11gR2 or 12cR2). But in this case, I believe the new In-Memory Option provides so much benefit that it will drive some customers to early adoption and will provide enough potential payback to make it worth being one of the pioneers. We already have Exadata customers planning to move to 12.1.0.2 at their next patch cycle. So I expect we’ll have several systems in production by the end of the year. I’m looking forward to seeing how it behaves in the real world.
Well that’s about it for now. I hope to do several posts on the In-Memory Option over the next few weeks. So let me know if you have anything specific you’d like to hear about.
Hi Kerry,
Good, helpful post as always. I hope you’ll allow a quick question. I seem to recall kso.skew3 from the Exadata book days.
Hey Kevin,
Yeah – it’s the same old basic table (but bigger). Nothing special though. Just a simple table with a non-normal distribution of values in the col1 column. And I misspoke, the main optimization was not storage indexes (although there were a bunch of storage regions that were pruned). More on that in the next post.
Kerry
Thanks for sharing . . .
Hey Kerry,
i totally agree with you for the mass adoption of R2 releases (it is also my experience), but in case of 12c even SAP starts with its certification process for 12.1.0.2 ( http://scn.sap.com/message/15321001#15321001 ). So 12.1.0.2 seems to be a “game changer” in the R1 release based on its feature sets / enhancements – so hopefully we gonna get 12c out in the field much earlier.
Regards
Stefan
Hi Stefan,
I agree, there will definitely be some delay on COTS certification, but as you suggest, the power of the feature may drive more rapid certification than usual. I certainly hope so!
Kerry