Kerry Osborne's Oracle Blog

12c In-Memory on RAC

September 16, 2014, 10:21 pm

I started looking into In-Memory on RAC this week. Data can be distributed across RAC nodes in a couple of different ways. The default is to spread it across the available nodes in the cluster. So if you had a 2 node cluster, roughly 50% of the data in your table or partition would be loaded into the column store in each of the 2 instances.

SYS@dw1> alter table kso.skew inmemory;

Table altered.

SYS@dw1> @gen_ddl
Enter value for object_type: 
Enter value for owner: KSO
Enter value for object_name: SKEW

DDL
--------------------------------------------------------------------------------

  CREATE TABLE "KSO"."SKEW"
   (    "PK_COL" NUMBER,
        "COL1" NUMBER,
        "COL2" VARCHAR2(30),
        "COL3" DATE,
        "COL4" VARCHAR2(1),
         PRIMARY KEY ("PK_COL")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 INVISIBLE COMPUTE STATISTICS
  STORAGE(INITIAL 865075200 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"  ENABLE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 1480589312 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"
  INMEMORY PRIORITY NONE MEMCOMPRESS FOR QUERY LOW
  DISTRIBUTE AUTO NO DUPLICATE                             <--- here's the RAC bit
   CACHE

SYS@dw1> @inmem_segs
Enter value for owner: 
Enter value for segment_name: 

OWNER                          SEGMENT_NAME                   ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED
------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------
                                                                             ----------------
sum

no rows selected

SYS@dw1> select count(*) from kso.skew;

  COUNT(*)
----------
  32000004

SYS@dw1> @inmem_segs
Enter value for owner: 
Enter value for segment_name: 

OWNER                          SEGMENT_NAME                   ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED
------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------
KSO                            SKEW                                  1,413.0            391.4        1.7              749.4
                                                                             ----------------
sum                                                                                     391.4
SYS@dw1> -- so about half the data is loaded in the local instance column store
SYS@dw1> -- let's see what's in the other instance's cache
SYS@dw1> l
  1  SELECT v.owner, v.segment_name,
  2  v.bytes/(1024*1024) orig_size_megs,
  3  v.inmemory_size/(1024*1024) in_mem_size_megs,
  4  (v.bytes - v.bytes_not_populated) / v.inmemory_size comp_ratio,
  5  v.bytes_not_populated/(1024*1024) megs_not_populated
  6  FROM v$im_segments v
  7  where owner like nvl('&owner',owner)
  8* and segment_name like nvl('&segment_name',segment_name)
SYS@dw1> l6
  6* FROM v$im_segments v
SYS@dw1> c/v$/gv$/
  6* FROM gv$im_segments v
SYS@dw1> /
Enter value for owner: 
Enter value for segment_name: 

OWNER                          SEGMENT_NAME                   ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED
------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------
KSO                            SKEW                                  1,413.0            569.1        1.6              526.6
KSO                            SKEW                                  1,413.0            391.4        1.7              749.4
                                                                             ----------------
sum                                                                                     960.5

So in this example with a non-partitioned table, the data is automatically spread (roughly 50/50) across the 2 instances. That’s pretty cool. It allows the column store to be scaled via adding RAC nodes. For partitioned tables there are a couple of options as to how the data can be distributed. You can distribute by partition or by rowid range. The default for a partitioned table appears to be to distribute partitions to the nodes in a round robin fashion not ordered by number of rows or size (as I would have expected), but then again I haven’t done much testing on it. Anyway, here’s a simple example using a partitioned table.

SYS@dw1> create table kso.skewp partition by range (col1) (partition "P1" values less than (2), partition p2 values less than (100000), partition p3 values less than (maxvalue)) as select * from kso.skew;

Table created.

SYS@dw1> select count(*) from kso.skewp partition (p1);

  COUNT(*)
----------
   3199971

SYS@dw1> select count(*) from kso.skewp partition (p2);

  COUNT(*)
----------
      8512

SYS@dw1> select count(*) from kso.skewp partition (p3);

  COUNT(*)
----------
  28791521

SYS@dw1> alter table kso.skewp inmemory;

Table altered.

SYS@dw1> select count(*) from kso.skewp;

  COUNT(*)
----------
  32000004

SYS@dw1> @inmem_part_segs
Enter value for owner: 
Enter value for segment_name: SKEWP

OWNER                          SEGMENT_NAME                   PARTITION_NAME                 ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED
------------------------------ ------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------
KSO                            SKEWP                          P1                                      128.0             55.4        2.3                 .0
KSO                            SKEWP                          P3                                    1,220.0            349.6        1.5              689.3
                                                                                                            ----------------
sum                                                                                                                    405.1

SYS@dw1> @inmem_part_segs_g
Enter value for owner: 
Enter value for segment_name: SKEWP

   INST_ID OWNER                          SEGMENT_NAME                   PARTITION_NAME                 ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED
---------- ------------------------------ ------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------
         1 KSO                            SKEWP                          P1                                      128.0             55.4        2.3                 .0
         1 KSO                            SKEWP                          P3                                    1,220.0            878.1        1.4                 .0
         2 KSO                            SKEWP                          P2                                        8.0              1.1        7.1                 .0
                                                                                                                       ----------------
sum                                                                                                                               934.7

SYS@dw1> -- very skewed distribution, P1 and P3 on node 1, and P2 on node 2
SYS@dw1> -- let's try again
SYS@dw1> alter table kso.skewp inmemory distribute by rowid range;

Table altered.

SYS@dw1> select count(*) from kso.skewp;

  COUNT(*)
----------
  32000004

SYS@dw1> @inmem_part_segs
Enter value for owner: 
Enter value for segment_name: SKEWP

set lines 200
OWNER                          SEGMENT_NAME                   PARTITION_NAME                 ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED
------------------------------ ------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------
KSO                            SKEWP                          P1                                      128.0             55.4        2.3                 .0
KSO                            SKEWP                          P3                                    1,220.0            605.2        1.4              358.9
                                                                                                            ----------------
sum                                                                                                                    660.6


SYS@dw1> @inmem_part_segs_g.sql
Enter value for owner: 
Enter value for segment_name: SKEWP

   INST_ID OWNER                          SEGMENT_NAME                   PARTITION_NAME                 ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED
---------- ------------------------------ ------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------
         1 KSO                            SKEWP                          P1                                      128.0             55.4        2.3                 .0
         1 KSO                            SKEWP                          P3                                    1,220.0            605.2        1.4              358.9
         2 KSO                            SKEWP                          P2                                        8.0              1.1        7.1                 .0
         2 KSO                            SKEWP                          P3                                    1,220.0            263.0        1.6              793.5
                                                                                                                       ----------------
sum                                                                                                                               924.8

So the default distribution mode for partitioned tables appears to be round robin by partition. This can result in very skewed data distributions as in this example. Using the DISTRIBUTE BY ROWID RANGE syntax we can get a more even distribution of data across the nodes. In this case it put the 2 smaller partitions on separate nodes and split the largest partition between the two (asymmetrically by the way, so that the data was actually spread pretty much equally).

There is another option which allows all the data to be duplicated across the instances. This is a high availability option. If a node needs to be restarted, query times could be impacted until the data is reloaded. Note that queries would still execute without error, but the data would have to be processed through the normal row oriented process. The DUPLICATE option populates the column store of both nodes with all the data for the object, in order to minimize performance impact if a node goes down. The documentation says this feature is only available on engineered systems by the way. Let’s have a look.

SYS@dw1> alter table kso.skew inmemory duplicate;

Table altered.

SYS@dw1> @gen_ddl
Enter value for object_type: 
Enter value for owner: KSO
Enter value for object_name: SKEW

DDL
--------------------------------------------------------------------------------

  CREATE TABLE "KSO"."SKEW"
   (    "PK_COL" NUMBER,
        "COL1" NUMBER,
        "COL2" VARCHAR2(30),
        "COL3" DATE,
        "COL4" VARCHAR2(1),
         PRIMARY KEY ("PK_COL")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 INVISIBLE COMPUTE STATISTICS
  STORAGE(INITIAL 865075200 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"  ENABLE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 1480589312 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"
  INMEMORY PRIORITY NONE MEMCOMPRESS FOR QUERY LOW
  DISTRIBUTE AUTO DUPLICATE
   CACHE

SYS@dw1> select count(*) from kso.skew;

  COUNT(*)
----------
  32000004

SYS@dw1> @inmem_segs
Enter value for owner: 
Enter value for segment_name: 

OWNER                          SEGMENT_NAME                   ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED
------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------
KSO                            SKEW                                  1,413.0            956.4        1.5                 .0
                                                                             ----------------
sum                                                                                     956.4

SYS@dw1> @inmem_segs_g.sql
Enter value for owner: 
Enter value for segment_name: 

   INST_ID OWNER                          SEGMENT_NAME                   ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED POPULATE_
---------- ------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------ ---------
         1 KSO                            SKEW                                  1,413.0            956.4        1.5                 .0 COMPLETED
         2 KSO                            SKEW                                  1,413.0            965.4        1.5                 .0 COMPLETED
                                                                                        ----------------
sum                                                                                              1,921.9

So the DUPLICATE keyword caused the entire table to be loaded into both instances (i.e. DUPLICATEd). Note that this test was done on an Exadata (an engineered system) so it should work.

How the data is accessed across the nodes is also interesting. When accessing data spread across two nodes with a serial plan, all the work appears to be done by the server process handling the connection on the driving node. (i.e. there don’t appear to be any processes on the remote that are burning CPU) When accessing data with a parallel plan on the other hand, slave processes are used on both nodes. This makes sense because that capability already existed. In this case though, the slaves appear to automatically be spread across the nodes at least when the data is distributed (i.e. not duplicated). This is in contrast to the normal spreading of slaves across a cluster which tends to want to keep them on the same node if possible. Anyway, here is an example. Note that this example has a very unusual SQL Monitor output.

SYS@dw1> select /*+ parallel */ avg(pk_col+col1) from kso.skewp a;

AVG(PK_COL+COL1)
----------------
      16588749.5

SYS@dw1> @rsm
Enter value for sql_id: 
Enter value for sql_exec_id: 

REPORT
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL Monitoring Report

SQL Text
------------------------------
select /*+ parallel */ avg(pk_col+col1) from kso.skewp a

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  SYS (141:54624)
 SQL ID              :  0qmdbnszhb2nh
 SQL Execution ID    :  16777218
 Execution Started   :  09/16/2014 22:01:27
 First Refresh Time  :  09/16/2014 22:01:26
 Last Refresh Time   :  09/16/2014 22:01:28
 Duration            :  1s
 Module/Action       :  sqlplus@enkdb03.enkitec.com (TNS V1-V3)/-
 Service             :  SYS$USERS
 Program             :  sqlplus@enkdb03.enkitec.com (TNS V1-V3)
 Fetch Calls         :  1

Global Stats
===================================================================================================
| Elapsed |   Cpu   |    IO    | Application |  Other   | Fetch | Buffer | Read | Read  |  Cell   |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Calls |  Gets  | Reqs | Bytes | Offload |
===================================================================================================
|    3.67 |    3.66 |     0.00 |        0.00 |     0.00 |     1 |  81893 |    1 | 352KB |  59.84% |
===================================================================================================

Parallel Execution Details (DOP=3 , Servers Allocated=3)
 Instances  : 2

=======================================================================================================================================================
| Instance |      Name      | Type  | Server# | Elapsed |   Cpu   |    IO    | Application |  Other   | Buffer | Read | Read  |  Cell   | Wait Events |
|          |                |       |         | Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) |  Gets  | Reqs | Bytes | Offload | (sample #)  |
=======================================================================================================================================================
| 1        | PX Coordinator | QC    |         |    0.01 |    0.01 |          |        0.00 |     0.00 |     13 |      |     . |    NaN% |             |
| 1        | p000           | Set 1 |       1 |    1.05 |    1.05 |          |             |          |     21 |      |     . |    NaN% |             |
| 1        | p001           | Set 1 |       2 |    1.01 |    1.01 |     0.00 |             |          |     69 |    1 | 352KB |  59.84% |             |
| 2        | p000           | Set 1 |       3 |    1.60 |    1.60 |          |             |          |  81790 |      |     . |    NaN% |             |
=======================================================================================================================================================

Instance Drill-Down
====================================================================================================================================
| Instance | Process Names | Elapsed |   Cpu   |    IO    | Application |  Other   | Buffer | Read | Read  |  Cell   | Wait Events |
|          |               | Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) |  Gets  | Reqs | Bytes | Offload |             |
====================================================================================================================================
|    1     | QC p000 p001  |    2.07 |    2.06 |     0.00 |        0.00 |     0.00 |    103 |    1 | 352KB |  59.84% |             |
|    2     | p000          |    1.60 |    1.60 |          |             |          |  81790 |      |       |    NaN% |             |
====================================================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=2545210427)
============================================================================================================================================================================
| Id |             Operation             |   Name   |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Cell   |  Mem  | Activity | Activity Detail |
|    |                                   |          | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload | (Max) |   (%)    |   (# samples)   |
============================================================================================================================================================================
|  0 | SELECT STATEMENT                  |          |         |      |         1 |     +1 |     1 |        1 |      |       |         |       |          |                 |
|  1 |   SORT AGGREGATE                  |          |       1 |      |         1 |     +1 |     1 |        1 |      |       |         |       |          |                 |
|  2 |    PX COORDINATOR                 |          |         |      |         1 |     +1 |     4 |        3 |      |       |         |       |          |                 |
|  3 |     PX SEND QC (RANDOM)           | :TQ10000 |       1 |      |         1 |     +1 |     3 |        3 |      |       |         |       |          |                 |
|  4 |      SORT AGGREGATE               |          |       1 |      |         1 |     +1 |     3 |        3 |      |       |         |       |    33.33 | Cpu (1)         |
|  5 |       PX BLOCK ITERATOR           |          |     32M |  783 |         1 |     +1 |     3 |      32M |      |       |         |       |          |                 |
|  6 |        TABLE ACCESS INMEMORY FULL | SKEWP    |     32M |  783 |         2 |     +0 |    31 |      32M |    1 | 352KB |  60.00% |    1M |    66.67 | in memory (2)   |
============================================================================================================================================================================

Can you see why this one is so unusual? This plan showed a TABLE ACCESS INMEMORY FULL, which had both inmemory access and offloading via smart scan (probably on different partitions). I’m not at all sure why that happened though. If you know please enlighten me.

Category: 12c, In-Memory, Oracle | Comment (RSS) | Trackback

12 Comments

kevinclosson says:

September 17, 2014 at 12:04 pm

Kerry,

I want to just take a moment to say thanks for these detailed posts on column store functionality! Very helpful!

Reply to this comment
Sidney Chen says:

September 17, 2014 at 10:04 pm

Hi Kerry,

Thanks for the sharing.

If the data scanned from in memory columnar store, seems the buffer gets will not increase, it makes sense since we are not doing traditional logical reads. From the last sql monitor report, the buffer gets for the P000 slave on instance 2 is 81790, which is around 600M, I guess if some partitions of SKEWP on instance 2 is not loaded into in memory columnar store.

Reply to this comment
osborne says:

September 19, 2014 at 3:41 pm

Hi Sidney,

I haven’t looked too closely at statistics being gathered during execution yet, but v$sql has a column called buffer_gets. It is not incremented when accessing objects in the column_store. There are 3 columns that store data about IM though. There are also a lot of stats being collected. Check out v$mystat for your session and look for %IM% in the stat name. I’ll do a post on that if I can find the time (or maybe Tanel will beat me to it).

Kerry

Reply to this comment
- Sidney Chen says:
  
  October 14, 2014 at 9:41 am
  
  Kerry,
  
  To enable parallel scan affinity for inmemory across RAC, you need to set parallel_degree_policy to auto(AutoDOP) if the table are not duplicate all on Exadata or on non-engineer system.
  
  Reply to this comment
Fairlie Rego says:

September 23, 2014 at 12:08 am

Hi Kerry,

Am unable to get the in memory column store to kick in on an ADG environment.
Do you know if this is a documented restriction.?

Ta
Fairlie

Reply to this comment
Fairlie Rego says:

September 24, 2014 at 5:58 pm

Ok found the documentation 🙂

The IM column store cannot be used on an Active Data Guard standby instance in the current release.
However it can be used in a Logical Standby instance and in an instance maintained using Oracle
Golden Gate.

Cheers
Fairlie

Reply to this comment
osborne says:

September 24, 2014 at 6:46 pm

Hey Fairlie,

Thanks for letting me know. I hadn’t had a chance to look into that yet. Lot’s of stuff to investigate these days. 🙂 Are you by any chance going to OOW? It would be good to put a face with a name.

Kerry

Reply to this comment
Bertrand says:

December 6, 2014 at 5:48 am

Hi Kerry,

Regarding the In-Memory data distribution on RAC, I tried to get rid of the default behaviour on non Engineered Systems (“Every row of the test table is stored in the IMCS of either one instance or the other”) for 2 reasons:

1) Christian Antognini finding: It could lead to bad performance.
2) In case of service defined as preferred/available, it would make sense to have the IMCS populated with all the DATA on the “active” Instance.

To get rid of the default behaviour, we can set the “_inmemory_auto_distribute” hidden parameter to false.

I tested it for:

RAC (non CDB) here: https://bdrouvot.wordpress.com/2014/11/20/in-memory-instance-distribution-with-rac-databases-i-want-at-least-one-instance-that-contains-all-the-data/
RAC with PDB here: https://bdrouvot.wordpress.com/2014/11/21/in-memory-instance-distribution-with-rac-and-multitenant-environment/
RAC with PDB and service defined as preferred/available here (Note that in that case FAN callouts can be enough):https://bdrouvot.wordpress.com/2014/12/04/fan-callouts-for-in-memory-instance-distribution-with-rac-pdb-and-preferredavailable-service/

Thx
Bertrand

Reply to this comment
Tarun says:

February 5, 2015 at 5:01 pm

Hi Kerry,
Here is my scenario. I have 2 node RAC.

My table size on disk is
—-
select owner , segment_name,sum(bytes) “Bytes”
from dba_segments
where owner=’XXCCS_O’
and segment_name=’XXCCS_DS_SL_CVR’
group by owner, segment_name;
————–
OWNER SEGMENT_NAME Bytes
XXCCS_O XXCCS_DS_SL_CVR 1,946,157,056
————-
When I load a subset of columns from this table inmemory I get the following output from the following query
—–
SELECT v.inst_id,v.owner,
v.segment_name,
v.segment_type,
v.bytes,
v.inmemory_size,
v.inmemory_duplicate
FROM gv$im_segments v
where owner=’XXCCS_O’
and segment_name=’XXCCS_DS_SL_CVR’
—–
INST_ID OWNER SEGMENT_NAME SEGMENT_TYPE BYTES INMEMORY_SIZE INMEMORY_DUPLICATE
2 XXCCS_O XXCCS_DS_SL_CVR TABLE 1,946,157,056 356,909,056 NO DUPLICATE
1 XXCCS_O XXCCS_DS_SL_CVR TABLE 1,946,157,056 356,909,056 NO DUPLICATE
——-
Table Size from dba_segments is – 1,946,157,056 bytes
Now on both the RAC instances it shows Bytes loaded as – 1,946,157,056
bytes as shown in the above output.

This is confusing me because I thought that it would distribute the data across both the nodes as the data is loaded with NO DUPLICATE as shown in the gv$im_segments. But looking at the numbers it shows that the same bytes have been loaded across both the RAC Nodes.

Can you please explain the numbers here?

I don’t want to load the same data inmemory across both the Nodes but want to divide and distribute the data across the nodes as I have limited memory space and want to be sure I am not doing anything wrong here.
Regards
Tarun

Reply to this comment
Oracle 12c in-memory database | OracLog says:

February 8, 2015 at 8:37 pm

[…] Kerry Osbourne in-memory database […]

Reply to this comment
ThePhantomDataWrangler says:

February 17, 2015 at 5:08 am

It’s kind of annoying. The various DISTRIBUTE and REPLICATE options are quietly (or not so quietly) ignored on non Exadata systems. There’s no way to build node affinities by manually specifying the target node for distribution (e.g. by partition), so any cleverness in RAC design has to spill out to the application to perhaps query gv$im_segment metadata and deliberately target one node, unless using PQ.

Reply to this comment
In-Memory Instance distribution with RAC databases: I want at least one instance that contains all the data | bdt's oracle blog says:

March 21, 2015 at 2:17 am

[…] Kerry Osborne show us how we can distribute the data (using the distribute INMEMORY attribute) across the nodes into this blog post. […]

Reply to this comment

12c In-Memory on RAC

12 Comments

Leave a Reply to osborne

Recent Posts

Recent Comments

Archives

Categories

Meta