Exadata and Parallel Queuing

Over the years Oracle has added many enhancements in order to allow individual SQL statements to take full advantage of multiprocessor computers. A few months ago Cary Millsap did a talk where he recalled the presentation Larry Ellison did when Oracle first announced the Parallel Query feature. During Larry’s demo he had a multiprocessor computer all to himself. I don’t remember how many processors it had, but I remember he had some kind of graphic showing individual CPU utilization on one screen while he fired up a parallel query on another screen. The monitoring screen lit up like a Christmas tree. Every one of the CPU’s was pegged during his demo. When Cary was telling the story he said that he had wondered at the time what would have happened if there had been other users on the system during the demo. Their experience would probably not have been a good one. I remember having the exact same thought.

Oracle’s parallel capabilities have been a great gift but they have also been a curse because controlling the beast in an environment where there are multiple users trying to share the resources is pretty difficult. There have been many attempts at coming up with a reasonable way of throttling big parallel statements along the way. But to date, I think this technology has only been used effectively in batch processing environments and large data warehouses where consuming the whole machine’s resources is acceptable due to the relatively low degree of concurrency required by those environments.

So why did I mention Exadata in the title of this post? Well I believe that one of the most promising aspects of Exadata is it’s potential with regard to running mixed work loads (OLTP and DW) on the same database, without crippling one or the other. In order to do that, Oracle needs some mechanism to separate the workloads. Resource Manager is an option in this area, but it doesn’t go far enough in controlling throughput on parallel queries. This new queuing mechanism should be a great help in that regard. So let’s review the options:

Parallel Adaptive Multi User (the old way)
This ability to automatically downgrade the degree of parallelism based on what’s happening on the system when a query kicks off is actually a powerful mechanism and is the best approach we’ve had prior to 11g Release 2. The downside of this approach is that parallelized statements can have wildly varying execution times. As you can imagine, a statement that gets 32 slaves one time and then gets downgraded to serial execution the next time will probably not make the user very happy. The argument for this type of approach is that stuff is going to run slower if the system is busy regardless of what you do. And that users expect it to run slower when the system is busy. The first part of that statement may be true but I don’t believe the second part is (at least in most cases). The bigger problem with the downgrade mechanism though is that the the decision about how many slaves to use is based on a single point in time (the point when the parallel statement starts). And once the degree of parallelism (DOP) is set for a statement it can not be changed. That execution of the statement will run with the number of slaves it got to start with, even if additional resources become available. So consider the statement that takes a minute with 32 slaves that gets downgraded to serial due to a momentarily high load. And say that 10 seconds after it starts the system load drops back to more normal levels. Unfortunately, the serialized statement will continue to run for nearly 30 minutes with it’s single process, even though on average the system is not busier than usual.

Parallel Queuing (the new way)
Now let’s compare that with the new mechanism introduced in 11gR2 that allows parallel statements to be queued in a First In – First Out fashion. This mechanism separates (presumably) long running parallel queries from the rest of the workload. The mechanics are pretty simple. Turn the feature on. Set a target number of parallel slaves (parallel_servers_target). Run stuff. If a statement tries to start that requires exceeding the target, it will be queued until the required number of slaves become available.

The Parallel Queuing feature is controlled by a hidden parameter called “_parallel_statement_queuing”. A value of TRUE turns it on and FALSE turns it off. FALSE is the default by the way. This parameter is not documented but is set automatically when the PARALLEL_DEGREE_POLICY parameter is set to AUTO. Unfortunately, PARALLEL_DEGREE_POLICY is one of those parameters that controls more than one thing. When set to AUTO it also turns on Automatic DOP calculation. This feature calculates a DOP for each statement regardless of whether any objects have been decorated with a parallel setting. The result is that all kinds of statements are run in parallel, even if no objects have been specifically defined with a parallel degree setting. This is truly automatic parallel processing because the database decides what to run in parallel and with how many slaves. On top of that, by default, the slaves may be spread across multiple nodes in a RAC database (this can be disabled by setting PARALLEL_FORCE_LOCAL to TRUE). Finally, AUTO is supposed to enable “In Memory Parallel Query”. This poorly named feature refers to 11gR2′s ability to make use of the SGA for parallel query, as opposed to using direct reads exclusively. Note: I haven’t actually seen this kick in yet, which is probably good, since Exadata Offloading depends on direct reads. I haven’t seen it kick in on non-Exadata databases either though.

Unfortunately this combination of features is a little like the wild west with things running in parallel all over the place. But the ability to queue parallel statements does provide some semblance of order. And to be fair, there are a number of parameters that can be set to control how the calculations are performed. Anyway, here’s a brief synopsis of parameter changes caused by the various settings PARALLEL_DEGREE_POLICY.

    Parameters Affected by Parallel_Degree_Policy
    Parallel_Degree_Policy Parameter Value
    MANUAL _parallel_statement_queuing FALSE
    _parallel_cluster_cache_policy ADAPTIVE
    LIMITED _parallel_statement_queuing FALSE
    _parallel_cluster_cache_policy ADAPTIVE
    AUTO _parallel_statement_queuing TRUE
    _parallel_cluster_cache_policy CACHED

So let’s look at how it behaves:

-bash-3.2$ !sql
sqlplus / as sysdba
 
SQL*Plus: Release 11.2.0.1.0 Production on Mon Jul 5 13:10:26 2010
 
Copyright (c) 1982, 2009, Oracle.  All rights reserved.
 
 
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
 
EXADATA> @parms 
Enter value for parameter: parallel
Enter value for isset: 
Enter value for show_hidden: 
 
NAME                                               VALUE                                                                  ISDEFAUL ISMODIFIED ISSET
-------------------------------------------------- ---------------------------------------------------------------------- -------- ---------- ----------
fast_start_parallel_rollback                       LOW                                                                    TRUE     FALSE      FALSE
parallel_adaptive_multi_user                       TRUE                                                                   TRUE     FALSE      FALSE
parallel_automatic_tuning                          FALSE                                                                  TRUE     FALSE      FALSE
parallel_degree_limit                              CPU                                                                    TRUE     FALSE      FALSE
parallel_degree_policy                             AUTO                                                                   TRUE     TRUE       TRUE
parallel_execution_message_size                    16384                                                                  TRUE     FALSE      FALSE
parallel_force_local                               FALSE                                                                  TRUE     FALSE      FALSE
parallel_instance_group                                                                                                   TRUE     FALSE      FALSE
parallel_io_cap_enabled                            FALSE                                                                  TRUE     FALSE      FALSE
parallel_max_servers                               160                                                                    TRUE     FALSE      FALSE
parallel_min_percent                               0                                                                      TRUE     FALSE      FALSE
parallel_min_servers                               0                                                                      TRUE     FALSE      FALSE
parallel_min_time_threshold                        AUTO                                                                   TRUE     FALSE      FALSE
parallel_server                                    TRUE                                                                   TRUE     FALSE      FALSE
parallel_server_instances                          2                                                                      TRUE     FALSE      FALSE
parallel_servers_target                            64                                                                     TRUE     FALSE      FALSE
parallel_threads_per_cpu                           2                                                                      TRUE     FALSE      FALSE
recovery_parallelism                               0                                                                      TRUE     FALSE      FALSE
 
18 rows selected.
 
EXADATA> select owner, table_name, degree from dba_tables where table_name = 'SKEW';
 
OWNER      TABLE_NAME                     DEGREE
---------- ------------------------------ ------
KSO        SKEW                                1
 
EXADATA> set echo on
EXADATA> @avgskew
EXADATA> select avg(pk_col) from kso.skew a where col1 > 0
  2  /
 
AVG(PK_COL)
-----------
 16093749.3
 
EXADATA> @fs
Enter value for sql_text: select avg(pk_col) from kso.skew a where col1 > 0
Enter value for sql_id: 
 
SQL_ID         CHILD  PLAN_HASH      EXECS     AVG_ETIME      AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------ ------------------------------------------------------------
05cq2hb1r37tr      0  568322376          1         28.77      162,427 select avg(pk_col) from kso.skew a where col1 > 0
 
1 row selected.
 
EXADATA> @dplan
Enter value for sql_id: 05cq2hb1r37tr
Enter value for child_no: 
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  05cq2hb1r37tr, child number 0
-------------------------------------
select avg(pk_col) from kso.skew a where col1 > 0
 
Plan hash value: 568322376
 
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       | 28420 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    24 |            |          |
|*  2 |   TABLE ACCESS FULL| SKEW |    32M|   732M| 28420   (2)| 00:05:42 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("COL1">0)
 
 
19 rows selected.
 
EXADATA> alter system set parallel_degree_policy=auto;
 
System altered.
 
EXADATA> @avgskew
 
AVG(PK_COL)
-----------
 16093749.3
 
1 row selected.
 
EXADATA> @fs
Enter value for sql_text: 
Enter value for sql_id: 05cq2hb1r37tr
 
SQL_ID         CHILD  PLAN_HASH      EXECS     AVG_ETIME      AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------ ------------------------------------------------------------
05cq2hb1r37tr      0  568322376          1         28.77      162,427 select avg(pk_col) from kso.skew a where col1 > 0
05cq2hb1r37tr      2  578366071          1         40.33      162,437 select avg(pk_col) from kso.skew a where col1 > 0
 
2 rows selected.
 
EXADATA> @dplan
Enter value for sql_id: 05cq2hb1r37tr
Enter value for child_no: 2
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  05cq2hb1r37tr, child number 2
-------------------------------------
select avg(pk_col) from kso.skew a where col1 > 0
 
Plan hash value: 578366071
 
----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |       |       |  6308 (100)|          |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |    24 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |    24 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |    24 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |    32M|   732M|  6308   (1)| 00:01:16 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL| SKEW     |    32M|   732M|  6308   (1)| 00:01:16 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   6 - access(:Z>=:Z AND :Z<=:Z)
       filter("COL1">0)
 
Note
-----
   - automatic DOP: Computed Degree of Parallelism is 5
 
 
28 rows selected.
 
EXADATA> alter system set parallel_degree_policy=auto;
 
System altered.
 
EXADATA> !ss.sh avgskew.sql 20 kso/kso
 
starting 20 copies of avgskew.sql
 
EXADATA> set echo on
EXADATA> @queued_sql
EXADATA> col sql_text for a60 trunc
EXADATA> SELECT sid, sql_id, sql_exec_id, sql_text
  2  from v$sql_monitor
  3  WHERE status='QUEUED'
  4  order by 3
  5  /
 
 SID SQL_ID        SQL_EXEC_ID SQL_TEXT
---- ------------- ----------- ------------------------------------------------------------
 293 05cq2hb1r37tr    16777265 select avg(pk_col) from kso.skew a where col1 > 0
 270 05cq2hb1r37tr    16777266 select avg(pk_col) from kso.skew a where col1 > 0
   7 05cq2hb1r37tr    16777267 select avg(pk_col) from kso.skew a where col1 > 0
  22 05cq2hb1r37tr    16777268 select avg(pk_col) from kso.skew a where col1 > 0
  42 05cq2hb1r37tr    16777269 select avg(pk_col) from kso.skew a where col1 > 0
 101 05cq2hb1r37tr    16777270 select avg(pk_col) from kso.skew a where col1 > 0
  78 05cq2hb1r37tr    16777271 select avg(pk_col) from kso.skew a where col1 > 0
  60 05cq2hb1r37tr    16777272 select avg(pk_col) from kso.skew a where col1 > 0
 118 05cq2hb1r37tr    16777273 select avg(pk_col) from kso.skew a where col1 > 0
 160 05cq2hb1r37tr    16777274 select avg(pk_col) from kso.skew a where col1 > 0
 137 05cq2hb1r37tr    16777275 select avg(pk_col) from kso.skew a where col1 > 0
 181 05cq2hb1r37tr    16777276 select avg(pk_col) from kso.skew a where col1 > 0
 199 05cq2hb1r37tr    16777277 select avg(pk_col) from kso.skew a where col1 > 0
 216 05cq2hb1r37tr    16777278 select avg(pk_col) from kso.skew a where col1 > 0
 
14 rows selected.
 
EXADATA> -- using Tanel Poder's snapper - 
EXADATA> @snapper ash=sid+event+wait_class,ash1=plsql_object_id+plsql_subprogram_id+sql_id 5 1 all
Sampling with interval 5 seconds, 1 times...
 
-- Session Snapper v3.11 by Tanel Poder @ E2SN ( http://tech.e2sn.com )
 
 
--------------------------------------------------------------
Active% |    SID | EVENT                     | WAIT_CLASS
--------------------------------------------------------------
   100% |    118 | enq: JX - SQL statement q | Scheduler
   100% |    216 | enq: JX - SQL statement q | Scheduler
   100% |     78 | enq: JX - SQL statement q | Scheduler
   100% |     63 | enq: JX - SQL statement q | Scheduler
   100% |    233 | enq: JX - SQL statement q | Scheduler
   100% |    199 | enq: JX - SQL statement q | Scheduler
   100% |    137 | enq: JX - SQL statement q | Scheduler
   100% |     96 | enq: JX - SQL statement q | Scheduler
   100% |    101 | enq: JX - SQL statement q | Scheduler
   100% |     27 | enq: JX - SQL statement q | Scheduler
 
---------------------------------------------------
Active% | PLSQL_OBJE | PLSQL_SUBP | SQL_ID
---------------------------------------------------
  2095% |            |            | 05cq2hb1r37tr
     3% |            |            |
 
--  End of ASH snap 1, end=2010-07-04 20:18:05, seconds=5, samples_taken=39
 
 
PL/SQL procedure successfully completed.

There are several things worth mentioning in this example. First we see that with PARALLEL_DEGREE_POLICY set to it’s default value of MANUAL, my avgskew.sql script runs a serial plan. Second we see that when we set PARALLEL_DEGREE_POLICY to AUTO, the statement is automatically parallelized. Notice the note at the bottom of the plan output and you see that Automatic DOP was set at 5. Magical! I then used a shell script (ss.sh) to fire off 20 copies of the avgskew.sql script in rapid succession. Querying V$SQL_MONITOR showed that the statements were indeed queuing. Finally, I ran Tanel Poder’s snapper to see what event the queued statements were waiting on. Turns out it was “enq: JX – SQL statement queue”. Note that there is also an event called “PX Queuing: statement queue” that we didn’t see in the snapper output. I’m not exactly sure why Oracle split these two events apart, but apparently “PX Queuing: statement queue” is the event that clocks time when a statement is next in line, while “enq: JX – SQL statement queue” is used when a statement is in the queue but not the next one up. Guy Harrison has a really good post on the parallel_degree_policy that covers these two events here.

Oracle has provided some control over the Automatic DOP calculations as well with the PARALLEL_DEGREE_LIMIT parameter. The default value for this parameter is CPU which comes up with an “Ideal DOP” based on the amount of data, but then caps it with a formula based on CPU_COUNT, THREADS_PER_CPU and ACTIVE_INSTANCE_COUNT. There is a good description in this an Oracle white paper Parallel Execution Fundamentals that covers this calculation.

Of course, automatic DOP calculations are still a little scary. So it’s nice that there is a way to turn on the Parallel Queuing feature without enabling the Automatic DOP. Here’s another quick example showing that Queuing can be turned on without the other features:

EXADATA> alter system set parallel_degree_policy=manual;
 
System altered.
 
EXADATA> @flush_pool
 
System altered.
 
EXADATA> @avgskew
 
AVG(PK_COL)
-----------
 16093749.3
 
EXADATA> @fs
Enter value for sql_text: %avg%kso.skew%
Enter value for sql_id: 
 
SQL_ID         CHILD  PLAN_HASH      EXECS     AVG_ETIME      AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------ ------------------------------------------------------------
05cq2hb1r37tr      0  568322376          1         10.92      162,496 select avg(pk_col) from kso.skew a where col1 > 0
 
1 row selected.
 
EXADATA> @dplan
Enter value for sql_id: 05cq2hb1r37tr
Enter value for child_no: 
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  05cq2hb1r37tr, child number 0
-------------------------------------
select avg(pk_col) from kso.skew a where col1 > 0
 
Plan hash value: 568322376
 
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       | 28420 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    24 |            |          |
|*  2 |   TABLE ACCESS FULL| SKEW |    32M|   732M| 28420   (2)| 00:05:42 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("COL1">0)
 
 
19 rows selected.
 
EXADATA> -- so with MANUAL - the DOP is not set automatically
EXADATA> -- let's set the table to have a parallel degree
EXADATA>
EXADATA> alter table kso.skew parallel (degree 6);
 
Table altered.
 
EXADATA> @avgskew
 
AVG(PK_COL)
-----------
 16093749.3
 
1 row selected.
 
EXADATA> @fs
Enter value for sql_text: 
Enter value for sql_id: 05cq2hb1r37tr
 
SQL_ID         CHILD  PLAN_HASH      EXECS     AVG_ETIME      AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------ ------------------------------------------------------------
05cq2hb1r37tr      0  578366071          1         13.93      142,437 select avg(pk_col) from kso.skew a where col1 > 0
 
1 row selected.
 
EXADATA> @dplan
Enter value for sql_id: 05cq2hb1r37tr
Enter value for child_no: 
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  05cq2hb1r37tr, child number 0
-------------------------------------
select avg(pk_col) from kso.skew a where col1 > 0
 
Plan hash value: 578366071
 
----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |       |       |  5256 (100)|          |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |    24 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |    24 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |    24 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |    32M|   732M|  5256   (1)| 00:01:04 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL| SKEW     |    32M|   732M|  5256   (1)| 00:01:04 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   6 - access(:Z>=:Z AND :Z<=:Z)
       filter("COL1">0)
 
 
24 rows selected.
 
EXADATA>  !ss.sh avgskew.sql 20 kso/kso
 
starting 20 copies of avgskew.sql
 
EXADATA> @queued_sql
 
no rows selected
 
EXADATA> -- so now it's parallel but only objects or statements that we define as parallel
EXADATA> -- but no queuing, let's turn on queuing
EXADATA>
EXADATA> alter system set "_parallel_statement_queuing"=true;
 
System altered.
 
EXADATA> !ss.sh avgskew.sql 20 kso/kso
 
starting 20 copies of avgskew.sql 
 
EXADATA> @queued_sql
 
 SID SQL_ID        SQL_EXEC_ID SQL_TEXT
---- ------------- ----------- ------------------------------------------------------------
 181 05cq2hb1r37tr    16777363 select avg(pk_col) from kso.skew a where col1 > 0
 216                  16777364
  44                  16777365
 273                  16777366
 234                  16777367
 160                  16777368
 100                  16777369
 233                  16777370
  30                  16777371
 138                  16777372
   7                  16777373
 293                  16777374
 137                  16777375
  83                  16777376
 251                  16777377
  66                  16777378
 123                  16777379
 195                  16777380
 
18 rows selected.
 
EXADATA> -- now we have control of which statements are parallelized and we have queuing

This example shows that you can have Parallel Queuing without turning on Automatic DOP. Of course it’s using a hidden parameter, so don’t do this without checking with your mother first (I mean Oracle support). Maybe they will make this a real parameter in a future release (one can only hope). Note that there are hints to turn this feature on and off on a statement level as well (STATEMENT_QUEUING and NO_STATEMENT_QUEUING).

So why did I mention Exadata again? Mixed workloads require a mechanism that forces parallel queries to play nicely with others. (kind of like kindergarten) Parallel Queuing provides a relatively simple mechanism for doing just that. I also mentioned Resource Manager (RM) in passing. That’s another option at our disposal. It does have the ability to curb parallel processing as well and it’s enabled by default in 11gR2. RM, along with it’s kissing cousin IORM, are key components in consolidation strategies on Exadata as well. But we’ll have to leave that for another post. So much to do and so little time.

7 Comments

  1. [...] This post was Twitted by Henry_CHO [...]

  2. Thanks Kerry – nice article.
    Well explained, demonstrated and easily understood.

  3. “I don’t remember how many processors it had, but I remember he had some kind of graphic showing individual CPU utilization on one screen while he fired up a parallel query on another screen. ”

    …I certainly do. It was a Sequent Symmetry 5000. I ought to know. I set it up! :-) I was in Sequent Advanced Oracle Engineering at the time.

  4. osborne says:

    Ha! I was thinking it was a Sequent. That’s interesting that you set it up. I knew another guy that worked for Oracle back around that time frame (maybe a little earlier actually). Anyway, his job was to put together Larry’s presentations and then run them for him while he talked. I remember seeing Larry light into guys that were supporting him during a talk. I think that must have been a pretty unpleasant job. (not yours, the guy doing the presentations for him).

    Kerry

  5. [...] speaking of Exadata, Kerry Osborne talks on his blog, about the Oracle 11g Release 2 new feature, Parallel Queuing, and how it applies to data warehousing workloads on Exadata. Marc Fielding begins his grand tour [...]

  6. DBA Teracomp says:

    Hi Kerry

    Thank you for this article, it’s very useful!

    Please, continue to post articles like that to share your great knowledge about Exadata with us!

    My best regards,

    .:. Teracomp IT Consulting .:.

Leave a Reply