Tuning Oracle to Make a Query Slower

I had an interesting little project this morning. Of course it takes longer to write it down than to do actually do it, but it was kind of interesting and since I haven’t done a post in quite some time (and it’s the day before Thanksgiving, so it’s pretty quite at the office anyway) I decided to share. One of the Enkitec guys (Tim Fox) was doing a performance comparison between various platforms (Exadata using it’s IB Storage Network, Oracle Database Appliance (ODA) using it’s direct attached storage, and a standard database on a Dell box using EMC fiber channel attached storage). The general test idea was simple – see how the platforms stacked up for a query that required a full scan of a large table. More specifically, what Tim wanted to see was the relative speed at which the various storage platforms could return data. The expectation was that the direct attached storage would be fastest and the fibre channel storage would be slowest (especially since we only had a single 2G HBA). He tested ODA and Exadata and got basically what he expected, but when he went to test the database on the Dell he was surprised that it was actually faster than either of the other two tests. So here’s some output from the initial tests: First the Exadata. It’s an X2 quarter rack with one extra storage server. Note that we had to set cell_offload_processing to false to turn off the Exadata storage optimizations, thus giving us a measurement of the hardware capabilities without the Exadata offloading.

> !sqlp
sqlp
 
SQL*Plus: Release 11.2.0.2.0 Production on Wed Nov 23 11:08:28 2011
 
Copyright (c) 1982, 2010, Oracle.  All rights reserved.
 
 
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
 
SYS@DEMO1> @uptime
 
INSTANCE_NAME    STARTUP_TIME      CURRENT_TIME         DAYS    SECONDS
---------------- ----------------- ----------------- ------- ----------
DEMO1            07-NOV-2011 12:37 23-NOV-2011 11:08   15.94    1377058
 
SYS@DEMO1> set sqlprompt "_USER'@'EXADATA'>' "
SYS@EXADATA> 
SYS@EXADATA> ! cat /etc/redhat-release
Enterprise Linux Enterprise Linux Server release 5.5 (Carthage)
 
SYS@EXADATA> ! uname -a
Linux enkdb03.enkitec.com 2.6.18-194.3.1.0.3.el5 #1 SMP Tue Aug 31 22:41:13 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
 
SYS@EXADATA> alter session set "_serial_direct_read"=always;
 
Session altered.
 
SYS@EXADATA> alter session set cell_offload_processing=false;
 
Session altered.
 
SYS@EXADATA> set autotrace on
SYS@EXADATA> set timing on
SYS@EXADATA> select count(*) from instructor.class_sales;
 
  COUNT(*)
----------
  90000000
 
Elapsed: 00:00:43.01
 
Execution Plan
----------------------------------------------------------
Plan hash value: 3145879882
 
----------------------------------------------------------------------------------
| Id  | Operation                  | Name        | Rows  | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |             |     1 |   314K  (1)| 00:00:02 |
|   1 |  SORT AGGREGATE            |             |     1 |            |          |
|   2 |   TABLE ACCESS STORAGE FULL| CLASS_SALES |    90M|   314K  (1)| 00:00:02 |
----------------------------------------------------------------------------------
 
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    1168567  consistent gets
    1168557  physical reads
          0  redo size
        526  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
 
SYS@EXADATA> set autotrace off
SYS@EXADATA> @fss
Enter value for sql_text: select count(*) from instructor.class_sales
Enter value for sql_id: 
 
SQL_ID         CHILD      EXECS   AVG_ROWS     AVG_ETIME       AVG_CPU       AVG_PIO      AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------- ------------- ------------ ----------------------------------------
b2br1x82p9862      0          1          1         43.00          3.16  1,168,557.00    1,168,567 select count(*) from instructor.class_sa
 
Elapsed: 00:00:00.08

So the test on the Exadata took 43 seconds to read and transport roughly 1 million 8K blocks. The same test on the ODA looked like this:

[oracle@patty scripts]$ !rl rlwrap sqlplus / as sysdba  
 
SQL*Plus: Release 11.2.0.2.0 Production on Wed Nov 23 10:16:51 2011  
 
Copyright (c) 1982, 2010, Oracle.  All rights reserved.   
 
 
Connected to: 
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production 
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, 
Data Mining and Real Application Testing options   
 
INSTANCE_NAME    STARTUP_TIME               CURRENT_TIME                  DAYS    SECONDS 
---------------- -------------------------- -------------------------- ------- ---------- 
ODA1             22-NOV-2011 12:23          23-NOV-2011 10:16              .91      78793  
 
SYS@ODA1> ! cat /etc/redhat-release 
 
Red Hat Enterprise Linux Server release 5.5 (Tikanga)  
 
SYS@ODA1> ! uname -a 
 
Linux patty 2.6.18-194.32.1.0.1.el5 #1 SMP Tue Jan 4 16:26:54 EST 2011 x86_64 x86_64 x86_64 GNU/Linux  
 
SYS@ODA1> set timing on
 
SYS@ODA1> alter session set '_serial_direct_read'=always;  
 
Session altered.  
 
Elapsed: 00:00:00.00 
 
SYS@ODA1> set autotrace on 
SYS@ODA1> select count(*) from instructor.class_sales;    
 
COUNT(*) 
----------   
90000000  
Elapsed: 00:00:30.60  
 
Execution Plan 
---------------------------------------------------------- 
Plan hash value: 3145879882  
 
-------------------------------------------------------------------------- 
| Id  | Operation          | Name        | Rows  | Cost (%CPU)| Time     | 
-------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT   |             |     1 |   317K  (1)| 00:00:05 | 
|   1 |  SORT AGGREGATE    |             |     1 |            |          | 
|   2 |   TABLE ACCESS FULL| CLASS_SALES |    90M|   317K  (1)| 00:00:05 | 
--------------------------------------------------------------------------   
 
 
Statistics 
----------------------------------------------------------          
        99  recursive calls           
         0  db block gets     
   1154080  consistent gets     
   1153994  physical reads         
       516  redo size         
       526  bytes sent via SQL*Net to client         
       524  bytes received via SQL*Net from client           
         2  SQL*Net roundtrips to/from client           
         7  sorts (memory)           
         0  sorts (disk)           
         1  rows processed  
 
SYS@ODA1> set autotrace off 
SYS@ODA1> @fss 
Enter value for sql_text: select count(*) from instructor.class_sales 
Enter value for sql_id:   
 
SQL_ID         CHILD PLAN_HASH_VALUE      EXECS ROWS_PROCESSED  AVG_ETIME    AVG_CPU    AVG_PIO    AVG_LIO SQL_TEXT 
------------- ------ --------------- ---------- -------------- ---------- ---------- ---------- ---------- ---------------------------------------- 
b2br1x82p9862      0      3145879882          1              1      30.55      13.91  1,153,994  1,154,080 select count(*) from instructor.class_sa

As expected, the direct attached disk was faster than moving the blocks across the IB network. It took about 30 seconds to read roughly the same number of blocks. The same test on the Dell produced this output:

 [osborne@homer scripts]$ sqlp  
 
SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 23 11:20:20 2011  
 
Copyright (c) 1982, 2011, Oracle.  All rights reserved.   
 
 
Connected to: 
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production 
With the Partitioning, OLAP, Data Mining and Real Application Testing options  
 
SYS@EXADATA> set sqlprompt "'_USER'@'DELL'>'" 
SYS@DELL> !cat /etc/redhat-release 
 
Red Hat Enterprise Linux Server release 5.5 (Tikanga)  
 
SYS@DELL> ! uname -a 
 
Linux homer.enkitec.com 2.6.18-194.el5 #1 SMP Mon Mar 29 22:10:29 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux  
 
SYS@DELL> alter session set '_serial_direct_read'=always;  
 
Session altered.  
 
SYS@DELL> set autotrace on 
 
SYS@DELL> set timing on 
 
SYS@DELL> select count(*) from instructor.class_sales;    
 
COUNT(*) 
----------   
90000000  
 
Elapsed: 00:00:11.31  
 
Execution Plan 
---------------------------------------------------------- 
Plan hash value: 3145879882  
 
-------------------------------------------------------------------------- 
| Id  | Operation          | Name        | Rows  | Cost (%CPU)| Time     | 
-------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT   |             |     1 |   315K  (1)| 01:03:08 | 
|   1 |  SORT AGGREGATE    |             |     1 |            |          | 
|   2 |   TABLE ACCESS FULL| CLASS_SALES |    90M|   315K  (1)| 01:03:08 | 
--------------------------------------------------------------------------   
 
 
Statistics ----------------------------------------------------------          
        77  recursive calls  
         0  db block gets  
   1168660  consistent gets  
   1168569  physical reads  
         0  redo size  
       526  bytes sent via SQL*Net to client  
       524  bytes received via SQL*Net from client  
         2  SQL*Net roundtrips to/from client  
        10  sorts (memory)  
         0  sorts (disk)  
         1  rows processed  
 
SYS@DELL> set autotrace off 
SYS@DELL> @fss 
Enter value for sql_text: select count(*) from instructor.class_sales 
Enter value for sql_id:   
 
SQL_ID         CHILD      EXECS   AVG_ROWS     AVG_ETIME       AVG_CPU       AVG_PIO      AVG_LIO SQL_TEXT 
------------- ------ ---------- ---------- ------------- ------------- ------------- ------------ ---------------------------------------- 
b2br1x82p9862      0          1          1         11.29         11.25  1,168,569.00    1,168,660 select count(*) from instructor.class_sa  
 
Elapsed: 00:00:00.03

This is where the results were surprising. Tim expected the single 2G HBA to be considerably slower than the other two set ups, but it only took 11 seconds on the Dell / EMC set up to read the same data. So this is where I got to help Tim try to make it slower! (there’s a first time for everything) So what gives? The first thought was that Oracle was caching the data: But no – you’ll notice that we set the _serial_direct_read parameter to always, which forces direct path reads and bypasses the buffer cache altogether. Also, the stats bear out that Oracle is doing physical reads. But 1 Million real i/o’s can’t happen in 11 seconds so it’s got to be memory access right? (I’m sure you’re way ahead of me by now) The answer of course is "Right – it is memory access!" – but it’s the Linux file system cache not the Oracle buffer cache. A quick check of the Linux memory usage showed us that the file cache was over 20G and the table was less than 10G. So Oracle must be set up without Direct I/O.

SYS@DELL> !cat /proc/meminfo 
 
MemTotal:     32948892 kB 
MemFree:       2769420 kB 
Buffers:        615664 kB 
Cached:       24727756 kB 
SwapCached:     101844 kB 
Active:       21322184 kB 
Inactive:      5261948 kB 
HighTotal:           0 kB 
HighFree:            0 kB 
LowTotal:     32948892 kB 
LowFree:       2769420 kB 
SwapTotal:    34996216 kB 
SwapFree:     34147776 kB 
Dirty:             448 kB 
Writeback:           0 kB 
AnonPages:     1138828 kB 
Mapped:        4758828 kB 
Slab:          1088356 kB 
PageTables:     167464 kB 
NFS_Unstable:        0 kB 
Bounce:              0 kB 
CommitLimit:  51470660 kB 
Committed_AS: 15637340 kB 
VmallocTotal: 34359738367 kB 
VmallocUsed:    371108 kB 
VmallocChunk: 34359367199 kB 
HugePages_Total:     0 
HugePages_Free:      0 
HugePages_Rsvd:      0 
Hugepagesize:     2048 kB  
 
SYS@DELL> !realfreemem.sh -a 
 
Free Memory: 2703M 
Cached Memory: 24148M 
Total Free Memory: 26851M 
Total Memory: 32176M 
Percent Memory Free (including cache): 83%  
 
SYS@DELL> @table_size 
Enter value for owner: INSTRUCTOR 
Enter value for table_name: CLASS_SALES 
Enter value for type:   
 
OWNER                SEGMENT_NAME                   TYPE               TOTALSIZE_MEGS TABLESPACE_NAME 
-------------------- ------------------------------ ------------------ -------------- ------------------------------ 
INSTRUCTOR           CLASS_SALES                    TABLE                     9,151.0 ODACOMP_DATA 
                                                                       -------------- 
sum                                                                           9,151.0  
 
Elapsed: 00:00:00.04
 
SYS@DELL> -- so how is Oracle configured for I/O? 
SYS@DELL> @parms 
Enter value for parameter: filesys 
Enter value for isset:  
Enter value for show_hidden:   
 
NAME                                               VALUE                                                                  ISDEFAUL ISMODIFIED ISSET 
-------------------------------------------------- ---------------------------------------------------------------------- -------- ---------- ---------- 
filesystemio_options                               NONE                                                                   TRUE     FALSE      FALSE  
 
Elapsed: 00:00:00.01

So the database was not configured to use direct i/o or async i/o (filesystemio_options=none). A quick check with strace verified that direct i/o was not being used. So we modified the filesystemio_options setting and tried again.

SYS@DELL> alter system set filesystemio_options=setall scope=spfile;  
 
System altered.  
 
SYS@DELL> startup force 
ORACLE instance started.  
 
Total System Global Area 4910620672 bytes 
Fixed Size                  2236648 bytes 
Variable Size            2298482456 bytes 
Database Buffers         2600468480 bytes 
Redo Buffers                9433088 bytes 
Database mounted. Database opened. 
 
SYS@ODACOMP>  
SYS@ODACOMP> @uptime  
 
INSTANCE_NAME    STARTUP_TIME               CURRENT_TIME                  DAYS    SECONDS 
---------------- -------------------------- -------------------------- ------- ---------- 
ODACOMP          23-NOV-2011 12:45          23-NOV-2011 12:47              .00        121  
 
SYS@ODACOMP> set sqlprompt "'_USER'@'DELL'>'" 
SYS@DELL>
SYS@DELL> ! cat /etc/redhat-release 
 
Red Hat Enterprise Linux Server release 5.5 (Tikanga)  
 
SYS@DELL> ! uname -a 
 
Linux homer.enkitec.com 2.6.18-194.el5 #1 SMP Mon Mar 29 22:10:29 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux  
 
SYS@DELL> set timing on 
SYS@DELL> alter session set '_serial_direct_read'=always;  
 
Session altered.  
 
Elapsed: 00:00:00.00
 
SYS@DELL> set autotrace on 
 
SYS@DELL> select count(*) from instructor.class_sales;    
 
COUNT(*) 
----------   
90000000  
 
Elapsed: 00:01:39.73  
 
Execution Plan 
---------------------------------------------------------- 
Plan hash value: 3145879882  
 
-------------------------------------------------------------------------- 
| Id  | Operation          | Name        | Rows  | Cost (%CPU)| Time     | 
-------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT   |             |     1 |   315K  (1)| 01:03:08 | 
|   1 |  SORT AGGREGATE    |             |     1 |            |          | 
|   2 |   TABLE ACCESS FULL| CLASS_SALES |    90M|   315K  (1)| 01:03:08 | 
--------------------------------------------------------------------------   
 
 
Statistics ----------------------------------------------------------  
         1  recursive calls  
         0  db block gets  
   1168567  consistent gets  
   1168557  physical reads  
         0  redo size  
       526  bytes sent via SQL*Net to client  
       524  bytes received via SQL*Net from client  
         2  SQL*Net roundtrips to/from client  
         0  sorts (memory)  
         0  sorts (disk)  
         1  rows processed  
 
SYS@DELL> set autotrace off 
SYS@DELL> @fss 
Enter value for sql_text: select count(*) from instructor.class_sales 
Enter value for sql_id:   
 
SQL_ID         CHILD      EXECS   AVG_ROWS     AVG_ETIME       AVG_CPU       AVG_PIO      AVG_LIO SQL_TEXT 
------------- ------ ---------- ---------- ------------- ------------- ------------- ------------ ---------------------------------------- 
b2br1x82p9862      0          1          1         99.71          7.32  1,168,557.00    1,168,567 select count(*) from instructor.class_sa  
 
Elapsed: 00:00:00.05

Ah that’s more like it. Roughly 100 seconds now to complete the query. That’s 10X slower. Nice job. It feels so good when you can make something go slower. ;) I should note that in general, it is a very good idea to set filesystemio_options=SETALL. So please don’t jump to the conclusion that setting this parameter to NONE will make your database run faster. These days, the memory is generally better used by Oracle than by the file system cache. Glenn Fawcett has a good (if somewhat dated) post which compares the effects of caching blocks in the file system cache vs. in the Oracle buffer cache in Solaris. As you might image, allowing Oracle to use the memory is generally more effective as shown in his results. So that’s it for today. By the way, this post took 4 times as long to write as the actual testing took. Maybe I’ll get better at it if I start practicing a little more often.

7 Comments

  1. Gwen Shapira says:

    In short, if you can read 10g table into the OS cache and not the SGA buffer cache, you are not sizing your SGA right.

  2. More or less exactly what I encountered some 5-7 years ago. Large solaris machine (at that time; 32GB memory), database user-data size 1-2GB. In order to make the processing go faster, I routinely advised to mount the data filesystems with ‘forcedirectio’….resulting in the database performance getting much times slower, because now it needed to be read from disk, instead from the operating system cache.

  3. osborne says:

    Gwen,

    Absolutely, if we wanted to make this query run fast, then making the buffer cache bigger would do it. But I was trying to make it go slower ;) – that’s why it was interesting (to me anyway).

    Frits,

    Yes, in 32 bit days there were distinct advantages to having big file system cache in some cases (like yours).

  4. Alex says:

    an alternative to flushing caches in linux : echo 3 > /proc/sys/vm/drop_caches

  5. Danyc says:

    Hi Kerry,

    By any chance, can you show the output of the fss/ realfreemem.sh and table_size scripts?

    Many thanks,
    Dani

  6. osborne says:
    $ cat table_SIZE.SQL
    set lines 155
    compute sum of totalsize_megs on report
    break on report
    col owner for a20
    col segment_name for a30
    col segment_type for a10
    col totalsize_megs for 999,999.9
    select * from (
    select owner, segment_name, segment_type type,
    sum(bytes/1024/1024) as totalsize_megs,
    tablespace_name
    from dba_segments
    where owner like nvl(‘&owner’,owner)
    and segment_name like nvl(‘&table_name’,segment_name)
    and segment_type like nvl(‘&type’,segment_type)
    group by owner, segment_name, tablespace_name, segment_type
    order by 4 desc
    )
    where rownum < 30
    order by 4;
    
    $ cat realfreemem.sh
    #!/bin/ksh
    
    FREEMEM=`cat /proc/meminfo | grep MemFree: | sed -e 's/ */ /g' | cut -f2 -d' '`
    CACHED=`cat /proc/meminfo | grep Cached: | grep -v SwapCached: | sed -e 's/ */ /g' | cut -f2 -d' '`
    TOTALMEM=`cat /proc/meminfo | grep MemTotal: | sed -e 's/ */ /g' | cut -f2 -d' '`
    
    ((FREEMEM=${FREEMEM}/1024))
    ((CACHED=${CACHED}/1024))
    ((TOTALMEM=${TOTALMEM}/1024))
    
    ((TOTALFREE=${FREEMEM}+${CACHED}))
    ((REALFREE=$TOTALFREE*100))
    ((REALFREEPCT=$REALFREE/$TOTALMEM))
    
    if [ "$1" = "-a" ]; then
    echo 'Free Memory:' ${FREEMEM}'M'
    echo 'Cached Memory:' ${CACHED}'M'
    echo 'Total Free Memory:' ${TOTALFREE}'M'
    echo 'Total Memory:' ${TOTALMEM}'M'
    echo 'Percent Memory Free (including cache):' ${REALFREEPCT}'%'
    else
    echo 'Percent Memory Free (including cache):' ${REALFREEPCT}'%'
    fi
    
    return $REALFREEPC
    
    
    
    		
    				
  7. DanyC says:

    One word buddy: Many thanks!!

    Dany

Leave a Reply