Tuning Oracle to Make a Query Slower
I had an interesting little project this morning. Of course it takes longer to write it down than to do actually do it, but it was kind of interesting and since I haven’t done a post in quite some time (and it’s the day before Thanksgiving, so it’s pretty quite at the office anyway) I decided to share. One of the Enkitec guys (Tim Fox) was doing a performance comparison between various platforms (Exadata using it’s IB Storage Network, Oracle Database Appliance (ODA) using it’s direct attached storage, and a standard database on a Dell box using EMC fiber channel attached storage). The general test idea was simple – see how the platforms stacked up for a query that required a full scan of a large table. More specifically, what Tim wanted to see was the relative speed at which the various storage platforms could return data. The expectation was that the direct attached storage would be fastest and the fibre channel storage would be slowest (especially since we only had a single 2G HBA). He tested ODA and Exadata and got basically what he expected, but when he went to test the database on the Dell he was surprised that it was actually faster than either of the other two tests. So here’s some output from the initial tests: First the Exadata. It’s an X2 quarter rack with one extra storage server. Note that we had to set cell_offload_processing to false to turn off the Exadata storage optimizations, thus giving us a measurement of the hardware capabilities without the Exadata offloading.
> !sqlp
sqlp
SQL*Plus: Release 11.2.0.2.0 Production on Wed Nov 23 11:08:28 2011
Copyright (c) 1982, 2010, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SYS@DEMO1> @uptime
INSTANCE_NAME STARTUP_TIME CURRENT_TIME DAYS SECONDS
---------------- ----------------- ----------------- ------- ----------
DEMO1 07-NOV-2011 12:37 23-NOV-2011 11:08 15.94 1377058
SYS@DEMO1> set sqlprompt "_USER'@'EXADATA'>' "
SYS@EXADATA>
SYS@EXADATA> ! cat /etc/redhat-release
Enterprise Linux Enterprise Linux Server release 5.5 (Carthage)
SYS@EXADATA> ! uname -a
Linux enkdb03.enkitec.com 2.6.18-194.3.1.0.3.el5 #1 SMP Tue Aug 31 22:41:13 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
SYS@EXADATA> alter session set "_serial_direct_read"=always;
Session altered.
SYS@EXADATA> alter session set cell_offload_processing=false;
Session altered.
SYS@EXADATA> set autotrace on
SYS@EXADATA> set timing on
SYS@EXADATA> select count(*) from instructor.class_sales;
COUNT(*)
----------
90000000
Elapsed: 00:00:43.01
Execution Plan
----------------------------------------------------------
Plan hash value: 3145879882
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 314K (1)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS STORAGE FULL| CLASS_SALES | 90M| 314K (1)| 00:00:02 |
----------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1168567 consistent gets
1168557 physical reads
0 redo size
526 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SYS@EXADATA> set autotrace off
SYS@EXADATA> @fss
Enter value for sql_text: select count(*) from instructor.class_sales
Enter value for sql_id:
SQL_ID CHILD EXECS AVG_ROWS AVG_ETIME AVG_CPU AVG_PIO AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------- ------------- ------------ ----------------------------------------
b2br1x82p9862 0 1 1 43.00 3.16 1,168,557.00 1,168,567 select count(*) from instructor.class_sa
Elapsed: 00:00:00.08
So the test on the Exadata took 43 seconds to read and transport roughly 1 million 8K blocks. The same test on the ODA looked like this:
[oracle@patty scripts]$ !rl rlwrap sqlplus / as sysdba
SQL*Plus: Release 11.2.0.2.0 Production on Wed Nov 23 10:16:51 2011
Copyright (c) 1982, 2010, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
INSTANCE_NAME STARTUP_TIME CURRENT_TIME DAYS SECONDS
---------------- -------------------------- -------------------------- ------- ----------
ODA1 22-NOV-2011 12:23 23-NOV-2011 10:16 .91 78793
SYS@ODA1> ! cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
SYS@ODA1> ! uname -a
Linux patty 2.6.18-194.32.1.0.1.el5 #1 SMP Tue Jan 4 16:26:54 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
SYS@ODA1> set timing on
SYS@ODA1> alter session set '_serial_direct_read'=always;
Session altered.
Elapsed: 00:00:00.00
SYS@ODA1> set autotrace on
SYS@ODA1> select count(*) from instructor.class_sales;
COUNT(*)
----------
90000000
Elapsed: 00:00:30.60
Execution Plan
----------------------------------------------------------
Plan hash value: 3145879882
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 317K (1)| 00:00:05 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| CLASS_SALES | 90M| 317K (1)| 00:00:05 |
--------------------------------------------------------------------------
Statistics
----------------------------------------------------------
99 recursive calls
0 db block gets
1154080 consistent gets
1153994 physical reads
516 redo size
526 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
7 sorts (memory)
0 sorts (disk)
1 rows processed
SYS@ODA1> set autotrace off
SYS@ODA1> @fss
Enter value for sql_text: select count(*) from instructor.class_sales
Enter value for sql_id:
SQL_ID CHILD PLAN_HASH_VALUE EXECS ROWS_PROCESSED AVG_ETIME AVG_CPU AVG_PIO AVG_LIO SQL_TEXT
------------- ------ --------------- ---------- -------------- ---------- ---------- ---------- ---------- ----------------------------------------
b2br1x82p9862 0 3145879882 1 1 30.55 13.91 1,153,994 1,154,080 select count(*) from instructor.class_sa
As expected, the direct attached disk was faster than moving the blocks across the IB network. It took about 30 seconds to read roughly the same number of blocks. The same test on the Dell produced this output:
[osborne@homer scripts]$ sqlp
SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 23 11:20:20 2011
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SYS@EXADATA> set sqlprompt "'_USER'@'DELL'>'"
SYS@DELL> !cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
SYS@DELL> ! uname -a
Linux homer.enkitec.com 2.6.18-194.el5 #1 SMP Mon Mar 29 22:10:29 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
SYS@DELL> alter session set '_serial_direct_read'=always;
Session altered.
SYS@DELL> set autotrace on
SYS@DELL> set timing on
SYS@DELL> select count(*) from instructor.class_sales;
COUNT(*)
----------
90000000
Elapsed: 00:00:11.31
Execution Plan
----------------------------------------------------------
Plan hash value: 3145879882
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 315K (1)| 01:03:08 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| CLASS_SALES | 90M| 315K (1)| 01:03:08 |
--------------------------------------------------------------------------
Statistics ----------------------------------------------------------
77 recursive calls
0 db block gets
1168660 consistent gets
1168569 physical reads
0 redo size
526 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
10 sorts (memory)
0 sorts (disk)
1 rows processed
SYS@DELL> set autotrace off
SYS@DELL> @fss
Enter value for sql_text: select count(*) from instructor.class_sales
Enter value for sql_id:
SQL_ID CHILD EXECS AVG_ROWS AVG_ETIME AVG_CPU AVG_PIO AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------- ------------- ------------ ----------------------------------------
b2br1x82p9862 0 1 1 11.29 11.25 1,168,569.00 1,168,660 select count(*) from instructor.class_sa
Elapsed: 00:00:00.03
This is where the results were surprising. Tim expected the single 2G HBA to be considerably slower than the other two set ups, but it only took 11 seconds on the Dell / EMC set up to read the same data. So this is where I got to help Tim try to make it slower! (there’s a first time for everything) So what gives? The first thought was that Oracle was caching the data: But no – you’ll notice that we set the _serial_direct_read parameter to always, which forces direct path reads and bypasses the buffer cache altogether. Also, the stats bear out that Oracle is doing physical reads. But 1 Million real i/o’s can’t happen in 11 seconds so it’s got to be memory access right? (I’m sure you’re way ahead of me by now) The answer of course is "Right – it is memory access!" – but it’s the Linux file system cache not the Oracle buffer cache. A quick check of the Linux memory usage showed us that the file cache was over 20G and the table was less than 10G. So Oracle must be set up without Direct I/O.
SYS@DELL> !cat /proc/meminfo
MemTotal: 32948892 kB
MemFree: 2769420 kB
Buffers: 615664 kB
Cached: 24727756 kB
SwapCached: 101844 kB
Active: 21322184 kB
Inactive: 5261948 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 32948892 kB
LowFree: 2769420 kB
SwapTotal: 34996216 kB
SwapFree: 34147776 kB
Dirty: 448 kB
Writeback: 0 kB
AnonPages: 1138828 kB
Mapped: 4758828 kB
Slab: 1088356 kB
PageTables: 167464 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 51470660 kB
Committed_AS: 15637340 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 371108 kB
VmallocChunk: 34359367199 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
Hugepagesize: 2048 kB
SYS@DELL> !realfreemem.sh -a
Free Memory: 2703M
Cached Memory: 24148M
Total Free Memory: 26851M
Total Memory: 32176M
Percent Memory Free (including cache): 83%
SYS@DELL> @table_size
Enter value for owner: INSTRUCTOR
Enter value for table_name: CLASS_SALES
Enter value for type:
OWNER SEGMENT_NAME TYPE TOTALSIZE_MEGS TABLESPACE_NAME
-------------------- ------------------------------ ------------------ -------------- ------------------------------
INSTRUCTOR CLASS_SALES TABLE 9,151.0 ODACOMP_DATA
--------------
sum 9,151.0
Elapsed: 00:00:00.04
SYS@DELL> -- so how is Oracle configured for I/O?
SYS@DELL> @parms
Enter value for parameter: filesys
Enter value for isset:
Enter value for show_hidden:
NAME VALUE ISDEFAUL ISMODIFIED ISSET
-------------------------------------------------- ---------------------------------------------------------------------- -------- ---------- ----------
filesystemio_options NONE TRUE FALSE FALSE
Elapsed: 00:00:00.01
So the database was not configured to use direct i/o or async i/o (filesystemio_options=none). A quick check with strace verified that direct i/o was not being used. So we modified the filesystemio_options setting and tried again.
SYS@DELL> alter system set filesystemio_options=setall scope=spfile;
System altered.
SYS@DELL> startup force
ORACLE instance started.
Total System Global Area 4910620672 bytes
Fixed Size 2236648 bytes
Variable Size 2298482456 bytes
Database Buffers 2600468480 bytes
Redo Buffers 9433088 bytes
Database mounted. Database opened.
SYS@ODACOMP>
SYS@ODACOMP> @uptime
INSTANCE_NAME STARTUP_TIME CURRENT_TIME DAYS SECONDS
---------------- -------------------------- -------------------------- ------- ----------
ODACOMP 23-NOV-2011 12:45 23-NOV-2011 12:47 .00 121
SYS@ODACOMP> set sqlprompt "'_USER'@'DELL'>'"
SYS@DELL>
SYS@DELL> ! cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
SYS@DELL> ! uname -a
Linux homer.enkitec.com 2.6.18-194.el5 #1 SMP Mon Mar 29 22:10:29 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
SYS@DELL> set timing on
SYS@DELL> alter session set '_serial_direct_read'=always;
Session altered.
Elapsed: 00:00:00.00
SYS@DELL> set autotrace on
SYS@DELL> select count(*) from instructor.class_sales;
COUNT(*)
----------
90000000
Elapsed: 00:01:39.73
Execution Plan
----------------------------------------------------------
Plan hash value: 3145879882
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 315K (1)| 01:03:08 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| CLASS_SALES | 90M| 315K (1)| 01:03:08 |
--------------------------------------------------------------------------
Statistics ----------------------------------------------------------
1 recursive calls
0 db block gets
1168567 consistent gets
1168557 physical reads
0 redo size
526 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SYS@DELL> set autotrace off
SYS@DELL> @fss
Enter value for sql_text: select count(*) from instructor.class_sales
Enter value for sql_id:
SQL_ID CHILD EXECS AVG_ROWS AVG_ETIME AVG_CPU AVG_PIO AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------- ------------- ------------ ----------------------------------------
b2br1x82p9862 0 1 1 99.71 7.32 1,168,557.00 1,168,567 select count(*) from instructor.class_sa
Elapsed: 00:00:00.05
Ah that’s more like it. Roughly 100 seconds now to complete the query. That’s 10X slower. Nice job. It feels so good when you can make something go slower. 😉 I should note that in general, it is a very good idea to set filesystemio_options=SETALL. So please don’t jump to the conclusion that setting this parameter to NONE will make your database run faster. These days, the memory is generally better used by Oracle than by the file system cache. Glenn Fawcett has a good (if somewhat dated) post which compares the effects of caching blocks in the file system cache vs. in the Oracle buffer cache in Solaris. As you might image, allowing Oracle to use the memory is generally more effective as shown in his results. So that’s it for today. By the way, this post took 4 times as long to write as the actual testing took. Maybe I’ll get better at it if I start practicing a little more often.
In short, if you can read 10g table into the OS cache and not the SGA buffer cache, you are not sizing your SGA right.
More or less exactly what I encountered some 5-7 years ago. Large solaris machine (at that time; 32GB memory), database user-data size 1-2GB. In order to make the processing go faster, I routinely advised to mount the data filesystems with ‘forcedirectio’….resulting in the database performance getting much times slower, because now it needed to be read from disk, instead from the operating system cache.
Gwen,
Absolutely, if we wanted to make this query run fast, then making the buffer cache bigger would do it. But I was trying to make it go slower 😉 – that’s why it was interesting (to me anyway).
Frits,
Yes, in 32 bit days there were distinct advantages to having big file system cache in some cases (like yours).
an alternative to flushing caches in linux : echo 3 > /proc/sys/vm/drop_caches
Hi Kerry,
By any chance, can you show the output of the fss/ realfreemem.sh and table_size scripts?
Many thanks,
Dani
One word buddy: Many thanks!!
Dany