Archive for the ‘Exadata’ Category.

E4 Wrap Up – Part I – OLTP Bashing

Well the Enkitec Extreme Exadata Expo (E4) is now officially over. I thoroughly enjoyed the event. I personally think Richard Foote stole the show with his clear and concise explanation of why a full table scan is not a straight forward operation on Exadata, and why that makes it so difficult for the optimizer to properly cost it. But Maria Colgan came out with a fiery talk on the optimizer that gave him a good run for his money (she actually had the highest average rating from the attendees that filled out evaluation forms by the way – so congratulations Maria!). Of course there were many excellent presentations from many very well known Oracle practitioners. Overall it was an excellent conference (in my humble opinion) due in large part to the high quality of the speakers and the effort they put into the presentations. I am also thankful for the fact that Intel agreed to sponsor the event and that Oracle supported the event by allowing so many of their technical folks to participate.

While I felt that the overall message presented at the conference was pretty balanced, I did leave with a couple of general impressions that didn’t really feel quite right. Of course having the ability to express one’s opinion is one of the founding principals of our country, so I am going to do a series of posts on generalities I heard expressed that I didn’t completely agree with.

The first was that I got the impression that some people think Exadata isn’t good at OLTP. No one really said that explicitly. They said things like “it wasn’t designed for OLTP” and “OLTP workloads don’t take advantage of Exadata’s secret sauce” (I may have even made similar comments myself). While these types of statements are not incorrect, they left me with the feeling that some people thought Exadata just flat wasn’t good at OLTP.

I disagree with this blanket sentiment for several reasons:

  1. While it’s true that OLTP workloads generally don’t make the best use of the main feature that makes Exadata so special (namely offloading), I have to say that in my experience it has shown itself to be a very capable platform for handling the single block access pattern that characterizes what we often describe as OLTP workloads. I’ve observed many systems running on Exadata that have average physical single-block read times in the sub-1ms range. These are very good times and compare favorably to systems that store all their data on SSD storage. So the flash cache feature actually works very well, which is not too surprising when you consider that Oracle has been working on caching algorithms for several decades.

  3. I think part of the reason for the general impression that OLTP doesn’t work well on Exadata is the human tendency to make snap judgements based on reality vs. expectations, rather than actually thinking through the relevant facts. For example, when you go to a movie that has been hyped as being on of the best of the year and a great cinematic achievement, you are more likely to come away feeling that the movie was not that great, simply because it didn’t live up to your expectations. Whereas a little known movie is more likely to impress you simply because you weren’t expecting that much. When you sit down and actually evaluate the movies side by side, you will probably come to the conclusion that hyped movie was indeed better (people don’t usually bestow accolades on totally worthless stuff). I think that, at least to some degree, OLTP type work loads on Exadata suffer from the same issue. The expectations are so high for the platform in general that even good to excellent results fall short of the massive expectations that have been created based on the some of the impressive results with Data Warehouse type work loads. But that doesn’t mean that the platform is not capable of matching the performance of any other platform you could build at a similar price point.

  5. I don’t think I’ve ever seen a true OLTP workload. That is to say, I can’t recall ever looking at a system that didn’t have some long running reporting component or batch process that does not fall into the simple single block access (OLTP) category. So I believe that the vast majority of systems categorized as OLTP should more correctly be called “mixed” workloads. In these types of systems, the offloading capability of Exadata can certainly make a big difference for the long running components of the system, but also can improve performance on the single-block access stuff by reducing the contention for resources caused by the long running queries that are unavoidable on standard Oracle architecture.

  7. Very few of the Exadata systems we’ve worked on over the last few years are supporting a single application or even a single database. Consolidation has become the name of the game for many (maybe most) Exadata implementations. I did a presentation at last year’s Hotsos Symposium where I compiled statistics from 51 Exadatas that we had worked on. 67% were being used as consolidation platforms. This makes it even more likely that an OLTP type workload will benefit from running on the Exadata platform.

So does Exadata run stand alone “pure OLTP” workloads 10X faster than any other standard Oracle based system you could build yourself?

No it does not.

But it does work as well as almost any system you could build, regardless of how much money you spend on the components. By way of proof I’ll tell you a story about a system that we benchmarked on an Exadata V2 quarter rack system. The benchmark was on a batch process that updated well over a billion records, one row at a time, via an index. The system we were comparing against was an M5000 / Solaris system with 32 cores and all data was stored on an SSD SAN. The benchmark showed Exadata to be a little over 4 times faster. This was primarily because most of the work was logical i/o that was serviced by the buffer cache on both platforms. The faster CPUs in the Exadata accounted for most of the gains. Nevertheless, the system was not migrated to an Exadata. A new system was built using a faster Intel-based server which made up the CPU speed difference (and in fact exceeded specs on the V2) and a more capabile SSD based SAN was installed. The resulting system ran the benchmark in about the same time as the original V2 quarter rack (actually it was very slightly faster). Unfortunately the SAN alone cost more than the Exadata. And the real life system also did a bunch of other stuff like some long running ad hoc queries. Guess which platform dealt with those better. ;) Here’s a slide that summarizes some of the results.

In fairness, I should point out that there is a subset of “OLTP” workloads that are very write intensive. Since writes to data in Oracle are usually asynchronous, while writes to log files usually must complete before a transaction can complete, it’s usually writes to log files that are the bottle neck in these types of systems. However, if the synchronous log file writes can be avoided (or optimized), much higher transaction rates can occur and writes to DB files can become a bottle neck. In those cases, pure write IOPS can be a limiting factor. My opinion is that such systems are relatively rare. But they do exist. Exadata is not currently the best possible option for these extremely write intensive workloads. I say currently because at the time of this writing the storage software does not include any sort of write back cache for buffering writes to data files. However, this is a feature that is expected to be released in the near future. ;)

So that’s it for the OLTP topic.

Stay tuned for Part II, where I’ll discuss another general impression with which I didn’t really agree…


We started on an interesting mad scientist kind of project a couple of days ago.

One of our long time customers bought an Exadata last month. They went live with one system last week and are in the process of migrating several others. The Exadata has an interesting configuration. The sizing exercise done prior to the purchase indicated a need for 3 compute nodes, but the data volume was relatively small. In the end, a half rack was purchased and all four compute nodes were licensed, but 4 of the 7 storage servers were not licensed. So it’s basically a half rack with only 3 storage servers.

Meanwhile, we had been talking with them about Hadoopie kind of stuff. They are in the telecom space and are interested in pulling data via a packet sniffer which captures info directly from the tcp traffic. During the talks we discussed hardware requirements for building a Hadoop cluster as they didn’t really have any spare hardware available to test with. That’s when the crazy science project idea was born. Someone (who shall remain nameless) suggested that we build the pilot Hadoop cluster on the 4 unused storage nodes from the Exadata half rack. Since the storage servers use basically the same hardware as is used in the Oracle Big Data Appliance (BDA), it’s kind of like having a mini BDA. Of course the storage servers have slower CPU’s and a little less memory so it’s not apples to apples, but the servers do have InfiniBand and the same 3T drives so it’s pretty similar. And since they already had the servers sitting there …

So now we have a mini Hadoop cluster installed (CDH3) with 3 data nodes (roughy 100T raw storage). We also set up the Oracle Big Data Connectors on one of the Exadata compute nodes which allows us to create external tables on files stored in HDFS. Pretty cool. Let the games begin!

Oh and by the way. I’ll probably be talking about this project a bit at E4 (Enkitec Extreme Exadata Expo) on Aug. 13-14 in Dallas.

Expert Oracle Exadata Translated into Chinese

Last year at Oracle Open World I was introduced to a guy named Zhang Leyi (Kamus). He said he had been hired to translate our Expert Exadata book into Chinese. There were a couple of other guys on the team as well – Kaya Huang 黄凯耀 and Jacky Zhang 张瑞. Well they have apparently finished their work and the book is due to be released this month. Kamus sent us a copy of the cover (see below). It looks very sexy don’t you think? We exchanged a few emails with the guys while they were working on it mainly just clarifying our intentions and helping them fix typos in the original version Unfortunately I wasn’t as responsive as I should have been (sorry about that guys), but Tanel and Randy did a pretty good job I think. Not surprisingly, the most difficult part seemed to be translating some of the “Kevin Says” sections, as he has a way of packing a lot into very few words.

So anyway, congratulations to the guys for getting through the project. I hope they send me a copy. ;)


Tuning Oracle to Make a Query Slower

I had an interesting little project this morning. Of course it takes longer to write it down than to do actually do it, but it was kind of interesting and since I haven’t done a post in quite some time (and it’s the day before Thanksgiving, so it’s pretty quite at the office anyway) I decided to share. One of the Enkitec guys (Tim Fox) was doing a performance comparison between various platforms (Exadata using it’s IB Storage Network, Oracle Database Appliance (ODA) using it’s direct attached storage, and a standard database on a Dell box using EMC fiber channel attached storage). The general test idea was simple – see how the platforms stacked up for a query that required a full scan of a large table. More specifically, what Tim wanted to see was the relative speed at which the various storage platforms could return data. The expectation was that the direct attached storage would be fastest and the fibre channel storage would be slowest (especially since we only had a single 2G HBA). He tested ODA and Exadata and got basically what he expected, but when he went to test the database on the Dell he was surprised that it was actually faster than either of the other two tests. So here’s some output from the initial tests: First the Exadata. It’s an X2 quarter rack with one extra storage server. Note that we had to set cell_offload_processing to false to turn off the Exadata storage optimizations, thus giving us a measurement of the hardware capabilities without the Exadata offloading.

> !sqlp
SQL*Plus: Release Production on Wed Nov 23 11:08:28 2011
Copyright (c) 1982, 2010, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SYS@DEMO1> @uptime
---------------- ----------------- ----------------- ------- ----------
DEMO1            07-NOV-2011 12:37 23-NOV-2011 11:08   15.94    1377058
SYS@DEMO1> set sqlprompt "_USER'@'EXADATA'>' "
SYS@EXADATA> ! cat /etc/redhat-release
Enterprise Linux Enterprise Linux Server release 5.5 (Carthage)
SYS@EXADATA> ! uname -a
Linux 2.6.18- #1 SMP Tue Aug 31 22:41:13 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
SYS@EXADATA> alter session set "_serial_direct_read"=always;
Session altered.
SYS@EXADATA> alter session set cell_offload_processing=false;
Session altered.
SYS@EXADATA> set autotrace on
SYS@EXADATA> set timing on
SYS@EXADATA> select count(*) from instructor.class_sales;
Elapsed: 00:00:43.01
Execution Plan
Plan hash value: 3145879882
| Id  | Operation                  | Name        | Rows  | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT           |             |     1 |   314K  (1)| 00:00:02 |
|   1 |  SORT AGGREGATE            |             |     1 |            |          |
|   2 |   TABLE ACCESS STORAGE FULL| CLASS_SALES |    90M|   314K  (1)| 00:00:02 |
          1  recursive calls
          0  db block gets
    1168567  consistent gets
    1168557  physical reads
          0  redo size
        526  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
SYS@EXADATA> set autotrace off
Enter value for sql_text: select count(*) from instructor.class_sales
Enter value for sql_id: 
------------- ------ ---------- ---------- ------------- ------------- ------------- ------------ ----------------------------------------
b2br1x82p9862      0          1          1         43.00          3.16  1,168,557.00    1,168,567 select count(*) from instructor.class_sa
Elapsed: 00:00:00.08

So the test on the Exadata took 43 seconds to read and transport roughly 1 million 8K blocks. The same test on the ODA looked like this: Continue reading ‘Tuning Oracle to Make a Query Slower’ »

Exadata Virtual Conference

I will be participating in an Exadata Virtual Conference which has been organized by Tanel Poder on August 3rd and 4th. This conference will follow the same format as the one Tanel and I did last year with Jonathan Lewis and Cary Millsap. It will be two half days which should provide some flexibility for people with busy schedules. The online format allows all participants to interact directly via chat while the speakers are presenting and then via voice during question and answer sessions. This is a great opportunity to talk to some guys that have done a bunch of work with Exadata. Andy Colvin will be discussing patching which has been problematic for some shops. Andy has done more patching than anyone I know. Randy Johnson will be discussing Resource Management on Exadata which is a key to successful consolidation projects. I will be talking about how Smart Scans work under the covers and covering techniques for determining when and where they are (or are not) occurring. Tanel will be covering in depth tuning and diagnostic techniques specific to Exadata. Of course, Randy, Tanel and I collaborated on the upcoming Apress book, Expert Oracle Exadata, which should be out a couple of weeks before the conference. Here’s a link to the page with all the conference details:

Exadata Virtual Conference

Note that there is a discount for early registration. I hope to see you there.