Kerry Osborne's Oracle Blog

Exadoop

July 24, 2012, 6:33 pm

We started on an interesting mad scientist kind of project a couple of days ago.

One of our long time customers bought an Exadata last month. They went live with one system last week and are in the process of migrating several others. The Exadata has an interesting configuration. The sizing exercise done prior to the purchase indicated a need for 3 compute nodes, but the data volume was relatively small. In the end, a half rack was purchased and all four compute nodes were licensed, but 4 of the 7 storage servers were not licensed. So it’s basically a half rack with only 3 storage servers.

Meanwhile, we had been talking with them about Hadoopie kind of stuff. They are in the telecom space and are interested in pulling data via a packet sniffer which captures info directly from the tcp traffic. During the talks we discussed hardware requirements for building a Hadoop cluster as they didn’t really have any spare hardware available to test with. That’s when the crazy science project idea was born. Someone (who shall remain nameless) suggested that we build the pilot Hadoop cluster on the 4 unused storage nodes from the Exadata half rack. Since the storage servers use basically the same hardware as is used in the Oracle Big Data Appliance (BDA), it’s kind of like having a mini BDA. Of course the storage servers have slower CPU’s and a little less memory so it’s not apples to apples, but the servers do have InfiniBand and the same 3T drives so it’s pretty similar. And since they already had the servers sitting there …

So now we have a mini Hadoop cluster installed (CDH3) with 3 data nodes (roughy 100T raw storage). We also set up the Oracle Big Data Connectors on one of the Exadata compute nodes which allows us to create external tables on files stored in HDFS. Pretty cool. Let the games begin!

Oh and by the way. I’ll probably be talking about this project a bit at E4 (Enkitec Extreme Exadata Expo) on Aug. 13-14 in Dallas.

Category: Exadata, Hadoop, Oracle, Speaking | Comment (RSS) | Trackback

11 Comments

swapnil kambli says:

July 25, 2012 at 5:18 am

Very innovative.I would love to know more as this progresses further
All the best

Reply to this comment
Amardeep Sidhu says:

July 26, 2012 at 12:24 am

> but 4 of the 7 storage servers were not licensed. So it’s basically a half rack with only 3 storage servers.

Something totally new for me !

> we build the pilot Hadoop cluster on the 4 unused storage nodes

It is allowed to do the stuff like that ?

BTW super cool idea !

Reply to this comment
osborne says:

July 26, 2012 at 7:37 pm

> It is allowed to do the stuff like that ?

I’m not entirely sure. ;)

Reply to this comment
Log Buffer #279, A Carnival of the Vanities for DBAs | The Pythian Blog says:

July 27, 2012 at 2:01 am

[…] Kerry Osborne started on an interesting mad scientist kind of project a couple of days ago. The title of blog post is very enticing. Exadoop. […]

Reply to this comment
osborne says:

July 27, 2012 at 7:38 am

I should probably say more than just “I’m not sure”. We obviously thought about this for a while before we did it. In this case there was extra hardware that was purchased by the customer but not licensed to run the Oracle storage software. So I believe they are clearly within their rights to use the additional hardware. The biggest point of discussion was actually around whether the storage servers could be “reclaimed” and added back into the Exadata (which of course would trigger additional license fees for Oracle). But that was a concern should storage demands increase. We are quite comfortable that this can be done fairly easily. We have done bare metal restores on storage servers in the past and the mechanism to do that is built in to the system. In addition, since Hadoop replicates data (by default 3 copies of each block are maintained) you could conceivably reclaim one of the storage servers without shutting down the Hadoop cluster and add another (external) server to that cluster later. Keep in mind that this is a very unusual situation to begin with, but it is a very interesting exercise. ;)

Reply to this comment
ManishN says:

August 2, 2012 at 4:12 pm

Awesome Idea!!! I love your book on Exadata too!

Reply to this comment
MPG says:

August 16, 2012 at 2:15 pm

|| Keep in mind that this is a very unusual situation to begin with, but it is a very interesting exercise. ;)

Actually I believe this is a rather common scenario for many customers. Exadoop could add some good options for our customers… if its ever supported/blessed/allowed by the product team.

Reply to this comment
Chuck Adams says:

September 30, 2012 at 4:56 am

I have two X2-8 racks interconnected into a single IB topology with a four node 11.2.0.3 RAC cluster with 21 cells allocated to ASM. I set aside 7 cells for the “EXADOOP” BDA install. Reading the BDA owner’s manual and configuration utility, the BDA owner’s manual states that the client, management and IB networks have to be on separate subnets. In the Exadata, the cells do not have client network connectivity. The BDA has two separate connections to the IB gateways (GW) switches, bondib0 for the infiniband private network and bondeth0 for the 10ge client network. To install BDA, the customer client network is connected directly to the IB GW switches and traffic is routed via bondeth0. In exadata, the client network is connected directly to all the compute nodes and traffic is routed via bondeth1. The client network is not connected to the cells.

What did you use for the client network in the Exadoop/BDA install on your Exadata cells?

Reply to this comment
- Andy Colvin says:
  
  October 1, 2012 at 11:03 am
  
  Chuck,
  
  It’s not an actual BDA install, since the Infiniband switches in the Exadata are just datacenter switches, while the BDA and Exalogic contain IB gateway switches (they connect to your 10GbE infrastructure as well as the Infiniband back end). We simply installed OEL 6.2 on the cells and created our own hadoop cluster. Keep in mind that this is just a temporary proof of concept and not something intended to be run in production!
  
  Andy
  
  Reply to this comment
osborne says:

September 30, 2012 at 2:11 pm

Hi Chuck,

Wow, messing with two X2-8’s sounds like fun. Our Frankenstein is not really a BDA, it’s just a CDH3 install on 4 servers that have IB cards in them. Andy Colvin did the networking stuff so maybe he can chime in on specifics, but Exadata doesn’t have the same model of IB switches as the BDA. There is no 10ge port on the Exadata IB switches themselves in the current version of Exadata. Of course this is all just mad scientist stuff so not really set up for anything but a play ground.

Kerry

Reply to this comment
Chuck Adams says:

October 26, 2012 at 4:50 am

Kerry and Andy,

Thanks for the replies. I am finally getting back to this due to customer priorities. I am configuring the client network on each of the seven cells using the 1g ports with cables connected directly to the data center client network switch. This is required since the IB switches in the X2-8 are not the same as the gateway switches in the BDA. I plan to install the BDA software using the Oracle Cloudera configuration utility and will report back progress.

Chuck

Reply to this comment

Exadoop

11 Comments

Leave a Reply

Member

Recent Posts

Recent Comments

Books

Blogroll

Links

Meta

Kerry Osborne's Oracle Blog

Exadoop

11 Comments

Leave a Reply

Member

Recent Posts

Recent Comments

Books

Categories

Blogroll

Links

Meta