Exadoop

We started on an interesting mad scientist kind of project a couple of days ago.

One of our long time customers bought an Exadata last month. They went live with one system last week and are in the process of migrating several others. The Exadata has an interesting configuration. The sizing exercise done prior to the purchase indicated a need for 3 compute nodes, but the data volume was relatively small. In the end, a half rack was purchased and all four compute nodes were licensed, but 4 of the 7 storage servers were not licensed. So it’s basically a half rack with only 3 storage servers.

Meanwhile, we had been talking with them about Hadoopie kind of stuff. They are in the telecom space and are interested in pulling data via a packet sniffer which captures info directly from the tcp traffic. During the talks we discussed hardware requirements for building a Hadoop cluster as they didn’t really have any spare hardware available to test with. That’s when the crazy science project idea was born. Someone (who shall remain nameless) suggested that we build the pilot Hadoop cluster on the 4 unused storage nodes from the Exadata half rack. Since the storage servers use basically the same hardware as is used in the Oracle Big Data Appliance (BDA), it’s kind of like having a mini BDA. Of course the storage servers have slower CPU’s and a little less memory so it’s not apples to apples, but the servers do have InfiniBand and the same 3T drives so it’s pretty similar. And since they already had the servers sitting there …

So now we have a mini Hadoop cluster installed (CDH3) with 3 data nodes (roughy 100T raw storage). We also set up the Oracle Big Data Connectors on one of the Exadata compute nodes which allows us to create external tables on files stored in HDFS. Pretty cool. Let the games begin!

Oh and by the way. I’ll probably be talking about this project a bit at E4 (Enkitec Extreme Exadata Expo) on Aug. 13-14 in Dallas.

11 Comments

  1. swapnil kambli says:

    Very innovative.I would love to know more as this progresses further
    All the best

  2. > but 4 of the 7 storage servers were not licensed. So it’s basically a half rack with only 3 storage servers.

    Something totally new for me !

    > we build the pilot Hadoop cluster on the 4 unused storage nodes

    It is allowed to do the stuff like that ?

    BTW super cool idea !

  3. osborne says:

    > It is allowed to do the stuff like that ?

    I’m not entirely sure. ;)

  4. [...] Kerry Osborne started on an interesting mad scientist kind of project a couple of days ago. The title of blog post is very enticing. Exadoop. [...]

  5. osborne says:

    I should probably say more than just “I’m not sure”. We obviously thought about this for a while before we did it. In this case there was extra hardware that was purchased by the customer but not licensed to run the Oracle storage software. So I believe they are clearly within their rights to use the additional hardware. The biggest point of discussion was actually around whether the storage servers could be “reclaimed” and added back into the Exadata (which of course would trigger additional license fees for Oracle). But that was a concern should storage demands increase. We are quite comfortable that this can be done fairly easily. We have done bare metal restores on storage servers in the past and the mechanism to do that is built in to the system. In addition, since Hadoop replicates data (by default 3 copies of each block are maintained) you could conceivably reclaim one of the storage servers without shutting down the Hadoop cluster and add another (external) server to that cluster later. Keep in mind that this is a very unusual situation to begin with, but it is a very interesting exercise. ;)

  6. ManishN says:

    Awesome Idea!!! I love your book on Exadata too!

  7. MPG says:

    || Keep in mind that this is a very unusual situation to begin with, but it is a very interesting exercise. ;)

    Actually I believe this is a rather common scenario for many customers. Exadoop could add some good options for our customers… if its ever supported/blessed/allowed by the product team.

  8. Chuck Adams says:

    I have two X2-8 racks interconnected into a single IB topology with a four node 11.2.0.3 RAC cluster with 21 cells allocated to ASM. I set aside 7 cells for the “EXADOOP” BDA install. Reading the BDA owner’s manual and configuration utility, the BDA owner’s manual states that the client, management and IB networks have to be on separate subnets. In the Exadata, the cells do not have client network connectivity. The BDA has two separate connections to the IB gateways (GW) switches, bondib0 for the infiniband private network and bondeth0 for the 10ge client network. To install BDA, the customer client network is connected directly to the IB GW switches and traffic is routed via bondeth0. In exadata, the client network is connected directly to all the compute nodes and traffic is routed via bondeth1. The client network is not connected to the cells.

    What did you use for the client network in the Exadoop/BDA install on your Exadata cells?

    • Andy Colvin says:

      Chuck,

      It’s not an actual BDA install, since the Infiniband switches in the Exadata are just datacenter switches, while the BDA and Exalogic contain IB gateway switches (they connect to your 10GbE infrastructure as well as the Infiniband back end). We simply installed OEL 6.2 on the cells and created our own hadoop cluster. Keep in mind that this is just a temporary proof of concept and not something intended to be run in production!

      Andy

  9. osborne says:

    Hi Chuck,

    Wow, messing with two X2-8′s sounds like fun. Our Frankenstein is not really a BDA, it’s just a CDH3 install on 4 servers that have IB cards in them. Andy Colvin did the networking stuff so maybe he can chime in on specifics, but Exadata doesn’t have the same model of IB switches as the BDA. There is no 10ge port on the Exadata IB switches themselves in the current version of Exadata. Of course this is all just mad scientist stuff so not really set up for anything but a play ground.

    Kerry

  10. Chuck Adams says:

    Kerry and Andy,

    Thanks for the replies. I am finally getting back to this due to customer priorities. I am configuring the client network on each of the seven cells using the 1g ports with cables connected directly to the data center client network switch. This is required since the IB switches in the X2-8 are not the same as the gateway switches in the BDA. I plan to install the BDA software using the Oracle Cloudera configuration utility and will report back progress.

    Chuck

Leave a Reply