Tuesday, July 21, 2009

Playing with DRBD (Replication)

Over the past 4+ years I've worked towards moving the infrastructure at work from isolated physical systems to a centralized storage(SAN) and virtualized paradigm. I'm happy to say we are definitely a "Virtualize First" and SAN shop now. A conservative comparison of separate physical systems with local storage to the current SAN/virtualized environment shows a nearly 50% cost savings on equipment alone. The benefits of extra rack space, lower power and cooling costs, I/O and compute capacity, etc are nice as well.

This evolution has a next logical progression. Now operating in a very centralized fashion, I can (more easily) begin examining solutions for replication. The base goal was to replicate the SAN to mitigate a 100% loss of the primary server room and not require an enormous amount of time restoring from tape. Tapes are great snapshot points in time like photographs, but like photographs, trying to rebuild your entire set of memories by looking through photographs would be very time consuming (and photographs degrade). Replication is not a replacement for good backups.

From a previous post my SAN is linux based, much akin to an Openfiler solution. It provdes NFS storage to ESXi servers for VM images and iSCSI storage to the VM images that need data volumes. All physical drives (as present by the RAID controllers) are sliced up using LVM. It has been humming along without issue since installation in February 2009.

I have known of the existence of DRBD for several years but not been in a position to utilize it. In short DRBD is a block device you layer into your device chain, just like LVM. It's specialty is taking all the original block level changes, keep track of them, and send them over to another system where they can be duplicated. The DRBD website is excellent, I highly suggest spending a few minutes there. DRBD has a few very nice traits that I'd like to highlight. First off it is smart (and dumb?). It works at the block level and knows which bits may be out of sync and will only send those bits across the wire - it knows nothing of filesystems, files, etc. Secondly it can be non-destructively added to existing data volumes. There's no need to backup/install/restore. DRBD is opensource and freely available - but its creators and primary maintainers, Linbit, offer commercial support and have been around for awhile. Linbit also offers a closed source product, DRBD Proxy, that is designed for long haul, high latency(200ms) connections or greater than 2 node replication situations. If you want to replicate outside of a LAN using DRBD you'll need it. DRBD is also 'good friends' with Hearbeat for high availability / failover situations.

I setup a couple CentOS x64 based VM's for testing. DRBD is available via the standard CentOS repositories but it is naturally a bit behind the current version available directly from the DRBD website. The download is small and if you have a basic compiler toolchain, and kernel-devel package the build / install is quick and painless(make rpm). Did I already mention the DRBD website documentation is fantastic - really go read it. The required configuration to have DRBD work is quite minimum although there are lots of options to fine tune its operation. If your data's rate of change is very high, you will really want to have Gigabit connectivity between your nodes. What you'll find is your DRBD devices will only write about as fast the data can get across the wire (assuming your drives can outrun wirespeed). If you need more than wire speed and your drives are fast, take a look into the DRBD Proxy product. I spent a fair amount of time in different scenarios to see how DRBD would act and what to do as an admin in those situations. Like many things, with a little bit of time and reading, DRBD was easy to work with.

So what was I doing with all this again? The base goal was to replicate the SAN to mitigate a 100% loss of the server room. Since the SAN literally contains everything (VM's, SQL, Exchange databases, file shares) this was a fairly simple move that captures the entire datacenter to another system. To backpedal a bit, my environment is modest in size by any modern measure, but still just as important. That 'size', centralized storage, and a geographically large site made the option of placing the replica system in a local but 'distant' (fiber connected) building a perfect option. The replica runs ESXi with a CentOS VM running DRBD to replicate the data. Why ESXi on the host? What this more or less creates is my datacenter-in-a-box, transportable if needed. The CentOS VM will provide NFS access back to the host ESXi for access to all the server VM's which in turn will use iSCSI access their data. ESXi virtual switches let me create matching, non routed networks local to the replica host for the NFS and iSCSI traffic, meaning zero reconfiguration of the production server VM's. This isn't meant to be a powerhouse / failover solution. What it is, is a very cost effective solution to a worst case situation that hopefully doesn't occur. If the worst was to occur, some scripting magic transforms the replica to production status. When a new production environment is established, DRBD can be used to mirror the data back to it enabling a transition with very little downtime.

Tired of reading yet?

Stumble Upon Toolbar


Matt said...

>Tired of reading yet?

No. Post Moar.


Great post, though. I'm really very excited to play with DRBD. It's encouraging that you're liking it so far. Thanks for the review/inspiration :-)

Joe C said...

Great ideas. I have been reading about drbd for a while now. Could you post your hardware specs on both ends of the drbd? I see the reference post seeking san, but how much space are you using for all these needs (exchange, file, etc)? The centos vm running in the remote host must have a lot of space dedicated.

Joe C said...

Also, how do you test your failover process? If you isolate the remote site for testing and make the secondary drbd active, doesn't that end your replication? How would you bring up all the vms, exchange databases, and sql if you needed to verify your dr scenario was intact?


JeffHengesbach said...

@Joe C
The total amount of space being mirrored is in the range of 4TB. The specs on both sides are not matched - the network is the lowest common denominator and the mirror is not intended to be a 1-1 match on performance. The mirror obviously has 'a lot' of space because it must mirror the primary.

Failover testing does require breaking the mirror and subsequent resync, along with some network magic. This may create an exposure window, but without validation the solution would be pointless.

Thanks for the great comments.