Nutanix Replication 101

Updated: Oct 21, 2020

The subject of how Nutanix does replication came under the microscope for me here recently so I thought I would dive into it some so others could get to grips with the basics here and educate myself beyond the broad strokes while I am at it.

Most of my replication experience is using EMC Symmetrix based SRDF or Recoverpoint technologies and of course good old VMware storage vMotion and DRS for mostly multi-site schemas.

This sort of replication is very expensive indeed.

This is because each site needs the space for all of the other sites data on the storage systems in use.

You can imagine the mesh required for a 5 site setup!

As a result you can imagine few replicate all of the data on each site and then have 5 other copies of it spread throughout the mesh.

That would be very expensive and each storage system would need to accommodate the other copies of "everything" on a per site basis.

This would also consume vast amounts of cooling, power and space.

VMware and Microsoft hypervisors sport replication ability out of the box ranging from primitive to reasonable in terms of capability.

VMware calls it vSphere Replication, and Microsoft calls theirs Hyper-V Replica.

They both use the same basic technique which is logging IO's to virtual disks in a file, and shipping the delta across the wire to a remote cluster.

This sounds simpler than it actually is by the way since all failure cases need to be considered carefully, as if the network is unstable or down = hosts down.

There is also a lot of time stamp journal stuff going on here as well.

You have to realize that the host is the key component in this sort of replication.

This means that if you have a single host with a number of critical VMs you want to support, then this host’s physical resources (CPU, memory, disk, network) become the actual bottleneck, as is the case with vSphere Replication and Hyper-V Replica.

The workaround is intelligent placement of VMs across multiple hosts and predicting various metrics like data change rates across workloads, which is pretty difficult stuff to do.

So with Nutanix you have to have a good idea of what it is you want to replicate.

The easiest is a few virtual machines from one cluster to another cluster and then you have to worry about the distance thing.

Sync replication has to be within 100Km which is 60 miles max distance but in US Telecommunications terms the dark fiber reach really needs to be no more than 20 miles away for it to work real good.

The best Sync replication is dark campus fiber in the same 4 square miles of the two Data Center locations.

Anything further than that and you really should go Asynchronous only.

I consulted my various notebooks I have of the many EMC and Hitachi storage arrays I have setup replication services up for as well as many VMAX SRDF schemas for major banks and Healthcare companies I deployed with SRDF trickery and those systems replication is very expensive.

You pay by the TB for replication space used by the storage systems and it aint cheap.

The WAN pipes on these puppies is also pretty impressive with matching monthly bills that take your breath away.

Of all of the Replication schemas and systems out there though, the one I like the best is an Israeli company EMC bought a few years ago that make a product called Kaysha that worked with any storage array until EMC bought them for $153 million in May of 2006.

EMC renamed it RecoverPoint and put some serious R&D into the thing and even offered it for replication between VMAX low end systems for customers scared off by the massive SRDF price tag.

Not every customer needs the exotic banking transaction SRDF setup that CitiBank and others use in their daily operations.

They also charge on various factors and this can end up costing a fair bit.

VMAX systems started using a lot of VMware under the hood as DRS and vMotion were pretty cool ways to get HA and BC/DR done for a lot less than before.

Nutanix has developed most of their Metro replication services and capability in house looking at these various schemas and adapting the basic tenets and concepts to their use case.

Remember that Nutanix will work with many Hypervisors including their own AHV hypervisor so it gets interesting depending on what platform is running what hypervisor.

Prism Central and Prism element, which is what the two Nutanix tools to manage a Nutanix Cluster (element) or clusters (central) are for also come into the equation here.

You can also buy these replication features separately if you are not going to use them all via stand alone licensing but most customers doing replication should buy the Nutanix Ultimate license as you get a whole bunch of things thrown in with that baby that will save you money.

Replication between HCI clusters is a bit of a different puppy compared to big iron Fiber channel storage systems replicating data from site to site for a 100% fail-over capability multi-site wise.

With HCI systems for example you are failing over the entire Virtual Machine with the data from one cluster to another.

We are basically setting up protection domains.

You could also setup protection policies but this is a lot more complicated.

So the first decision to make is which ones will you replicate between clusters?

If you just want to replicate everything from a primary site to a secondary site in an Active-Active Schema then there are some things to think about here such as:

  • Does each cluster have the CPU, RAM and storage to run its own load plus the other clusters data?

  • How much time will it take to fail-over?

  • How long will it take to fail-over the failed-over cluster data back to its proper cluster and come back online?

  • All sorts of interesting IP and DNS problems come with such setups that involves some intricate VLAN planning and orchestration to pull off

  • Network Gear needs setting up to do this properly - switches, routers, load balancers and firewalls are in the mix here, IP addresses, VLANs and the like need to be assigned in this piece

  • A standard operating procedure to make sure this works as intended needs to be setup and documented for troubleshooting purposes

  • Ideally clusters doing replication can be brought up side by side and data replicated the first time between them so that the WAN sync once the remote site is setup at the actual remote site does not impinge daily operations too much

  • If data is lost in the failover event, how much was lost and what steps will be in effect to input the data and resume BC/DR emergency operations at the remaining site?

  • You need to develop a full DR/BC plan - (Disaster recover actually does come become business continuity)

  • The physics also needs to be reckoned with - distance between the sites = latency and response in milliseconds

  • If the sites can use synchronous or asynchronous - again a physics thing, synchronous is limited to within a 60 mile radius between the sites to work really well. A simple ping test will give you link responses in milliseconds to help you here

  • How big is your WAN link between the two sites? Some have their own dark fibre on the same campus and the cost is already made in the dark fiber cable others have to pay a telco for the port and the speed and feed is set

For those wondering what the delay is here is all about it is concise and clear information and what Prism Element gets you from a metro replication POV.

I am running into vastly differing opinions here vs Prism Pro Advanced replication features and when I understand it clearly will continue this blog posting... 20 October 2020...