vSAN 2 Nodes ROBO Setup
VMware Virtual SAN has become a Hot Topic lately. I see many Bloggers try comparing Similar Hyper Converge Infrastructure Solutions e.g. Nutanix in performance level. Well, that’s not very my point to write this blog. And actually instead of competing and arguing which technology is better, I am personally more pleased as the whole HCI world is growing. SAN has done so great a job in the past decade, but in the fast growing world, I believe Simplicity and Scalability would be the key elements of an infrastructure.
To branch out a bit, I actually have a customer using the VMware NSX happily in their environment. Trust me, they do not have a huge farm, to be precise, they only have 12 ESXi hosts in their environment. But their IT team got a lot of On-Demand requests from Users for new Networks, Routing, Firewall, VPN or Load Balancers. NSX help them so much in fulling these requests with speed and ease. And when they are expanding their environment, well… they just need to add a new host under the same NSX cluster and all DONE.
For Virtual SAN (VSAN), it is being developed in the same idea to simplify the infrastructure. The same customer mentioned lately had a new use case, they are setting up a new remote office away from their datacenter. So to simplify stuffs but at the same time to maintain the availability of the VM, VSAN ROBO setup is so fit a solution for them. They immediately start the evaluation and I was the one to help designing and configuring it. Turn out, I used less than a day from setup of the hardware to actually migrating VM on top and running on it. I would like to provide a very quite overview of the Design and Deployment for your reference which let you know VSAN is as simple as such.
On preparing the VSAN 2 Nodes ROBO deployment, as always of course you should try studying this two guides first, 1)Solution Overview, 2)Design Guide. But instead of making you have to go through so many lines. I would like to share a summary of the design blueprint as following:
Definitely in order to create a VSAN ROBO deployment, we need to have VSAN 6.1 onward. This means we need to have vCenter 6.0 Update 1 onward, while in this blog, I am using vCenter 6.0 Update 2 which leverage VSAN 6.2. As from the design, we target to create the VSAN 2 Nodes ROBO deployment at the Remote Office 1. Definitely, it is important that these two ESXi hosts should be in the VSAN support list and you can check from here. Afterwards, do remember you would need to have a Witness Server Remotely, you got two options:
- A dedicate Physical ESXi Host
- Witness VM (which is a nested ESXi Host)
For my deployment and actually I think even more likely in an actual deployment, the 2nd Option would be far more feasible in cost, ease of deployment and even higher availability as a VM can always be protected by vSphere HA if it is run under a vCenter environment (which is actually recommended). But a dedicated host would refer to a more static configuration, higher cost, lower availability and longer lead time in deployment.
In Network Design, there are more considerations we need to be aware of to ensure the configuration is based on the VMware Recommendations and also being supported. Again, the mentioned Design Guide provides a more detail explanation over the different deployment approaches. But as Summary Again, followings are the highlighted items:
- Non-Routed VSAN Network Between Two Physical Nodes
- Routed VSAN Network Between Witness VM and Two Physical Nodes
- Both Routed and Non-Routed management Network is Okay for All the VSAN Nodes
- MTU 9000 is NOT a must
While Point 1,3 and 4 are particularly trivial, Point 2 is more tricky one. Since VSAN Network does NOT support separate TCPIP stack yet which means VSAN Network does NOT have a separate default Gateway, in order to Enable VSAN Networks to be routable among the nodes, we need to setup Static Route on ESXi hosts to direct the VSAN Traffic.
esxcli network ip route ipv4/ipv6 add –gateway IPv4_address_of_router –network IPv4_address
Do remember the command could create persistent route on ESXi 5.1, 5.5 and 6.0 onward only. This would be fine for VSAN ROBO deployment which only supported after 6.0 Update 1, but if you would like to do implement Static Route on ESXi 5.0 hosts then you would need some extra step.
After preparing the necessary Networks and Host Configuration, VSAN deployment is more than trivial. There are step by step deployment guide from the Design Guide. Or for simplicity you can refer to the VMware Blog or the following Clip,
What I want to highlight is actually something after the deployment, you may see some error messages after configuring VSAN. Error or Alerts are usually raised as the VSAN Health Check Test will be run periodically by default. You can find more detail in the following tab.
Usually for a ROBO deployment, you will receive Stretch Cluster Latency Alerts. This is a bug in the health check utility which do NOT have awareness on a ROBO deployment. While the Network Requirements VMware Supported are as following:
- Between VSAN Physical Nodes – 1G for Hybrid and 10G for All Flash, latency < 5ms RTT
- Between Witness Server to VSAN Nodes – 2Mbps and < 250ms RTT
But the Current Health Check Utilities puts a much tighter latency requirement for the Point 2 which causing a false alerts.
While if you are using some newer Hardwares as your Physical VSAN nodes, the DB installed with the vCenter 6.0 Update 2 which is released in March 2016 may not be update enough. If your vCenter is internet accessible, then you can simply update it on the fly. But or else, which is more likely that your vCenter is offline. You have to download an updated JSON file according to the VMware KB, and upload it manually. This could clear the HCL alerts you having.
So What’s Next, I would recommend to have a better monitoring in the VSAN Nodes. This is definitely necessary, as your ESXi hosts are not just running VM anymore, you also have your data which is definitely Valuable. There are many blogs and white paper telling you would to monitor the performance issues, but what I want to clarify and what I want to emphasise to the ESXi healthiness through log monitoring. And this can be easily done by using VMware Log Insight, however, this is too much for this blog, I would create another dedicated blog for VMware Log Insight and VMware Operation Manager.