vRealize Automation 7.2 Lab – vSphere 6.0 and NSX 6.2.4 Preparation

As mentioned in the previous blog, I am setting up a vRealize Automation 7.2 environment for lab testing in order to test out the new features in the version 7.2. But as mentioned before, currently there are no supported NSX version which can be working with vSphere 6.5 yet. This is why I am testing the vRA 7.2 + vSphere 6.0 + NSX 6.2.4 in my lab. And following guide provide a quick high light steps on how you can prepare your vSphere 6.0 + NSX 6.2.4 environment. We would proceed to the vRA 7.2 initial setup in the next Blog Post.

Yet, I’m not going to do a step by step configuration for the vCenter Server and ESXi Hosts. I think those are more than trivial for you to perform it, especially when the vCenter Server Appliance provides such a simple way for deployment. I assume you have already done the following things:

  1. Install a vCenter Server
  2. Setup at least two ESXi Hosts (for NSX VXLAN testing)
  3. Shared Storage between the ESXi

I can confirm you that all of the above can actually build on Nested environment. So you don’t really need a physical storage to be shared among hosts. You can just use freenas, open filer…etc stuffs.

vSphere and NSX preparation

Assuming you’ve already deployed the above stuffs. First and foremost, you have to add the ESXi Hosts into the vCenter. It’s important that if you are using cluster to groups your ESXi hosts, do enable DRS and ensure it’s in Fully Automated Mode. I do hit some issue before, if the mode is manual or partial automated.

Afterwards, you have to create a vDS for the ESXi hosts to connect to. This is needed for NSX VXLAN deployment. You may worry about doesn’t this mean only vSphere Enterprise Plus Edition can use NSX? NO, VMware Support vSphere Standard and Enterprise Edition to use vDS if and only if you are using NSX on top. This is tricky in configuration, as if you are familiar with the vDS creation and deployment, you should know we actually needed to:

  1. Create vDS on the vCenter
  2. Add Host into the vDS

So the 1st step is always okay, as vDS is a vCenter object but NOT ESXi one. So you can actually create as many as vDS you like. But the problem is in 2nd Step, you need to add ESXi hosts into the vDS for management. And this is the step where vSphere (Web) Client would check your ESXi license, if you are using Standard or Enterprise Licenses, actually you cannot add in those hosts. So… how can you use Standard Edition vSphere with NSX?! Well, you need an extra step in-between as following:

  1. Create vDS on the vCenter
  2. Configure NSX and “prepare” the Hosts with Standard Edition Licenses
  3. Add Host into the vDS

You will notice that after preparing your hosts with NSX, you would be allowed when trying to add the host into the vDS. DO NOT underestimate the difference in configuration order. As i DID… I got a really painful experience in it. So let’s recap what have I done “wrongly”.

NSX support vSphere Enterprise and Standard Edition Licenses

Well I was adhere to the message above, SO I think i could ignore the setup steps by using trial licenses on vSphere first which allow me using vDS and adding Hosts with trial license into the vDS. And I happily configure the NSX stuffs on top using the vDS, including all the transport zone, logical switches, routers… etc. And afterwards, I tried to CHANGE the license attaching to the ESXi. Well, and this is the Step I failed…

When I tried to Assign Standard Licenese


Even I click 2000 times… it still fails


You will find the above warning. You cannot by pass it and add the Std or Enterprise Edition License. So what (You) I have to do. I need to revert everything to detach the vDS from the host in order to change the licenses. Everything, I mean all the Edge Gateways, Logical Routers, Switches, Transport Zone, NSX Agent… etc.

THUS CAUTION! So If you are using Standard, Advance or Enterprise Licenses, DO assign the licenses before you are configuring the NSX stuffs.

Step by Step Configuration of vSphere and NSX

Let’s get back to the vSphere preparation and NSX Deployment for VRA integration. As said, we need to create vDS for NSX to run VXLAN on it. So you would need to setup a VDS at vSphere level.

Login the vSphere Client and Create a new vDS


Then choose “Add and Manage Hosts” to “Add hosts” under the management with vDS


Select the two hosts you would like the NSX to cover


You would just need to manage the Physical Adapters of the hosts, but not other options


Here, I use two uplinks of each hosts.


Well, you are all done and ready for the NSX deployment. So let’s start working on the NSX deployment and configuration. NSX deployment is not that difficult, you would need to deploy the NSX Manager OVA into the vCenter first and boot it up.

After boot up the deployed OVA, you can go to the http://<NSX-MGR> to perform the vCenter Registration. Click the “Manage vCenter Registration” after login.


Click Edit to configure the Lookup Service URL which is the SSO server. And input all the necessary credential for bot the Upper and Lower sessions.


Confirm the SSL certificate warnings


On successful registration, you can see the following screen. Remember that the Lookup service URL is using 443 in vSphere 6.X but 7443 in 5.X


So after the initial configuration, we need to go back to the vSphere Client to continue the setup

Go back to vSphere Web Client

You definitely need to log out and log in again the web client to discover the new icon “Networking and security” in the Web Client. That’s the entry point for manage and configure NSX.


On clicking the “Networking and Security” Icon, you will enter the NSX configuration page. While “Dashboard” is a new tab since version 6.2.3 (deprecated) to let you having an overview of NSX healthiness


To begin the initial setup. Go to Installation > Management. You need to deploy the NSX Controller Nodes IF you are using unicast VXLAN. Yes, if you are using multicast VXLAN, you don’t need it. Anyway, as VMware recommends using unicast. Thus just deploy it.


You got to define the IP Pool and Deployment location. This IP Pool should be in a subnet that communicable to the management network of the ESXi hosts for the agent communication.


Click OK to confirm the deployment


Wait for the completion of controller


Then you can proceed to the “Host Preparation” tab. This is the step you need to do before adding the host into the vDS if you are using Standard, Advance or Enterprise Edition vSphere. You can click the gear icon and choose install.


Confirm the setup by click “Yes”


Likely, if you really following my instruction, you can see nothing. As your NSX is not yet licensed. So… Go back to the licensing icon to give your NSX a Valid License


Return to the Host Preparation tab and Install the agent. After the installation, only Firewall and Installation Status is ticked. But VXLAN is yet configured and this is why you need to proceed by clicking the “Not Configured” link under the VXLAN column.


Here, we define the VTEP IP for the ESXi. VTEP IP is used for building the underlaying network communication while VXLAN is the overlying network. So if VTEP IP are communicable thru’ Layer 2 or Layer 3, we can build a new Layer 2 over the underlaying network. And this is how VXLAN works like. So we need to choose “Use IP Pool” to define the IP we use for VTEP IP.


This can be an isolated subnet as said, we just need the hosts communicating thru’ the VTEP IP in between. So Layer 2 or Layer 3 are both good.


But you need to be careful on selecting VMKNic Teaming Policy. Using “Fail Over” will let you having 1 VTEP IP per host and this is a easy approach but if you are equipping 4 uplinks… then only one will be in use. So, I would use “Load Balance – SRCID” to let the ESXi having one VTEP IP on each of the Uplinks. This let your traffic load balanced in all the uplinks you have but simplify the LACP configuration.


Click OK to confirm the setup and you can see the VXLAN is ticked too


You can proceed to the “Logical Network Preparation” tab and review the VXLAN configuration and VTEP IP Assigned.


VXLAN is like VLAN, we need Network IDs for them. For VXLAN we recommend to start from 5000 which is > 4096 (max of VXLAN) to denote the different, yet this is not necessary as they are two different tagging which will not be conflicting.


So here, I just simply define VXLAN 5000-15000 and this give me 10001 network I could create already.


Finally, we need to click the Transport Zone to create a Zone which define the scope of the VXLAN, i.e. how wide the VXLAN can span across


GREAT! The NSX configuration is all done. You can then create a Layer 2 Network with VXLAN to test for the connectivity.

You can do this by clicking the “Logical Switches” on the left hand side and create a new Logical Network. In create the Logical Switch which is a Layer 2 network, you need to define the Transport zone.


After creation, click the Logical Switch. You can test the connectivity easily by going to “Monitor” and do the Ping Test across hosts


Or by Broadcast, which is useful for a large cluster testing


Finally, we are ready. vCenter 6.0 and NSX 6.2.4 are configured and integrated for the vRealize Automation 7.2 consumption. In the Next Blog, I will go back to the vRealize Automation 7.2 to setup the integration. Stay tune.


Pay Attention! vRealize Automation 7.2 Deployment Highlight

vRealize Automation 7.2 has been finally released, honestly, I am more than happy to have the first hand testing on it. This is because the what’s new items are so attractive. Among the items, Containers management is enabled on vRA 7.2. I would blog on the new features in separate post, but followings are the recap from the release notes.

vRealize Automation 7.2 What’s New?

The vRealize Automation 7.2 release includes resolved issues and the following new capabilities.

  • Enhanced APIs for programmatically installing, configuring, and upgrading vRealize Automation
  • Enhanced upgrade functionality for system-wide upgrade automation
  • LDAP support for authentication and single sign-on
  • FIPS 140-2 Compliance:
    • Consumer/administrator interface is now FIPS 140-2 compliant
    • Managed using vRealize Automation appliance management console or command-line interface
    • FIPS is disabled by default
  • Migration improvements:
    • UI-driven vRealize Automation 6.2.x to 7.2 migration
    • Migration option available in Deployment Wizard
    • Enhanced support for importing vCloud Director workloads
  • Service entitlement enhancements:
    • Checkbox to add all users to an entitlement
    • Delete inactive entitlements
  • Expanded extensibility capabilities:
    • Several new event broker topics for enhanced extensibility use cases
    • Subscription granularity for individual components, catalog items, component actions, containers, or deployments
    • Leverage extensibility with new container management functionality
    • Scale-out/Scale-in custom XaaS services and applications that include XaaS objects
  • Networking Enhancements:
    • IPAM framework support for NSX on-demand routed networks
    • New network profiles to support additional IPAM use cases
    • Configure load balancing policy for NSX on-demand load balancer in the blueprint (round-robin, IP-Hash, Leastconn)
    • Configure service monitor URL for HTTP/HTTPS
  • Container Management:
    • Integrated container management engine for deploying and managing Docker Container hosts and containers
    • Build hybrid applications that include containers and traditional OS
    • New Container administrator and architect roles
    • Auto discovery of provisioned container hosts
    • Minimum supported version of Docker is 1.9.0
    • If the docker built-in load balancing is needed for clustered containers in user-defined networks, Docker 1.11+ is required
  • Azure Endpoint for Hybrid Cloud provisioning and management:
    • Seemlessly build, deliver, and manage Azure machines with vRealize Automation
    • Support for Azure networking services
  • ServiceNow Integration:
    • Automatically expose entitled vRealize Automation catalog items to ServiceNow portal by using the Plug-in that is available in VMware Solution Exchange

Check the Interoperability for Planning Your Deployment

But before testing the What’s New items, I would like to go thru’ the installation steps and consideration here a bit. As mentioned in previous blog posts, you may see vSphere 6.5 and feeling awesome about it. Also, VMware has released vROP 6.4. But when deploying all the solution together, you have to consider the interoperability among the products, e.g.


As you may know that, we can integrate the following VMware Products to provide a full suite features for the Cloud Management Portal Features

  1. vRealize Automation
  2. vSphere
  3. NSX
  4. vRealize Operation

But from the Interoperability currently, vSphere 6.5 does NOT have a supported NSX version yet. And thus, if you are seeking to have On Demand Network Provisioning, Load Balancing, Firewall and Scale Out/In Features, you could NOT select Version 6.5 as your vSphere Infra. Even though vRA 7.2 support vSphere 6.5


Thus, again!!! If you would like to use NSX with VRA, you need to have vSphere 6.0 instead. While the latest vROPS 6.4 provides native support on vRA 7.2 directly.


Deployment of vRealize Automation 7.2

I have to say, the deployment steps have been simplified by a LOOOOOOOT since version 7.0. And in version 7.2, actually the deployment steps are exactly same as the 7.0 and 7.1 version. In high level, for a minimal setup you would need to:

  1. Download and Deployment the vRealize Automation Appliance 7.2
  2. Prepare another Windows OS for vRealize Automation IaaS Server (Optionally install with MS-SQL if you want to have a simple deployment)
  3. Optionally, you can separate the MS-SQL which IaaS server depends on

When the above are ready and powered up. You can go to the https://<VRA Appliance>:5480 to start actually setting up the vRealize Automation 7.2 with the installation Wizard.

I would Recommend You allowing the IaaS Server to have Internet Access to Simplify the Setup

Go to the VRA Appliance 5480 URL

On opening the VRA 5480 port URL, you can first select the deployment method. As I’m doing a POC or Minimal Deployment, select “Minimal Deployment” and “Install Infrastructure as a Service”. Press Next to proceed the setup.


In this step, DO ensure your VRA is sync to a reliable time source. “User Time Server” is always a better option, but for POC purpose, you can optionally use “User Host Time” which is usually not so accurate.


You cannot proceed the above step until an IaaS Host is being discovered in the button panel “Discovered Hosts”

Do this on the IaaS Server 

Thus, you have to login the IaaS Server and install the Management Agent which you can find the installable from the wizard above as vCAC-IaaSmanagementAgent-Setup.msi.


Run the Executable and Press Next to proceed the setup


Accept the EULA and press Next to proceed the setup


Choose the installation path and press Next to proceed


Input the vRA 5480 URL at the vRA Appliance Address field, root user name and password. Click the Load button to retrieve the SSL fingerprint and check the confirmation checkbox. Press Next to continue.


Provide an Admin credential of the IaaS Server


Press Install to kick off the setup of IaaS management Agent on the IaaS server


Wait for the Completion of Setup


Click Finished to finalise the Setup


Go back to the VRA Appliance 5480 URL

You can See the IaaS server is available on the “Discovered Host” list now which allow you to proceed the Setup Wizard. Press Next to proceed the setup.


I simply love this step, as this is the Prerequisite Checker Wizard for Your IaaS Server. In vRA 6.X age, we need to use scripts or manual ways to install all the Pre-Requisites which can spend a lot of time and effort for doing those completely. Now, You can simply do this by this wizard. Again, 7.0, 7.1 also provide this Wizard.


Click the “Run” button above to start the checking on the IaaS Server. And click the “Fix” button to start preparing the prerequisites of the IaaS Server


Wait for the Completion of Setup


IaaS Server will be reboot after the setup is completed


Provide the vRA Service a workable address, meaning the DNS has to be working


Provide the SSO password, this is the SSO server password embedded in the vRealize Automation 7.2 Appliance


Provide the IaaS Server information and Login


Provide the DB server information, as I’m using the IaaS Server as the DB server also. So I use the IaaS.vmware.lab for my case and create new database. This is because as mentioned in the pre-requisites above, I have done nothing in the Database but just install a MS SQL Server in the IaaS Server Out Of the Box (OOTB).


Then input the DEM Manager information


And the vSphere Agent information which is setup along with the Installation Wizard which let you connect to vSphere Endpoint


We then need to generate the Cert for the vRA Appliance


And the IaaS Server which also hold the web sever


After all the input, you can validate the installation before starting


Wait for the completion of Validation


On Completion, press Next to continue the setup


Setup Wizard will recommend you to create a Snapshot before the setup. But as I’m just setting up it for POC and Green field. I Skipped this :). But Please Do take one when it’s an upgrade or production purpose.


Press Install to kick start the setup


The Wizard will start setting up your vRealize Automation Appliance, vRealize Automation IaaS Server and the MS-SQL Server which is used by the IaaS Server


The installation will take for a while and it’s time to have a cup of tea


O, yeah. The installation is completed


Then you would have to provide a valid License for your vRA Setup


Optionally check to Join the VMware Customer Experience Improvement Program


Optionally, check to create the Initial Content. But I would skip it, as I usually configure everything myself on the vRealize Automation in order to learn all the changes from Version to Version usually.


Provide the configuration admin password if you would like to Create the Initial Content


Either on Done or one Skip, the Setup will be completed and you can press Finish to complete the Setup.


All DONE! You can then login your vRealize Automation 7.2 console from the https://<VRA>. I will provide more what’s new item testing in the coming blogs soon. But for this blog, I think it’s lengthy enough again… 🙂


Horizon VIEW One Way Trust Step by Step

Horizon One-Way Trust Setup Procedure

As of the original state, there are two separate domains named CX.lab and VMware.lab correspondingly which there are no trust in-between. While a Horizon Environment is setup on the CX.lab domain, the domain user can connect remote hosted application. And the objective of the work is to enable the VMware.lab domain user to use the Horizon environment from the CX.lab.


As from the original state, I set up a Horizon 6.2 environment which is the first version we support the One-Way-Trust among domains. And originally, only CX.lab user can be found and entitled from the Horizon View Connection Server.


And Of course, only User from CX domain can login the Horizon View Client


And use the application available and entitled from the Horizon View


In order to let the VMware.lab user to use the CX.lab Horizon Provisioned Virtual Desktop or Hosted Application, we have to setup the One Way Trust from CX.lab to VMware.lab.


Do this DNS Configuration from the CX.lab Domain Controller

Firstly, you have to configure the DNS for the Zone Transfers to let CX.lab resolving Domain.lab environment and actually we need to do this in opposite bit later. I added the DC from the VMware.lab domain under the Zone Transfers to allow the Zone Transfer. No worry that you see a Red Cross as it is expected when no permission is granted yet.


Create a New Zone from the CX.lab


Click Next to proceed the Setup of the New Zone


Choose a Secondary Zone for the VMware.lab


Of couse, input the VMware.lab


Input the DC of the VMware.lab


Confirm the setup


Do this DNS Configuration from the VMware.lab Domain Controller

Create a New Zone from the VMware.lab to connect to the CX.lab DNS Server in opposite


Click Next to proceed the Wizard


Choose a Secondary Zone for connecting to the CX.lab Domain


Type the CX.lab Domain to connect to


And provide a domain controller IP from CX.lab domain to connect to


Press Finished to confirm the setup


If the Setup is correctly done, you can double check the Zone Transfer status and it should be green now.


Do this AD Configuration from the CX.lab Domain Controller

Then we need to configure the AD Trust to establish the one way trust from the CX.lab to VMware.lab


You have to open the “Active Directory Domains and Trusts” and edit the domain trust from the property of the domain object which is the CX.lab.


Click the “Trusts” tab


Click The “New Trust” to establish new trust


Press Next to proceed the Setup


Input VMware.lab as you need CX.lab trust VMware.lab


You can either choose “Forest Trust” or “External Trust”, this depends on the security level you can accept. But I use “External Trust” to making the security control more straighten.


Choose “One-way: Outgoing”


Choose “Both this domain and the specified domain”


Provide the domain admin user credential to establish the trust from CX.lab to VMware.lab


Select “Domain-wide Authentication”


Press Next to confirm the setup



Do this AD Configuration from the VMware.lab Domain Controller

Go back to the Active Directory Domains and Trusts and you can see a new “incoming trust” from the “Trust” tab. Click the item and select the “Properties”.


Click the “Validate” button to confirm the One-way-trust request


Input the domain admin username and password to validate


You can see the following message when the validation is successful


Go back to CX.lab Domain Controller to proceed the setup

Confirming the trust


Press Finished to complete the setup


Press OK to confirm the message


As an end result, you can see the VMware.lab Domain from the CX.lab Domain


So we can now start entitling the VMware.lab user into the CX.lab Domain

Entitling VMware.lab user into Horizon View

So after the one way trust setup, you need to configure the Horizon View Connection Server futher thru’ View PowerCLI


You need to use the following command to enabling the one-way-trust credential in the Connection Server.

“vdmadmin –T domainauth –add –owner <View Admin> -user <Remote-Domain-Admin> -password <Remote-Domain-Admin Password>”


Restart the Connection Server Services


On logging into the view admin page again, you can see the second domain already.


And you can entitle application to the User is the VMware.lab Domain now.


You can login the Horizon View with the VMware.lab domain


But you will find warning as the Hosted Application Does not allow the VMware.lab domain to login yet


This is why you have to add the VMware.lab domain users into the RDS Hosts


Afterwards, you would able to launch your app successfully


Awaiting too long for this!!! vCenter HA – Part 4

So this is the final part of the blog series for setting up the vCenter HA for your vCenter Server Appliance 6.5. To recap a bit, we have deployed the PSC nodes, NLB and configured the HA mode for PSC nodes. Now, we would perform the installation of vCenter (which is trivial) and also the vCenter HA (which is also quite trivial). Following steps will be taken:

  1. Install the vCenter Server using the Load Balancer virtual IP for the Platform Service Controller when prompted.
  2. Configure vCenter HA With the Basic Option
  3. Verify the vCenter HA function

You can read more about the vCenter HA deployment method, requirement and prerequisites from the vSphere Availability Guide. But to keep setup simple, we are using the Basic (trivial) deployment method which just require us to provide a port group and IPs for Heartbeating. slide1

While PSC01 and PSC01 is being deployed in the Part 2 of this blog series and the HA configuration has been configured and deployed with Part 3. In this blog, we are going to configure the VC nodes. We will first deploy the VC01 first then when we are enabling the vCenter HA with basic config, the VC02 and VC Witness will be cloned out and configured directly. So let’s get started!

Install the vCenter Server using the Load Balancer virtual IP for the Platform Service Controller when prompted.

Run again the vCenter Appliance Installer and choose Install


Press Next to proceed to deployment


Accept the EULA and press Next to proceed


We are deploying a vCenter with External PSC, so please choose “External Platform Services Controller” and “vCenter Server (Requires External Platform Services Controller)”.


Choose the ESXi host you would like to deploy the vCenter Server Appliance.


Accept the certification to proceed


Provide the VM name and Root Password info. You do not need to give it the node name. I mean if you are using “vCenter” as the management hostname, you can use it directly. We do not need to make vCenter01 and vCenter02 as the node names. As the vCenter HA will cloned out the secondary node and witness node with hostname configured. In simply words, you don’t need 3 Management IP for the 3 nodes, you just need one DNS and one IP for management. While 3 heartbeat IPs would be needed for the nodes.


Choose the size of the vCenter. I use Small configuration for development and testing purpose.


Choose the proper datastore for deployment.


Input the network information, as said, we input the IP which you wants to be the final service IP. Of course FQDN as the final service FQDN too.


Press Finish to proceed


Wait for completion of the deployment in stage 1, press continue to proceed to the stage 2.


Press Next to proceed to the Stage 2 of the deployment


This is an important step, you need to enable the SSH access for the vCenter Server as the pre-requisites for the vCenter HA


Input the SSO configuration, input the NLB-ed FQDN setup in the previous steps. For my case, it’s PSC.vmware.lab.


Confirm the input and press Finish to proceed


Press OK to proceed and do not interrupt the configuration and wait for the completion of deployment.


GREAT you got the vCenter Deployed!!!

Configure vCenter HA With the Basic Option

To have the Basic Option for vCenter HA, you would have to have your vCenter Self Managed Host. i.e. You need to have your vCenter managing your ESXi which running your vCenter VM itself.

I added the hosts into the vCenter management, actually you can see my screen I have two hosts only and this is why I have failed in first time. And turn out, I added one more host into the environment to make the deployment successful


Then you need to have a separate port group for the Heartbeat taffic.


After that, we can enable the vCenter HA from the VC object. Go to the “configure” tab and go to vCenter HA and press “Configure”


Choose “Basic” Configuration for the vCenter HA deployment.


Select a Heartbeat IP for the vCenter Node 1. This is an isolated network for the VC nodes to communicate, so it can be isolated one. And choose the corresponding port group from the “Select vCenter HA network”


In the next step of the Wizard, you would have to input the Heartbeat IP for the VC Node 2 and the Witness.


After that, press Next to continue the setup


In this page, you can select separate datastores for running the different nodes in order to enhance the availability level.


You can even select thin provisioning for different nodes


So the configuration will take place. (Again, you actually need to have three hosts to make the deployment success)


On successful deployment, you can see the page changed and here you can see which node is active, passive or witness. Green lights of course mean configuration is okay. For detail you can press “vCenter HA Monitoring” under the “Edit” and “Initiate Failover” button.


You can see more detail under the Monitor > vCenter HA.


So the configuration are all DONE!!!

Verify the vCenter HA function

So before completion, definitely we need to test failover. So just click the initiate failover under the configure > vCenter HA.


Press Yes to confirm the failover.


WOW, Failover in progress. And we need wait some time to let the web client up and running


WOW, failover is done and you can see now the Active and Passive Role swapped




PERFECT!!! So all done, we deployed the vCenter HA with the NLB HA PSC configuration. You can enjoy the best protection of your vCenter Appliance now. All the nodes of the following diagram has been deployed and they are up and running and working properly. This is really a great great feature for the vSphere 6.5 and I wish the blog series could let you deploy your instance easily too.


Awaiting too long for this!!! vCenter HA – Part 3

This is the part 3 of the blog series for configuring the new vCenter HA protection for your vCenter Server 6.5. As the previous blog posts, I am performing a deployment with best availability level and with supported and recommended topology defined by VMware. We have already deployed the PSC01 and PSC02 as shown in the following logical diagram. In this blog we focus in the NLB deployment and configuration, accompanying with the setup of HA mode across PSC01 and PSC02. We would be ready for setup the vCenter and deployment vCenter HA in the next blog post.


So in this blog we are going to perform the following steps for enabling a High Availability PSC which is supported by a Load Balancer:

  1. Create a new machine SSL certificate. For more information, see:
    Configuring certificates for Platform Services Controller for High Availability in vSphere 6.5 (2147627)
  2. Configure the load balancer. For more information, see:
    Configuring Netscaler Load Balancer for use with vSphere Platform Services Controller (PSC) 6.5 (2147014)
  3. Verify the machine Certificate:
    vCenter Server Appliance – /usr/lib/vmware-vmafd/bin/vecs-cli entry list –store MACHINE_SSL_CERT –text
  4. Verify the Load Balancer is presenting the same certificate:
    vCenter Server Appliance – openssl s_client -connect SSOLB.vmware.local:443
  5. Run the configuration scripts on the Platform Service Controllers. For more information, see
    Configuring PSC Appliance for High Availability in vSphere 6.5 (2147384)

Create a new machine SSL certificate

So according to the VMware KB, after deploying the two PSC servers. We have to configure the two nodes using the same SSL certs. Therefore, we have to generate new SSL certs and replace those on both PSC nodes. The following steps are refer to KB2147627:

Creating the certificate request

Using a text editor, create the psc_ha_csr_cfg.cfg file with these entries:

[ req ]
distinguished_name = req_distinguished_name
encrypt_key = no
prompt = no
string_mask = nombstr
req_extensions = v3_req
[ v3_req ]
basicConstraints = CA:false
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = DNS:psc-ha-a1.domain.com, DNS:psc-ha-a2.domain.com, DNS:psc-ha-vip.domain.com
[ req_distinguished_name ]
countryName = Country
stateOrProvinceName = State
localityName = City
0.organizationName = Company
organizationalUnitName = Department
commonName = psc-ha-vip.domain.com

The subjectAltName values should contain all PSC FQDNs that will participate in this HA Site, including the Load Balanced FQDN.
The commonName value should be the Load Balanced FQDN.

For my case, I use psc.vmware.lab as the NLB FQDN. And psc01 and psc02 would be the two appliances I deployed in the previous blog post. And you don’t need to thing about where to generate the cert. You can directly generate it on the PSC nodes and just you need to do this step in one of the PSC of course.


Run this command to create a psc-ha-vip.csr and a psc-ha-vip.key file.

openssl req -new -nodes -out /certs/psc-ha-vip.csr -newkey rsa:2048 -keyout /certs/psc-ha-vip key -config /certs/psc_ha_csr_cfg.cfg

Note: 2048 bit key length private key is created with rsa:2048. This value can be increased, 2048 is the minimum supported key length.

Again, I perform this step on the PSC server directly, again you just need to do this step once as the same cert files will be used by the other PSC server


So after that you have the necessary cert request files for proceeding the next for generating the cert out.

Generating a certificate from the VMCA

I leverage the VMCA directly instead of an external CA as i think for most of the case, the vCenter certificates using in environment are the one generated by the VMCA. So following the steps in the KB.

Run this command to create the certificate from the psc-ha-vip.csr and the the psc_ha_csr_cfg.cfg file outputting a psc-ha-vip.crt file.

openssl x509 -req -days 3650 -in /certs/psc-ha-vip.csr -out /certs/psc-ha-vip.crt -CA /var/lib/vmware/vmca/root.cer -CAkey /var/lib/vmware/vmca/privatekey.pem -extensions v3_req -CAcreateserial -extfile /certs/psc_ha_csr_cfg.cfg

Run this command to copy the current VMCA root certificate and rename it to cachain.crt.

cp /var/lib/vmware/vmca/root.cer /certs/cachain.crt


Run this command to create Machine SSL Certificate that contains the newly created certificate and the VMCA root certificate named psc-ha-vip-chain.crt.

cat /certs/psc-ha-vip.crt >> /certs/psc-ha-vip-chain.crt
cat /certs/cachain.crt >> /certs/psc-ha-vip-chain.crt


Preparing Certificates
Three certificates should have been created

  1. psc-ha-vip-chain.crt
  2. psc-ha-vip.key
  3. cachain.crt

Validate the certificate information
Run this command to open the certificate:

openssl x509 -in /certs/psc-ha-vip-chain.crt -noout -text

Ensure that the Subject CN value is the correct Load Balanced FQDN.
Ensure that the the DNS values contain all PSC FQDNs and Load Balancer FQDN.

So My DNS value shows all the three elements: FQDN of NLB, PSC01 and PSC02


And My CN is the NLB FQDN

Replacing the Certificates on the Platform Services Controller
Launch the Certificate-Manager using this command:


Select Option 1, then Option 2.
Provide the paths to the psc-ha-vip-chain.crt, psc-ha-vip.key and cachain.crt files created in the Preparing Certificates section.

For example:

Please provide valid custom certificate for Machine SSL.
File : /certs/psc-ha-vip-chain.crt
Please provide valid custom key for Machine SSL.
File : /certs/psc-ha-vip.key
Please provide the signing certificate of the Machine SSL certificate
File : /certs/cachain.crt
Important: Replace the Machine SSL Certificate of the additional PSC using the same certificate.


On finishing, I personally copy the certs into the nodes to and perform the same cert replacement tasks. If you are trying to leverage scp, do remember you need to enable the bash shell in another PSC nodes: e.g.


Configure the load balancer

So then we can configure the load balancer for the PSC nodes, I use Netscaler in this case. This is not because I have experience in Netscaler or a fans of Citrix. But because the deployment overhead is easier. We can have NSX or F5, but my lab has to be upgraded first before I got more resources for deploying those.

The load balancer requirement for PSC is actually easy to meet and thus I just download the free edition Netscaler and perform a one arm NLB (simplest possible configuration). The link is here.


Of course I download the ESXi format one which level me deploy on the same ESXi hosts I setup for running the PSC and vCenter Nodes. There is a nice blog post here which can guide you thru’ the basic setup and deployment for the VPX Express.

So after the NLB is up, we are following the VMware KB2147014 to perform the simple setup. There are overlapping part in the KB due to documentation bug, but you can following the following steps (I added some more complementing wordings in bold):

To configure the Netscaler Load Balancer to provide the vSphere 6.5 Platform Services Controller (PSC) High Availability. You need to carry out 5 steps in your Netscaler:

  1. Add Platform Controller Servers Under the Server Tab
  2. Add Services
  3. Create Virtual Servers
  4. Create a Persistency Group
  5. Verify Servers, Services, Virtual Servers

So log into the Netscaler Web UI
Adding Platform Controller Servers

  • Navigate to Configuration > Traffic Management > Load Balancing > Servers.
  • Select Add.
  • Enter a Server Name for the First PSC Node.
  • Enter an IP Address for the First PSC Node.
  • Click Create.
  • Repeat these steps for the Additional PSC Node.


Adding Services

  • Navigate to Configuration > Traffic Management > Load Balancing > Services
    Select Add.
  • Enter a Service Name.
  • Select Existing Server.
  • Select the First PSC Node from the Server drop down menu.
  • Click Protocol and then select TCP.
  • Click Port and enter 443.
  • Click OK.
  • Repeat for these ports 389, 636, 2012, 2014 and 2020 for both PSCs.


Creating the Virtual Servers

  • Navigate to Configuration > Traffic Management > Load Balancing > Virtual Servers.
    Select Add.
  • Enter a Name.
  • Click Protocol and then select TCP.
  • Click IP Address and enter the Load Balanced IP Address.
  • Click Port and then 443.
  • Click OK.

Note: If you are asked to enable the ‘LB’ Feature, click Yes.

  • Under Services and Service Groups select No Load Balancing Virtual Server Service Binding.
    Click >.
  • Select the two services for port 443.
  • Click OK. The added Services should appear in the Select Service box.
  • Click Bind.
  • Repeat the preceding steps for ports 389, 636, 2012, 2014 and 2020.

Note: There should be 6 Virtual Servers after this process.

Create a Persistency Group

  • Navigate to Configuration > Traffic Management > Load Balancing > Persistency Groups
    Click Add.
  • Click Group Name provide a name.
  • Click Persistence and then select SOURCEIP.
  • Click Time-out and enter 1440.
  • Click Virtual Server Name and then click the + button.
  • Click the > button to move all six PSC VIPs to the Configured pane.
  • Click Create.


Verify Servers, Services, Virtual Servers

  • Navigate to Configuration > Traffic Management > Load Balancing > Servers.
  • Verify that both PSC Servers are online and enabled.
  • Navigate to Configuration > Traffic Management > Load Balancing > Services.
  • Verify that all Services are UP and that there are two Services for each Port
  • Navigate to Configuration > Traffic Management > Load Balancing > Virtual Servers.
  • Verify that all Virtual Servers are UP, that they map to the correct Load Balanced PSC HA IP Address and that there is a Virtual Server for each Port

Verify the machine Certificate

After the NLB is setup we can verify the Machine Certificate, this is by one single command. In the KB, you can see the “bin” is missing, hopefully it will got fixed soon.

 /usr/lib/vmware-vmafd/bin/vecs-cli entry list –store MACHINE_SSL_CERT –text

Verify the CN, for mine it’s psc.vmware.lab


Verify the DNS again, for mine, it includes all the NLB and node FQDN


Verify the Load Balancer is presenting the same certificate

Check the Load balancer

vCenter Server Appliance – openssl s_client -connect SSOLB.vmware.local:443

For confirming the CN and DNS again

Run the configuration scripts on the Platform Service Controllers

Finally, as the last step in setting up your HA PSC with NLB. You need to run the script to configure the PSC nodes. We are following the VMware KB2147384.

Configuring PSC HA 6.5

 To configure the PSCs for load balancing, run updateSSOConfig.py and updateLsEndpoint.py scripts:
  • The updateSSOConfig.py script updates information local to each PSC and must be ran on all PSCs in the HA instance.
  • The updateLsEndpoint.py script updates the ServiceRegistration Endpoints in VMDir and only needs to be ran on one of the PSCs in the HA instance.
Running the updateSSOConfig.py script
  1. Connect to the PSC appliance and log in with root credentials.
  2. Type shell to access the Bash shell.
  3. Navigate to /usr/lib/vmware-sso/bin with this command:cd /usr/lib/vmware-sso/bin
  4. Run this command:python updateSSOConfig.py –lb-fqdn=psc-ha-vip

    For example:

    python updateSSOConfig.py –lb-fqdn=loadbalancer.vmware.com

  5. Repeat these steps on remaining PSCs.

psc02 psc01

Running the updateLsEndpoint.py script
  1. Connect to the PSC appliance and log in with root credentials.
  2. Type shell to access the Bash shell.
  3. Navigate to /usr/lib/vmware-sso/bin with this command:cd /usr/lib/vmware-sso/bin
  4. Run this command:python UpdateLsEndpoint.py –lb-fqdn=psc-ha-vip.domain.com –user=administrative_user –password=password

    For example:

    python UpdateLsEndpoint.py –lb-fqdn=psc-ha-vip.domain.com –user=administrator@vsphere.local –password=VMware123$

    Note: Perform these step on a single PSC node only.


GREAT! Your PSC servers and NLB are working now! Finally, you can deploy the vCenter and further perform the vCenter HA configuration!!! I will document that in the next Blog.


Awaiting too long for this!!! vCenter HA – Part 2

As recap, in this series of blogs I would perform the deployment of vCenter HA with the topology of the most comprehensive protection. I would deploy the following configuration to have HA deployment for PSC 6.5 on a Load Balancer while vCenter HA will be enabled for the vCenter server.


So, I have gone through about ten steps while those are NOT made up by me but from VMware KB and VMware Guides (So it’s quite trust worthy). As the version 6.5 is still bit new, I could see some typo in the KB. But no worry, I will highlight the caution steps. As from the image above, left hand side is the high level architecture, there are 6 components in total and I realised those as 6 VM of the logical setup on the right hand side. I mainly followed the steps from

  1. VMware KB 2147018: To setup the NLB and PSC nodes, i.e. PSC01, 02 and NLB boxes
  2. VMware ESXi vCenter 6.5 Availability Guide: To setup the vCenter HA i.e. VC01, 02 and VC Witness boxes

So the 10 steps are performed are as following:

  1. Install the primary external Platform Services Controller node.
  2. Deploy the secondary SSO node as a replication partner to the primary Platform Service Controller node.
  3. Create a new machine SSL certificate. For more information, see:
    Configuring certificates for Platform Services Controller for High Availability in vSphere 6.5 (2147627)
  4. Configure the load balancer. For more information, see:
    Configuring Netscaler Load Balancer for use with vSphere Platform Services Controller (PSC) 6.5 (2147014)
  5. Verify the machine Certificate:
    vCenter Server Appliance – /usr/lib/vmware-vmafd/bin/vecs-cli entry list –store MACHINE_SSL_CERT –text
  6. Verify the Load Balancer is presenting the same certificate:
    vCenter Server Appliance – openssl s_client -connect SSOLB.vmware.local:443
  7. Run the configuration scripts on the Platform Service Controllers. For more information, see
    Configuring PSC Appliance for High Availability in vSphere 6.5 (2147384)
  8. Install the vCenter Server using the Load Balancer virtual IP for the Platform Service Controller when prompted.
  9. Configure vCenter HA With the Basic Option
  10. Verify the vCenter HA function

So let’s get started and deep dive in the steps!!! In this blog post, I will perform the step 1 and step 2. While a separate blog post will be written for Step 3-7 and another for Step 8-10.

Install the primary external Platform Services Controller node.

Trivial step, thanks for the improvement of the installer. Not matter windows based or Appliance based installation is so simple now. In this deployment, I use all Virtual Appliance for all the PSC nodes. So following is the installation wizard of the vCSA 6.5, you do NOT need to install any plug-in before running it (still remember the one for 6.0?). Thus, you can run it even on your Mac.


After choosing the “Install” on the first page, you will come to the Wizard for deploying the appliance. Here, it illustrates you that there are actually two Stages of deployment while the first step is to “Deploy Appliance” and the second stage is “Set up appliance”. We are at stage 1 and press next to continue.


We then need to accept the EULA and press next to continue


Next, we have to choose the deployment topology. While, we are deploying just the PSC, so choose the “External Platform Services Controller” and “Platform Services Controller” Option. Press Next to Continue.


Provide the ESXi information you would like to deploy your PSC appliance. I have three hosts in my environment and this is the minimum actually. Since VC01, VC02 and the witness servers have to be put on three different hosts by default.


Accept the SSL cert warning


Provide the VM network and password for the root user of the PSC Appliance


Choose the storage to put on. Usually I would put two PSC into two different datastore for better residency.


Provide the IP information and choose the correct port group


Review the information and press finish to proceed the Stage 1 deployment


Wait for the completion of task and proceed to the Stage 2 by hitting continue


In the Stage 2 of the deployment, we will setup the PSC appliance. Press Next to proceed the setup.


DO use NTP server for syncing the time for the PSC server. But as I don’t have a NTP server, I choose to sync it with the ESXi host which could more possibly causing time drift between your machines.


I like to enable the SSH access for all the nodes. *DO remember this is mandatory for the vCenter Nodes you going to deploy bit later in step 8.


As usually, provide the SSO domain information and press next to proceed.


Configure the CEIP, and press Next to continue


Confirm the setting and hit Finish to start the Stage 2 setup


As in stage 2, we are NOT just deploying an OVA as stage 1. Lot more configuration and packages works are kicked start in this step, so Wizard would prompt you NOT to interrupt it.


Wait for the completion and click Close


Log into the PSC to confirm the SSO admin page is shown and everything is working


Great! Step 1 is DONE!!! Proceed to Step 2…

Deploy the secondary SSO node as a replication partner to the primary Platform Service Controller node.

We have to run the installer wizard again for deploying another PSC appliance in the environment



Basically, we did same  procedures for the Stage 1 deployment. But we do have some different in Stage 2 for joining this newly deployed PSC into the Existing PSC which we deployed in Stage 1.


Accept the EULA and press Next to Continue


As said, since we are using external PSC, choose the “External Platform Services Controller” and “Platform Services Controller”. Press Next to proceed the setup.


Although you can use the same host for deploy the second PSC, I choose another hosts for better resiliency. Remember that you still have to configure the DRS affinity rule to separate the PSC servers after the vCenter is being setup later.


Accept the cert warning and proceed


Provide the VM name again and define the Root Password for the PSC appliance


Choose a separate Datastore for 2nd PSC such that 1st and 2nd PSC are running on different storage to provide a better resiliency level.


Provide the FQDN, network information for the 2nd PSC


Confirm the input and click finish to proceed the Stage 1 deployment


Wait for the completion of PSC deployment and click “Continue” to proceed the Stage 2 setup for the 2nd PSC Server


Press Next to kick start the Stage 2 deployment of the 2nd PSC


Configure the Time Sync Setting and Enable the SSH access again. (Same like the 1st PSC)


Choose “Join an existing SSO domain” in this step, input the information of the Existing PSC server and press next to continue the setup


Choose “Join an existing site”, as the diagram shown in the wizard, this option refers to a High Availability Setup. If you are doing a cross site setup, you would need to choose “Create a new site” instead (we are not doing this).


Choose Join or not Join the CEIP and click Next to proceed the setup


Confirm the configuration and press Finish to proceed the Setup


Again, the wizard will warn you not to interrupt the setup. Click OK to proceed


Wait for the completion of deployment


DONE!!! You have finished the Step 1 and Step 2 of the setup. Please refer to the Next blog for the Step 3-7 setup which focus in the HA configuration of the deployed PSC severs.

Awaiting too long for this!!! vCenter HA – Part 1

As of the GA of the vSphere 6.5, of course so do the VSAN 6.5, vROPS 6.4, LogInight 4.0 and vRealize Business for Cloud 7.2, it has been a bit busy for me to redeploying my lab environment. The pain point why I was not upgrading my existing 6.0 environment, it’s because there are no supported version of NSX being realised yet. So if you are currently using NSX, then vSphere 6.5 may not be a viable option for you yet. Actually Even for pure vSphere environment, you still need to ensure the interoperability and compatibility of Hardware and Software. Many of those are still yet certified in this stage, such as if you are upgrading the Site Recovery Manager 6.5, some SRAs are yet 6.5 supported.


Therefore, I think likely you would use vSphere 6.5 for development, test and evaluation purpose at this stage. Well, from my perspective, definitely I would like to test every what’s new as always. While you can refer to the VMware Official What’s New for vSphere 6.5,  but in this blog I would like to talk about one most important item, vCenter Server HA.

It has been always a topic on how to protect the vCenter Server itself. If you are not too new to VMware, actually you can recall a tools named vCenter Server Heartbeat which is a separate product for protecting the vCenter. Honestly, I thought it was a great product, however, it has been discontinued. Well… I know many of you may ever suffered from split brain, unexpected downtime using the vCenter Server Heartbeat. But most of the cases I come across is because of configuration issue.


May be that’s normal for one not try treasuring something until we lost something. Before vCenter Heartbeat faded away, actually I saw few Customer looking into HA solutions for vCenter Server. But after it gone, more and more customers do wanna find better ways to protect their vCenter. Of course, it could be generated by the fact that vCenter becomes more and more important in Customer environment when newer solutions say VDI, Cloud… are being deployed and using with the vCenter. Even VMware, there is a new KB and white paper illustrating step by step configuration on how you should choose and deploy the protection on vCenter Servers thru’

  1. vSphere HA (High Availability)
  2. vSphere FT (Fault Tolerance)
  3. Native Watchdog
  4. Microsoft Failover Cluster

While it is trivial for vSphere HA deployment. vSphere FT requires quite high Requirement in network and limited sizing in CPU (Max 4 for SMP-FT in vSphere 6.0). Thus, previously I have tested on the Microsoft Failover Cluster before.

In the Version 6.5, here it comes another new vCenter Server Protection method which is named vCenter HA. And that’s why I have deployed a new lab and performed a vCenter HA setup to test out this cool new feature. In order to test this out truly for a Production environment, I have followed the following KBs and Documents in designing and deploying the setup:

  1. Supported and deprecated topologies for VMware vSphere 6.5 (2147672)
  2. Configuring Platform Service Controller HA in vSphere 6.5 (2147018)
  3. Configuring Windows PSC for High Availability in vSphere 6.5 (2147527)
  4. Configuring Netscaler Load Balancer to provide the PSC 6.5 High Availability (2147014)
  5. vSphere Availability Guide

So the first thing, we need to choose a deployment configuration best for your use case and environment. From the KB2147672, there are two configurations recommended. Recapping those as following:

Configuration 1)


Does not support Enhanced Linked Mode
Does not support Platform Service Controller replication

Configuration 2)


As the limitation of the configuration 1), I do not think it’s very legitimate to deploy such configuration in a large environment, where there are multiple vCenter Servers in an environment for performing cross vCenter vMotions or in the future the vMotion to the Cloud. And this is why I didn’t even consider to test the first option. However, originally I don’t want to deploy the second option too… as I need to setup more servers and load balancer for it. So I have consult VMware if my original design as following is supported.

Configuration 3) A mixed between Configuration 1 and 2 which putting PSC external but skip the dependence of a load balancer


What I got from Support, this configuration is also supported and no limitation on enhanced linked mode. Thus, if you do not have a load balancer, i would recommend you deploying this configuration. But do remember that this deployment method only protects the vCenter Server but not the PSC server. You PSC is the Single Point of Failure using this deployment. And this is why turn out I have actually deployed a Configuration 2 setup in the lab, but it takes some more steps in setting up the load balanced PSC. The detail steps will be blogged in separate posts.

Cloud Native And Container with VMware

Trust me, I am coming from Infrastructure background. I worked on Wintel area for years and have been worked in virtualisation area for some more years. Actually, I didn’t touch linux until I met ESX 4.0 which is the first ESXi generation I work with. Definitely you would find it’s not an early version at all, some of you may have been playing around with version 3.0 or 3.5 and some even perhaps GSX. So why I would like to blog about Cloud Native Applications and Containers stuffs? Is it because this is founded by VMware? Definitely not. But I would like to share with you why I think this is important for me and how my mindset changed after I looked into this area.


I think I had tried to pick up Container stuffs years ago actually when Docker was out and became so hot in the market. I started to learn this from a pure infrastructure people viewpoint. I was not amused by it as i think Virtual Machine was far better than a Container which is Linux based only stuff and not scalable and secure. And then I dropped it for a while.

Yet, few months ago, I’ve had another change to pick up again the Container stuffs again as I was assigned to look into Cloud Native Application and DevOps area. Although it was not my focus at all before, I did tried spending my effort in picking those up to support my team. And this time… it is way too different and my brain got totally refreshed. I exaggerated nothing and I think the difference comes as I had used a totally different angle to re-engage the four confluences: Containers, DevOps, CNA and Micro-services. It was definitely my fault to judge from technical and technology comparisons, but I started my learning again as I see how these stuffs are complementing each others just for one single purpose which is to “Make Works working”.


After recalibrating my mindset, I see how making sense the four confluences are and I see how easy the latest VMware Open Source Projects helps in enabling these technologies on VMware Platforms.

Photon Platform and vSphere Integrated Container

While there are a number of Open Source Projects, the two above are more important ones for making your VMware Environment now Container Compatible. With such, you don’t need to revamp your environment or setup another Docker based environment at all, instead you can just deploy those over your ESXi and then your environment just got Supercharged instantly. Although both solutions are Container Supported, they are not identical and are designed for serving two different use cases.

vSphere Integrated Container (VIC)

This is a solution suitable for you if you are just starting looking into Container Technology. It is when you may not be so confident in managing and operating native Containers in your environment, VIC will be very handy for you. VIC provides Docker Api for consumption, such that after you deploy a Virtual Container Host thru’ VIC, you can give out the Docker API URI to the developers for using. They can pull, run and push docker images into the Virtual Container Host just like any Docker Host. The good thing is, operators and administrators can manage and operate the Docker Host and Containers just like a VM as the Virtual Container Host will thread out a new VM for running the Container Process when a new Docker process is being requested.


Photon Platform

And photon platform is instead for more mature users such that you may need to provide a Multi-tenant Container Host infrastructure. Perhaps, you need to let your user deploying Container Host on demand, or let them having their own kubernetes, docker swarm or mesos running on top of a common platform. This is the case when you should consider Photon Platform. As this makes a lot of sense when one has to support Microservices development. When you are developing Microservices, it is possible for many different services running different technology and thus Container Orchestrators. By the way for a Photon Platform, container processes are not being threaded out as VM like VIC anymore, but they would be directly running on the Container Hosts, i.e. Photon OS.


Thus, I believe the use cases direct which way you should pick. But anyway, these two tools have been already GA and fully tested for production. The tools have lower the pre-requisities and hurdle for you to pick up Containers related technologies and are free. So stop stopping, let’s try it out in your environment too.

VMware Virtual SAN (vSAN) Day 2 Operation – Scaling your vSAN

The number of VSAN customer reference has just passed 5500. This is quite a milestone stating that hyper-converged infrastructure (HCI) and software defined storage (SDS) are being focused. We got definitely a range of options in the market and they offer different functionalities and architectures but anyway they are targeting to start from small and grow linearly in scale when more and more physical hosts are added into the environment. But sometimes you may have multiple instances of HCI deployed in the environment and would like to consolidate it, move it or wipe it. In this blog, I am going to walk through the steps and considerations you should take care when you are trying to do those actions on VMware Virtual SAN (vSAN).


vSAN is radically easy, the functional modules are already inside the ESXi you installed. What you need on top is 1) a network for running VSAN traffic 2) SSD as the cache and HDD as the capacity. VMware offers VSAN Ready Nodes, EVO Rail and VxRail for simplifying the deployment even more. Trust me, if you want to test it in a lab and you have got every necessary pre-requisites ready, it would not take you more than 15 minutes to set it up.

As the slogan of many HCI, “starting from small”. Although when someone asking me about VMware Virtual SAN (vSAN), I would definitely recommend one having 4 hosts as bare minimal. Yes, you would say 2 nodes, ROBO and even 3 nodes are officially supported. But you know, when it is talking in BAU viewpoint, 4 node VSAN is way better than those as it would automatically rebuild stuffs (i mean replicas) in your existing nodes when 1 of the nodes go down. For the other deployment methods mentioned, it is not painful but requires manual steps in rebuilding.


However, reality is reality, some of my customers would still prefer 2 nodes or 3 nodes VSAN for their trial, pilot or development purpose. And thus, they would either have to expand the VSAN when they are (more than likely ?) happy with VSAN, OR to destroy the VSAN (sadly ? but you still need to migrate out your data).

Expanding a VSAN

When we are expanding the VSAN, there are two ways of cox. First, scaling up by adding more disks into the existing servers and second, scaling out by adding more nodes into the existing VSAN cluster. Usually, one likely scales up the VSAN before scaling out. Well… should be legitimate that disks are cheaper than a server node, right?

Scaling UP

So when you are scaling up your VSAN nodes, you can accomplish this by adding extra:

  1. Magnetic disks into disk group (which increase capacity but may degrade performance due to cache:capacity %), you can add as many as seven magnetic disks for each of the disk group
  2. Disk Groups into VSAN cluster (which increase capacity & also the cache tier for maintaining performance), remember that VSAN datastore assume all disk groups are homogenous and all disk groups are actually belonging to one single VSAN datastore

Again, as VSAN is designed to be simpler and more compatible to multiple hardwares, you won’t be warned actually if you are adding more magnetic disks into the environment while the performance is being degraded, and you also won’t be warned even you are adding All-Flash Disk Group with a slower Hybrid Mode Disk Group which definitely causes mismatch performance among different data blocks inside a VSAN datastore.

So the steps for you to upgrade the Capacity Tier is trivial, you can just simply insert more disks into your server and claim the disks at the VSAN interface on the vSphere Web Client. You can see the effective capacity too. It’s not very difficult and following you can refer to the sample steps.

Expand VSAN by Adding in more Disks

  1. Perform the scaling up tasks on all VSAN nodes one by one, for each nodes, perform the following actions to increase the capacity
  2. Put a VSAN Node into Maintenance Mode
  3. Do select the “Ensure Accessibility” on promptensure-accessbility
  4. Shutdown the Host (skip this if your server support hot plug)
  5. Add a new Harddisk to the server
  6. Power On the Host (skip this if your server support hot plug)
  7. Claim the new disk into your existing Disk Groupadd-disk
  8. Exit Maintenance Mode
  9. Check the VSAN Capacity to ensure Disk is managed by VSANadded-disks

Expand VSAN by Adding in more Disk Group

  1. Perform the scaling up tasks on all VSAN nodes one by one, for each nodes, perform the following actions to increase the capacity
  2. Put a VSAN Node into Maintenance Mode
  3. Do select the “Ensure Accessibility” on prompt
  4. Shutdown the Host (skip this if your server support hot plug)
  5. Add new Harddisks including SSD and HDD to the server
  6. Power On the Host (skip this if your server support hot plug)
  7. Create a new Disk Group and Claim the SSD and HDD addeddisk-group
  8. Exit Maintenance Mode

So what if you have already fill up all the 5 (Disk Group) x 7 (Capacity Tier Disk) = 35 Disks in your VSAN Nodes? You can still scale up it by replace the old disks with new larger disks. Well this would be bit more complicated but still can do thru’ the above steps.

Scaling OUT

If performance and data availability are also important when you are trying to upgrade the VSAN capacity, then you should consider Scaling OUT approach. The beauty of HCI is linearly scalability, meaning that adding one more node/block into the environment not only scale the network or storage capacity in your environment but it also scale the performance at the same line. Some vendor claims this as a web scale characteristic but VMware didn’t define a special term for this. So scaling OUT a VSAN just means adding a new VSAN Node into the VSAN Cluster. But as would like to make it bit more complicate, I am expanding a VSAN Cluster from 2-Node Deployment to 3-Node. The steps are as follows:

  1. Install and setup a new ESXi Host
  2. Putting the ESXi into the existing VSAN Cluster
  3. Configuring the VSAN Network of the ESXi
  4. Remove the VSAN Witness Applianceremove-vsan-witness
  5. Remove the VSAN Fault Domains
  6. Resynchronise the VSAN Object at VSAN Health

Shrinking a VSAN

Just opposite the above session, we are trying to reducing a VSAN deployment here. Definitely, I don’t wish you doing this ?, but I think there are still positive reason for doing this e.g. if you are trying to increase the VSAN capacity by removing existing old and small disks in the existing VSAN cluster. And following captures the possible actions we gonna to perform,

Scaling Down

Shrink VSAN by removing Disks

  1. Perform the scaling Down tasks on all VSAN nodes one by one, for each nodes, perform the following actions to increase the capacity
  2. Put a VSAN Node into Maintenance Mode
  3. Do select the “Ensure Accessibility” on prompt
  4. Remove the disk from your existing VSAN Disk Group
  5. Do select the “Full Data Migration” on promptevacuate
  6. Shutdown the Host (skip this if your server support hot plug)
  7. Remove the Unclaimed Disk from the server
  8. Power On the Host (skip this if your server support hot plug)
  9. Exit Maintenance Mode

Shrink VSAN by removing Disk Groups

  1. Perform the scaling Down tasks on all VSAN nodes one by one, for each nodes, perform the following actions to increase the capacity
  2. Put a VSAN Node into Maintenance Mode
  3. Do select the “Ensure Accessibility” on prompt
  4. Remove the whole Disk Group from your existing VSAN Cluster
  5. Do select the “Full Data Migration” on prompt
  6. Shutdown the Host (skip this if your server support hot plug)
  7. Remove the Disks from the Disk Group of the VSAN Cluster from the server
  8. Power On the Host (skip this if your server support hot plug)
  9. Exit Maintenance Mode

So effectively you will see the VSAN capacity being shrunk from the Monitoring Tab.

Scaling IN

Another scenario for shrinking VSAN can be achieved thru’ retiring and removing a VSAN Node from a VSAN cluster. Well, again and again… this is not something I would like to see but do remember that when capacity is reduced, performance is also degraded. (More Node, More performance)

Again, as scaling in 5 node to 4 node, or 4 node to 3 node is very trivial which you just need to put an ESXi host into maintenance mode with full evaluation. So I would like to show the steps from downgrading a VSAN from 3 nodes to 2 nodes. Following please refer to the steps:

  1. Install and setup a new VSAN Witness Appliance
  2. Join the VSAN Witness Appliance into the existing VSAN Cluster
  3. Configuring the VSAN Network of the VSAN Witness Appliance
  4. Put Existing Node (you want to remove) into maintenance mode with “Ensure Accessible”
  5. Move Away the Host from the VSAN cluster and Disable the VSAN network
  6. Reconfigure the VSAN to include witness server
  7. Build the Fault Domains according to the Wizard
  8. Resynchronise the VSAN objects to ensure the VSAN Healthiness

Consolidating a VSAN

It also happens some customer do have two or more VSAN clusters being deployed under a vSphere Environment, so you may have to consolidate smaller VSAN clusters into a bigger VSAN cluster. No worry, actually you know the steps already. Where? It’s just above this session, because:

Consolidating a VSAN cluster is just = Reducing one VSAN cluster + Shrinking another VSAN cluster

Of course you need to do storage vMotion across the VSAN cluster during the consolidating process.

Removing a VSAN

Likewise, if you would like to completely remove the VSAN cluster. You could keep following the Shrinking a VSAN steps to keep reducing the size of VSAN cluster. You may claim why not just vMotion all VM out of the VSAN cluster and disable the VSAN directly. Well… technically you can, but you need spend some time to remove the disk partition on the local disks if you wanna reuse the local disks as normal local datastores.

So I think I have covered quite a lot in Scaling your VSAN Cluster and I wish this would be helpful for you.