Bits and Bytes of Virtualization

November 30, 2015
by zach

Commitmas is Almost Here

Last year Matt Brender (@mjbrender) started a little movement called Commitmas. As we approach the end of the year, Commitmas is almost here! At the heart, it is all about learning and sharing with the community. In the past, GitHub was an application developer’s playground. As infrastructure is becoming more and more managed by code, revision control is a must. GitHub or some other revision control system should be at the top of all IT Pro’s list of skills to

Commitmas was only twelve days last year but this year, a couple of the vBrownBag crew (Jonathan Frappier & Rob Nelson) has expanded it tremendously. This year, it is the entire month of December. Community engagement has expanded with the addition of an entire series of vBrownBags (sign up here), a twitter account (@commitmas), and a new Commitmas repository for 2015.

I didn’t join in on Commitmas last year as I didn’t see it until it was almost over. As I started to learn Python in late 2014 into early 2015, I used GitHub to keep track of my progress as well as learn GitHub at the same time. Unfortunately, I haven’t used GitHub much since then except for sharing a few PowerCLI scripts and vRO workflows. I’m not sure what I will be committing this Commitmas but I plan to make it through the entire 30 days!

Get signed up on GitHub, join the vBrownBag events, and be social while you learn a new skill! I urge you to join the challenge with the community!


August 20, 2015
by zach

Unable to Expire or Power On vRA Managed Machine

Eric and I are deploying a distributed installation of vRealize Automation 6.2.2 with the help of a VMware architect on-site. We have progressed nicely through less than two weeks with the exception of some load balancing issues. Today we were deploying a VM into our environment and testing out different functions within the vRA interface. After a bit of testing, we were unable to expire or power on a vRA managed machine. Here’s where we ran into an issue.

A VM had been deployed by vRA and was online. I set the VM to expire. We checked the Requests tab to see if the request had successfully processed. It said it did but the VM never powered down. Also when viewing the VM within the list of Items in vRA, the status still reflected “On”.

Time to troubleshoot! we checked the Log under Infrastructure > Monitoring > Log. The following error was shown:


Workflow ‘FireVirtualMachineEventRequest’ failed with the following exception: The HTTP request is unauthorized with client authentication scheme ‘Anonymous’. The authentication header received from the server was ‘NTLM,Negotiate’. Inner Exception: The remote server returned an error: (401) Unauthorized.

After a bit of digging, the “Negotiate,NTLM” bit in the error was the key. We checked the Web server’s IIS Windows Authentication Providers. Negotiate was listed above NTLM, which was the incorrect order as shown.


After moving NTLM to top provider as shown,


make sure you restart IIS with “iisreset” in the command line. We then tested expiring the VM. It was successful!


Later when I attempted to power on the VM, I received the same error and the VM was never powered on. The status wasn’t expired, it was just powered off.

I then logged into my DEM Orchestrator servers and checked the same setting with the providers in IIS. Sure enough, Negotiate was listed above NTLM. I moved NTLM to the top and restarted the DEM-Orchestrator services.

Success! The VM powered on successfully!

The NTLM provider should have been ahead of Negotiate as we ran Brian Graf’s vRA 6.2 Pre-requisite Script, but for some reason they weren’t configured correctly.

March 6, 2015
by eric

Asynchronously remove datastores via vCO! (Updated)

Anyone with more than 3 hosts absolutely dreads removing data volumes from the VMware environment.  It is a mind-blowing tedious and redundant process that VMware has yet to fully address.  First you must unmount the volume(s) from all the hosts.  This part, thankfully, is easy, it just requires you to select the proper datastore, right click, and select ‘Unmount’.  A nice little wizard comes up and runs the appropriate checks to make sure the datastore can indeed be unmounted.  Just hit next and select the hosts you wish to unmount from and VMware kicks off the unmount procedure for that datastore on the selected hosts.

Well if you thought you were done and ready to unpresent that datastore, you are mistaken.  vSphere still sees that LUN and if you simply unpresent it from the hosts, they will really not like you one bit until you reboot them.  You must go to each host’s configuration page for storage adapters, find the correct LUN, right click and detach.  Here is one of VMware’s KB articles for those that need more information on the process.

Imagine the time it takes to go through 10 hosts, or how about 50 hosts without automation?

So…let’s fix that and automate the entire process via vCenter Orchestrator!  Here is a quick run-down of what the workflow does.  First thing you need to do when running the workflow is select the cluster the datastore is presented to.


After selecting the proper cluster and hitting next, you are presented with a dialog to select your datastore or datastores you wish to unmounts and detach from the hosts in the selected cluster.


After selecting the datastores, just hit “submit” and away it goes.  So what does it do?  Here is what the schema looks like for the workflow.


The workflow starts off by getting all the hosts of the cluster you select.  It then grabs the needed information from the datastore(s) and stores it in a couple of arrays to be used later.  Take a quick look at the actual scripting behind this.


It grabs the UUIDs needed for the unmount procedure and the Canonical NAA name for the detach sequence.  Who knows why VMware doesn’t allow these procedures to be done by just using just one of these variables, or at the very least fully documents the process, but this works…for now.

*Note:  you might need to adjust the SLICE number in your environment to grab the correct UUID.  14 is what works in my environment.

So after the workflow has the necessary info, it can proceed to the unmount loop.  We set the host to work with within the host array to the counter, then we kick off the unmount procedure that loops through each datastore in the datastore array that you selected and unmounts it on that host.  Here is the scripting code for that workflow.


After it has looped through all the hosts, kicked off the unmounts and they finish, the workflow exits the unmount loop, resets the counter, and then drops into the detach loop.  The detach loop has the same setup as the unmount loop, except it launches the detach workflow for each host instead of the unmount workflow.  Take a look at its scripting code.


Once the detach loop is complete and all detach operations have finished, the workflow exits the detach loop, kicks off a rescan for datastores on the hosts in the cluster to clean up the LUN paths, and then exits.

That is pretty much it, all this is done asynchronously on the hosts to save even more time.  Let me know what you think or if you have any questions.  Have fun tailoring this workflow for your needs!

You can find this workflow package on either Github or Flowgrab.



April 21, 2015 Update:  Updated workflow to 2.1.0 based on Jason’s feedback.  There is now a sleep timer of 15 seconds and an initial counter reset before the unmount.  The updated workflows were pushed to the links above.

March 1, 2015
by eric

Howdy all!

Thanks for the intro Zach.  I am both nervous and excited to start blogging.  I feel that it is time for me to make my appearance on the world-wide web in a more productive manner.  I have quite a few lofty goals this year, both personally and professionally, that could provide good writing opportunities as well as some comedic gold I am sure.  I tend to be light-hearted, but also don’t beat around the bush.  I am not afraid to call people, products, or companies out when they do questionable or flat-out dumb things.  So with that all said, let’s do this.  Head to the about page to read about me professionally and I will soon have a new post up that I hope you like.

February 27, 2015
by zach

.Net 3.5 Feature Install Fails on Windows 2012

Recently, I ran into an issue where the .Net 3.5 Feature install fails on Windows 2012. Many search engine searches, blog posts, and message board posts later, I found a solution.

As you may know Windows Server 2012 comes with the .Net Framework 4.5 feature preinstalled. It does not have the .Net Framework 3.5 feature installed. Normally, it is an easy process to add the feature – Server Manager->Add Feature->Check the .Net Framework 3.5 feature->Install. But what if the server you are attempting to install .Net 3.5 onto is not allowed to connect to the Internet? If it is a VM, quickly attach a Windows Server 2012 .iso and specify an alternate source path and point it to “[DVD Drive Letter]:\sources\sxs” and it installs, right?

Well not every time. Most of the servers I have come across, usually fresh builds, attaching the ISO and specifying it as a the alternate source path does the trick. But I have found a couple servers that will fail indicating that the correct files are not in the attached .ISO, even though they are. The specific error I received was 0x800F081F. .Net 3.5 Feature install fails on Windows 2012











I finally found the correct KB article that outlined the correct issue with a resolution. I found numerous other reasons why .Net 3.5 wouldn’t install but this was the cause. I also found that if any language packs were installed prior to trying to install the feature, it would also fail. Any language pack installed, needs to be uninstalled and then the feature enabled, then the language pack(s) can be reinstalled.

The KB article points out that if either KB2966827 or KB2966828 were installed on the system, the .Net Framework 3.5 feature installation would fail, regardless of where the source of the files were. I downloaded the fix and installed it on the server (no reboot required!), and the feature was enabled without issue.

February 27, 2015
by zach

vCO 5.5 Appliance Access Permissions

I hadn’t worked too much with the “Copy file from vCO to guest” workflow until the past six months. I quickly ran into issues with the default vCO 5.5 appliance access permissions settings. When I first tried to use it, I created a folder named “vcofiles” in the /opt/ directory on the vCO appliance based on a guide I was following. The had copied the file I wanted to transfer to multiple guests up to the /opt/vcofiles/ directory on the vCO server and gave root 777 rights to the vcofiles directory and the individual files. I kicked off the workflow and received the following error:

vCO No Permissions!

So I went back and checked to ensure I gave it full 777 access. I had. I then researched a bit more and found that the js-io-right.conf file needs to be edited to allow vCO rights to the new directory I created. Nick Colyer had a good post on what needed to be done over here and there was also a VMware KB article about it. If you check out the KB, you will notice that this applies for version 4.2.x and 5.1.x, but not 5.5.x. Of course, I was using the 5.5 appliance. The meat and potatoes of both articles still hold true. The only difference I have found is the new location of the js-io-rights.conf file in the 5.5.x appliance.

5.1.x and older location: /opt/vmo/app-server/server/conf/
5.5.x+ location: /etc/vco/app-server/

I added read, write, and execute permissions (+rwx) permissions to my new directory. After I finished, here’s what my settings looked like:

vCO 5.5 Appliance Access Permissions





As you can guess, this is done to ensure that the application that the users are accessing from the vCO client can only access directories that are specifically defined by the vCO admin.

February 24, 2015
by zach

Welcome Eric!

A past co-worker of mine, Eric TeKrony, wanted to not only jump into vCO more after I left but he also wanted to contribute back to the community. So far we have combined forces on the Get-VM GitHub organization and have uploaded a few of our vCO workflows and actions. Along with uploading resources to GitHub, he may be on here from time to time releasing resources or just documenting an issue he found and resolved.

Along with vCO, he has extensive knowledge in many other realms of IT. I’m excited to see what he can bring to the community through this blog and other avenues.

So welcome Eric!

January 17, 2015
by zach

vCO Workflow – Update PernixData Host Extensions

PernixDataBefore I get into this workflow, if you have not tried PernixData’s FVP in your environment, it is a must. All you need is a couple SSD drives and download a free trial from their site and you can begin seeing the advantages quickly. It not only speeds up your VMs but gives your array a break! Now to the goods.

PernixData is installed inside the vSphere host as a host extension. Unfortunately, the upgrade process is not as streamlined as the rest of their product’s experience. It requires us to upload the upgrade zip file to each host, put each host into maintenance mode, and run a few commands through the shell of each host. Definitely a repetitive and lengthy process if you have numerous hosts.

I took a look at the official documentation, available within the support portal on PernixData’s website, and determined I could quickly put together a vCenter Orchestrator, soon to be vRealize Orchestrator. Not only could I automate the upgrade process on a single host, but I could do it an a cluster level. I figured this is appropriate as PernixData should be deployed at a cluster level to take full advantage of the technology without limiting the agility of the VMs within the host. Below I will walk you through the process.


Make sure you have read the prerequisites before kicking off this upgrade process. No VM can be accelerated by FVP during the upgrade process so they need to be put into Write-Through mode. On the opposite end of this process, don’t put the VMs back into Write-Back mode until all hosts are upgraded and confirmed in working order.

Once you are ready to commit to the upgrade, my workflow requires you to upload the upgrade zip file to the “/opt/vcofiles/” directory on the vCO appliance. If this directory does not exist, please create it or modify the workflow to look elsewhere. If you are not using the vCO appliance, I recommend it over the Windows server install, especially if you are using the vCO service that is installed with the Windows installed version of vCenter. You could modify the script to look at a different location, like a Windows directory, if you choose to make it work that way. The workflow will pull the zip file from the specified directory and scp it to each host as it upgrades each host in serial. Now you are ready to run the workflow.

You will be prompted to select your vCO appliance and a cluster of hosts to upgrade.

Select Environment Variables

Then you have the option to upgrade all hosts in the cluster or a selection of hosts. You may want to select a single host to test out the process if you’d like or even in the event that a host has an issue with an upgrade, you can then select the remaining hosts in a future pass.

Select Hosts
Next you will enter the filename of the upgrade zip. Be sure to include the .zip file extension.

PernixData Upgrade File

On the next screen, enter the credentials for the host that you would enter if you would be upgrading FVP manually.

Host Credentials

On the last page, enter the credentials of the vCO appliance. Then kick off the workflow.

vCO Credentials

Heavy Lifting

The workflow will gather all of the hosts you have approved for the upgrade and put them in an array. It will then select the first host, put it into maintenance mode, turn on SSH, upload the zip file to a temporary directory on the host, then send the following PernixData supplied command to uninstall the current host extension:

cp /opt/pernixdata/bin/ /tmp/ && /tmp/

Once complete, it will then run the following install command:

esxcli software vib install -d /tmp/<upgrade filename>.zip

The workflow will clean up after itself and remove the upgrade zip file and file from the /tmp/ directory. The host will then be taken out of maintenance mode and SSH turned back off.

Full Schema

Below, I have included a picture of the full schemas from vCO. This shows the schema for the cycle of host upgrades.

Cluster Array

The following schema shows where the real work goes on.

Work Schema

As you can see there is some error handling. I discovered a couple returns that vCO believed to be “erroneous” but after I checked and confirmed with PernixData support, they were false positives.

Even though this workflow has worked in my environment, it does NOT mean it will work in yours. Make sure you read PernixData’s official documentation and know the process as well as comb over the workflow itself to ensure it won’t cause issues within your environment. Use at your own risk and remember, I am not responsible if this workflow causes issues within your environment.

I have uploaded the orchestrator package to my Github page. If I make any changes to the workflow, both pages will be upgraded with the latest version. Automate all the things!!

September 8, 2014
by zach

HA Agent Alerts and Issues

This past week, I have run into two different HA Agent alerts and issues that have thrown up alerts or caused me some administrative headaches. As a reference point, we are running vSphere/vCenter 5.1 but I feel these issues affect a broader range of products based on the KB articles I have come across.

Issue 1: Within one of our clusters, a VM was rebooted by HA because of a backup issue. I’m thankful that HA saw the issue and rebooted the VM. So quickly that our monitoring solution didn’t even notice downtime. That’s great! The alert was thrown at the cluster level for obvious reasons and put a yellow alert banner on the Summary tab of the cluster, not in the Alarms tab. The yellow banner indicated that “HA initiated a failover in <cluster> on <datacenter>”. I don’t see an alarm in the Definitions that is specific to this alert as it was for a single VM. I guess that is why it wasn’t displayed in the Alarms tab. Now how do I acknowledge and clear the alert? I discovered a KB article (2004802) and it describes my issue exactly. The cause is written as:

This issue occurs when a HA failover event occurs in the cluster, which triggers the warning message. This locks the warning message and prevents it from being removed.

I don’t like the last sentence of that cause. Why lock it? Let me acknowledge and clear the warning. As described in the resolution, I disabled HA on the cluster and re-enabled it. The alert was gone as expected.

It looks like this affects vCenter 4.0-5.5. I assume this is not seen often as clearing an alert in this manner is downright inefficient.

 Issue 2: During a troubleshooting session with VMware support, I was asked to reboot a host. No big deal. After our troubleshooting completed, I noticed that DRS was not migrating the VMs back to the rebooted host. I attempted to manually vMotion a VM to the host in question but the wizard indicated that the HA Agent on the host was “Unreachable.” I did a quick search and found the following KB article (2011192). The symptom description was word for word what I was seeing from the host.  Some relevant notes:

1. The host was accessible by vCenter
2. This host was the only host showing these symptoms.
3. All hosts and vCenter reside in the same VLAN.

I attempted the following to resolve the issue with no luck:

1. “Reconfigure for vSphere HA” on the host.
2. Restarted management agents on the host.
3. Rebooted the host again.

In the KB article, in mentions to restart the vCenter service. I felt this was overkill as the issue was isolated to a single host so I did not perform that troubleshooting step. Much of the resolution steps in the KB article talk about the host as Not Responding, but this was not the case.

In the end, I disabled HA at the cluster level and then re-enabled it. After that, all of the HA Agents on each of the hosts in that cluster reported back correctly.

**When in doubt, just disable HA and re-enable it across the cluster. In the vCenter HA world, it is the equivalent to rebooting a computer to clear any weird issues.**

September 8, 2014
by zach

EVO:RAIL – My thoughts



At VMworld last month, VMware revealed Project MARVIN as EVO:RAIL. This is VMware’s entry into the hyper-converged space. Companies like Nutanix and Simplivity have made waves with their product offerings making it easier for companies, small and large, to deploy a virtual infrastructure. Whether or not companies have bought into this way of deploying infrastructure, most have looked into it.

EVO:RAIL is a new way of deploying hyper-convergence that is not directly sold by VMware but rather the partnered vendors that have manufactured the physical appliance. “One throat to choke” is the name of the game here. Every bit of this appliance will be supported by calling a single number.

ROBO – With a few configuration parameters entered by an admin, the appliance sets itself up quickly and provides an easy to use interface for even a novice admin. I believe this is a perfect product for ROBO (Remote Office/Branch Office). My experience with determining specs, deploying, and training on-site ROBO staff, the RAIL would have been a great product to implement. Many of these smaller ROBO’s staff that I have worked with were just learning about virtualization. Changing their mindset of what is possible and then teaching them how to use the new technology in a short amount of time on-site can be challenging. Based on the videos I have seen (link) showing the implementation of a single and multiple EVO:RAIL appliances, going on-site to train the staff could be optional.

Enterprise? – Obviously I feel good about the EVO:RAIL being a ROBO solution but I am definitely not sold on it being deployed in an enterprise datacenter. One of the big reasons I feel this way is the integration with current deployments. I’ve seen some discussion about the possibility of integrating it into an already deployed VSAN environment. I saw that it is “technically possible” but I gathered from the hesitant responses that it should not be done. Therefore, an enterprise could use the EVO:RAIL for a specific use case like VDI or even an easy way to segregate a workload for a division/group within the organization. There are limitations on how it can be deployed but remember, this is a hyper-converged appliance and is not meant to be integrated in with our traditional infrastructure.


EVO:RAIL’s codename MARVIN logo

UI – RAIL has its own UI to administer the environment instead of using the normal vSphere/vCenter clients, and by doing so VMware has reduced the complexity of the environment dramatically. The UI runs purely on HTML5 which is a big improvement over the vSphere Web Client that well love to hate. I assume the vCenter 6.1/6.5? version of the web client that will be forced down our throats will run on HTML5. Maybe we won’t mind that web client! VMware should definitely be taking the UI team from EVO:RAIL and reassign them to the vCenter Web Client to perform a 100% rewrite.

I’d love to get a shot at playing around with one of these appliances and working with others to deploy it for a specific use case. In the long-term, it will be interesting to see how not only VMware (and the EVO partners) but Nutanix and Simplivity will address upgrading to newer appliances as the hardware bought today will be aged in a few years.