Get-VM

Bits and Bytes of Virtualization

January 17, 2015
by zach
3 Comments

vCO Workflow – Update PernixData Host Extensions

PernixDataBefore I get into this workflow, if you have not tried PernixData’s FVP in your environment, it is a must. All you need is a couple SSD drives and download a free trial from their site and you can begin seeing the advantages quickly. It not only speeds up your VMs but gives your array a break! Now to the goods.

PernixData is installed inside the vSphere host as a host extension. Unfortunately, the upgrade process is not as streamlined as the rest of their product’s experience. It requires us to upload the upgrade zip file to each host, put each host into maintenance mode, and run a few commands through the shell of each host. Definitely a repetitive and lengthy process if you have numerous hosts.

I took a look at the official documentation, available within the support portal on PernixData’s website, and determined I could quickly put together a vCenter Orchestrator, soon to be vRealize Orchestrator. Not only could I automate the upgrade process on a single host, but I could do it an a cluster level. I figured this is appropriate as PernixData should be deployed at a cluster level to take full advantage of the technology without limiting the agility of the VMs within the host. Below I will walk you through the process.

RTFM!

Make sure you have read the prerequisites before kicking off this upgrade process. No VM can be accelerated by FVP during the upgrade process so they need to be put into Write-Through mode. On the opposite end of this process, don’t put the VMs back into Write-Back mode until all hosts are upgraded and confirmed in working order.

Once you are ready to commit to the upgrade, my workflow requires you to upload the upgrade zip file to the “/opt/vcofiles/” directory on the vCO appliance. If this directory does not exist, please create it or modify the workflow to look elsewhere. If you are not using the vCO appliance, I recommend it over the Windows server install, especially if you are using the vCO service that is installed with the Windows installed version of vCenter. You could modify the script to look at a different location, like a Windows directory, if you choose to make it work that way. The workflow will pull the zip file from the specified directory and scp it to each host as it upgrades each host in serial. Now you are ready to run the workflow.

You will be prompted to select your vCO appliance and a cluster of hosts to upgrade.

Select Environment Variables

Then you have the option to upgrade all hosts in the cluster or a selection of hosts. You may want to select a single host to test out the process if you’d like or even in the event that a host has an issue with an upgrade, you can then select the remaining hosts in a future pass.

Select Hosts
Next you will enter the filename of the upgrade zip. Be sure to include the .zip file extension.

PernixData Upgrade File

On the next screen, enter the credentials for the host that you would enter if you would be upgrading FVP manually.

Host Credentials

On the last page, enter the credentials of the vCO appliance. Then kick off the workflow.

vCO Credentials

Heavy Lifting

The workflow will gather all of the hosts you have approved for the upgrade and put them in an array. It will then select the first host, put it into maintenance mode, turn on SSH, upload the zip file to a temporary directory on the host, then send the following PernixData supplied command to uninstall the current host extension:

cp /opt/pernixdata/bin/prnxuninstall.sh /tmp/ && /tmp/prnxuninstall.sh

Once complete, it will then run the following install command:

esxcli software vib install -d /tmp/<upgrade filename>.zip

The workflow will clean up after itself and remove the upgrade zip file and prnxuninstall.sh file from the /tmp/ directory. The host will then be taken out of maintenance mode and SSH turned back off.

Full Schema

Below, I have included a picture of the full schemas from vCO. This shows the schema for the cycle of host upgrades.

Cluster Array

The following schema shows where the real work goes on.

Work Schema

As you can see there is some error handling. I discovered a couple returns that vCO believed to be “erroneous” but after I checked and confirmed with PernixData support, they were false positives.

Even though this workflow has worked in my environment, it does NOT mean it will work in yours. Make sure you read PernixData’s official documentation and know the process as well as comb over the workflow itself to ensure it won’t cause issues within your environment. Use at your own risk and remember, I am not responsible if this workflow causes issues within your environment.

I have uploaded the orchestrator package to my Github page. If I make any changes to the workflow, both pages will be upgraded with the latest version. Automate all the things!!

September 8, 2014
by zach
0 comments

HA Agent Alerts and Issues

This past week, I have run into two different HA Agent alerts and issues that have thrown up alerts or caused me some administrative headaches. As a reference point, we are running vSphere/vCenter 5.1 but I feel these issues affect a broader range of products based on the KB articles I have come across.

Issue 1: Within one of our clusters, a VM was rebooted by HA because of a backup issue. I’m thankful that HA saw the issue and rebooted the VM. So quickly that our monitoring solution didn’t even notice downtime. That’s great! The alert was thrown at the cluster level for obvious reasons and put a yellow alert banner on the Summary tab of the cluster, not in the Alarms tab. The yellow banner indicated that “HA initiated a failover in <cluster> on <datacenter>”. I don’t see an alarm in the Definitions that is specific to this alert as it was for a single VM. I guess that is why it wasn’t displayed in the Alarms tab. Now how do I acknowledge and clear the alert? I discovered a KB article (2004802) and it describes my issue exactly. The cause is written as:

This issue occurs when a HA failover event occurs in the cluster, which triggers the warning message. This locks the warning message and prevents it from being removed.

I don’t like the last sentence of that cause. Why lock it? Let me acknowledge and clear the warning. As described in the resolution, I disabled HA on the cluster and re-enabled it. The alert was gone as expected.

It looks like this affects vCenter 4.0-5.5. I assume this is not seen often as clearing an alert in this manner is downright inefficient.

 Issue 2: During a troubleshooting session with VMware support, I was asked to reboot a host. No big deal. After our troubleshooting completed, I noticed that DRS was not migrating the VMs back to the rebooted host. I attempted to manually vMotion a VM to the host in question but the wizard indicated that the HA Agent on the host was “Unreachable.” I did a quick search and found the following KB article (2011192). The symptom description was word for word what I was seeing from the host.  Some relevant notes:

1. The host was accessible by vCenter
2. This host was the only host showing these symptoms.
3. All hosts and vCenter reside in the same VLAN.

I attempted the following to resolve the issue with no luck:

1. “Reconfigure for vSphere HA” on the host.
2. Restarted management agents on the host.
3. Rebooted the host again.

In the KB article, in mentions to restart the vCenter service. I felt this was overkill as the issue was isolated to a single host so I did not perform that troubleshooting step. Much of the resolution steps in the KB article talk about the host as Not Responding, but this was not the case.

In the end, I disabled HA at the cluster level and then re-enabled it. After that, all of the HA Agents on each of the hosts in that cluster reported back correctly.

**When in doubt, just disable HA and re-enable it across the cluster. In the vCenter HA world, it is the equivalent to rebooting a computer to clear any weird issues.**

September 8, 2014
by zach
0 comments

EVO:RAIL – My thoughts

EVO:Rail

EVO:RAIL

At VMworld last month, VMware revealed Project MARVIN as EVO:RAIL. This is VMware’s entry into the hyper-converged space. Companies like Nutanix and Simplivity have made waves with their product offerings making it easier for companies, small and large, to deploy a virtual infrastructure. Whether or not companies have bought into this way of deploying infrastructure, most have looked into it.

EVO:RAIL is a new way of deploying hyper-convergence that is not directly sold by VMware but rather the partnered vendors that have manufactured the physical appliance. “One throat to choke” is the name of the game here. Every bit of this appliance will be supported by calling a single number.

ROBO – With a few configuration parameters entered by an admin, the appliance sets itself up quickly and provides an easy to use interface for even a novice admin. I believe this is a perfect product for ROBO (Remote Office/Branch Office). My experience with determining specs, deploying, and training on-site ROBO staff, the RAIL would have been a great product to implement. Many of these smaller ROBO’s staff that I have worked with were just learning about virtualization. Changing their mindset of what is possible and then teaching them how to use the new technology in a short amount of time on-site can be challenging. Based on the videos I have seen (link) showing the implementation of a single and multiple EVO:RAIL appliances, going on-site to train the staff could be optional.

Enterprise? – Obviously I feel good about the EVO:RAIL being a ROBO solution but I am definitely not sold on it being deployed in an enterprise datacenter. One of the big reasons I feel this way is the integration with current deployments. I’ve seen some discussion about the possibility of integrating it into an already deployed VSAN environment. I saw that it is “technically possible” but I gathered from the hesitant responses that it should not be done. Therefore, an enterprise could use the EVO:RAIL for a specific use case like VDI or even an easy way to segregate a workload for a division/group within the organization. There are limitations on how it can be deployed but remember, this is a hyper-converged appliance and is not meant to be integrated in with our traditional infrastructure.

MARVIN

EVO:RAIL’s codename MARVIN logo

UI – RAIL has its own UI to administer the environment instead of using the normal vSphere/vCenter clients, and by doing so VMware has reduced the complexity of the environment dramatically. The UI runs purely on HTML5 which is a big improvement over the vSphere Web Client that well love to hate. I assume the vCenter 6.1/6.5? version of the web client that will be forced down our throats will run on HTML5. Maybe we won’t mind that web client! VMware should definitely be taking the UI team from EVO:RAIL and reassign them to the vCenter Web Client to perform a 100% rewrite.

I’d love to get a shot at playing around with one of these appliances and working with others to deploy it for a specific use case. In the long-term, it will be interesting to see how not only VMware (and the EVO partners) but Nutanix and Simplivity will address upgrading to newer appliances as the hardware bought today will be aged in a few years.

 

 

May 25, 2014
by zach
0 comments

Long time, no blog post….

It has been over a year since I last posted on here. Quite the break. Not really a break I guess. More like being crazy busy and lazy at the same time. Is that a thing? If so, that’s me. The year of 2013 was the most crazy year of my life. It began with a nice promotion with a new title, Senior Systems Engineer. Nothing really changed except for that and some additional pay. That was April.

May rolled around with plenty of prep for the wedding. Oh yea, did I mentioned I got hitched to the love of my life/best friend? We got married in June, in Kauai. Best two week vacation, ever. July rolled around with the reception back home which was a ton of fun. If you’re looking to get married in the future, I highly recommend a destination wedding. Totally worth it.Back from Hiatus!

Next up, August. My house went up for sale. Sold in 46 hours. Then we found a house we had been searching for in a awesome part of town. LOCATION! LOCATION! LOCATION! Put a bid in on that house and then quickly put up my wife’s house for sale. Sold her house in 10 days flat. After three weeks of battling for the house we really wanted, we locked it in and never looked back. Moved in during October. August-November included too much real estate talk with a sprinkle of VMworld 2013.

Let’s jump to 2014. I became more open to other companies that I could potentially leave IPG for. A few opportunities came and didn’t feel energized about them. Jelecos on the other hand had me sold from the first interview. So I went for it. I’m about 3 weeks in. I like the direction of the company and where they want to be in the coming years. I feel I can make an impact here which is a great feeling. Jelecos also has a great bunch of people so that makes it even better.

I was also awarded vExpert for a second year in a row. Thanks to Corey from VMware for heading up the program this year. Sad to see Troyer leave but we all leave at some point. The program is definitely in good hands with Corey.

As I will be working more with vCO, vCOPS, vCAC, etc I plan to be posting more on here about my experiences and even throwing up some scripts or workflows for others to check out and use in their environment. Hopefully my post isn’t a year from now. I’ll make sure it isn’t.

 

 

May 2, 2013
by zach
0 comments

Too Many Groups (Another Tale of Being Half-Baked)

After months of waiting for VMware to make Update 1 available for vSphere/vCenter 5.1, it finally arrived. We had hoped that it would provide fixes to some half-baked items that we had noticed after deploying vCenter 5.1. As of right now, I personally can’t say if those issues or annoyances we found have been fixed or not.

Unfortunately, I can’t login to the “preferred” web client that VMware wants us to adopt so bad.

Why?

According to KB article 2050941, my admin account that I login to vCenter belongs to too many groups in Active Directory. Are you kidding me? It says that there is not a definitive number of groups that is the threshold but is normally around 19. I belong to 24 while my co-worker that can login belongs to 20. Clearly our threshold is somewhere in there. My question is how long has VMware been running 5.1 U1 in their labs and somehow never noticed this issue?

There are three workarounds for this issue.

  • Log in to vCenter Server via the vSphere Client using the Use Windows session credentials option. – So now I need to use a client that doesn’t include the new 5.1 features?
  • Work with your Active Directory administrator to modify the group membership of the vCenter Server login account to a minimum. – hahaha! There’s a reason why I belong to so many groups. My day-to-day activities depend on those memberships.
  • Limit the number of domain based identity sources to no more than one. – We have users from around the world logging in that need those identity sources available. Odds are most of them can’t login either though.

Yet again, VMware has released more software/updates that seem to be half-baked and not fully tested for even the largest of their customers. This just adds more fuel to the fire that is pushing us to really consider Microsoft’s latest Hyper-V release. Twelve hosts yet to be ordered this year for a refresh of old vSphere hosts in our environment. Maybe they will be Hyper-V hosts instead.

March 24, 2013
by zach
0 comments

vExpert 2013 – Now Accepting Applications

I just found out that John Troyer has opened up the application process for vExpert 2013. This is the first year that I will be applying for it. I didn’t feel like I had deserved that recognition in previous years. Hopefully John and company feel I meet their criteria. For those of you that are unaware of the vExpert program, see below.

Description:

These are the bloggers, book authors, VMUG leaders, speakers, tool builders, community leaders and general enthusiasts. They work as IT admins and architects for VMware customers, they act as trusted advisors and implementors for VMware partners or as independent consultants, and some work for VMware itself. All of them have the passion and enthusiasm for technology and applying technology to solve problems. They have contributed to the success of us all by sharing their knowledge and expertise over their days, nights, and weekends. They are, quite frankly, the most interesting and talented group of people I’ve ever been in a room with.

There are three paths that can be taken by a vExpert:

Evangelist Path
The Evangelist Path includes book authors, bloggers, tool builders, public speakers, VMTN contributors, and other IT professionals who share their knowledge and passion with others with the leverage of a personal public platform to reach many people. Employees of VMware can also apply via the Evangelist path. A VMware employee reference is recommended if your activities weren’t all in public or were in a language other than English.

Customer Path
The Customer Path is for leaders from VMware customer organizations. They have been internal champions in their organizations, or worked with VMware to build success stories, act as customer references, given public interviews, spoken at conferences, or were VMUG leaders. A VMware employee reference is recommended if your activities weren’t all in public.

VPN (VMware Partner Network) Path
The VPN Path is for employees of our partner companies who lead with passion and by example, who are committed to continuous learning through accreditations and certifications and to making their technical knowledge and expertise available to many. This can take shape of event participation, video, IP generation, as well as public speaking engagements. A VMware employee reference is required for VPN Path candidates.

If you feel you are a vExpert, send in your application before April 15th! – http://bit.ly/vExpert2013recommend
Source: vExpert 2013 applications are now open

 

March 1, 2013
by zach
0 comments

Omaha VMUG Q1 Meeting Scheduled

After some issues with booking a new venue, the Omaha VMUG team has locked in a date and vendor to present for Q1. It will be held on April 16 (yes, that is technically Q2) at The Old Mattress Factory in their party room. We have locked in Xangati as the primary vendor to speak about their product. VMware will naturally be there with their update and also providing some vCOPs info. Drew, David and I will also be presenting I the middle with some free management and monitoring tools that could be of use to any VMware administrator.

The venue change has also brought a change in time. We will be holding the meeting from 2-5pm and ending with a happy hour session with drinks!VMUG_Logo

If you would like more information about the meeting please check out the official page here.

January 18, 2013
by zach
0 comments

PowerCLI Scripts

I have added a new page labeled “Scripts” as you can see from the menu above. It includes a few of the PowerCLI scripts that our team uses in our production environment. Some are very simple while others are more than “one-liners.” Either way, they have been useful for us and may be useful to you. As more are written, I will add them to the list. Enjoy!

December 20, 2012
by zach
0 comments

2012 – A Busy Year

2012 was definitely a busy year for not only me, but the whole team I belong to. There were many accomplishments and goals met that were planned like further automation within our virtual environment. Many accomplishments were unplanned and achieved. The biggest was the ability of our virtualization team and how they pulled together to design, develop, and implement a VDI environment at VMworld 2012, with a week of notice and no prior knowledge of VMware View.Graduate

Personally, I also completed my classes at a local college to further my degree. This took a lot out of my free time but was beneficial as I did learn a few things, especially in project management, and most importantly, I got that piece of paper.

Now that I have completed my schooling, for the time being, I will have more time to put into this blog. I have definitely neglected it. I plan to add more posts about our adventures with PowerCLI and vCenter Orchestrator. We also plan to implement a Hyper-V 2012 environment in the coming year so I may throw some things in about that. Hey, it is still virtualization! 2013 looks to be packed with projects all across the board so I should have plenty to blog about.

I’m officially on vacation until 2013, so I’ll see you then! Happy Holidays!

November 5, 2012
by zach
4 Comments

Virtual Connect Stacking Link MAC Flapping

A couple weeks ago, our Network Operations team stumbled upon numerous MACs flapping on their Cisco switches. We began investigating where these MACs were in the data center as every switch stack in every row was seeing this issue. An example of what we saw is listed below:

10/24/2012 2:27:19 PM appsw01-c8-gis-omaedc Warning 1622524: . Host 0021.5add.383d in vlan 700 is flapping between port Po9 and port Po8
10/24/2012 2:27:19 PM appsw01-c8-gis-omaedc Warning 1622523: . Host 0025.b382.2561 in vlan 707 is flapping between port Po37 and port Po36
10/24/2012 2:27:19 PM appsw01-c8-gis-omaedc Warning 1622520: . Host 0025.b382.2561 in vlan 425 is flapping between port Po37 and port Po36
10/24/2012 2:27:19 PM appsw01-c8-gis-omaedc Warning 1622522: . Host 0025.b382.2561 in vlan 703 is flapping between port Po37 and port Po36
10/24/2012 2:27:19 PM appsw01-c8-gis-omaedc Warning 1622521: . Host 0025.b382.2561 in vlan 450 is flapping between port Po37 and port Po36

After digging, we found that it was the stacking link MAC address on our Virtual Connect modules in our HP c7000 enclosures. Next, I had to determine if it was from every enclosure, 29 total, or only certain enclosures that had something in common. An email was sent to our HP Account Support Manager about the issue and if he had any prior experience. He mentioned he has seen instances relating to ESX servers, NIC drivers, or LLDP packets not handled correctly.

Through our investigation, enclosures without ESX servers were causing this issue. We doubted the NIC driver issue, since it was the stacking link. Our network team went down the LLDP route initially but it resulted in no change. We called HP support to go further and one of the engineers provided the following customer advisory. The description matched our issues as the one thing we noticed was that the enclosures with VC 3.15 (we are on our last month of VC upgrades to 3.60, just in time to start upgrades to 3.70!) were not causing the issues. The advisory indicates the Network Loop Protection setting was put into place in version 3.51 and affects later versions. The NLP frame being transmitted every five seconds was aligned with what we saw in the logs as well.

HP support could not comment on whether any pings would be lost when the setting was disabled. The description is a bit vague on that question but they, along with our Account Support Manager said it shouldn’t but they didn’t have first hand knowledge as to if it would cause any network disruption during the disable process. We scheduled a change time late at night.

Good news followed immediately. As soon as I applied the change, no pings were lost and our switch logs began clearing up. Days later, there is still no flapping seen and our switch CPU usage has dropped to normal levels. Unfortunately, it doesn’t look like this issue is in the fixes under VC 3.70.