Virtualization admins have become the last line of defense when something is not working right. Networking teams will claim it is the over use of hardware, developers will claim it is the hardware, storage guys claim that they just provide disk. So how can a virtualization admin clearly show the relationships between different systems, different hosts, and the needs for CPU, Memory, network and storage. The answers to all the questions already sit in the logs and data collected by vCenter, but without an easy way to interpret the data it is pointless. In comes vCenter Operations.
vCenter Operations is a natural way for virtualization administrators enter into a full scope virtualization monitoring platform. As an integrator, we have found that each time we place a new or upgraded vSphere environment, we have added Operation Standard to the kit to allow the admins on site to get a great overview of their environment as shown below.
You will notice that you can see vCenter systems, datacenters, clusters, and even all the way down to an individual VM. The next questions is how did this help us with a Tier one app.
I was recently at a client site that was having some significant speed issues in their Sharepoint environment. As a multi-tier application Sharepoint needs both a web front end and a database backend. We got word that Sharepoint was running slowly. Luckily we had just installed vCenter Operations and when we checked the console we noticed that 2 of the ESX hosts were in the Red category. After double clicking on the ESX host we saw that the network usage was the issue, with 100% of network being used similar to the graphic below that shows a systems at 100% of CPU being used.
The great part was looking further down the page on the ESX host we found the Child Objects. With a single child object being Red as well I was able to drill down into the exact machine.
Once we drilled down to the machine we found that the database server from Sharepoint was using 100% of my network capacity. Then we looked at the other ESX host that showed Red, we found out that it also only had a single machine with a Red status. It turned out that the second guest with a Red status was the Web front end for Sharepoint. We then backed out of the vCenter Operations console (which conveniently is integrated into the Virtual Infrastructure Client) and migrated the web server to the same ESX host as the SQL server. We gave vCenter Operations a couple minutes to make sure that the data was up to date and we were back to a healthy Green environment across the board.
Now that we know the network load could be impacting the rest of my environment when the two servers are split, we simply set an Affinity Rule on DRS. This forces the two servers to always stay on the same ESX host. Searches and document retrieval speeds from Sharepoint decreased almost immediately. Needless to say convincing the powers that be that instead of hours upon hours of troubleshooting a simple add-on product is sometimes worth its weight in gold. The next solutions might have been to move Sharepoint back to a physical environment, meaning the cost for new hardware that would have been a minimum of twice the cost of a simple monitoring and correlation product.