Jan 282014
 

OK, so maybe they do not lie to you, but I got your attention. In virtualized environments, the performance statistics that you see from the management interface may not give you the entire picture, and can lead to bad assumptions or misleading recommendations.

Remember, the virtualization layer does not understand the nuances of the applications that you are running inside the virtual machine, so it might be giving you performance statistics that are not representative of that application. 

For example, SQL Server is very memory hungry, as it uses memory to cache data reads from disk. If you give a SQL Server virtual machine 32GB of vRAM, and allocate 26GB to the SQL Server buffer pool, chances are that all 26GB is going to get consumed as your workload reads data. However, the physical host server where your VM is running might not understand this usage pattern, and might misinterpret things. 

Take this scenario. I recently had to load a 20GB dataset into a SQL Server for performance statistic analysis. The data is read from disk and inserted into a new SQL Server table. At the end of the load, and because nothing else was running on that server at the time of the load, almost all of the inserted data was also now in the buffer pool. But, look at what the VMware performance counters for memory for this virtual machine reported.

vmware stats active memory during bulk data load

You’ll notice that the active memory counter grows as the data is loaded. Once the data is loaded, however, the data resides in the buffer pool, but VMware is reporting that the memory is no longer active as the timer on the block deltas grows. During this time, I am actively performing the analysis of the data, so the data is still being read from, but just does not change. This activity is not reflected in the counter at all.

It’s not just memory either. LogicMonitor has a great writeup of the differences between Windows Perfmon sampling inside a VM versus the actual CPU usage, as reported by the physical server. 

I have heard virtualization administrators tell DBAs that because these counters show that most of the memory is stale and not active, it’s OK for them to reduce the amount of memory allocated to that virtual machine.  This counter is pretty misleading, and I see quite a bit of misinterpretation by virtualization administrators on this topic. Now, they should not be expected to know the specifics on the workloads executing in their environments, but understanding that this counter does not show you the entire picture is going to save everyone a lot of time during these types of discussions.

As a result, this disparity is just one of the reasons why I constantly push the process of actively collecting ongoing performance statistics from both the physical host and from inside the guest, as well as from within the application that you have running inside the VM. Correlate everything together to see the full picture of what is going on, because any single metric probably does not give you the complete picture.

UPDATE:  (t) from VMware has previously written a great blog post on vmware.com discussing this exact topic, and he goes into greater detail on the counter. You should go check it out here!

  6 Responses to “Virtualization resource consumption counters lie to you”

  1. Thanks David for all the great work you put in evangelizing databases and especially, SQL, on vSphere. Great point again that it’s important to understand what the counters are trying to tell you. I find ‘active memory’ is a very commonly misunderstood and misused counter. More details here:

    https://blogs.vmware.com/vsphere/2013/10/understanding-vsphere-active-memory.html

    Thanks for supporting my mission of reducing the number of bad sizing decisions based on it alone.

    @vmMarkA

  2. You’re quite welcome! I have a blast doing this! Active memory is pretty amazing, and your blog post over at vmware.com is a very good read. We are not alone in our quest to advocate for DBA and business-critical application rights!

  3. Do you really mean to say….
    “Remember, the virtualization layer does understand the nuances of the applications that you are running inside the virtual machine, so it might be giving you performance statistics that are not representative of that application. ”
    or should this read “Remember, the virtualization layer does NOT understand the nuances….”

  4. You’re right! I made a typo. I mean that it does NOT understand the nuances. Thank you for catching that!

  5. No problem. Your article caught my eye because I am dealing with a VM Admin who is constantly telling us our machines are ‘over provisioned’ with either RAM, CPU etc.
    Our VM admin pulls out vSphere metrics (like Active Memory) to support his claim that the servers are ‘doing nothing’ and don’t need the RAM they have been provisioned.
    I am constantly having to ‘fight back’.
    I do understand how things have ‘tipped’ the other way given the new found ease of ‘standing up ‘ new servers with multi-cores and GB’s of memory and how this can lead to an ‘over consumption’ of scarce Datacenter resources, but as you suggest the baseline performance of any server needs to be monitored constantly and at all levels in relation to the function it is performing. But more importantly, the findings need to be accurately interpreted. It’s all about ‘balance’.
    Here is an article speaks to the under utilization of issue of virtual machines…
    http://gigaom.com/2013/11/30/the-sorry-state-of-server-utilization-and-the-impending-post-hypervisor-era/

  6. […] (lack of) validity on the actual memory usage properties of the SQL Server engine. A while back, I blogged about this but I want to revisit it […]