Snapshots: A Love-Hate Relationship For VM snapshot


Snapshots: A Love-Hate Relationship for VM snapshot

There is hardly a more popular function in virtualization than snapshots, as the administrator has new options for handling patches, configuration modifications, and all kinds of updates. If something goes wrong, you can simply use the VM snapshot to return to the previous status, making the problem a thing of the past and enabling you to try again.  Nearly all backup products that specialize in the usage of VMware support use snapshots automatically.

However, snapshots have their drawbacks, as they use capacities on the datastore in addition to the basic hard drive files. A Delta-file can grow to be as large as the data file, meaning a 40 GB basic hard drive file can create Delta hard drives amounting to another 40 GB per snapshot. Depending on the free drive space, this may quickly lead to space shortages on the datastore if snapshots are kept for a long period of time or very extensive modifications are conducted on the data (e.g. database upgrades) during the existence of snapshots. Therefore, you should consistently monitor snapshots for size, age, and growth rate.

Issues also arise if the snapshots are no longer recognized by VMware and thus are no longer visible in the snapshot manager. In this case it is very difficult to figure out which VMs are based on snapshots and how they grow by means of manual tracing.

First however, let us talk about the basic topics, what exactly a VM snapshot is and how it grows, in order to be prepared for error elimination later on.

VMware or VM Snapshots

When you create a snapshot on a virtual machine, a certain point in time is recorded, meaning as of the time of the snapshot, the original files of the VM remain as they are, and all modifications are recorded in new files. This may take place during ongoing operation of the virtual machines. It is also possible to merge the current data with the data of the snapshot, which also does not cause downtime. Only if you want to delete the data accumulated since the snapshot, the virtual machine is stopped, reset to the old status, and activated again.

Theoretically you can create a large number number of VM snapshots, however, this is hardly beneficial due to the lacking transparency in snapshot administration and the elaborate management.

As soon as a snapshot is created, the newly produced Delta files grow dynamically with the activity in the guest, and every modification on the hard drive thus leads to an increase of the Delta-hard drive file. This relates to every modification, from copying a file via secure formatting of the hard drive with zeros all the way to deleting files. The drive space need is never reduced. However, a Delta file can never become larger than its original file, as all memory blocks were copied 1:1. If the same block is overwritten a hundred times, this does not change the size of the Delta file. As soon as a new block is written, the Delta file grows along with it in steps of at least 15 MB.

Thus it is important to understand that after creating the snapshot, the additional memory requirement may be doubled at most, but this is applicable to every snapshot; meaning if the Delta file is 5 GB in size after the first snapshot and a second snapshot is created, the Delta files add up on the datastore. This is why you need to watch the number of snapshots as well as their size.

By the way, snapshots are almost always used by backup products to secure active virtual machines from the outside (not via agent in the guest, but VCB). This is due to the fact that the hard drive files of a VM have exclusive read/write access through the VMkernel until a VM snapshot is created. At this point in time, the original hard drive files are readable, and the last Delta file has exclusive write/read access through the VMkernel. The table illustrates the technical process of a snapshot.

Action/Files of the VM VMDK-size NTFS-size Free capacity NTFS
Creation of the VM with Thick-drive
Vm1.vmdk (c:) 10,2 GB 10 GB 5 GB
Capacity usage in the VMFS through VM 10,2 GB
Copying of a DVD in guest (1GB)
Vm1.vmdk (c:) 10,2 GB 10 GB 4 GB
Creation of Snapshot 1
Vm1.vmdk (c:) 10,2 GB 10 GB 4 GB
Vm1-000001.vmdk > 1 MB 10 GB 4GB
Copying of a file in guest (500 MB)
Vm1.vmdk (c:) 10,2 GB 10 GB 4 GB
Vm1-000001.vmdk ~500 MB 10 GB 3,5 GB
Creation of Snapshot 2
Vm1.vmdk (c:) 10,2 GB 10 GB 4 GB
Vm1-000001.vmdk ~500 MB 10 GB 3,5 GB
Vm1-000002.vmdk > 1 MB 10 GB 3,5 GB
Copying of a DVD in guest (2 GB)
Vm1.vmdk (c:) 10,2 GB 10 GB 4 GB
Vm1-000001.vmdk ~500 MB 10 GB 3,5 GB
Vm1-000002.vmdk ~2 GB 10 GB 1,5 GB
Capacity usage in the VMFS through VM 12,5 GB
Removal of the two snapshots
Vm1.vmdk (c:) 10,2 GB 10 GB 1,5 GB
Capacity usage in the VMFS through VM 10,2 GB

Table 1: Snapshot – Development in an overview

As you can see in the table, Delta files can be easily recognized by their numbering ‑######.vmdk and increase in size along with the data in the guest. The drive occupations in the guest file system are preserved with the creation of the snapshot, and are managed along with the Delta files. As soon as the snapshots are removed, all modifications are recorded on the original hard drive files. The Delta files are deleted and no longer occupy additional drive space. Every increase of the Delta files and the creation and removal of the snapshots leads to SCSI-reservations in the FC-environment, meaning that excessive usage of snapshots also quickly leads to performance shortages.

Snapshots are not backups!

Snapshots are intended for securing from the outside using software or scripts, but do not serve as a replacement for backup solutions. If at all, snapshots should be used for a short time in case of adaptations in the guest (updating of the guest operating system or the application) or as mentioned through the backup software, which deletes the snapshots again as soon as the backup is finished.

As already stated in the explanation, the snapshots build up on each other per Copy-on-Write-procedure. Thus, you may never interrupt the snapshot-chain, for example by manually removing snapshot files. In worst case, this may lead to massive data loss.

We cover these issues in several rules of opvizor! Try our solution for automatic detection of Snapshot problems you´re not aware of.

Removal of Snapshots

The removal of snapshots, which technically corresponds to rewriting all modifications since the creation of the snapshots on the original-VMDK files respective Raw Device Mapping, was already significantly adapted with vSphere 4 Update 2.

The modification refers to the selection Delete All in the Snapshot Manager, in order to remove all snapshots and rewrite all modifications on the original drive.

Process all versions up to vSphere 4 Update 2

By selecting Delete All in the Snapshot Manager, Snapshot 4 is first rewritten back into Snapshot 3, followed by Snapshot 3 into Snapshot 2, Snapshot 2 in Snapshot 1 and Snapshot 1 ultimately into the original hard drive, in order to then delete all snapshots. During the time of rewriting, additional hard drive memory is required. Shortly before the final rewrite, the VM1 could use up to 36 GB with its hard drives ( 5+ 6+ 7+ 8 +10 GB).

New process as of Version vSphere 4 Update 2

By selecting Delete All in the Snapshot Manager, Snapshot 1 is first rewritten into the original hard drive, followed by Snapshot 2, Snapshot 3 and lastly Snapshot 4 and then all snapshots are deleted. No additional hard drive memory is used, meaning it stays at 28 GB.

This update has a decisive advantage, as the hard drive usage does not increase with each deletion of a snapshot. Especially in case of a full datastore due to growing snapshots the value of this modification is priceless.

Problem Situations

Therefore, due to the described functionality, there are different problems with snapshots I want to discuss in more detail. However, this is not applicable to all types of virtual machines, as long-term snapshots may be desired by all means. The following problems may occur on typical server-VMs.

Snapshot Existence:

The existence of snapshots is not a problem of course, however it may become one quickly if their existence is not monitored.

Snapshot Increase

The increase of snapshots should be specifically monitored, as this may quickly lead to a full datastore due to the additionally required data amount on the datastore. If the VMkernel can no longer record data on the datastore, the virtual machine will shut down quickly, as the I/O comes to a full stop. Using thin provisioning hard drive files can be especially critical, as these are recorded as they grow on the datastore, like the snapshots. A full datastore thus leads to a failure of the VMs with active snapshots and to a failure of VMs with TP-hard drive files.

Beyond that, deactivated VMs can no longer start on this datastore.

We cover this issues in opvizor: Issue Nr. 33 Snapshot Size

Snapshot Age

The older snapshots become, the more serious a data loss of this snapshot would be, therefore you should remove snapshots again if possible after a few days and completed work (e.g. patching a system). Corrupt snapshot files may occur due to different situations (usually technical problems like storage failures). The older a snapshot is, the more data is lost.

Remember: A snapshot is not a backup and for sure not an archive!

We cover this issues in opvizor: Issue Nr. 8 Snapshot Age

Defective Snapshots

Last but not least it is possible that snapshot description files become corrupt or are deleted without the actual snapshot being removed. In this case you will not recognize any more snapshots with normal VMware queries (SDK, PowerCLI, vCLI, vCenter, vSphere Client) and the Snapshot Manager of the vSphere Client will not indicate any available snapshots. The existence of the snapshot can only be recognized through the virtual machine by means of the used hard drive file. In this case, the VMkernel continues to use the snapshot, which however is not visible for the management interfaces.

We cover this issues in opvizor: 

Recognition

Of course, there are several ways to recognize snapshots, from the vSphere Client with Snapshot Manager over community scripts like the vCheck, all the way to commercial tools. However, only one tool on the market recognizes all above listed problems including the defective snapshots: opvizor. This Software-as-a-Service product recognizes hundreds of problems of VMware infrastructures, as well as all disadvantageous types of snapshots, whether they are too old, growing too fast or defective – the administrator or consultant can recognize the involved VMs directly and is able to tell where the snapshots can be found. Beyond that, you receive information about the condition of the snapshots and how to handle them.

Illustration: opvizor display of all discovered potential problems

opvizor recognizes the snapshots as a problem (issue) on the one hand, and can also create a complete list of all snapshots in a report. This enables the consultant or administrator to recognize the problems within the shortest time and to eliminate their cause.

opvizor VM snapshot_report

Illustration: opvizor Snapshot Report

Snapshots, Eliminating Problems

In most cases it is possible to simply delete the snapshot in the Snapshot Manager. If no snapshot is visible, but the snapshot still exists, there are several ways to solve the problem.

  1. New snapshot – definitively the easiest trick, as you simply create a new snapshot description file when creating a new snapshot. Then, remove this new snapshot with “Delete All.”   With a little luck, VMware recognises all existing snapshots and deletes them.
    VM snapshot manager
    Illustration: VMware vSphere Client – Snapshot Manager to remove the snapshots
  1. vmkfstools – this option is significantly more complex and requires SSH access on the ESX Host or access via vCLI. First, you need to figure out the hard drive name, which you can find in the characteristics of the virtual machines, e.g. vm1-000001.vmdk. Subsequently, you clone this hard drive file: vmkfstools –i vm1-000001.vmdk vm1-neu.vmdk. VMware automatically clones the complete current hard drive file, which means that it does not contain snapshots. Important: You need free drive space on the datastore, as it is copied entirely at full size. Then you switch off the virtual machine and replace the old hard drive file with the new hard drive file. Renaming the new hard drive file/s and deleting the old hard drive file/s can be graphically adapted via the Datastore Browser of the vSphere Client. Prior to deleting the old data you should start the VM and check whether everything is up to date (e.g. in the  Windows Eventmanager). Of course, this procedure must be repeated for every hard drive file of the VM.
  1. Snapshot Consolidator – VMware reacted with vSphere 5 and enables the recognition and deletion of snapshots that are no longer visible in the snapshot manager by using the integrated Snapshot Consolidator. These invisible snapshots were difficult for some administrators in the past, as the datastores filled up without the ability to tell why. Tools like the icomasoft opvizor were able to locate these snapshots earlier, by monitoring the utilized hard drive of the VM, rather than the snapshot manager.

VM snapshot consolidation

Illustration: Snapshot Consolidator

Conclusion

Snapshots are a truly useful function in virtualization and enable actions and procedures that would be impossible in a purely physical world. However, you need to be familiar with the idea of snapshots and should never regard snapshots as a backup or archive, but rather as a short term freeze of a functioning system status. Thus, snapshots are necessary for different migrations or backups, but only as “helpers.”

opvizor Snapwatcher - Say Goodbye to Broken Snapshots

opvizor provides a solution called Snapwatcher that can be downloaded on the opvizor website here at http://try.opvizor.com/snapwatcher/

Specifically in the case of infrastructures with dozens, hundreds or thousands of virtual machines, consistent review of the snapshots is extremely important and absolutely necessary. As this would be far too elaborate to do manually, there are tools like opvizor, which provide valuable services in cases like these. If you are unsure regarding complex snapshot deletion processes, contact a consultant of your choice before having to deal with data loss.

opvizor owns ten different rules for detecting different VM snapshot-related errors and problems.

Sign Up for opvizor


Log in

Forgot your password?
Don’t have an account? Register.