Analyze the Deployment Planner report for VMware disaster recovery to Azure
The generated Microsoft Excel report contains the following sheets:
On-premises summary
The On-premises summary worksheet provides an overview of the profiled VMware environment.
Start Date and End Date: The start and end dates of the profiling data considered for report generation. By default, the start date is the date when profiling starts, and the end date is the date when profiling stops. This can be the StartDate
and EndDate
values if the report is generated with these parameters.
Total number of profiling days: The total number of days of profiling between the start and end dates for which the report is generated.
Number of compatible virtual machines: The total number of compatible virtual machines for which the required network bandwidth, required number of storage accounts, Azure cores, configuration servers and extra process servers are calculated.
Total number of disks across all compatible virtual machines: The number that's used as one of the inputs to decide the number of configuration servers and extra process servers to be used in the deployment.
Average number of disks per compatible virtual machine: The average number of disks calculated across all compatible virtual machines.
Average disk size (GB): The average disk size calculated across all compatible virtual machines.
Desired RPO (minutes): Either the default recovery point objective or the value passed for the DesiredRPO
parameter at the time of report generation to estimate required bandwidth.
Desired bandwidth (Mbps): The value that you have passed for the Bandwidth
parameter at the time of report generation to estimate achievable RPO.
Observed typical data churn per day (GB): The average data churn observed across all profiling days. This number is used as one of the inputs to decide the number of configuration servers and extra process servers to be used in the deployment.
Recommendations
The recommendations sheet of the VMware to Azure report has the following details as per the selected desired RPO:
Profiled data
Profiled data period: The period during which the profiling was run. By default, the tool includes all profiled data in the calculation, unless it generates the report for a specific period by using StartDate
and EndDate
options during report generation.
Server Name: The name or IP address of the VMware vCenter or ESXi host whose virtual machines’ report is generated.
Desired RPO: The recovery point objective for your deployment. By default, the required network bandwidth is calculated for RPO values of 15, 30, and 60 minutes. Based on the selection, the affected values are updated on the sheet. If you have used the DesiredRPOinMin parameter while generating the report, that value is shown in the Desired RPO result.
Profiling overview
Total Profiled Virtual Machines: The total number of virtual machines whose profiled data is available. If the VMListFile has names of any virtual machines which weren't profiled, the report generation excludes those VMs and does not count them among the total profiled virtual machine.
Compatible Virtual Machines: The number of virtual machines that can be protected to Azure by using Site Recovery. The calculation of required network bandwidth, storage accounts, Azure cores, configuration servers, and additional process servers is based on the total number of compatible virtual machines. The details of every compatible virtual machine are available in the Compatible virtual machines section.
Incompatible Virtual Machines: The number of profiled virtual machines that are incompatible for protection with Site Recovery. The reasons for incompatibility are noted in the Incompatible virtual machines section. If the VMListFile has names of any virtual machines that weren't profiled, those virtual machines are excluded from the incompatible virtual machines count. These virtual machines are listed as "Data not found" at the end of the Incompatible virtual machines section.
Desired RPO: Your desired recovery point objective, in minutes. The report is generated for three RPO values: 15 (default), 30, and 60 minutes. The bandwidth recommendation in the report is changed based on your selection in the Desired RPO drop-down list at the top right of the sheet. If you have generated the report by using the -DesiredRPO parameter with a custom value, this custom value will show as the default in the Desired RPO drop-down list.
Required network bandwidth (Mbps)
To meet RPO 100 percent of the time: The recommended bandwidth in Mbps to be allocated to meet your desired RPO 100 percent of the time. This amount of bandwidth must be dedicated for steady-state delta replication of all your compatible virtual machines to avoid any RPO violations.
To meet RPO 90 percent of the time: If broadband pricing or other factors prevent you from setting the necessary bandwidth to achieve your desired RPO 100 percent of the time, you can opt for a lower bandwidth setting that meets your desired RPO 90 percent of the time. To understand the implications of setting this lower bandwidth, the report provides a what-if analysis on the number and duration of RPO violations to expect.
Achieved Throughput: The throughput from the server on which you run the GetThroughput command to the Azure region where the storage account is located. This throughput number indicates the estimated level that you can achieve when you protect the compatible virtual machines by using Site Recovery, provided that your configuration server or process server storage and network characteristics remain the same as that of the server from which you run the tool.
For replication, you should set the recommended bandwidth to meet the RPO 100 percent of the time. After you set the bandwidth, if you don’t see any increase in the achieved throughput, as reported by the tool, do the following:
Check to see whether there is any network Quality of Service (QoS) that is limiting Site Recovery throughput.
Check to see whether your Site Recovery vault is in the nearest physically supported Azure region to minimize network latency.
Check your local storage characteristics to determine whether you can improve the hardware (for example, HDD to SSD).
Change the Site Recovery settings in the process server to increase the amount network bandwidth used for replication.
If you're running the tool on a configuration server or process server that already has protected virtual machines, run the tool a few times. The achieved throughput number changes depending on the amount of churn being processed then.
For all enterprise Site Recovery deployments, we recommend that you use ExpressRoute.
Required storage accounts
The following chart shows the total number of storage accounts (standard and premium) that are required to protect all the compatible virtual machines. To learn which storage account to use for each virtual machine, see the VM-storage placement section. If you're using v2.5 of Deployment Planner, this recommendation only shows the number of standard cache storage accounts which are needed for replication since the data is being directly written to Managed Disks.
Required number of Azure cores
This result is the total number of cores to be set up before failover or test failover of all the compatible virtual machines. If too few cores are available in the subscription, Site Recovery fails to create virtual machines at the time of test failover or failover.
Required on-premises infrastructure
This figure is the total number of configuration servers and extra process servers to be configured that would suffice to protect all the compatible virtual machines. Depending on the supported size recommendations for the configuration server, the tool might recommend extra servers. The recommendation is based on the larger of either the per-day churn or the maximum number of protected virtual machines (assuming an average of three disks per virtual machine), whichever is hit first on the configuration server or the extra process server. You'll find the details of total churn per day and total number of protected disks in the "On-premises summary" section.
What-if analysis
This analysis outlines how many violations could occur during the profiling period when you set a lower bandwidth for the desired RPO to be met only 90 percent of the time. One or more RPO violations can occur on any given day. The graph shows the peak RPO of the day. Based on this analysis, you can decide if the number of RPO violations across all days and peak RPO hit per day is acceptable with the specified lower bandwidth. If it's acceptable, you can allocate the lower bandwidth for replication, else allocate the higher bandwidth as suggested to meet the desired RPO 100 percent of the time.
Recommended virtual machine batch size for initial replication
In this section, we recommend the number of virtual machines that can be protected in parallel to complete the initial replication within 72 hours with the suggested bandwidth to meet desired RPO 100 percent of the time being set. This value is configurable value. To change it at report-generation time, use the GoalToCompleteIR parameter.
The graph here shows a range of bandwidth values and a calculated virtual machine batch size count to complete initial replication in 72 hours, based on the average detected virtual machine size across all the compatible virtual machines.
In the public preview, the report does not specify which virtual machines should be included in a batch. You can use the disk size shown in the Compatible VMs section to find each virtual machine’s size and select them for a batch, or you can select the virtual machines based on known workload characteristics. The completion time of the initial replication changes proportionally, based on the actual virtual machine disk size, used disk space, and available network throughput.
Cost by components The total DR cost is divided into four components: Compute, Storage, Network, and Azure Site Recovery license cost. The cost is calculated based on the consumption that is incurred during replication and at DR drill time for compute, storage (premium and standard), ExpressRoute/VPN that is configured between the on-premises site and Azure, and Azure Site Recovery license.
Cost by states The total disaster recovery (DR) cost is categories based on two different states - Replication and DR drill.
Replication cost: The cost that is incurred during replication. It covers the cost of storage, network, and Azure Site Recovery license.
DR-Drill cost: The cost that is incurred during test failovers. Azure Site Recovery spins up virtual machines during test failover. The DR drill cost covers the running virtual machines’ compute and storage cost.
Azure storage cost per Month/Year It shows the total storage cost that is incurred for premium and standard storage for replication and DR drill.
Growth factor and percentile values used
This section at the bottom of the sheet shows the percentile value used for all the performance counters of the profiled virtual machines (default is 95th percentile), and the growth factor (default is 30 percent) that's used in all the calculations.
Recommendations with available bandwidth as input
You might have a situation where you know that you cannot set a bandwidth of more than x Mbps for Site Recovery replication. The tool allows you to input available bandwidth (using the -Bandwidth parameter during report generation) and get the achievable RPO in minutes. With this achievable RPO value, you can decide whether you need to set up extra bandwidth or you're OK with having a disaster recovery solution with this RPO.
virtual machine-storage placement
Note
Deployment Planner v2.5 onwards recommends the storage placement for machines which will replicate directly to managed disks.
Replication Storage Type: Either a standard or premium managed disk, which is used to replicate all the corresponding virtual machines mentioned in the VMs to Place column.
Log Storage Account Type: All the replication logs are stored in a standard storage account.
Suggested Prefix for Storage Account: The suggested three-character prefix that can be used for naming the cache storage account. You can use your own prefix, but the tool's suggestion follows the partition naming convention for storage accounts.
Suggested Log Account Name: The storage-account name after you include the suggested prefix. Replace the name within the angle brackets (< and >) with your custom input.
Placement Summary: A summary of the disks needed to protected virtual machines by storage type. It includes the total number of virtual machines, total provisioned size across all disks, and total number of disks.
Virtual Machines to Place: A list of all the virtual machines that should be placed on the given storage account for optimal performance and use.
Compatible virtual machines
Virtual machine Name: The virtual machine name or IP address that's used in the VMListFile when a report is generated. This column also lists the disks (VMDKs) that are attached to the virtual machines. To distinguish vCenter virtual machines with duplicate names or IP addresses, the names include the ESXi host name. The listed ESXi host is the one where the virtual machine was placed when the tool discovered during the profiling period.
Virtual machine Compatibility: Values are Yes and Yes*. Yes* is for instances in which the virtual machine is a fit for premium SSDs. Here, the profiled high-churn or Input/output operations per second (IOPS) disk fits in the P20 or P30 category, but the size of the disk causes it to be mapped down to a P10 or P20. The storage account decides which premium storage disk type to map a disk to, based on its size. For example:
- <128 GB is a P10.
- 128 GB to 256 GB is a P15
- 256 GB to 512 GB is a P20.
- 512 GB to 1024 GB is a P30.
- 1025 GB to 2048 GB is a P40.
- 2049 GB to 4095 GB is a P50.
For example, if the workload characteristics of a disk put it in the P20 or P30 category, but the size maps it down to a lower premium storage disk type, the tool marks that virtual machine as Yes*. The tool also recommends that you either change the source disk size to fit into the recommended premium storage disk type or change the target disk type post-failover.
Storage Type: Standard or premium.
Asrseeddisk (Managed Disk) created for replication: The name of the disk that is created when you enable replication. It stores the data and its snapshots in Azure.
Peak R/W IOPS (with Growth Factor): The peak workload read/write IOPS on the disk (default is 95th percentile), including the future growth factor (default is 30 percent). The total read/write IOPS of a virtual machine isn't always the sum of the virtual machine’s individual disks’ read/write IOPS, because the peak read/write IOPS of the virtual machine is the peak of the sum of its individual disks' read/write IOPS during every minute of the profiling period.
Peak Data Churn in Mbps (with Growth Factor): The peak churn rate on the disk (default is 95th percentile), including the future growth factor (default is 30 percent). The total data churn of the virtual machine isn't always the sum of the virtual machine’s individual disks’ data churn, because the peak data churn of the virtual machine is the peak of the sum of its individual disks' churn during every minute of the profiling period.
Azure virtual machine Size: The ideal mapped Azure Cloud Services virtual machine size for this on-premises virtual machine. The mapping is based on the on-premises virtual machine’s memory, number of disks/cores/NICs, and read/write IOPS. The recommendation is always the lowest Azure virtual machine size that matches all of the on-premises virtual machine characteristics.
Number of Disks: The total number of virtual machine disks (VMDKs) on the virtual machine.
Disk size (GB): The total setup size of all disks of the virtual machine. The tool also shows the disk size for the individual disks in the virtual machine.
Cores: The number of CPU cores on the virtual machine.
Memory (MB): The RAM on the virtual machine.
NICs: The number of NICs on the virtual machine.
Boot Type: Boot type of the virtual machine. It can be either BIOS or EFI. Currently Azure Site Recovery supports Windows Server EFI virtual machines (Windows Server 2012, 2012 R2 and 2016) provided the number of partitions in the boot disk is less than 4 and boot sector size is 512 bytes. To protect EFI virtual machines, Azure Site Recovery mobility service version must be 9.13 or later. Only failover is supported for EFI virtual machines. Failback isn't supported.
OS Type: It's OS type of the virtual machine. It can be either Windows or Linux or other based on the chosen template from VMware vSphere while creating the virtual machine.
Incompatible virtual machines
Virtual machine Name: The virtual machine name or IP address that's used in the VMListFile when a report is generated. This column also lists the VMDKs that are attached to the virtual machines. To distinguish vCenter virtual machines with duplicate names or IP addresses, the names include the ESXi host name. The listed ESXi host is the one where the virtual machine was placed when the tool discovered during the profiling period.
Virtual machine Compatibility: Indicates why the given virtual machine is incompatible for use with Site Recovery. The reasons are described for each incompatible disk of the virtual machine and, based on published storage limits, can be any of the following:
Wrong data disk size or wrong OS disk size. Review the support limits.
Total virtual machine size (replication + TFO) exceeds the supported storage-account size limit (35 TB). This incompatibility usually occurs when a single disk in the virtual machine has a performance characteristic that exceeds the maximum supported Azure or Site Recovery limits for standard storage. Such an instance pushes the virtual machine into the premium storage zone. However, the maximum supported size of a premium storage account is 35 TB, and a single protected virtual machine cannot be protected across multiple storage accounts. Also note that when a test failover is executed on a protected virtual machine, it runs in the same storage account where replication is progressing. In this instance, set up 2x the size of the disk for replication to progress and test failover to succeed in parallel.
Source IOPS exceeds supported storage IOPS limit of 7500 per disk.
Source IOPS exceeds supported storage IOPS limit of 80,000 per virtual machine.
Average data churn exceeds supported Site Recovery data churn limit of 20 MB/s for average I/O size for the disk.
Peak data churn across all disks on the virtual machine exceeds the maximum supported Site Recovery peak data churn limit of 54 MB/s per virtual machine.
Average effective write IOPS exceeds the supported Site Recovery IOPS limit of 840 for disk.
Calculated snapshot storage exceeds the supported snapshot storage limit of 10 TB.
Total data churn per day exceeds supported churn per day limit of 2 TB by a Process Server.
Peak R/W IOPS (with Growth Factor): The peak workload IOPS on the disk (default is 95th percentile), including the future growth factor (default is 30 percent). The total read/write IOPS of the virtual machine isn't always the sum of the virtual machine’s individual disks’ read/write IOPS, because the peak read/write IOPS of the virtual machine is the peak of the sum of its individual disks' read/write IOPS during every minute of the profiling period.
Peak Data Churn in Mbps (with Growth Factor): The peak churn rate on the disk (default 95th percentile) including the future growth factor (default 30 percent). The total data churn of the virtual machine isn't always the sum of the virtual machine’s individual disks’ data churn, because the peak data churn of the virtual machine is the peak of the sum of its individual disks' churn during every minute of the profiling period.
Number of Disks: The total number of VMDKs on the virtual machine.
Disk size (GB): The total setup size of all disks of the virtual machine. The tool also shows the disk size for the individual disks in the virtual machine.
Cores: The number of CPU cores on the virtual machine.
Memory (MB): The amount of RAM on the virtual machine.
NICs: The number of NICs on the virtual machine.
Boot Type: Boot type of the virtual machine. It can be either BIOS or EFI. Currently Azure Site Recovery supports Windows Server EFI virtual machines (Windows Server 2012, 2012 R2 and 2016) provided the number of partitions in the boot disk is less than 4 and boot sector size is 512 bytes. To protect EFI virtual machines, Azure Site Recovery mobility service version must be 9.13 or above. Only failover is supported for EFI virtual machines. Failback isn't supported.
OS Type: It's OS type of the virtual machine. It can be either Windows or Linux or other based on the chosen template from VMware vSphere while creating the virtual machine.
Azure Site Recovery limits
The following table provides the Azure Site Recovery limits. These limits are based on our tests, but they cannot cover all possible application I/O combinations. Actual results can vary based on your application I/O mix. For best results, even after deployment planning, we always recommend that you perform extensive application testing by issuing a test failover to get the true performance picture of the application.
Replication storage target | Average source disk I/O size | Average source disk data churn | Total source disk data churn per day |
---|---|---|---|
Standard storage | 8 KB | 2 MB/s | 168 GB per disk |
Premium P10 or P15 disk | 8 KB | 2 MB/s | 168 GB per disk |
Premium P10 or P15 disk | 16 KB | 4 MB/s | 336 GB per disk |
Premium P10 or P15 disk | 32 KB or greater | 8 MB/s | 672 GB per disk |
Premium P20 or P30 or P40 or P50 disk | 8 KB | 5 MB/s | 421 GB per disk |
Premium P20 or P30 or P40 or P50 disk | 16 KB or greater | 20 MB/s | 1684 GB per disk |
Source data churn | Maximum Limit |
---|---|
Peak data churn across all disks on a virtual machine | 54 MB/s |
Maximum data churn per day supported by a Process Server | 2 TB |
These are average numbers assuming a 30 percent I/O overlap. Site Recovery is capable of handling higher throughput based on overlap ratio, larger write sizes, and actual workload I/O behavior. The preceding numbers assume a typical backlog of approximately five minutes. That is, after data is uploaded, it's processed and a recovery point is created within five minutes.