Deploying Highly Available vMX in Azure
By Mitchell Gulledge
Overview
This document encompasses a detailed architecture when deploying highly available VPN concentrators inside Microsoft Azure. This document contains a reference architecture along with a detailed explanation on how failover is achieved in Azure. This reference architecture is highly scalable and is achievable today as GA.
Important: This Article is meant to be used as Reference Only. The goal is to explain from a high-level perspective how an Azure Function could perform HA. Cisco Meraki is not responsible for Github project files and Microsoft links mentioned in this article. Customers must create their own scripts. Github project files and Microsoft links are not supported or updated by Cisco Meraki.
Please note that unlike the MX Warm Spare - High Availability Pair, a license is required for each vMX, as they are deployed in separate dashboard networks.
Reference Architecture
In the below reference architecture, a Cisco Meraki vMX has been deployed in 2 separate VNETs that reside in 2 different Availability Zones. VNET peering has been configured for each SD-WAN VNET (where the vMX has been deployed) to the respective VNETs where the Azure resources are hosted. This process can be fully automated if new VNETs are deployed. This provides software redundancy as well as hardware redundancy since this architecture resides in different Availability zones.
For deploying vMX Network Virtual Appliances from the Azure marketplace, please reference our vMX Setup Guide for Microsoft Azure.
In order to provide High Availability for vMXs in Azure, Azure functions can be utilized to facilitate automatic failover between a primary and standby vMX. User-defined routes (UDRs) are utilized to override the Azure default system routes by directing traffic to the active vMX in an active-passive pair. If the active vMX fails, the Azure route table changes the next hop to the secondary vMX.
The vMXs and Azure functions must be within the same Azure subscription and region. The vMXs must also be deployed using their own resource group.
Using different availability zones per vMX is recommended to reduce the likelihood the vMXs are running on the same underlying hardware.
The recommended deployment is that the vMXs are deployed in separate SD-WAN subnets.
This solution uses two Azure virtual machines to host the vMXs in an active-passive configuration:
The failover of UDR table entries is automated by the Azure function App. The function changes next-hop address to the IP address to the interface of the active vMX for the Azure Gateways route table. The function app must be in the same Azure subscription that contains the vMXs. This function app monitors the state of the vMX and triggers a User Defined Route override to facilitate failover. During the initial setup of the function app, a probe interval to check VM liveliness is specified. The default value for this timer trigger causes the function app to run every 30-seconds. Per Azure, it is not recommended to shorten this interval.
Below are some of the variables in the function that are needed to facilitate high availability:
Primary vMX Name |
Name of the virtual machine hosting the primary vMX |
Secondary vMX Name |
Name of the virtual machine hosting the failover vMX |
vMX Resource Group Name |
Name of the resource group containing the vMXs |
vMX UDR Tag |
Resource tag value |
vMX Probe Retires |
3 (enables three retries for checking vMX health before returning “Down” status) |
vMX Delay |
2 (enables two seconds between retries) |
vMX MONITOR |
vMX Status |
For implementing Azure functions to support High Availability vMXs, please reference:
https://github.com/Azure/ha-nva-fo
For more information regarding Azure Functions, please reference:
https://docs.microsoft.com/en-us/azure/azure-functions/
For more information on configuring VNET peering, please reference:
https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-peering-overview
For more information on automating VNET peering, please reference:
https://docs.microsoft.com/en-us/rest/api/virtualnetwork/virtualnetworkpeerings
The links above are meant to be used as reference and are not supported/updated by the Cisco Meraki support team. Please refer any questions to the appropriate authors of those articles and not the Cisco Meraki support team for troubleshooting/deployments using these articles/scripts.