Skip to main content
Cisco Meraki

Deploying Highly Available vMX in Azure

By Mitchell Gulledge

Overview

This document encompasses a detailed architecture when deploying highly available VPN concentrators inside Microsoft Azure. This document contains a reference architecture along with a detailed explanation on how failover is achieved in Azure. This reference architecture is highly scalable and is achievable today as GA.  

Reference Architecture

In the below reference architecture, a Cisco Meraki vMX has been deployed in 2 separate VNETs that reside in 2 different regions. VNET peering has been configured for each SD-WAN VNET (where the vMX has been deployed) to the respective VNETs where the Azure resources are hosted. This process can be fully automated if new VNETs are deployed. This provides software redundancy as well as hardware redundancy since this architecture is cross region.  

 

For deploying vMX Network Virtual Appliances from the Azure marketplace, please reference our vMX Setup Guide for Microsoft Azure.

In order to provide High Availability for vMXs in Azure, Azure functions can be utilized to facilitate automatic failover between a primary and standby vMX. User-defined routes (UDRs) are utilized to override the Azure default system routes by directing traffic to the active vMX in an active-passive pair. If the active vMX fails, the Azure route table changes the next hop to the secondary vMX. 

This solution uses two Azure virtual machines to host the vMXs in an active-passive configuration: 

 

The failover of UDR table entries is automated by the Azure function App. The function changes next-hop address to the IP address to the interface of the active vMX for the Azure Gateways route table. The function app must be in the same Azure subscription that contains the vMXs. This function app monitors the state of the vMX and triggers a User Defined Route override to facilitate failover. During the initial setup of the function app, a probe interval to check VM liveliness is specified. The default value for this timer trigger causes the function app to run every 30-seconds. Per Azure, it is not recommended to shorten this interval. 

Below are some of the variables in the function that are needed to facilitate high availability: 

 

Primary vMX Name 

Name of the virtual machine hosting the primary vMX 

Secondary vMX Name 

Name of the virtual machine hosting the failover vMX 

vMX Resource Group Name 

Name of the resource group containing the vMXs 

vMX UDR Tag 

Resource tag value 

vMX Probe Retires 

3 (enables three retries for checking vMX health before returning “Down” status) 

vMX Delay 

2 (enables two seconds between retries) 

vMX MONITOR 

vMX Status 

For implementing Azure functions to support High Availability vMXs, please reference:  

https://github.com/Azure/ha-nva-fo 

For more information regarding Azure Functions, please reference: 

https://docs.microsoft.com/en-us/azure/azure-functions/ 

For more information on configuring VNET peering, please reference: 

https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-peering-overview 

For more information on automating VNET peering, please reference: 

https://docs.microsoft.com/en-us/rest/api/virtualnetwork/virtualnetworkpeerings