Skip to main content
Cisco Meraki

Deploying Highly Available vMX100s in Azure

By Mitchell Gulledge

Overview

This document encompasses a detailed architecture when deploying highly available VPN concentrators inside Microsoft Azure. This document contains a reference architecture along with a detailed explanation on how failover is achieved in Azure. This reference architecture is highly scalable and is achievable today as GA.  

Reference Architecture

In the below reference architecture, a Cisco Meraki vMX100 has been deployed in 2 separate VNETs that reside in 2 different regions. VNET peering has been configured for each SD-WAN VNET (where the vMX100 has been deployed) to the respective VNETs where the Azure resources are hosted. This process can be fully automated if new VNETs are deployed. This provides software redundancy as well as hardware redundancy since this architecture is cross region.  

 

For deploying vMX100 Network Virtual Appliances from the Azure marketplace, please reference our vMX100 Setup Guide for Microsoft Azure.

In order to provide High Availability for vMX100s in Azure, Azure functions can be utilized to facilitate automatic failover between a primary and standby vMX100. User-defined routes (UDRs) are utilized to override the Azure default system routes by directing traffic to the active vMX100 in an active-passive pair. If the active vMX100 fails, the Azure route table changes the next hop to the secondary vMX100. 

This solution uses two Azure virtual machines to host the vMX100s in an active-passive configuration: 

 

The failover of UDR table entries is automated by the Azure function App. The function changes next-hop address to the IP address to the interface of the active vMX100 for the Azure Gateways route table. The function app must be in the same Azure subscription that contains the vMX100s. This function app monitors the state of the vMX100 and triggers a User Defined Route override to facilitate failover. During the initial setup of the function app, a probe interval to check VM liveliness is specified. The default value for this timer trigger causes the function app to run every 30-seconds. Per Azure, it is not recommended to shorten this interval. 

Below are some of the variables in the function that are needed to facilitate high availability: 

 

Primary vMX100 Name 

Name of the virtual machine hosting the primary vMX100 

Secondary vMX100 Name 

Name of the virtual machine hosting the failover vMX100 

vMX Resource Group Name 

Name of the resource group containing the vMX100s 

vMX UDR Tag 

Resource tag value 

vMX Probe Retires 

3 (enables three retries for checking vMX100 health before returning “Down” status) 

vMX Delay 

2 (enables two seconds between retries) 

vMX MONITOR 

vMX100 Status 

For implementing Azure functions to support High Availability vMX100s, please reference:  

https://github.com/Azure/ha-nva-fo 

For more information regarding Azure Functions, please reference: 

https://docs.microsoft.com/en-us/azure/azure-functions/ 

For more information on configuring VNET peering, please reference: 

https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-peering-overview 

For more information on automating VNET peering, please reference: 

https://docs.microsoft.com/en-us/rest/api/virtualnetwork/virtualnetworkpeerings