Tag-Based IPsec VPN Failover
Authors: Mitchell Gulledge, Raul Ricano and Chris Weber
This document describes the benefits and uses of Tagged Based VPN Failover. This document will serve as a reference for the optimal architecture to allow our customers to receive the most benefit of this technology.
Overview
Tagged Based VPN Failover is utilized for third party Data Center Failover and OTT SD WAN Integration. This is accomplished by utilizing the API at each branch or Data Center. Each MX appliance will utilize IPsec VPN with cloud VPN nodes. IPsec along with the API is utilized to facilitate the dynamic tag allocation.
A typical VPN topology for enterprise routing can be seen above. In this use case, the design is providing DC-DC Failover for branch(spoke) sites. In this scenario, if there is a failure on any of the monitored IPs (IPsec peers) there will be an immediate, secure and reliable failover. In order for DC-DC Failover to be achieved, the following behavior must occur:
-
Spoke sites will form a VPN tunnel to the primary DC
-
dual active VPN tunnels to both DC’s is not possible with IPSEC given that interesting traffic is often needed to bring up an ipsec tunnel and that interesting traffic will be routed to the first tunnel/peer configured and never the second
-
Each spoke will be configured with a tracked IP of its primary DC under the traffic shaping page
-
-
If the tracked IP experiences loss in the last 5 minutes, the API script (below) will re-tag the network in order to swap to the secondary ipsec VPN tunnel
-
Once the tracked IP has not had any loss in the last 5 minutes, the tags will be swapped back to swap back to the primary DC (to avoid flapping)
Sample API Solution
The following code is one sample python implementation of this solution. The following will describe how this works.
Prerequisites
Add your API key and org ID to the code in the bolded sections (api_key and url) of the code.
Topology
Dashboard Configuration
Tracked IP's
Navigate to Security & SD-WAN > Configure > SD-WAN & Traffic Shaping and add the IP of the primary peer under the uplink statistics. The MX will start sending ICMP requests to this IP to track reachability. This data can be viewed on the Security & SD-WAN > Monitor > Appliance Status > Uplink page and can be obtained via the API
Network Tags
Naviate to Organization > Monitor > Overview. Select the network you wish to tag and add one tag for each IPSec peer. Tags should be in the format:
<identifier>_<primary/backup>_<state(up/down)>
As an example, if my primary VPN endpoint is London and backup is Paris my tags would be:
london_primary_up (default state for primary is up)
paris_backup_down (default state for the backup is down)
The script below will change the up/down state of these tags when loss is detected on the primary peer (tracked per the section above).
Site to Site VPN
Navigate to Security & SD-WAN > Configure > Site-to-Site VPN and add a peer for the primary and one for the secondary. Each will have the same private subnets but do not cause an overlapping conflict because each will be tagged to a different network with the availability selector. Tag each peer with its corresponding tag configured in the section above.
Code
The below code is for reference only. Meraki support does not assist with scripting.
import requests, json, time api_key = '<API Key>' url = 'https://api.meraki.com/api/v1/organizations/<org_ID>/devices/uplinksLossAndLatency' header = {"X-Cisco-Meraki-API-Key": api_key, "Content-Type": "application/json"} networkDownList = [] while True: response = requests.get(url,headers=header) for network in response.json(): if network['ip'] != '8.8.8.8' and network['uplink']!="wan1": print(network['networkId']) print(network['ip']) loss=False for iteration in network['timeSeries']: if iteration['lossPercent'] >= 30: loss=True network_info = requests.get("https://api.meraki.com/api/v1/networks/"+network['networkId'], headers=header) print(network_info.json()['name']) tags = network_info.json()['tags'].split(' ') if "_primary_down" in tags[1] or "_primary_down" in tags[2]: print("VPN already swapped") break else: print("Need to change VPN, recent loss - "+str(iteration['lossPercent'])) if "_primary_up" in tags[1]: tags[1] = tags[1].split("_up")[0]+"_down" if "_primary_up" in tags[2]: tags[2] = tags[2].split("_up")[0]+"_down" if "_backup_down" in tags[1]: tags[1] = tags[1].split("_down")[0]+"_up" if "_backup_down" in tags[2]: tags[2] = tags[2].split("_down")[0]+"_up" payload = {'tags': tags[2]+" "+tags[1]} new_network_info = requests.put("https://api.meraki.com/api/v1/networks/"+network['networkId'], data=json.dumps(payload), headers=header) networkDownList.append(network['networkId']) break if loss==False and network['networkId'] in networkDownList: print("Primary VPN healthy again..swapping back") network_info = requests.get("https://api.meraki.com/api/v1/networks/"+network['networkId'], headers=header) tags = network_info.json()['tags'].split(' ') if "_primary_down" in tags[1]: tags[1] = tags[1].split("_down")[0]+"_up" if "_primary_down" in tags[2]: tags[2] = tags[2].split("_down")[0]+"_up" if "_backup_up" in tags[1]: tags[1] = tags[1].split("_up")[0]+"_down" if "_backup_up" in tags[2]: tags[2] = tags[2].split("_up")[0]+"_down" payload = {'tags': tags[1]+" "+tags[2]} new_network_info = requests.put("https://api.meraki.com/api/v1/networks/"+network['networkId'], data=json.dumps(payload), headers=header) networkDownList.remove(network['networkId']) print(networkDownList) print("Sleeping for 30s...") time.sleep(30)
Note: This is a sample script that can be used as a reference to create a custom script. Cisco Meraki support will not have the ability to troubleshoot any third party scripts that are based on/or similar to this.
Sample Output
N_573083052582988629 <--Network we are tracking 192.168.128.201 <--Primary VPN hub we are tracking SD-WAN Hub <-- Network Name Need to change VPN, recent loss - 41.7 <--Packet loss of 41.7% detected. Script above set to failover on 30% loss Sleeping for 30s... <--continues to repeat process every 30s (adjustable in script) N_573083052582988629 192.168.128.201 SD-WAN Hub VPN already swapped Sleeping for 30s... N_573083052582988629 192.168.128.201 SD-WAN Hub VPN already swapped Sleeping for 30s... . ...Repeats until 5 minutes of 0% loss . Sleeping for 30s... N_573083052582988629 192.168.128.201 Primary VPN healthy again..swapping back <--Hasn't been any packet loss on the tracked IP for 5 minutes. Swap back Sleeping for 30s...
Tags Before Failover