Home > Security Appliances > Networks and Routing > NAT HA Failover Behavior

NAT HA Failover Behavior

Overview

The Cisco Meraki MX offers seamless hardware failover using a warm spare, high availability configuration. This article will detail how an HA pair of MX use Virtual Router Redundancy Protocol (VRRP) to fail over and maintain connectivity for downstream clients.

Please note, this article assumes working knowledge of VRRP and NAT HA.

For more information about VRRP, please reference the RFC.

For more information about NAT HA, please reference our documentation.

VRRP Mechanics for HA

A pair of MX in an HA configuration will use VRRP advertisements to monitor the status of the current master. In a working state, the master MX will send VRRP advertisements out to the LAN every second. If the spare MX does not receive any advertisements for three seconds, it assumes that the master MX has failed and will take over as the new master (including sending its own advertisements). This mechanism allows a secondary MX to take over in the event of a hardware failure.

 

In addition to this simple heartbeat mechanic, the master will also report its VRRP priority in the advertisements it sends. For reference, the following priority values are used by each MX, which also depends on whether or not the MX has uplink connectivity:

  Primary MX Secondary MX
Working Uplinks 255 235
No Uplink Connection 75 55

If either MX sees a VRRP advertisement with a lower priority than its own, that MX will take over as master.

For example: If the master/primary MX loses all uplink connectivity, it changes its own VRRP priority to 75 and sends an advertisement with a priority of 0 - a priority of 0 indicates that the sender will no longer be sending advertisements. When the spare MX receives the advertisement with priority 0, it sees that its own priority (235) is greater than the priority within the advertisement, so the spare takes over as the current master. The primary MX stops sending advertisements until it goes back into a working state.

This mechanism allows a secondary MX to take over in the event of an upstream failure on the primary MX.

Additional VRRP Notes

Only the current master MX will send VRRP advertisements. In addition to the VRRP priority, there are two key values used by the HA pair: 

  • VRRP Router ID - A shared router ID that is also used by both of the MX in the warm spare pair.
  • VRRP MAC address - The virtual MAC address used on the LAN by both MX.

These two fields are used in conjunction to indicate that a VRRP advertisement is sent by the other MX; they will ignore any VRRP advertisements that do not match these values.

In addition, the VRRP MAC address is shared by both MXen for LAN communication. Clients on the LAN will associate this shared MAC address with the MX's LAN IPs. As such, in the event of failover, LAN clients won't need to update their ARP table with a new MAC address.

Typical Failover Scenario

The following sections walk step-by-step through a common HA failover scenario, wherein the primary MX loses all uplink connectivity and the secondary MX takes over.

The following scenario assumes that the primary and secondary MX are connected on the LAN side, and that they are able to exchange VRRP advertisements across all configured VLANs.

Normal State

Starting from a baseline working state, both the primary and secondary MX are online with dual uplinks. Everything is normal, so the primary MX is the current master:

In this state, the primary MX sends VRRP advertisements (with a priority of 255) every second:

Primary Uplink Failure

After the primary MX loses all uplink connectivity, it will send a VRRP advertisement with a priority of 0.

The priority value zero (0) has special meaning indicating that the current master has stopped participating in VRRP.  This is used to trigger backup routers to immediately transition to master without having to wait for the current master to timeout.

Failover to Spare MX

Once the spare MX receives the 0-priority VRRP advertisement, it will become the new master.

As the new master, the spare MX takes over the LAN by sending its own advertisements with a priority of 235:

Additional Failover Scenarios

The following sections outline some less common failover scenarios:

Both MXen Lose Uplink Connectivity

Assume the end of the scenario above, where the primary MX has no uplink connectivity and the spare MX is the current master.

If the spare MX also loses all uplink connectivity, it will send a VRRP message with a priority of 0:

In this scenario, the primary MX will transition back into the current master role. Without any working uplinks, it will only provide LAN routing:

When the primary MX receives the 0-priority VRRP advertisement, the primary starts sending its own VRRP advertisements with a priority of 75, indicating that it does not have uplink connectivity:

Uplinks and Primary MX Down

In the unlikely scenario that the primary MX's hardware goes down entirely while the spare has no working uplinks, the spare will transition back to the current master role in order to provide LAN routing:

When the spare MX stops seeing any VRRP messages from the primary, the spare MX takes over the LAN by sending its own advertisements with a priority of 55, indicating that it does not have uplink connectivity:

You must to post a comment.
Last modified
13:15, 8 Aug 2016

Tags

This page has no custom tags.

Classifications

This page has no classifications.

Article ID

ID: 4302

Contact Support

Most questions can be answered by reviewing our documentation, but if you need more help, Cisco Meraki Support is ready to work with you.

Open a Case