The Cisco Meraki MX offers seamless hardware failover using a warm spare, high availability configuration. This article will detail how an HA pair of MX use Virtual Router Redundancy Protocol (VRRP) to fail over and maintain connectivity for downstream clients.
Please note, this article assumes working knowledge of VRRP and NAT HA.
For more information about VRRP, please reference the RFC.
For more information about NAT HA, please reference our documentation.
A pair of MX in an HA configuration will use VRRP advertisements to monitor the status of the current master. In a working state, the master MX will send VRRP advertisements out to the LAN every second. If the spare MX does not receive any advertisements for three seconds, it assumes that the master MX has failed and will take over as the new master (including sending its own advertisements). This mechanism allows a secondary MX to take over in the event of a hardware failure.
In addition to this simple heartbeat mechanic, the master will also report its VRRP priority in the advertisements it sends. For reference, the following priority values are used by each MX, which also depends on whether or not the MX has uplink connectivity:
VRRP Priority Values set on the MX
Primary MX | Secondary MX | |
---|---|---|
Working Uplinks on primary | 255 | 235 |
No Uplink Connection on primary | 75 | 235 |
No Uplink Connection on secondary | 255 | 55 |
VRRP Priority Values sent by the MX
Primary MX | Secondary MX | |
---|---|---|
Working Uplinks on primary | 255 | Nothing |
No Uplink Connection on primary | 0 (single advertisement) | 235 |
No Uplink Connection on secondary | 255 | Nothing |
If either MX sees a VRRP advertisement with a lower priority than its own, that MX will take over as master.
For example: If the master/primary MX loses all uplink connectivity, it changes its own internal VRRP priority to 75 and sends one-time advertisement with a priority of 0 - a priority of 0 indicates that the sender will no longer be sending advertisements. When the spare MX receives the advertisement with priority 0, it sees that its own priority (235) is greater than the priority within the advertisement, so the spare takes over as the current master and begins sending advertisements with the priority of 235. The primary MX stops sending advertisements until it goes back into a working state.
This mechanism allows a secondary MX to take over in the event of an upstream failure on the primary MX.
Only the current master MX will send VRRP advertisements. In addition to the VRRP priority, there are two key values used by the HA pair:
These two fields are used in conjunction to indicate that a VRRP advertisement is sent by the other MX; they will ignore any VRRP advertisements that do not match these values.
In addition, the VRRP MAC address is shared by both MXs for LAN communication. Clients on the LAN will associate this shared MAC address with the MX's LAN IPs. As such, in the event of failover, LAN clients won't need to update their ARP table with a new MAC address.
The following sections walk step-by-step through a common HA failover scenario, wherein the primary MX loses all uplink connectivity and the secondary MX takes over.
The following scenario assumes that the primary and secondary MX are connected on the LAN side, and that they are able to exchange VRRP advertisements across all configured VLANs.
Starting from a baseline working state, both the primary and secondary MX are online with dual uplinks. Everything is normal, so the primary MX is the current master:
In this state, the primary MX sends VRRP advertisements (with a priority of 255) every second:
After the primary MX loses all uplink connectivity, it will send a VRRP advertisement with a priority of 0.
The priority value zero (0) has special meaning indicating that the current master has stopped participating in VRRP. This is used to trigger backup routers to immediately transition to master without having to wait for the current master to timeout.
Once the spare MX receives the 0-priority VRRP advertisement, it will become the new master.
As the new master, the spare MX takes over the LAN by sending its own advertisements with a priority of 235:
The following sections outline some less common failover scenarios:
Assume the end of the scenario above, where the primary MX has no uplink connectivity and the spare MX is the current master.
If the spare MX also loses all uplink connectivity, it will send a VRRP message with a priority of 0:
In this scenario, the primary MX will transition back into the current master role. Without any working uplinks, it will only provide LAN routing:
When the primary MX receives the 0-priority VRRP advertisement, the primary starts sending its own VRRP advertisements with a priority of 75, indicating that it does not have uplink connectivity:
In the unlikely scenario that the primary MX's hardware goes down entirely while the spare has no working uplinks, the spare will transition back to the current master role in order to provide LAN routing:
When the spare MX stops seeing any VRRP messages from the primary, the spare MX takes over the LAN by sending its own advertisements with a priority of 55, indicating that it does not have uplink connectivity: