Home > Security Appliances > Other Topics > Troubleshooting MX Warm Spare in NAT Mode (NAT HA)

Troubleshooting MX Warm Spare in NAT Mode (NAT HA)

When configured in NAT mode, an MX security appliance is commonly used as a network gateway, performing inter-VLAN routing and handling traffic bound for the Internet. To achieve redundancy, a secondary MX can be added to Dashboard as a warm spare, allowing it to share the primary MX's configuration and seamlessly take over in the event of a device failure. This configuration is commonly referred to as High Availability NAT (NAT HA).

This article outlines common troubleshooting steps and best practices for NAT HA configurations.

Note: For more information about NAT HA, including configuration steps and use cases, please refer to our documentation.

Requirements and Best Practices

When configuring NAT HA, it is critical that both MXes have a reliable connection to each other on the LAN, so the Primary MX's VRRP heartbeats can be seen reliably by the Spare. To ensure this connection is reliable:

  • The two MXes should be physically connected to each other on the LAN by a single cable, with trunk ports on both sides allowing all VLANs.
    • If the two MXes cannot be directly connected, there should be no more than one additional hop between them, and they must be able to communicate on all VLANs.
  • If possible, disconnect any downstream LAN devices during configuration, as long as both MXes have Internet connectivity to the cloud, and LAN connectivity to each other.
    • When first configuring NAT HA, the Spare should be added and configured in Dashboard before the device is physically deployed, so it will immediately fetch its configuration and behave appropriately.

Additionally, the following other considerations should be kept in mind:

  • Both MXes must share the same number of uplinks. That is, if the Primary MX has dual uplinks, then the Spare must have dual uplinks as well.
  • If a virtual IP is being used, then each uplink of the two MXes must share the same broadcast domain on the WAN side.

Note: The secondary MX must be the same MX model as the primary. Warm spare functionality is not supported between different MX models (e.g. MX80 & MX100).

Troubleshooting

If there is a problem with the NAT HA configuration, there may be various symptoms that will affect the network, and it may not be obvious that the root cause is NAT HA. This section outlines what issues with HA typically look like, as well as recommended troubleshooting steps.

Dual Master

The most common sign of a problem with NAT HA is a Dual Master scenario, where both the Primary and Spare MX report in Dashboard as being Active (master). This can be observed in Dashboard under Security appliance > Monitor > Appliance status, and comparing the current state of each appliance.

This will occur if the Primary MX is online and sending heartbeats that aren't seen by the Spare, resulting in the Spare thinking that the Primary is down. This is usually the result of having a non-direct connection between the two MXes, which can cause problems with the VRRP heartbeats reliably reaching the spare.

If both the Primary and Spare are in the master state, this will cause various issues with the network, affecting DHCP, routing, VPN, etc.

Recommended Troubleshooting Steps

If network issues are occurring that appear to be related to NAT HA, the following troubleshooting steps should be taken to identify the root cause:

  1. Check both appliances in Dashboard (under Security appliance > Monitor > Appliance status) to check if there is a Dual Master scenario as outlined above.
    1. If both appliances are consistently reporting in the "active" state, check their LAN connection and make sure they can communicate with each other. If not, it is recommended to remedy this with a single cable directly connecting two trunk ports on each MX.
    2. If the Spare MX is intermittently reporting as active while the Primary remains online and active, check that both MXes can communicate with each other on all VLANs. Additionally, ensure there is no bad cable connecting the two devices, or any other physical issue that could result in unreliable communication.
    3. In any case, it is strongly recommended to take a packet capture on the LAN side of each MX, to get a clear picture of where the VRRP heartbeats are being lost.
  2. If the HA pair is configured to use a virtual IP on the uplink, make sure that each pair of WAN connections (WAN 1 on each MX, for example) share the same broadcast domain, so they can both be seen by the upstream device. See the image below for an example topology with dual uplinks/ISPs:
You must to post a comment.
Last modified
15:16, 18 Feb 2016

Tags

This page has no custom tags.

Classifications

This page has no classifications.

Article ID

ID: 3839

Contact Support

Most questions can be answered by reviewing our documentation, but if you need more help, Cisco Meraki Support is ready to work with you.

Open a Case