MS Device Health
Overview
The Device Health feature is designed to provide real-time monitoring and historical reporting of critical system resources including CPU, memory, and power utilization. This feature aims to help users maintain optimal device performance, identify potential issues early, and make informed decisions about resource management. By offering detailed insights into the operational status of these key components, Device Health ensures that users can proactively manage their devices, leading to improved efficiency, longevity, and reliability. The Device Health tab could be found when viewing an individual Switch details page.
Requirements and limitations
Switch Series | Minimum Firmware Version |
---|---|
MS100 | MS17 |
MS200 | MS17 |
MS300 | MS17 |
MS400 | MS17 |
Switches are polled every 5 minutes for statistics and the UI displays the timestamp for when the information was last fetched.
Device Health Status Reasons
Icon |
Status |
Description |
---|---|---|
|
Good |
The device component is in a healthy state and does not need attention. |
|
Fair |
The device component is in a fair state and should be investigated and addressed before a potential problem occurs. |
|
Poor |
The device component is not in a healthy and needs immediate attention. |
Key Metrics
CPU
This metric monitors the total number of packets received, dropped and processed by the switch. Packets may be dropped by either the switch ASIC or the CPU depending on the circumstance.
Before delving into how this feature works let's define a couple of terms first.
ASIC |
Short for Application Specific Integrated Circuit. ASICs are chips designed for a specific task. In switches, ASICs are designed to make packet forwarding decisions without sacrificing performance. |
CPU |
Short for Central Processing Unit. The switch CPU is responsible for many critical functions such as managing the operating system, memory, and local processes. The CPU is also responsible for managing various control plane protocols, which informs how the data plane should be configured. |
Control plane |
The control plane on a switch is responsible for configuring how and where packets are forwarded through the switch’s data plane. The control plane handles tasks such as routing protocols, LACP, STP, and other protocols that are critical to network stability and functionality. |
Now that we have defined some basic terminology, let’s talk about how, where, and why traffic might be dropped.
Switch ASICs drop traffic primarily to maintain network performance and integrity. Some examples where a switch ASIC may drop traffic is if an ACL or access policy is configured, the line-rate of the switch port has been exceeded, the traffic exceeds a threshold applied by a QoS rule, and more.
The switch’s CPU may drop traffic if it is oversubscribed. Oversubscription can occur when there’s a large number of packets being forwarded from the switch ASIC to the CPU for control plane processing.
Currently MS Device Health supports reporting the following protocols; LACP, STP, ARP, OSPF, and management with more coming in future releases. The “Others” line is a summation of the 5 protocols mentioned above as well as all other packets received and processed by the switch. Future firmware updates will classify the packets as their respective protocol once supported.
CPU Thresholds
CPU thresholds are defined on a per protocol basis and can be identified by hovering over the health status. Here’s an example of the defined thresholds for the Spanning Tree Protocol.
What if my switch is dropping traffic?
Switches are designed to drop traffic and may not necessarily indicate a problem with the network. Ideally, the CPU and ASIC are able to handle most traffic with minimal dropped packets, so while some dropped packets may be normal, a significant increase of dropped traffic at any time warrants further investigation to determine the cause.
Memory
The memory metric monitors the device’s total system memory usage often referred to as Random Access Memory (RAM). System memory on a network switch is used for critical functions such as running the operating system, loading firmware and configuration files, buffering and queuing of packets, and more.
What if my switch’s memory usage is high?
Cisco Meraki switches are provisioned with sufficient memory to handle operating environments within their design specification. If you notice high memory usage that deviates from your switch’s baseline and results in fair or poor warnings please call into Cisco Meraki Support as this may indicate an issue with the switch.
Power Supplies
The power metric monitors field replaceable power supplies. This section shows the slot the power supply is inserted into, the power supply unit’s serial number and model number, and PoE budget.
PoE Usage
This metric monitors the device’s PoE usage. This section shows the power consumption of all PoE devices connected to the switch and how much power has been budgeted.
API
CPU protocol data can be retrieved using the following endpoint
Memory data can be retrieved using the following endpoint