Packet loss is when a piece of data sent from one networked device to another fails to arrive, and can occur for a variety of reasons. The first thing to do when troubleshooting it is to isolate where the loss is occurring. Using the ping and traceroute or tracert tools in most operating systems is very useful for this. This article will work through an example case of isolating packet loss being encountered by a Windows PC attempting to reach 22.214.171.124.
Determining where packet loss is occurring over routed links
To confirm if packet loss is occurring:
Note: This only tests for packet loss impacting ICMP or all traffic. Protocol specific loss may not be reflected.
If packet loss was seen, the next step is to identify where the packet loss begins to occur. 'tracert' can be used to check each layer 3 device along the path to the destination:
A lack of response will be represented by an asterisk (*), potentially indicating packet loss, or that the device is configured to not respond. The test may need to be completed multiple times to identify where loss is occurring. If packet loss is frequently encountered after a particular hop, then the issue most likely is with that device or between it and the previous hop. This screenshot illustrates a tracert clear of packet loss. The only device to not respond (hop 11) is likely configured to do so, as there is no packet loss after it.
In this next screenshot, packet loss is regularly encountered beginning with hop 2. This indicates there may be an issue with the ISP gateway, or the link(s) between the Client gateway and ISP gateway. It is recommended to test from multiple clients at different locations in the network to help rule out specific client issues and develop commonality between clients experiencing the problem.
As a more robust test, the tool MTR can be used to preform a continuous series of traces and present a % of loss at each hop in the path to more clearly identify where the loss is occurring. Output for the above scenario would appear similar to:
Determining where packet loss is occurring in a wireless/switched network
Tracert only provides information for layer 3 devices in the path, such as routers. However, in the case where packet loss is occurring at the first hop, and must pass through a wireless access point and switch to get there, additional testing is required to isolate the problem. In this case, testing will need to be done multiple times, while getting progressively closer to the layer 3 device. The following steps are illustrated in the image below:
If testing with Cisco Meraki devices it is also possible to ping the first MS switch in the path at switch.meraki.com and the MX security appliance at wired.meraki.com
Common causes of packet loss
There are many potential causes for packet loss. This article will outline some of the more common reasons and what can be done about them.
This occurs when two ends of a link are using different speed/duplex settings, such as 100Mbps/half-duplex and 1000Mbps/full-duplex. When this occurs, some or all traffic will be lost on the link. To correct this, ensure both sides of the link have identical settings. Ideally, both ends of the connection should be set to "Auto" for both speed and duplex. If a speed or duplex setting must be manually set of one end, ensure that it has been set to the same values on the other end as well.
Link congestion (too much traffic)
This occurs when more traffic is attempting to go over a network link than it can support. Such as 60Mbps of traffic passing over the same 20Mbps link. This creates a bottleneck, resulting in some traffic being dropped.
There are multiple ways to solve this, including:
Firewall blocking certain traffic
Even if packet loss isn't occurring for all types of traffic, an upstream firewall may be filtering certain types of traffic. This can result in some websites loading and others failing, or some services being accessible, while others are not. If a firewall exists between two devices/locations experiencing these symptoms, ensure that the firewall is not blocking the traffic that is experiencing the problem.
Bad cable or loose connection
A cable that has been poorly/incorrectly terminated or damaged can result in an incomplete or inaccurate electrical signal passing between devices. Swapping a cable with a new one, or performing a cable test on the one in question, can help to eliminate this as a possibility. Similarly, a cable that has not been fully seated in the port, or has been seated in a port with dust or other non-conductive debris on the pins, can result in an incomplete electrical signal. Be sure to keep all ports free of dust or build-up and ensure cables are securely connected.