Most Common Traceroute Mistake

Look at the following traceroute and see if you can find the problems:

traceroute to google.com (74.125.239.129), 30 hops max, 60 byte packets
 1  10.71.177.1 (10.71.177.1)  0.632 ms  0.476 ms  0.604 ms
 2  10.71.145.213 (10.71.145.213)  1.338 ms  1.096 ms  1.283 ms
 3  hmmm.com (10.71.177.249)  *  *  *
 4  oh.oh.net (192.168.213.66)  213.504 ms  487.377 ms  451.450 ms
 5  192.168.224.252 (192.168.224.252)  1.426 ms  1.450 ms  1.563 ms
 6  209.85.240.114 (209.85.240.114)  2.067 ms  2.000 ms  1.894 ms
 7  66.249.95.31 (66.249.95.31)  2.095 ms  2.289 ms  2.339 ms
 8  nuq05s02-in-f1.1e100.net (74.125.239.129)  1.798 ms  1.844 ms  1.786 ms

See the issue?

OK, that was actually a trick question. There is nothing wrong with the above traceroute. Everything looks good. But wait, want about hops three and four? Hop three (hmmm.com) isn't replying at all, and look at how large the response times from hop four (oh.oh.net) are!

The above are by far the most common mistake when users attempt to diagnose netwokring issues. I have seen experienced network and system administrators fall for them. The truth is, hop three and four don't matter. Hop three is probably a router that does not reply to ICMP packets where the router itself is the destination. An administrator probably disabled ICMP replies. They might do this to prevent attacks and also just to save router memory for actual data routing.

If you ever see response failures in the middle of the traceroute path, but no failure end to end, then there is no problem. It just means a device along the way does not prioritize ICMP packets sent to itself. This on the other hand might be something to investigate further:

traceroute to google.com (74.125.239.129), 30 hops max, 60 byte packets
 1  10.71.177.1 (10.71.177.1)  0.632 ms  0.476 ms  0.604 ms
 2  10.71.145.213 (10.71.145.213)  1.338 ms  1.096 ms  1.283 ms
 3  hmmm.com (10.71.177.249)  *  *  *
 4  oh.oh.net (192.168.213.66)  213.504 ms  *  *
 5  192.168.224.252 (192.168.224.252)  1.426 ms  *  *
 6  209.85.240.114 (209.85.240.114)  2.067 ms  *  *
 7  66.249.95.31 (66.249.95.31)  2.095 ms  *  *
 8  nuq05s02-in-f1.1e100.net (74.125.239.129)  1.798 ms  *  *

You see that the first trace went through end to end, but then the other two got dropped and never go beyond hop three. This would indicate there is an issue.

The same with the latency you see on hop four of the original traceroute. just because hop four is slow to respond ( 4 oh.oh.net (192.168.213.66) 213.504 ms 487.377 ms 451.450 ms) does not mean there is an issue, because the end to end response times are good. It only takes a little over a millisecond to reach the destination ( 8 nuq05s02-in-f1.1e100.net (74.125.239.129) 1.798 ms 1.844 ms 1.786 ms).

Basically, don't worry about what happens in the middle of a traceroute. A traceroute is useful to see where packets go when trying to reach the destination, but they can easily lead to bad diagnostics or the impression that there is a problem when there is no problem.

The above two mistakes in reading a traceroute happen all the time.

There is an excellent NANOG presentation on troubleshooting with traceroute that anyone interested in networking should read.