Network Troubleshooting

One of the really cool things about the Internet is the way you, or anyone, can see how your traffic is being routed across the Net, and what’s happening to it along the way. This comes in very handy when you’re experiencing some sort of problem connecting to another site. With just a few seconds of research, you can often tell exactly where the problem lies, and this in turn can tell you if it’s something you need to fix yourself, something you need to complain to somebody else about, or something that’s essentially out of your control. It also comes in very handy for evaluating the quality of the explanations you get when you bug your ISP about network outages, which in turn can be an important factor in deciding where to host your web site.

ping and traceroute

The first network utilities we’re going to talk about are the ping and traceroute commands. These utilities let you probe a TCP/IP network (like the Internet) to see where your data packets are going, how long it’s taking them to get there, and whether any of them are getting lost along the way. (See Packet-Switching 101 if these concepts are new to you.)

The ping command sends a bunch of test packets to a particular hostname or IP address and measures how long it takes for them to come back. When you’ve sent enough packets to satisfy your curiosity, you type Ctrl-C, and the program prints out a brief summary and exits. Here’s an example:

[jbc@andros jbc]$ ping www.yahoo.com
PING www.yahoo.com (204.71.200.74): 56 data bytes
64 bytes from 204.71.200.74: icmp_seq=0 ttl=248 time=19.5 ms
64 bytes from 204.71.200.74: icmp_seq=1 ttl=248 time=18.5 ms
64 bytes from 204.71.200.74: icmp_seq=2 ttl=248 time=21.4 ms
64 bytes from 204.71.200.74: icmp_seq=3 ttl=248 time=24.4 ms
64 bytes from 204.71.200.74: icmp_seq=4 ttl=248 time=19.5 ms
64 bytes from 204.71.200.74: icmp_seq=5 ttl=248 time=18.5 ms
64 bytes from 204.71.200.74: icmp_seq=6 ttl=248 time=18.5 ms
64 bytes from 204.71.200.74: icmp_seq=7 ttl=248 time=19.5 ms
64 bytes from 204.71.200.74: icmp_seq=8 ttl=248 time=19.5 ms
64 bytes from 204.71.200.74: icmp_seq=9 ttl=248 time=19.5 ms

--- www.yahoo.com ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 18.5/19.8/24.4 ms

That’s a very respectable set of ping statistics: I sent 10 packets, got them all back (for 0% packet loss), and had an average round-trip time of 19.8 milliseconds. The various people responsible for the route between the machine where I entered this command and http://www.yahoo.com are doing a fine job.

Now let’s try pinging some other site in a more-distant part of the Net:

[jbc@andros jbc]$ ping www.ontas.com.au
PING www.ontas.com.au (203.60.16.17) from 209.151.249.42 : 56(84) bytes of data.
64 bytes from vws1.southcom.com.au (203.60.16.17): icmp_seq=0 ttl=243 time=277.8 ms
64 bytes from vws1.southcom.com.au (203.60.16.17): icmp_seq=1 ttl=243 time=275.4 ms
64 bytes from vws1.southcom.com.au (203.60.16.17): icmp_seq=2 ttl=243 time=281.6 ms
64 bytes from vws1.southcom.com.au (203.60.16.17): icmp_seq=3 ttl=243 time=294.1 ms
64 bytes from vws1.southcom.com.au (203.60.16.17): icmp_seq=4 ttl=243 time=288.1 ms
64 bytes from vws1.southcom.com.au (203.60.16.17): icmp_seq=5 ttl=243 time=280.7 ms
64 bytes from vws1.southcom.com.au (203.60.16.17): icmp_seq=6 ttl=243 time=275.1 ms
64 bytes from vws1.southcom.com.au (203.60.16.17): icmp_seq=7 ttl=243 time=273.4 ms
64 bytes from vws1.southcom.com.au (203.60.16.17): icmp_seq=8 ttl=243 time=282.5 ms
64 bytes from vws1.southcom.com.au (203.60.16.17): icmp_seq=9 ttl=243 time=271.6 ms

--- www.ontas.com.au ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 271.6/280.0/294.1 ms

It takes a little longer (280 milliseconds on average), but this still looks pretty healthy, considering that all of my packets are successfully making it to Tasmania and back (from California) in about a quarter second.

How do I know my packets are going to Tasmania? Well, I don’t, technically. But it seems like a good guess, based on the output of another essential network debugging utility: traceroute. The traceroute command lets you traverse the route that your data follows between your machine and some other machine, sending three test packets to each router along the way. Let’s try it on http://www.ontas.com.au:

[jbc@andros jbc]$ traceroute www.ontas.com.au
traceroute to www.ontas.com.au (203.60.16.17), 30 hops max, 38 byte
 packets
 1  chancy-colocate.hq.cyberverse.net (209.151.233.1)  5.843 ms  3.555 ms  0.619 ms
 2  216.246.13.129 (216.246.13.129)  6.868 ms  7.195 ms  5.374 ms
 3  newDuke-bb.softaware.com (207.155.0.34)  5.414 ms  7.280 ms  7.684 ms
 4  aar1-serial6-1-1-0.Anaheim.cw.net (208.172.39.33)  8.603 ms  10.074 ms  8.562 ms
 5  acr2-loopback.Anaheim.cw.net (208.172.34.62)  7.325 ms  9.941 ms  8.773 ms
 6  optus-networks.Anaheim.cw.net (208.172.33.142)  241.478 ms  245.452 ms 
      241.621 ms
 7  POS4-0-0.rr2.optus.net.au (192.65.89.213)  241.389 ms  269.195 ms  251.966 ms
 8  GigEth3-0.sg2.optus.net.au (202.139.191.2)  243.714 ms  252.133 ms  240.937 ms
 9  POS2-0.mg1.optus.net.au (202.139.124.82)  254.781 ms  263.256 ms  258.124 ms
10  GigEth1-0-0.mb1.optus.net.au (202.139.188.4)  254.089 ms  255.718 ms  255.304 ms
11  202.139.130.94 (202.139.130.94)  271.595 ms  278.407 ms  273.950 ms
12  Ether2-2.fra-core1.hbt.southcom.com.au (203.31.212.161)  279.584 ms  284.519 ms
      295.090 ms
13  vws1.southcom.com.au (203.60.16.17)  280.207 ms  299.295 ms  293.986 ms

Reading down from the top of the traceroute command’s output, I see my packets go:

  • Through a router owned by Cyberverse, my ISP

  • Through a machine that doesn’t have a hostname, just an IP addresses (216.246.13.129)

  • Through the network of a company called Softaware (my ISP’s upstream provider on this route), with routers whose hostnames end in softaware.com

  • Through the network of Cable & Wireless (cw.net)

  • Through the network of Optus (an Australian ISP), via routers whose names end with optus.net.au.

  • To the network of a company called Southern Internet Services (at hosts whose names end with southcom.com.au), which has a web page describing the company as “Tasmania’s Premier ISP” (http://www.southcom.com.au/)

Now you know what good ping and traceroute results look like. What do bad results look like? Typically you’ll see longer round-trip times, perhaps greater than 1000 ms (that is, greater than 1 second). You’ll also probably see lost packets, which show up in the ping command’s output as missing numbers in the ICMP sequence and are summarized in the results printed at the end. With traceroute, lost packets show up as asterisks where the round-trip time for that test packet should be.

Another thing you might see in the results of a traceroute command is !H in place of a particular packet’s round-trip time; this stands for “host unreachable,” and is usually a sign of a fairly serious routing problem.

mtr

The traditional way you use ping and traceroute to troubleshoot a misbehaving TCP/IP connection is to first use traceroute to figure out where the packets are going, then systematically ping the hosts along the route to identify where the problem is. At some point a clever guy named Matt Kimball created a tool to carry out both of those steps simultaneously, naming the program mtr (for Matt's traceroute) (see http://www.bitwizard.nl/mtr/).

If the mtr utility is installed on your Unix server, you can run it by entering mtr followed by the name of the host you are interested in tracerouting and pinging:

[jbc@andros jbc]$ mtr www.ontas.com.au

When you do, your shell window will display a list of hosts (the same as that shown by the traceroute command) down the left side of the window, with the rest of the window taken up by constantly updating statistics on the results of repeatedly pinging each host. The longer you leave mtr running, the more data it will gather (see Figure 1-1). When you are done, type q to quit back to the shell prompt.

The mtr utility displaying network troubleshooting information

Figure 1-1. The mtr utility displaying network troubleshooting information

Get Perl for Web Site Management now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.