Learning by doing: Writing your own traceroute in 8 easy steps

Posted in Networking on July 29th, 2010 by Leonid Grinberg30 Comments

Anyone who administers even a moderately sized network knows that when problems arise, diagnosing and fixing them can be extremely difficult. They’re usually non-deterministic and difficult to reproduce, and very similar symptoms (e.g. a slow or unreliable connection) can be caused by any number of problems — congestion, a broken router, a bad physical link, etc.

One very useful weapon in a system administrator’s arsenal for dealing with network issues is traceroute (or tracert, if you use Windows). This is a neat little program that will print out the path that packets take to get from the local machine to a destination — that is, the sequence of routers that the packets go through.

Using traceroute is pretty straightforward. On a UNIX-like system, you can do something like the following:

    $ traceroute google.com
    traceroute to google.com (173.194.33.104), 30 hops max, 60 byte packets
     1  router.lan (192.168.1.1)  0.595 ms  1.276 ms  1.519 ms
     2  70.162.48.1 (70.162.48.1)  13.669 ms  17.583 ms  18.242 ms
     3  ge-2-20-ur01.cambridge.ma.boston.comcast.net (68.87.36.225)  18.710 ms  19.192 ms  19.640 ms
     4  be-51-ar01.needham.ma.boston.comcast.net (68.85.162.157)  20.642 ms  21.160 ms  21.571 ms
     5  pos-2-4-0-0-cr01.newyork.ny.ibone.comcast.net (68.86.90.61)  28.870 ms  29.788 ms  30.437 ms
     6  pos-0-3-0-0-pe01.111eighthave.ny.ibone.comcast.net (68.86.86.190)  30.911 ms  17.377 ms  15.442 ms
     7  as15169-3.111eighthave.ny.ibone.comcast.net (75.149.230.194)  40.081 ms  41.018 ms  39.229 ms
     8  72.14.238.232 (72.14.238.232)  20.139 ms  21.629 ms  20.965 ms
     9  216.239.48.24 (216.239.48.24)  25.771 ms  26.196 ms  26.633 ms
    10 173.194.33.104 (173.194.33.104)  23.856 ms  24.820 ms  27.722 ms

Pretty nifty. But how does it work? After all, when a packet leaves your network, you can’t monitor it anymore. So when it hits all those routers, the only way you can know about that is if one of them tells you about it.

The secret behind traceroute is a field called “Time To Live” (TTL) that is contained in the headers of the packets sent via the Internet Protocol. When a host receives a packet, it checks if the packet’s TTL is greater than 1 before sending it on down the chain. If it is, it decrements the field. Otherwise, it drops the packet and sends an ICMP TIME_EXCEEDED packet to the sender. This packet, like all IP packets, contains the address of its sender, i.e. the intermediate host.

traceroute works by sending consecutive requests to the same destination with increasing TTL fields. Most of these attempts result in messages from intermediate hosts saying that the packet was dropped. The IP addresses of these intermediate hosts are then printed on the screen (generally with an attempt made at determining the hostname) as they arrive, terminating when the maximum number of hosts have been hit (on my machine’s traceroute the default maximum is 30, but this is configurable), or when the intended destination has been reached.

The rest of this post will walk through implementing a very primitive version of traceroute in Python. The real traceroute is of course more complicated than what we will create, with many configurable features and modes. Still, our version will implement the basic functionality, and at the end, we’ll have a really nice and short Python script that will do just fine for performing a simple traceroute.

So let’s begin. Our algorithm, at a high level, is an infinite loop whose body creates a connection, prints out information about it, and then breaks out of the loop if a certain condition has been reached. So we can start with the following skeletal code:

    #!/usr/bin/python

    def main(dest):
        while True:
            # ... open connections ...
            # ... print data ...
            # ... break if useful ...
            pass

    if __name__ == "__main__":
        main('google.com')

Step 1: Turn a hostname into an IP address.

The socket module provides a gethostbyname() method that attempts to resolve a domain name into an IP address:

    #!/usr/bin/python

    import socket

    def main(dest_name):
        dest_addr = socket.gethostbyname(dest_name)
        while True:
            # ... open connections ...
            # ... print data ...
            # ... break if useful ...
            pass

    if __name__ == "__main__":
        main('google.com')

Step 2: Create sockets for the connections.

We’ll need two sockets for our connections — one for receiving data and one for sending. We have a lot of choices for what kind of probes to send; let’s use UDP probes, which require a datagram socket (SOCK_DGRAM). The routers along our traceroute path are going to send back ICMP packets, so for those we need a raw socket (SOCK_RAW).

    #!/usr/bin/python

    import socket

    def main(dest_name):
        dest_addr = socket.gethostbyname(dest_name)
        icmp = socket.getprotobyname('icmp')
        udp = socket.getprotobyname('udp')
        while True:
            recv_socket = socket.socket(socket.AF_INET, socket.SOCK_RAW, icmp)
            send_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, udp)
            # ... print data ...
            # ... break if useful ...

    if __name__ == "__main__":
        main('google.com')

Step 3: Set the TTL field on the packets.

We’ll simply use a counter which begins at 1 and which we increment with each iteration of the loop. We set the TTL using the setsockopt module of the socket object:

    #!/usr/bin/python

    import socket

    def main(dest_name):
        dest_addr = socket.gethostbyname(dest_name)
        icmp = socket.getprotobyname('icmp')
        udp = socket.getprotobyname('udp')
        ttl = 1
        while True:
            recv_socket = socket.socket(socket.AF_INET, socket.SOCK_RAW, icmp)
            send_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, udp)
            send_socket.setsockopt(socket.SOL_IP, socket.IP_TTL, ttl)

            ttl += 1
            # ... print data ...
            # ... break if useful ...

    if __name__ == "__main__":
        main('google.com')

Step 4: Bind the sockets and send some packets.

Now that our sockets are all set up, we can put them to work! We first tell the receiving socket to listen to connections from all hosts on a specific port (most implementations of traceroute use ports from 33434 to 33534 so we will use 33434 as a default). We do this using the bind() method of the receiving socket object, by specifying the port and an empty string for the hostname. We can then use the sendto() method of the sending socket object to send to the destination host (on the same port). The first argument of the sendto() method is the data to send; in our case, we don’t actually have anything specific we want to send, so we can just give the empty string:

    #!/usr/bin/python

    import socket

    def main(dest_name):
        dest_addr = socket.gethostbyname(dest_name)
        port = 33434
        icmp = socket.getprotobyname('icmp')
        udp = socket.getprotobyname('udp')
        ttl = 1
        while True:
            recv_socket = socket.socket(socket.AF_INET, socket.SOCK_RAW, icmp)
            send_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, udp)
            send_socket.setsockopt(socket.SOL_IP, socket.IP_TTL, ttl)
            recv_socket.bind(("", port))
            send_socket.sendto("", (dest_name, port))

            ttl += 1
            # ... print data ...
            # ... break if useful ...

    if __name__ == "__main__":
        main('google.com')

Step 5: Get the intermediate hosts’ IP addresses.

Next, we need to actually get our data from the receiving socket. For this, we can use the recvfrom() method of the object, whose return value is a tuple containing the packet data and the sender’s address. In our case, we only care about the latter. Note that the address is itself actually a tuple containing both the IP address and the port, but we only care about the former. recvfrom() takes a single argument, the blocksize to read — let’s go with 512.

It’s worth noting that some administrators disable receiving ICMP ECHO requests, pretty much specifically to prevent the use of utilities like traceroute, since the detailed layout of a network can be sensitive information (another common reason to disable them is the ping utility, which can be used for denial-of-service attacks). It is therefore completely possible that we’ll get a timeout error, which will result in an exception. Thus, we’ll wrap this call in a try/except block. Traditionally, traceroute prints asterisks when it can’t get the address of a host. We’ll do the same once we print out results.

    #!/usr/bin/python

    import socket

    def main(dest_name):
        dest_addr = socket.gethostbyname(dest_name)
        port = 33434
        icmp = socket.getprotobyname('icmp')
        udp = socket.getprotobyname('udp')
        ttl = 1
        while True:
            recv_socket = socket.socket(socket.AF_INET, socket.SOCK_RAW, icmp)
            send_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, udp)
            send_socket.setsockopt(socket.SOL_IP, socket.IP_TTL, ttl)
            recv_socket.bind(("", port))
            send_socket.sendto("", (dest_name, port))
            curr_addr = None
            try:
                _, curr_addr = recv_socket.recvfrom(512)
                curr_addr = curr_addr[0]
            except socket.error:
                pass
            finally:
                send_socket.close()
                recv_socket.close()

            ttl += 1
            # ... print data ...
            # ... break if useful ...

    if __name__ == "__main__":
        main('google.com')

Step 6: Turn the IP addresses into hostnames and print the data.

To match traceroute‘s behavior, we want to try to display the hostname along with the IP address. The socket module provides the gethostbyaddr() method for reverse DNS resolution. The resolution can fail and result in an exception, in which case we’ll want to catch it and make the hostname the same as the address. Once we get the hostname, we have all the information we need to print our data:

    #!/usr/bin/python

    import socket

    def main(dest_name):
        dest_addr = socket.gethostbyname(dest_name)
        port = 33434
        icmp = socket.getprotobyname('icmp')
        udp = socket.getprotobyname('udp')
        ttl = 1
        while True:
            recv_socket = socket.socket(socket.AF_INET, socket.SOCK_RAW, icmp)
            send_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, udp)
            send_socket.setsockopt(socket.SOL_IP, socket.IP_TTL, ttl)
            recv_socket.bind(("", port))
            send_socket.sendto("", (dest_name, port))
            curr_addr = None
            curr_name = None
            try:
                _, curr_addr = recv_socket.recvfrom(512)
                curr_addr = curr_addr[0]
                try:
                    curr_name = socket.gethostbyaddr(curr_addr)[0]
                except socket.error:
                    curr_name = curr_addr
            except socket.error:
                pass
            finally:
                send_socket.close()
                recv_socket.close()

            if curr_addr is not None:
                curr_host = "%s (%s)" % (curr_name, curr_addr)
            else:
                curr_host = "*"
            print "%d\t%s" % (ttl, curr_host)

            ttl += 1
            # ... break if useful ...

    if __name__ == "__main__":
        main('google.com')

Step 7: End the loop.

There are two conditions for exiting our loop — either we have reached our destination (that is, curr_addr is equal to dest_addr)1 or we have exceeded some maximum number of hops. We will set our maximum at 30:

    #!/usr/bin/python

    import socket

    def main(dest_name):
        dest_addr = socket.gethostbyname(dest_name)
        port = 33434
        max_hops = 30
        icmp = socket.getprotobyname('icmp')
        udp = socket.getprotobyname('udp')
        ttl = 1
        while True:
            recv_socket = socket.socket(socket.AF_INET, socket.SOCK_RAW, icmp)
            send_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, udp)
            send_socket.setsockopt(socket.SOL_IP, socket.IP_TTL, ttl)
            recv_socket.bind(("", port))
            send_socket.sendto("", (dest_name, port))
            curr_addr = None
            curr_name = None
            try:
                _, curr_addr = recv_socket.recvfrom(512)
                curr_addr = curr_addr[0]
                try:
                    curr_name = socket.gethostbyaddr(curr_addr)[0]
                except socket.error:
                    curr_name = curr_addr
            except socket.error:
                pass
            finally:
                send_socket.close()
                recv_socket.close()

            if curr_addr is not None:
                curr_host = "%s (%s)" % (curr_name, curr_addr)
            else:
                curr_host = "*"
            print "%d\t%s" % (ttl, curr_host)

            ttl += 1
            if curr_addr == dest_addr or ttl > max_hops:
                break

    if __name__ == "__main__":
        main('google.com')

Step 8: Run the code!

We’re done! Let’s save this to a file and run it! Because raw sockets require root privileges, traceroute is typically setuid. For our purposes, we can just run the script as root:

    $ sudo python poor-mans-traceroute.py
    [sudo] password for leonidg:
    1       router.lan (192.168.1.1)
    2       70.162.48.1 (70.162.48.1)
    3       ge-2-20-ur01.cambridge.ma.boston.comcast.net (68.87.36.225)
    4       be-51-ar01.needham.ma.boston.comcast.net (68.85.162.157)
    5       pos-2-4-0-0-cr01.newyork.ny.ibone.comcast.net (68.86.90.61)
    6       pos-0-3-0-0-pe01.111eighthave.ny.ibone.comcast.net (68.86.86.190)
    7       as15169-3.111eighthave.ny.ibone.comcast.net (75.149.230.194)
    8       72.14.238.232 (72.14.238.232)
    9       216.239.48.24 (216.239.48.24)
    10     173.194.33.104 (173.194.33.104)

Hurrah! The data matches the real traceroute‘s perfectly.

Of course, there are many improvements that we could make. As I mentioned, the real traceroute has a whole slew of other features, which you can learn about by reading the manpage. In the meantime, I wrote a slightly more complete version of the above code that allows configuring the port and max number of hops, as well as specifying the destination host. You can download it at my github repository.

Alright folks, What UNIX utility should we write next? strace, anyone? :-) 2

1 This is actually not quite how the real traceroute works. Rather than checking the IP addresses of the hosts and stopping when the destination address matches, it stops when it receives a ICMP “port unreachable” message, which means that the host has been reached. For our purposes, though, this simple address heuristic is good enough.

2 Ksplice blogger Nelson took up a DIY strace on his personal blog, Made of Bugs.

Fixing network problems? Don’t let reboots slow you down!

With Ksplice Uptrack, you can apply kernel patches without rebooting. Spend less time babysitting your updates and more time playing with network utilities!

Share :
  • Twitter
  • Reddit
  • Digg
  • Facebook
  • del.icio.us
  • StumbleUpon
  1. Great execution!

    I love the way you show the fow of coding, going from the design in clearly visible steps to the finished code.

    Showing only the new codeline is neat!

  2. Leonid Grinberg says:

    Thanks! I confess that I was inspired by my friend Evan (http://blog.ksplice.com/2010/07/building-filesystems-the-way-you-build-web-apps/) when writing this.

  3. Great article, very clear and well oranganized.

  4. Jing Jong says:

    I challenge you to write strace. You will find it some orders of magnitude more difficult.

  5. Andrew Otto says:

    Love it. I’ve always wondered how traceroute works. And yes yes please, I’d love to see strace explained like this as well. Thanks!

  6. Leonid Grinberg says:

    Thanks! I’m glad this was helpful/interesting.

    As for strace, Jing Jong is right — implementing it is definitely much harder. I’ll see if I can con another Ksplice blogger into doing it!

  7. There’s also tracepath which is similar to traceroute but doesn’t require root privs. Any idea how it works?

  8. Wu says:

    Hi, I’ve tried both poor-mans-traceroute and traceroute itself (as you have them in github) and none worked.

    I’ve tried in OSx, with both python 2.6 and 2.5:

    silverbullet:python wu$ python traceroute.py 212.89.9.184
    Traceback (most recent call last):
    File “traceroute.py”, line 71, in
    max_hops=int(options.max_hops)))
    File “traceroute.py”, line 24, in main
    recv_socket, send_socket = create_sockets(ttl)
    File “traceroute.py”, line 15, in create_sockets
    recv_socket = socket.socket(socket.AF_INET, socket.SOCK_RAW, icmp)
    File “/opt/local/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/socket.py”, line 159, in __init__
    _sock = _realsocket(family, type, proto)
    socket.error: (1, ‘Operation not permitted’)

    silverbullet:python wu$ python2.5 traceroute.py 212.89.9.184
    Traceback (most recent call last):
    File “traceroute.py”, line 71, in
    max_hops=int(options.max_hops)))
    File “traceroute.py”, line 24, in main
    recv_socket, send_socket = create_sockets(ttl)
    File “traceroute.py”, line 15, in create_sockets
    recv_socket = socket.socket(socket.AF_INET, socket.SOCK_RAW, icmp)
    File “/opt/local/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/socket.py”, line 159, in __init__
    _sock = _realsocket(family, type, proto)
    socket.error: (1, ‘Operation not permitted’)

    silverbullet:python wu$ python poor-mans-traceroute.py
    Traceback (most recent call last):
    File “poor-mans-traceroute.py”, line 44, in
    main(‘google.com’)
    File “poor-mans-traceroute.py”, line 13, in main
    recv_socket = socket.socket(socket.AF_INET, socket.SOCK_RAW, icmp)
    File “/opt/local/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/socket.py”, line 159, in __init__
    _sock = _realsocket(family, type, proto)
    socket.error: (1, ‘Operation not permitted’)

    And I’ve tried in FreeBSD 8 with python 2.6 too, same result:

    [nidhogg] ~> python traceroute.py google.com
    Traceback (most recent call last):
    File “traceroute.py”, line 71, in
    max_hops=int(options.max_hops)))
    File “traceroute.py”, line 24, in main
    recv_socket, send_socket = create_sockets(ttl)
    File “traceroute.py”, line 15, in create_sockets
    recv_socket = socket.socket(socket.AF_INET, socket.SOCK_RAW, icmp)
    File “/usr/local/lib/python2.6/socket.py”, line 182, in __init__
    _sock = _realsocket(family, type, proto)
    socket.error: [Errno 1] Operation not permitted

    Any idea? am I the only one with that problem?

  9. Wu says:

    Ups, forget the previous comment, I forgot to use sudo! :D

  10. steve says:

    Very nice, going to give that a try. Thanks for sharing

  11. Charles says:

    Why did you choose not to use ‘dest_addr’ in the send_socket.sendto function call? I knwo it was used as a loop termination condition, but if it was only used there, then there was no need to add that code until step 7.

    dest_addr = socket.gethostbyname(dest_name)

    send_socket.sendto(“”, (dest_name, port))

    Should be

    send_socket.sendto(“”, (dest_name, port))

    IIRC, Python will do a hostname lookup for each sendto call. Sure, it’ll cache, but the point remains.

  12. Charles says:

    Sorry…should be:

    send_socket.sendto(“”, (dest_addr, port))

  13. Dhruv says:

    Very nice work!
    And a very educational post!
    :)

  14. Dominic says:

    I don’t normally comment on blogs (can’t be bothered most of the time), but this was a nice, brief, yet interesting article not compounded by extraneous opinions or waffle. I especially loved the way you introduced new code.

    Loved it!

  15. Martin says:

    Brilliant! Great article. Also shows how powerful python’s “batteries included” design philosophy can be.

  16. Phil says:

    This is great article, both from the perspective of understanding network behavior, but also using Python’s raw socket calls. Thanks!

    I’ve made some modifications to this script so it behaves a little more like traceroute: I’ve added the default 5-second timeout for when certain hosts do not respond. This prevents it from getting stuck indefinitely on a particular hop. Like traceroute, this code also tries three times for each TTL (unless it gets a response), and it prints an asterisk for each failure.

    You can see the modified code here: http://gist.github.com/502451

  17. noobstar says:

    Traceback (most recent call last):
    File “C:\Users\admin\Desktop\python\script1.py”, line 44, in main(‘google.com’)
    File “C:\Users\admin\Desktop\python\script1.py”, line 17, in main send_socket.sendto(“”, dest_addr, port))
    TypeError: sendto() takes exactly 3 arguments (2 given)

    ——-

    Using python 3.1.2 and I am under administrator mode in command prompt. Is there a way around this?

  18. Leonid Grinberg says:

    noobstar: It’s hard to debug since I don’t see what you are running, but judging from:

    File “C:\Users\admin\Desktop\python\script1.py”, line 17, in main send_socket.sendto(“”, dest_addr, port))

    I am guessing that you want

    send_socket("", (dest_addr, port))

    instead (i.e. put parentheses around the last two arguments.

  19. noobstar says:

    Hi,

    Sorry I must’ve pasted it weirdly, but the parentheses are there.

    Traceback (most recent call last):
    File “C:\Users\admin\Desktop\python\script1.py”, line 44, in
    main(‘google.com’)
    File “C:\Users\admin\Desktop\python\script1.py”, line 17, in main
    send_socket.sendto(“”, (dest_addr,port))
    TypeError: sendto() takes exactly 3 arguments (2 given)

    —–

    After trying everything, I’ve copied and pasted the original source from the github repository. I am running Windows 7, Pythong 3.1.2, cmd prompt with administrator enabled. I’ve also tried putting an “extra” argument in the form of 0, “” and []. It gives me the same form of error but the last line says: TypeError: must be bytes or buffer, not str.

    If you need more details about my setup, just give another post.

  20. Ab Tiwary says:

    Wow, awesome article thanks. Brings back memories of networking class at uni (most of which i’ve gradually forgotten over the years :P )

    An article on poor man’s “whois” and “nmap” would be sweet.

    Thanks,
    Ab

  21. Jungle Jim says:

    noobstar: you need to install Python 2.71 for this code to work ….. else you need to modify the code to match the Python 3.1 syntax.

  22. papaioannoua says:

    Great article, very clear and well oranganized.

    Can you code it in C#?

  23. Avinash says:

    wow !!

    This is Pretty useful =)
    Thanks for sharing your article & code

  24. No discussion of traceroute would be complete without a mention of “Tracer T”: http://www.youtube.com/watch?v=SXmv8quf_xM

    Linux version: http://www.youtube.com/watch?v=6WHu1EM8CgY

  25. Adam Fisk says:

    Great post – I also love the highlighted new code approach.

    One interesting aspect of traceroute is the fact that the ICMP packets are allowed back in through your NAT/firewall. Curious huh? My buddy Samy Kamkar has exploited that for NAT traversal — check it out:

    http://samy.pl/pwnat/

  1. [...] Ksplice.com has put together a tutorial on how to write the traceroute/tracert application in Python in just 8 easy steps.  This is a good introduction on how the traceroute application works and socket programming in Python. [...]

  2. [...] Ksplice » Learning by doing: Writing your own traceroute in 8 easy steps – System administrat… (tags: python traceroute networking) Published: July 31, 2010 Filed Under: Delicious Leave a Comment Name: Required [...]

  3. [...] Ksplice » Learning by doing: Writing your own traceroute in 8 easy steps – System admini… [...]

Leave a Reply