One of life's minor annoyances is having to wait on my devices to connect to the network after I wake them from sleep. All too often, I'll open the lid on my EeePC netbook, enter a web address, and get the dreaded "This webpage is not available" message because the machine is still working on connecting to my Wi-Fi network. On some occasions, I have to twiddle my thumbs for as long as 10-15 seconds before the network is ready to be used. The frustrating thing is that I know it doesn't have to be this way. I know this because I have a Mac. When I open the lid of my MacBook Pro, it connects to the network nearly instantaneously. In fact, no matter how fast I am, the network comes up before I can even try to load a web page. My curiosity got the better of me, and I set out to investigate how Macs are able to connect to the network so quickly, and how the network connect time in other operating systems could be improved.
I figure there are three main categories of time-consuming activities that occur during network initialization:
- Link establishment. This is the activity of establishing communication with the network's link layer. In the case of Wi-Fi, the radio must be powered on, the access point detected, and the optional encryption layer (e.g. WPA) established. After link establishment, the device is able to send and receive Ethernet frames on the network.
- Dynamic Host Configuration Protocol (DHCP). Through DHCP handshaking, the device negotiates an IP address for its use on the local IP network. A DHCP server is responsible for managing the IP addresses available for use on the network.
- Miscellaneous overhead. The operating system may perform any number of mundane tasks during the process of network initialization, including running scripts, looking up preconfigured network settings in a local database, launching programs, etc.
My investigation thus far is primarily concerned with the DHCP phase, although the other two categories would be interesting to study in the future. I set up a packet capture environment with a spare wireless access point, and observed the network activity of a number of devices as they initialized their network connection. For a worst-case scenario, let's look at the network activity captured while an Android tablet is connecting:
Samsung Galaxy Tab 10.1 - "dhcpcd-5.2.10:Linux-2.6.36.3:armv7l:p3" | ||
time (seconds) | direction | packet description |
00.0000 | out | LLC RNR (The link is now established.) |
01.1300 | out | DHCP request 192.168.1.17: The client requests its IP address on the previously connected network. |
05.6022 | out | DHCP request 192.168.1.17: The client again requests this IP address. |
11.0984 | out | DHCP discover: "Okay, I give up. Maybe this is a different network after all. Is there a DHCP server out there?" |
11.7189 | in | DHCP offer 192.168.4.20: The server offers an IP address to the client. |
11.7234 | out | DHCP request 192.168.4.20: The client accepts the offered IP address. |
11.7514 | in | DHCP ACK: The server acknowledges the client's acceptance of the IP address. |
This tablet, presumably in the interest of "optimization", is initially skipping the DHCP discovery phase and immediately requesting its previous IP address. The only problem is this is a different network, so the DHCP server ignores these requests. After about 4.5 seconds, the tablet stubbornly tries again to request its old IP address. After another 4.5 seconds, it resigns itself to starting from scratch, and performs the DHCP discovery needed to obtain an IP address on the new network. The process took a whopping 11.8 seconds to complete. (Note: This would have been faster if my DHCP server was configured to send NAKs properly—see my update below... -simmons, 2011-07-21)
In all fairness, this delay wouldn't be so bad if the device was connecting to the same network as it was previously using. However, notice that the tablet waits a full 1.13 seconds after link establishment to even think about starting the DHCP process. Engineering snappiness usually means finding lots of small opportunities to save a few milliseconds here and there, and someone definitely dropped the ball here.
In contrast, let's look at the packet dump from the machine with the lightning-fast network initialization, and see if we can uncover the magic that is happening under the hood:
MacBook Pro - MacOS 10.6.8 | ||
time (seconds) | direction | packet description |
00.0000 | out | LLC RNR (The link is now established.) |
00.0100 | out | ARP request broadcast who-has 169.254.76.19 (The client is validating its link-local address) |
00.0110 | out | ARP request unicast 00:22:75:45:e3:54 who-has 192.168.2.1 tell 192.168.2.56 |
00.0120 | out | ARP request unicast 4e:80:98:f0:35:e3 who-has 192.168.4.1 tell 192.168.4.25 |
00.0120 | in | ARP reply unicast from DHCP server: 192.168.4.1 is-at 4e:80:98:f0:35:e3 |
00.0130 | out | ARP request unicast 00:0d:b9:54:27:b3 who-has 192.168.1.1 tell 192.168.1.29 |
00.0140 | out | DHCP request 192.168.4.25 |
00.0180 | out | ARP broadcast who-has 192.168.4.25 tell 192.168.4.25 |
00.0210 | out | ARP broadcast who-has 169.254.255.255 tell 192.168.4.25 |
00.0290 | out | ARP broadcast who-has 192.168.4.1 tell 192.168.4.25 |
00.0290 | in | ARP reply unicast: 192.168.4.1 is-at 4e:80:98:f0:35:e3 |
00.0310 | out | UDP to router's port 192 (AirPort detection) This implies that the IP interface is now configured. |
... | ... | (More normal IP activity on the newly configured interface) |
01.2680 | out | DHCP request 192.168.4.25 |
01.3043 | in | DHCP ACK |
The key to understanding the magic is the first three unicast ARP requests. It looks like Mac OS remembers certain information about not only the last connected network, but the last several networks. In particular, it must at least persist the following tuple for each of these networks:
- The Ethernet address of the DHCP server
- The IP address of the DHCP server
- Its own IP address, as assigned by the DHCP server
During network initialization, the Mac transmits carefully crafted unicast ARP requests with this stored information. For each network in its memory, it attempts to send a request to the specific Ethernet address of the DHCP server for that network, in which it asks about the server's IP address, and requests that the server reply to the IP address which the Mac was formerly using on that network. Unless network hosts have been radically shuffled around, at most only one of these ARP requests will result in a response—the request corresponding to the current network, if the current network happens to be one of the remembered networks.
This network recognition technique allows the Mac to very rapidly discover if it is connected to a known network. If the network is recognized (and presumably if the Mac knows that the DHCP lease is still active), it immediately and presumptuously configures its IP interface with the address it knows is good for this network. (Well, it does perform a self-ARP for good measure, but doesn't seem to wait more than 13ms for a response.) The DHCP handshaking process begins in the background by sending a DHCP request for its assumed IP address, but the network interface is available for use during the handshaking process. If the network was not recognized, I assume the Mac would know to begin the DHCP discovery phase, instead of sending blind requests for a former IP address as the Galaxy Tab does.
The Mac's rapid network initialization can be credited to more than just the network recognition scheme. Judging by the use of ARP (which can be problematic to deal with in user-space) and the unusually regular transmission intervals (a reliable 1.0ms delay between each packet sent), I'm guessing that the Mac's DHCP client system is entirely implemented as tight kernel-mode code. The Mac began the IP interface initialization process a mere 10ms after link establishment, which is far faster than any other device I tested. Android devices such as the Galaxy Tab rely on the user-mode dhclient system (part of the dhcpcd package) dhcpcd program, which no doubt brings a lot of additional overhead such as loading the program, context switching, and perhaps even running scripts.
The next step for some daring kernel hacker is to implement a similarly aggressive DHCP client system in the Linux kernel, so that I can enjoy fast sign-on speeds on my Android tablet, Android phone, and Ubuntu netbook. There already exists a minimal DHCP client implementation in the Linux kernel, but it lacks certain features such as configuring the DNS nameservers. Perhaps it wouldn't be too much work to extend this code to support network recognition and interface with a user-mode daemon to handle such auxillary configuration information received via DHCP. If I ever get a few spare cycles, maybe I'll even take a stab at it.
Update, July 12th, 2011 1pm MT:
This post has been mentioned on Hacker News, and there's lots of lively discussion in the comments over there.
Some people have pointed out some disadvantages in putting a full-featured DHCP client in the kernel. I'm skeptical about putting the DHCP client in the kernel, myself. However, I didn't want to elaborate on that at 2:00am, since the post was getting way too lengthy as it was. If I had known it would be subject to such peer review, I might have been a bit more careful with my words. :)
The argument for putting the DHCP client in the kernel basically boils down to:
- Achieving speed is all about shaving a few milliseconds here and there, and you just can't launch a program, wait for it to dynamically link, load config files, etc., and get the 10ms response time that the Mac has. (10ms from link establishment to transmitting the first DHCP packet.) I'm told that the dhcpcd program is a persistent daemon, so maybe the launch overhead isn't there. But something is keeping Linux hosts from having a 10ms response time.
- Doing ARP tricks could be awkward in user-space. You'd need to use the raw socket interface for transmitting (which isn't a big deal), and you'd have to use something like the packet(7) interface to sniff incoming packets to observe the ARP replies. I haven't played around with the packet(7) interface, so I'm not sure what the pros and cons might be.
Neither of these are show-stoppers to an improved user-mode DHCP client, but that was my thinking at the time. Now, I think I would certainly start with a user-mode solution, since a carefully crafted daemon should be able to achieve comparable response time, and the arping(8) program doesn't seem to have any problem using packet(7) to send and receive ARP packets in user-space.
Update, July 13th, 2011 2:48am MT:
Thanks to M. MacFaden for pointing out in the comments that this scheme is basically an implementation of RFC 4436: Detecting Network Attachment in IPv4 (DNAv4), which was co-authored by an Apple employee.
Update, July 21th, 2011 1:20pm MT:
Thanks to Steinar H. Gunderson for pointing out in the comments that the DHCP server on my test network was incorrectly configured. Since I was using a mostly "out of the box" dhcpd configuration from Ubunbtu Linux, it wasn't set up to be authoritative by default, so it wasn't promptly sending NAKs in response to the Galaxy Tab's requests for an old IP address. After fixing the problem on the DHCP server, the Galaxy Tab's DHCP handshake happens quite a bit faster (although still 85 times slower than the Mac). Below is the revised chart of network activity for the Galaxy Tab:
Samsung Galaxy Tab 10.1 (Revised) - "dhcpcd-5.2.10:Linux-2.6.36.3:armv7l:p3" | ||
time (seconds) | direction | packet description |
00.0000 | out | LLC RNR (The link is now established.) |
01.1570 | out | DHCP request 192.168.1.17: The client requests its IP address on the previously connected network. |
01.1574 | in | DHCP NAK: The server declines to allow 192.168.1.17 on this network. |
02.2261 | out | DHCP discover |
02.5871 | in | DHCP offer 192.168.4.20: The server offers an IP address to the client. |
02.5951 | out | DHCP request 192.168.4.20: The client accepts the offered IP address. |
02.6198 | in | DHCP ACK: The server acknowledges the client's acceptance of the IP address. |
These times are more in line with what I see on most non-Mac devices on my non-test networks—about 2.5-3s in DHCP, plus a bit more time for link initialization and such—long enough that I frequently get a "no connection" error in my web browsers. We'll need to find ways to shave this down in emerging consumer electronics devices. Consumers are conditioned to think of PCs as "something you wait on," but expect non-PC network devices to behave more like light switches.
I've posted a summary of the discussion in another entry.
posted at 2011-07-12 02:23:17 US/Mountain
by David Simmons
tags: mac dhcp networking
permalink
comments