| Rapid DHCP: Or, how do Macs get on the network so fast? | July 12, 2011 |
One of life's minor annoyances is having to wait on my devices to connect to the network after I wake them from sleep. All too often, I'll open the lid on my EeePC netbook, enter a web address, and get the dreaded "This webpage is not available" message because the machine is still working on connecting to my Wi-Fi network. On some occasions, I have to twiddle my thumbs for as long as 10-15 seconds before the network is ready to be used. The frustrating thing is that I know it doesn't have to be this way. I know this because I have a Mac. When I open the lid of my MacBook Pro, it connects to the network nearly instantaneously. In fact, no matter how fast I am, the network comes up before I can even try to load a web page. My curiosity got the better of me, and I set out to investigate how Macs are able to connect to the network so quickly, and how the network connect time in other operating systems could be improved.
I figure there are three main categories of time-consuming activities that occur during network initialization:
- Link establishment. This is the activity of establishing communication with the network's link layer. In the case of Wi-Fi, the radio must be powered on, the access point detected, and the optional encryption layer (e.g. WPA) established. After link establishment, the device is able to send and receive Ethernet frames on the network.
- Dynamic Host Configuration Protocol (DHCP). Through DHCP handshaking, the device negotiates an IP address for its use on the local IP network. A DHCP server is responsible for managing the IP addresses available for use on the network.
- Miscellaneous overhead. The operating system may perform any number of mundane tasks during the process of network initialization, including running scripts, looking up preconfigured network settings in a local database, launching programs, etc.
My investigation thus far is primarily concerned with the DHCP phase, although the other two categories would be interesting to study in the future. I set up a packet capture environment with a spare wireless access point, and observed the network activity of a number of devices as they initialized their network connection. For a worst-case scenario, let's look at the network activity captured while an Android tablet is connecting:
| Samsung Galaxy Tab 10.1 - "dhcpcd-5.2.10:Linux-2.6.36.3:armv7l:p3" | ||
| time (seconds) | direction | packet description |
| 00.0000 | out | LLC RNR (The link is now established.) |
| 01.1300 | out | DHCP request 192.168.1.17: The client requests its IP address on the previously connected network. |
| 05.6022 | out | DHCP request 192.168.1.17: The client again requests this IP address. |
| 11.0984 | out | DHCP discover: "Okay, I give up. Maybe this is a different network after all. Is there a DHCP server out there?" |
| 11.7189 | in | DHCP offer 192.168.4.20: The server offers an IP address to the client. |
| 11.7234 | out | DHCP request 192.168.4.20: The client accepts the offered IP address. |
| 11.7514 | in | DHCP ACK: The server acknowledges the client's acceptance of the IP address. |
This tablet, presumably in the interest of "optimization", is initially skipping the DHCP discovery phase and immediately requesting its previous IP address. The only problem is this is a different network, so the DHCP server ignores these requests. After about 4.5 seconds, the tablet stubbornly tries again to request its old IP address. After another 4.5 seconds, it resigns itself to starting from scratch, and performs the DHCP discovery needed to obtain an IP address on the new network. The process took a whopping 11.8 seconds to complete. (Note: This would have been faster if my DHCP server was configured to send NAKs properly—see my update below... -simmons, 2011-07-21)
In all fairness, this delay wouldn't be so bad if the device was connecting to the same network as it was previously using. However, notice that the tablet waits a full 1.13 seconds after link establishment to even think about starting the DHCP process. Engineering snappiness usually means finding lots of small opportunities to save a few milliseconds here and there, and someone definitely dropped the ball here.
In contrast, let's look at the packet dump from the machine with the lightning-fast network initialization, and see if we can uncover the magic that is happening under the hood:
| MacBook Pro - MacOS 10.6.8 | ||
| time (seconds) | direction | packet description |
| 00.0000 | out | LLC RNR (The link is now established.) |
| 00.0100 | out | ARP request broadcast who-has 169.254.76.19 (The client is validating its link-local address) |
| 00.0110 | out | ARP request unicast 00:22:75:45:e3:54 who-has 192.168.2.1 tell 192.168.2.56 |
| 00.0120 | out | ARP request unicast 4e:80:98:f0:35:e3 who-has 192.168.4.1 tell 192.168.4.25 |
| 00.0120 | in | ARP reply unicast from DHCP server: 192.168.4.1 is-at 4e:80:98:f0:35:e3 |
| 00.0130 | out | ARP request unicast 00:0d:b9:54:27:b3 who-has 192.168.1.1 tell 192.168.1.29 |
| 00.0140 | out | DHCP request 192.168.4.25 |
| 00.0180 | out | ARP broadcast who-has 192.168.4.25 tell 192.168.4.25 |
| 00.0210 | out | ARP broadcast who-has 169.254.255.255 tell 192.168.4.25 |
| 00.0290 | out | ARP broadcast who-has 192.168.4.1 tell 192.168.4.25 |
| 00.0290 | in | ARP reply unicast: 192.168.4.1 is-at 4e:80:98:f0:35:e3 |
| 00.0310 | out | UDP to router's port 192 (AirPort detection) This implies that the IP interface is now configured. |
| ... | ... | (More normal IP activity on the newly configured interface) |
| 01.2680 | out | DHCP request 192.168.4.25 |
| 01.3043 | in | DHCP ACK |
The key to understanding the magic is the first three unicast ARP requests. It looks like Mac OS remembers certain information about not only the last connected network, but the last several networks. In particular, it must at least persist the following tuple for each of these networks:
- The Ethernet address of the DHCP server
- The IP address of the DHCP server
- Its own IP address, as assigned by the DHCP server
During network initialization, the Mac transmits carefully crafted unicast ARP requests with this stored information. For each network in its memory, it attempts to send a request to the specific Ethernet address of the DHCP server for that network, in which it asks about the server's IP address, and requests that the server reply to the IP address which the Mac was formerly using on that network. Unless network hosts have been radically shuffled around, at most only one of these ARP requests will result in a response—the request corresponding to the current network, if the current network happens to be one of the remembered networks.
This network recognition technique allows the Mac to very rapidly discover if it is connected to a known network. If the network is recognized (and presumably if the Mac knows that the DHCP lease is still active), it immediately and presumptuously configures its IP interface with the address it knows is good for this network. (Well, it does perform a self-ARP for good measure, but doesn't seem to wait more than 13ms for a response.) The DHCP handshaking process begins in the background by sending a DHCP request for its assumed IP address, but the network interface is available for use during the handshaking process. If the network was not recognized, I assume the Mac would know to begin the DHCP discovery phase, instead of sending blind requests for a former IP address as the Galaxy Tab does.
The Mac's rapid network initialization can be credited to more than just the network recognition scheme. Judging by the use of ARP (which can be problematic to deal with in user-space) and the unusually regular transmission intervals (a reliable 1.0ms delay between each packet sent), I'm guessing that the Mac's DHCP client system is entirely implemented as tight kernel-mode code. The Mac began the IP interface initialization process a mere 10ms after link establishment, which is far faster than any other device I tested. Android devices such as the Galaxy Tab rely on the user-mode dhclient system (part of the dhcpcd package) dhcpcd program, which no doubt brings a lot of additional overhead such as loading the program, context switching, and perhaps even running scripts.
The next step for some daring kernel hacker is to implement a similarly aggressive DHCP client system in the Linux kernel, so that I can enjoy fast sign-on speeds on my Android tablet, Android phone, and Ubuntu netbook. There already exists a minimal DHCP client implementation in the Linux kernel, but it lacks certain features such as configuring the DNS nameservers. Perhaps it wouldn't be too much work to extend this code to support network recognition and interface with a user-mode daemon to handle such auxillary configuration information received via DHCP. If I ever get a few spare cycles, maybe I'll even take a stab at it.
Update, July 12th, 2011 1pm MT:
This post has been mentioned on Hacker News, and there's lots of lively discussion in the comments over there.
Some people have pointed out some disadvantages in putting a full-featured DHCP client in the kernel. I'm skeptical about putting the DHCP client in the kernel, myself. However, I didn't want to elaborate on that at 2:00am, since the post was getting way too lengthy as it was. If I had known it would be subject to such peer review, I might have been a bit more careful with my words. :)
The argument for putting the DHCP client in the kernel basically boils down to:
- Achieving speed is all about shaving a few milliseconds here and there, and you just can't launch a program, wait for it to dynamically link, load config files, etc., and get the 10ms response time that the Mac has. (10ms from link establishment to transmitting the first DHCP packet.) I'm told that the dhcpcd program is a persistent daemon, so maybe the launch overhead isn't there. But something is keeping Linux hosts from having a 10ms response time.
- Doing ARP tricks could be awkward in user-space. You'd need to use the raw socket interface for transmitting (which isn't a big deal), and you'd have to use something like the packet(7) interface to sniff incoming packets to observe the ARP replies. I haven't played around with the packet(7) interface, so I'm not sure what the pros and cons might be.
Neither of these are show-stoppers to an improved user-mode DHCP client, but that was my thinking at the time. Now, I think I would certainly start with a user-mode solution, since a carefully crafted daemon should be able to achieve comparable response time, and the arping(8) program doesn't seem to have any problem using packet(7) to send and receive ARP packets in user-space.
Update, July 13th, 2011 2:48am MT:
Thanks to M. MacFaden for pointing out in the comments that this scheme is basically an implementation of RFC 4436: Detecting Network Attachment in IPv4 (DNAv4), which was co-authored by an Apple employee.
Update, July 21th, 2011 1:20pm MT:
Thanks to Steinar H. Gunderson for pointing out in the comments that the DHCP server on my test network was incorrectly configured. Since I was using a mostly "out of the box" dhcpd configuration from Ubunbtu Linux, it wasn't set up to be authoritative by default, so it wasn't promptly sending NAKs in response to the Galaxy Tab's requests for an old IP address. After fixing the problem on the DHCP server, the Galaxy Tab's DHCP handshake happens quite a bit faster (although still 85 times slower than the Mac). Below is the revised chart of network activity for the Galaxy Tab:
| Samsung Galaxy Tab 10.1 (Revised) - "dhcpcd-5.2.10:Linux-2.6.36.3:armv7l:p3" | ||
| time (seconds) | direction | packet description |
| 00.0000 | out | LLC RNR (The link is now established.) |
| 01.1570 | out | DHCP request 192.168.1.17: The client requests its IP address on the previously connected network. |
| 01.1574 | in | DHCP NAK: The server declines to allow 192.168.1.17 on this network. |
| 02.2261 | out | DHCP discover |
| 02.5871 | in | DHCP offer 192.168.4.20: The server offers an IP address to the client. |
| 02.5951 | out | DHCP request 192.168.4.20: The client accepts the offered IP address. |
| 02.6198 | in | DHCP ACK: The server acknowledges the client's acceptance of the IP address. |
These times are more in line with what I see on most non-Mac devices on my non-test networks—about 2.5-3s in DHCP, plus a bit more time for link initialization and such—long enough that I frequently get a "no connection" error in my web browsers. We'll need to find ways to shave this down in emerging consumer electronics devices. Consumers are conditioned to think of PCs as "something you wait on," but expect non-PC network devices to behave more like light switches.
I've posted a summary of the discussion in another entry.
posted at 2011-07-12 02:23:17
by simmons
tags: networking dhcp mac
permalink
comments (45)


Posted by Corbin Simpson on July 12, 2011 at 11:33 AM MDT #
Posted by Dave on July 12, 2011 at 11:41 AM MDT #
Hi Corbin,
It's good that you bring up security considerations. I should definitely mull over the implications before banging out a mod_fastdhcp.ko module! I'm not knowledgeable about the various attack vectors with WEP and WPA-TKIP, but my first thought is that perhaps the problem rests with those schemes if they are susceptible to such packet replay weaknesses.
You're right -- dhclient and dhcpcd are two different programs. I glanced at an Ubuntu 10.04 machine, and saw that /sbin/dhclient was part of the dhcp3-client package, and must have seen "dhcpcd" somehow. That's what I get for posting at 2am. :)
Posted by David Simmons on July 12, 2011 at 11:48 AM MDT #
Posted by August Lilleaas on July 12, 2011 at 12:09 PM MDT #
Unfortunately, given the glee with which kernel support for IP autoconfiguration was mostly removed a number of years ago and declared to be a "userspace problem" (residual support notwithstanding), it seems to me rather unlikely that Linus would accept such a patch.
None of the things e.g. the Android or Linux systems do that take so long to bring the link back up are inherently unreasonable, but taken together do result in a significantly worse user experience.
Conversely, I have a bit of trouble accepting the Mac behaviour as inherently superior, because I feel that they've reduced delays so much that while it undoubtedly works Just Fine™ in the vast majority of cases where installed networking operates well, it's rather fragile in that it wouldn't take much—a loose socket, a bad cable, WiFi interference—to render it completely inoperable. (Note that I have no idea what the Mac might do in the face of network problems; I would hope it has some sort of sensible "slow and steady wins the race" fallback.)
While it may be difficult to get such precise, accurate timing in userspace, I think it still should be possible. There are various kinds of real-time or high-priority scheduling available in even a stock Linux kernel, and even sending packets at a 100x slower rate (every 100 ms instead of every 1) would still produce a massive speedup over current behaviour.
Posted by Ice Karma on July 12, 2011 at 12:12 PM MDT #
Posted by Eric on July 12, 2011 at 12:23 PM MDT #
Posted by Timur on July 12, 2011 at 12:56 PM MDT #
Posted by James on July 12, 2011 at 12:58 PM MDT #
Posted by fantazio on July 12, 2011 at 03:08 PM MDT #
Posted by M. MacFaden on July 12, 2011 at 03:24 PM MDT #
Posted by Stefan Arentz on July 12, 2011 at 04:11 PM MDT #
Posted by Lemur on July 12, 2011 at 05:05 PM MDT #
Posted by Junior on July 12, 2011 at 06:29 PM MDT #
Posted by Brandon on July 12, 2011 at 11:52 PM MDT #
Posted by G on July 13, 2011 at 01:35 AM MDT #
Posted by Shish on July 13, 2011 at 07:20 AM MDT #
Posted by Tom Limoncelli on July 13, 2011 at 08:10 AM MDT #
Posted by Toni Viemerö on July 13, 2011 at 09:08 AM MDT #
Posted by Ty Miles on July 13, 2011 at 11:11 AM MDT #
Posted by Philippe Gauthier on July 13, 2011 at 01:07 PM MDT #
Posted by mtz on July 13, 2011 at 01:18 PM MDT #
Posted by Small Thoughts on July 13, 2011 at 01:47 PM MDT #
Posted by Anon on July 13, 2011 at 02:45 PM MDT #
Note that there are *not* any IPR disclosures for it; see this.
Posted by Mark Nottingham on July 14, 2011 at 12:28 AM MDT #
Posted by Tyrel on July 14, 2011 at 09:08 AM MDT #
Posted by David Simmons on July 14, 2011 at 02:29 PM MDT #
Posted by Tyrel on July 14, 2011 at 02:49 PM MDT #
Posted by Steinar H. Gunderson on July 21, 2011 at 05:05 AM MDT #
Posted by David Simmons on July 21, 2011 at 01:26 PM MDT #
Posted by Quora on July 28, 2011 at 01:56 PM MDT #
Posted by UFies.org on August 04, 2011 at 12:18 PM MDT #
Posted by UtterlyBoring.com on August 16, 2011 at 12:03 AM MDT #
Posted by Jimmy Wong on August 31, 2011 at 09:58 PM MDT #
Posted by Nelson Minar on September 10, 2011 at 12:11 PM MDT #
Posted by David Simmons on September 12, 2011 at 11:42 AM MDT #
Posted by Tom Murphy on December 21, 2011 at 09:52 AM MST #
Posted by Bill Gould on May 11, 2012 at 10:27 AM MDT #
Posted by David Moffatt on May 25, 2012 at 06:13 PM MDT #
Posted by Pearltrees on July 12, 2012 at 07:10 AM MDT #
Posted by Quora on August 01, 2012 at 01:21 PM MDT #
Posted by Kobi on April 04, 2013 at 09:44 PM MDT #
Now i know why my macbook always attempted (for far longer than the 1 second implied above that it should take) to reuse its old IP when I wanted to forcibly change it in DHCP to a different IP.
What you call a feature, annoys the hell out of me.
A DHCP client should ALWAYS attempt to ask for an IP instead of presuming the old one should work if an ARP reply isn't returned, wrongly using an IP that will be NAKd soon...
Gah.
Posted by Mike H on June 28, 2013 at 03:56 AM MDT #
Posted by Chester T Field on April 02, 2014 at 02:10 PM MDT #
Posted by 80.203.120.72 on December 23, 2014 at 08:30 PM MST #
Posted by Isil on May 05, 2015 at 02:54 PM MDT #