Caffeinated Bitstream

Bits, bytes, and words.

Posts

Hanging out on the job: Using Google Hangouts for collaborative telepresence

My work and telepresence setup.

As a work-from-home software engineer, I'm always looking for ways to improve communication with co-workers and clients to help bridge the distance gap. At the beginning of October, a colleague and I decided to devote the month to an extreme collaboration experiment we called Maker's Month. We had been using Google Hangouts for meetings with great effectiveness, so we asked ourselves: Why not leave a hangout running all day, to provide the illusion of working in the same room? To that end, we decided to take our two offices -- separated spatially by 1,000 miles -- and merge them into one with the miracle of modern telecommunications.

We began by establishing some work parameters: We would have a meeting every morning to discuss the goals of the day, then mute our microphones for most of the next 6 to 7 "core office hours" while the hangout was left running. During the day we could see each other working, ask questions, engage in impromptu integration sessions, and generally pretend like we were working under the same roof. At the end of the day, we would have another meeting to discuss our accomplishments, adjust the project schedule, and set goals for the following day. We would then adjourn the hangout and work independently in "offline" mode.

There were a handful of questions we were hoping to answer during the course of this experiment:

  • How much bandwidth would this telepresence cost, in terms of both instantaneous bitrate and total data usage?
  • What audio/video gear would give us the best experience, and help avoid the usual trouble areas? (Ad-hoc conferencing setups are notorious for annoying glitches such as remote echo.)
  • Would Google even allow us to keep such long-duration hangouts running, or to use such a large number of hangout-hours in a month? (Unlike peer-to-peer protocols such as RTP/WebRTC/etc., hangout media streams are actually switched in the cloud and consume the CPU/bandwidth resources of Google.)
  • Do extended telepresence sessions provide real value to software development teams?

While Google Hangouts supports up to nine people in a hangout, our experiment only involved two people. (Our initial plans to bring a third team member into the hangout never materialized.)

Technical details

This wouldn't be a proper Caffeinated Bitstream post without some graphs and figures, so here are some charts showing the overall bandwidth usage:

The first chart shows the bandwidth usage of a typical two-person hangout session, which uses about 750-1000 kbps in each direction (when the connection settings are configured for "fast connection"). The aberrations in the chart are due to changing hangout parameters (i.e. screen sharing instead of video, or the remote party dropping off.) The second chart shows the bandwidth usage for my house during the month of October. The hangout sessions are likely the bulk of this usage, but it also includes occasional movie streaming, Ubuntu downloads, software updates, and such. I sometimes hear people comment that the bandwidth caps imposed by some internet service providers can't be exceeded by legitimate use of the network, but I can easily imagine many telepresence scenarios that would quite legitimately push users over the limit. Fortunately, our usage is fairly modest, and my provider doesn't impose caps, anyway.

My hangout hardware consists of:

  • A desktop computer with a quad-core Core i7 920 2.67Ghz processor and 8GB of RAM, running Ubuntu Linux
  • A dedicated LCD monitor
  • A Logitech HD Pro Webcam C910
  • A Blue Yeti microphone
  • A stereo system with good speakers, for audio output.

I've occasionally run Google Hangouts on my mid-2010 MacBook Pro, but the high CPU usage eventually revs up the fan to an annoying degree. The desktop computer doesn't seem to noticeably increase its fan noise, although I do have it tucked away in a corner. I've found that having a dedicated screen for the hangout really helps the telepresence illusion. The Yeti microphone is awesome, but the C910's built-in microphone is also surprisingly great. In fact, my colleague can't tell much of a difference between the two. I've noticed that the use of some other (perhaps sub-standard) microphones seems to thwart the echo cancellation built-in to Google Hangouts, resulting in echo that makes it almost impossible to carry on a conversation.

In addition to its thirst for bandwidth, Google Hangouts also demands a hefty chunk of processor time (and thus, power usage) on my equipment:

systemcpu usagequiescent powerhangout powerhangout power increase
4-core Core i7 920 2.67Ghz desktop62%75W80W5W
2-core Core i7 2.66Ghz mid-2010 MacBook Pro77%13W38W25W

(Note: CPU usage is measured such that full usage of a single core is 100%. The usage is the sum of various processes related to delivering the hangout experience. On Linux: GoogleTalkPlugin, pulseaudio, chrome, compiz, Xorg. On Mac: GoogleTalkPlugin, Google Chrome Helper, Google Chrome, WindowServer, VDCAssistant. Power was measured with an inline Kill A Watt meter.)

I figure that using my desktop machine for daily hangouts has a marginal electrical cost of around $0.06/month. (Although keeping this desktop running without suspending it is probably costing me around $4.74/month.) Changing the hangout settings to "slow connection" roughly reduces the CPU usage by half.

Why does Google Hangouts use so much CPU and bandwidth? I think it all comes down to the use of H.264 Scalable Video Coding (SVC), a bitrate peeling scheme where the video encoder actually produces multiple compressed video streams at different bitrates. The higher-bitrate streams are encoded relative to information in the lower-bitrate streams, so the total required bitrate is fortunately much less than the sum of otherwise independent streams, but it is higher than a single stream. The "video switch in the cloud" operated by Google (or perhaps Vidyo, the provider of the underlying video technology) can determine the bandwidth capacity of the other parties and peel away the high-bitrate layers if necessary. Unfortunately, not only does SVC somewhat increase the bandwidth requirements, but it also means that the Google Talk Plugin cannot leverage any standard H.264 hardware encoders that may be present on the user's computer. Thus, a software encoder is used and the CPU usage is high. The design decision to use SVC probably pays off when three people or more are using a hangout.

One downside to using Google Hangouts for extended telepresence sessions is the periodic "Are you still there?" prompt, which seems to appear roughly every 2.5 hours. If you don't answer in the affirmative, you will be dropped from the hangout after a few minutes. Sometimes when I've stepped out of the office for coffee, I'll miss the prompt and get disconnected. I understand why Google does this, though, and reconnecting to the same hangout is pretty easy. Even with our excessive use of Google Hangouts, we haven't encountered any other limits to the service.

Telepresence effectiveness

Video conferencing has always offered some obvious communication advantages, and Google Hangouts is no exception. The experience is much better than talking on the phone, as body language can really help convey meaning. In many ways, it does help close the distance gap and simulate being in the same room: team members can show artifacts (such as devices and mobile phone apps) and see at a glance if other team members are present, absent, working hard on a problem, or perhaps available for interruption. We made heavy use of the screen sharing feature, and even took advantage of the shared YouTube viewing on several occasions. We didn't engage in pair programming in this experiment, although remote pair programming is not unheard of. The biggest benefit of telepresence for geographically distributed teams seems to be keeping team members focused and engaged, as being able to see other team members working can be a source of motivation.

For me, the biggest downside to frequent use of Google Hangouts is the "stream litter" problem: Every hangout event appears in your Google+ stream forever, unless you manually delete it. While it's only visible to the hangout participants, it's really annoying to have to sift through a hundred hangout events while I'm looking for an unrelated post in my Google+ stream. Also, it's sometimes awkward when I want to share the screen from my work computer while using a different computer for the hangout. I end up joining the hangout a second time from my work computer, only to have nasty audio feedback ensue until I mute the microphone and speaker.

Conclusions

I think that using Google Hangouts for extended work sessions adds a lot of value, and I'll continue to use it. It would be interesting to try other video conferencing solutions to see how they compare.

For the impatient people who just scrolled down to "Conclusions" right away, here's the tl;dr:

Pros:
  • Continuous visual of other team members increases the opportunities for impromptu discussions and helps motivation.
  • The "same room" illusion helps close the distance gap associated with telework.
  • Good quality audio and video.
  • Easily accessible from GMail or Google+.
  • Screen sharing.
  • Shared YouTube viewing.
Cons:
  • Relatively high (but manageable) bandwidth and CPU requirements.
  • Google+ stream littered with hangout events.
  • 2.5-hour "Are you still there?" prompt.
  • When eating doughnuts in front of team members, can't offer some for everyone.
A quick survey of C++11 feature support

I recently conducted a quick-and-dirty survey of C++11 (formerly known as C++0x) features available on various platforms and compilers that I had lying around. My testing was not authoritative nor rigorous. (For example, g++ without -std=c++0x actually compiles lambdas without throwing an error, so I marked it as supported even though it does give a stern warning.) I'm posting the results here, mostly for my own future reference.

Mac OS 10.6 / Xcode 4.2
gcc version 4.2.1
Apple clang version 3.0
Ubuntu 12.04
gcc version 4.6.3
Ubuntu clang version 3.0-6ubuntu3
Windows 7
MSVC++ 2010
g++ clang++ clang++ -std=c++0x g++ g++ -std=c++0x clang++ clang++ -std=c++0x cl.exe /clr
__cplusplus 1L 1L 201103L 1L 1L 1L 201103L 199711L
__GXX_EXPERIMENTAL_CXX0X__ undef undef 1 undef 1 undef 1 undef
omit space in nested template ">>" X X X X
std::tr1::shared_ptr X X X X X X X
std::shared_ptr X X X
nullptr X X X X
auto X X X X X
uniform initialization X
for range (foreach) X X X X X
move semantics (std::move) X X
raw string literals X X
encoded string literals X X
noexcept X X X
constexpr X X X
variadic templates X X X X X X
lambdas X X X
decltype X X X X
new function declaration style X X X X
scoped enums X X X X
std::function X X X
std::tr1::function X X X X X X X
can autodetect need for std::tr1 X X X X X X X X

Other, probably more thorough information about C++11 feature support:

My quick-and-dirty test suite is available for download.

UPDATE 2013-05-27: More recent platforms and compilers, below...

Mac OS 10.8 / Xcode 4.6.2
gcc version 4.2.1
Apple clang version 3.3
Ubuntu 13.04
gcc version 4.7.3
Ubuntu clang version 3.2-1~exp9ubuntu1
clang++ clang++ -std=c++11 g++ g++ -std=c++11 clang++ clang++ -std=c++11
__cplusplus 199711L 201103L 199711L 201103L 199711L 201103L
__GXX_EXPERIMENTAL_CXX0X__ undef 1 undef 1 undef 1
omit space in nested template ">>" X X X
std::tr1::shared_ptr X X X X X X
std::shared_ptr X X
nullptr X X X
auto X X X X X
uniform initialization X X
for range (foreach) X X X X X
move semantics (std::move) X X
raw string literals X X X
encoded string literals X X X
noexcept X X X
constexpr X X X
variadic templates X X X X X X
lambdas X X X X
decltype X X X
new function declaration style X X X
scoped enums X X X
std::function X X
std::tr1::function X X X X X X
can autodetect need for std::tr1 X X X X X
Nest Learning Thermostat: Installation, battery issues, and the importance of the "C" wire

My furnace's control board. The "C" terminal has no connection to the thermostat in this picture. (The white wire on the C terminal goes to the A/C.) I connected the unused blue wire (bottom center) to the C terminal.

The Nest now confirms the active "C" wire.

I recently bought and installed a Nest Learning Thermostat to replace my old non-networked thermostat. I show the installation, demonstrate control from mobile devices, and provide a general review in the above video.

It's been about a month since I installed the device, and I found one important issue yesterday. My Nest dropped off the network for 7 hours, and upon investigation I discovered that the battery was low and it turned off the Wi-Fi radio to save power. Many other people have reported problems with the battery, which is scary because your thermostat is one device that you absolutely want to work 24/7 -- you don't want your pipes freezing when you leave town and the Nest decides to run out of juice!

It turns out that my thermostat wiring, like in many homes, does not provide a "C" wire (common 24VAC) for completing a circuit that provides constant power to the unit. This sort of wiring worked great for old-fashioned mercury thermostats -- it provides a red 24VAC power wire, and "call" wires for turning on the fan, heat, and air conditioning. When the thermostat needs to turn on one of those appliances, it simply closes the circuit between the red wire and the relevant call wire. Smart thermostats rely on batteries to power their smartness when no circuit is closed. When an appliance is running (i.e. one of those three circuits is closed), it can perform "power stealing" to sap power from the closed circuit for its operation and recharging the battery. For simple programmable thermostats, power stealing is probably sufficient. However, for a power-hungry device like the Nest that needs to operate a Wi-Fi radio, this mode of operation can be problematic for several reasons:

  1. If you live in a nice place like Colorado where you can open the windows and go days without using the heater or air conditioner, the control circuits are never closed and the Nest's battery doesn't have an opportunity to recharge.
  2. Power stealing is an imperfect backwards compatibility hack, and can't necessarily provide enough current to recharge the battery even when the appliances are operating. This is because the current may be limited by resistance in your furnace's control board.
  3. When the HVAC appliances are not running and the battery needs to be charged, the Nest performs an even worse hack than power stealing: it pulses the heater call circuit on and off very quickly to steal some power, and hopes that the pulses are short enough to keep the furnace from activating. I haven't noticed any problem with this, but at least one person has found that this wrecks havoc on their heater.
  4. The Nest uses a "Power Saving Mode" of Wi-Fi to reduce the power consumption of the radio and prolong the battery life. (And hopefully require less overall power than it can steal from the call circuits.) Nest indicates that some non-conformant wireless access points may not fully support this mode, thus causing the Nest to consume more power. (Perhaps more quickly than it can be replenished.)

I was lucky that my thermostat wiring contained an extra, unused (blue) wire, and my furnace's control board provided a 24VAC common terminal for a "C" wire. After hooking up the blue wire at the furnace and the Nest's base, I now seem to have successfully provided a 24VAC "C" wire to the Nest, and hopefully my battery issues are behind me.

I do think that Nest is perhaps overly optimistic about their power stealing and circuit pulsing being able to provide adequate power to the device. There's certainly no warning about this potential issue when you provide your wiring information to their online compatibility tool.

References

A Technical Look at Google Fiber

While visiting Kansas City recently, I decided to investigate Google Fiber, Google's ambitious new residential gigabit Internet service they are building in Kansas City, Kansas, and central Kansas City, Missouri. While they haven't connected residential customers to the network yet, they have provisioned service at several local businesses. They also opened a showroom called "Fiber Space" to demonstrate the service to potential customers.

My first stop was the Mud Pie Vegan Bakery and Coffeehouse, a neat coffee house in a historic area of Midtown Kansas City. Mud Pie has the Google Fiber hookup, which customers can use via Wi-Fi or the ethernet-attached Chromebooks which Google has provided. I tried to convince the barista to let me borrow the ethernet connection from a Chromebook so I could test the fast path, but he declined due to Google not wanting people to interfere with their hardware in such a way. However, I found I was able to accomplish most of my investigation goals using a combination of my laptop on Wi-Fi and the wired Chromebooks. I ended up hanging out at Mud Pie for several hours, running tests and chatting with the barista and customers.

Four blocks south of Mud Pie, Google has set up a showroom for Google Fiber called "Fiber Space." It's a very consumer-oriented experience aimed at selling the service to locals. Many Google Fiber employees are on hand to show people what hardware they'll need, and demonstrate the Internet and TV services in virtual living rooms. The "car roller coaster" set from the Google Fiber promotional video and free snacks were also on hand. In addition to the wired Chromebooks on display, people can bring their laptops to try out Google Fiber via the Wi-Fi. However, an employee told me that they didn't allow hooking up to the wire, citing a concern about piracy or illegal activities or some such. (Which sounds like a pretty weak excuse to me.)

Speed Tests

I don't always download big files. But when I do, I download half a gigabyte of pseudorandom bytes generated by /dev/urandom.

Naturally, the first thing people want to know about Google Fiber is how fast is it, really? Unfortunately, it's difficult to reliably measure the practical speed of the service due to the many other bottlenecks that exist once you remove the bottleneck of the last mile. Also, since others have performed plenty of speed tests, I decided to focus more on other characteristics of the network. However, I did run a few throughput tests for good measure.

Here is the result from speedtest.net, running on the wired Chromebook:

I tried running the test against servers in other locales, but the default Palo Alto server delivered the best result. I don't think these tests are great measures of throughput for such high speeds, since not only might the test servers be bottlenecked, but they may not run the tests long enough for the TCP window size to ramp up to the connection's true capacity.

A slightly better test was to download very large files full of random data from various cloud servers:

data centerfile sizetimerate
Forethought.net (Denver)100MB8 seconds104.858 Mbps
Forethought.net (Denver)512MB42 seconds102.261 Mbps
Forethought.net (Denver)512MB41 seconds104.755 Mbps
Linode (Dallas)256MB72 seconds29.826 Mbps
Linode (Dallas)256MB79 seconds27.183 Mbps

I don't know why the Linode download was so slow, although the outbound route to that server went out to California, and even across Comcast's network (!) before heading to Dallas. The download from a server at Forethought hit a much higher bottleneck somewhere, but it's difficult to say where.

Latency Tests

Going to town with the ping and traceroutes.
At Mud Pie, a couple of Chromebooks were hooked up to Google Fiber via ethernet.

I performed pings and traceroutes to a number of hosts, to get an idea of Google Fiber's positioning on the network and the available peering points for outbound packets. These tests were conducted from the Wi-Fi network at Mud Pie, so a few milliseconds can be attributed to local Wi-Fi latency (see the first item on the list).

hostlocationminavgmaxstddevnotes
networkbox1.7662.7365.0070.909The local gateway, for reference
www.apple.comDallas, TX (see notes)33.02435.81339.8962.499Akamai CDN node in Dallas, TX
google.comDallas, TX10.92013.88817.2242.046
youtube.comDallas, TX10.76012.12912.7870.736
www.kcnap.netKansas City, MO75.60576.97778.6280.902
xo.comWashington, DC82.05783.95985.6091.021
www.frgp.netDenver, CO19.39020.93123.4981.576Major peering point in Denver
www.cogentco.comWashington, DC35.46636.68139.8261.439
sparcomedia.comBeaverton, OR71.45273.80277.6521.986
www.forethought.netDenver, CO47.43849.15552.7761.642
cafbit.comDenver, CO48.17752.19558.2103.121You are here
www.he.netFremont, CA39.67142.08745.3462.055
gw.msstate.eduStarkville, MS119.290122.218129.9733.324
www.olemiss.eduOxford, MS43.33545.50749.0761.777
news.ycombinator.comHouston, TX53.56955.13559.5131.806
www.facebook.comPalo Alto, CA70.68675.49980.5263.325
66.249.72.47Mountain View, CA39.15442.49845.4882.106Last Googlebot host to visit cafbit.com
drive.google.comDallas, TX12.86514.44919.5392.079
a.root-servers.netTokyo, Japan200.447203.547209.3362.972(anycast)
b.root-servers.netMarina Del Rey, CA50.48853.98055.9541.805
c.root-servers.net?23.47826.93534.2854.029(anycast)
d.root-servers.netCollege Park, MD118.310122.118134.2354.868
f.root-servers.netChicago, IL92.45193.63496.7991.263(anycast)
h.root-servers.netAberdeen, MD45.97947.85249.9231.313
i.root-servers.netBrussels, Belgium115.991118.196120.0971.284(anycast)
j.root-servers.netSlovenia145.607149.108152.6412.383(anycast)
k.root-servers.netMiami, FL77.65080.06389.3223.588(anycast)
l.root-servers.netSan Jose, CA47.90448.73349.7330.561(anycast)
m.root-servers.netJapan144.549145.707151.1062.059(anycast)

The full ping/traceroute output is available.

Peering points

As far as I can tell, outbound packets exit Google Fiber's network via links to either San Jose, CA, or Dallas, TX. In San Jose, Google Fiber seems to be peering with Comcast and XO communications. (Presumably at Equinix's 11 Great Oaks facility.) In Dallas, Google Fiber seems to peer with Level 3 and Google's main network (which is a separate autonomous system from Google Fiber). As you might expect, access to Google services (such as Google Drive and YouTube) is quite snappy from the Google Fiber network.

Locally, Google Fiber has a short route to the University of Kansas Medical Center, but I'm not sure who else they peer with in Kansas City. They definitely do not peer with KC NAP.

IPv6

While on Mud Pie's network, my laptop was assigned an IPv6 address in the fc00::/7 block which is designated for unique local addresses. However, I'm not sure what the point of this is. I definitely could not reach the IPv6 internet via ping6.

Conclusion

Google Fiber is fast. If it was available in my neighborhood, I'd sign up.

UPDATE 2012-12-28: I've made another visit to Kansas City... see my post about plugging into the ethernet at the Hacker House.

Announcing Valence64: A new platform for a new era

Last year, I wrote an Android app called Valence that allows the user to remote-control the mouse and keyboard of another machine. Always looking for new challenges, I recently decided it was time for Valence to broaden its horizons beyond Android and support additional platforms to reach a wider audience.

In the following video, I demonstrate this exciting new release:

Yes, it's Valence for the Commodore 64. Now you can control your home theater PC easily and reliably from any C64 you happen to have handy. The source code for Valence64 is available on my GitHub under an Apache 2.0 license — bug fixes and feature patches are gladly accepted.

System requirements

An ethernet adapter such as the one shown above is required to use Valence64.

Valence64 requires the following hardware for proper operation:

  • Commodore 64 or 128*
  • 1541 or 1571 disk drive
  • 64K RAM
  • A supported ethernet cartridge: RR-Net, The Final Ethernet, 64NIC+ (and probably any other adapter with a cs8900a or lan91c96 chipset)
  • One blank disk required
  • Joystick optional

* - in C64 mode.

Downloads

Lua and Squirrel overhead

I've been researching the idea of using embedded languages in mobile applications as a way of reusing business logic across platforms. I haven't found a lot of information about how much an embedded language will bloat an app's size, so I decided to see for myself. So far, I've written simple "Hello, world" apps for both Lua and Squirrel. Lua is a simple language that has been heavily used in video games for years. Squirrel is a newer language that was inspired by Lua, but uses a more C-like syntax.

These tests are not very scientific, and only demonstrate the bare minimum task of including the language support as a native shared library, and some JNI code to run a script to generate a "Hello, world" message which is returned to the activity.

Lua and Squirrel app delivery overhead (.apk size differences)
language start size final size overhead
Lua12817 (13K)60089 (59K)47272 (46K)
Squirrel13530 (13K)118520 (116K)104990 (103K)
Squirrel (sans compiler)13530 (13K)99598 (97K)86068 (84K)

I'm frankly blown away by the compact size of these language implementations, especially after getting the impression that including Javascript (via Rhino) would cost many hundreds of kilobytes. That wouldn't be a problem for many apps, but for certain small apps, Rhino could end up being much larger than the app itself. In the case of Lua, which is implemented in a mere 20 C source files, you not only get the Lua virtual machine in 46K, but the compiler to boot! Developers can and do use Lua on other platforms such as iOS, and the Lua code has even been compiled to Javascript via emscripten (an LLVM Javascript backend), which adds the potential of reusing code in HTML5 apps.

I haven't played around with writing code in these languages, though, so I'm curious to hear about people's real-world experiences.

Using a Mac keyboard in Ubuntu 11.10 with Mac-like shortcuts

I'm trying out Ubuntu 11.10 (Oneiric Ocelot) on a PC with a Mac keyboard attached. I made a few hacks to make the keyboard work smoothly and in a (very roughly) Mac-like fashion. I figured I'd make a few notes here for my own future reference. (Note: I'm using a U.S. keyboard. If you are using a different kind of keyboard, your mileage may vary.)

Goals

  1. Make the function keys (F1..F12) work as function keys without needing to hold down the Fn key.
  2. Use Mac-like keyboard shortcuts for window navigation (Cmd-Tab, Cmd-`) and the terminal (Cmd-C for copy, Cmd-V for paste).
  3. Avoid stepping on Unity's use of the Super key (i.e. the command key on Macs and the Windows key on PC keyboards).
  4. Use the legacy Caps Lock key for something useful.

The plan

  1. Change a driver parameter to enable use of the function keys without holding down the Fn key.
  2. By default, the keyboard's left and right command keys are mapped to Super_L and Super_R. Map these instead to the seldom-used Hyper_L and Hyper_R keysyms. (If you try to use the Super keys for shortcuts, the Unity dock will appear every time you hold down the command key. It's really annoying.)
  3. Map the Caps Lock key to Super_L so it can be used for certain Unity shortcuts.

Making function keys work

Create a file in /etc/modprobe.d which sets the fnmode parameter of the hid_apple driver to 2 = fkeysfirst:

echo 'options hid_apple fnmode=2' > /etc/modprobe.d/apple_kbd.conf

Reboot, and the function keys will work without needing to hold down the Fn key. (You can access the volume controls and such by holding down the Fn key.) Thanks to Alan Doyle for reporting on this tweak.

Remapping the keys

I used the xkbcomp utility to remap the keys. I extracted the current keyboard mappings into a default.xkb file, made a copy of the mapping file as mackeyboard.xkb, made the changes to this file, then loaded the new mapping into the running X server:

xkbcomp :0 default.xkb
cp default.xkb mackeyboard.xkb
vi mackeyboard.xkb
xkbcomp mackeyboard.xkb :0

I'm attaching my mackeyboard.xkb file and the diff for reference. (Use these at your own peril.) I made the following changes:

  1. Changed the LWIN and RWIN keycode identifiers to LCMD and RCMD, for clarity.
  2. Commented out the LMTA and RMTA keycode aliases, to avoid confusion.
  3. Changed the CAPS keysym mapping from Caps_Lock to Super_L.
  4. Changed the LWIN and RWIN (now LCMD and RCMD) keysym mappings from Super_L and Super_R to Hyper_L and Hyper_R.
  5. Changed the modifier mapping so that only the CAPS keycode is used for Mod4. Since Mod3 wasn't previously in use, I mapped Hyper_L and Hyper_R to this modifier.

Configuring new shortcuts

In System Settings -> Keyboard -> Shortcuts, configure these shortcuts:

SectionShortcut nameKey
NavigationSwitch applicationsCmd+Tab
NavigationSwitch windows of an applicationCmd+`
WindowsToggle fullscreen modeCmd+Return
WindowsClose WindowCmd+Q

In Terminal's Edit -> Keyboard Shortcuts, configure these shortcuts:

SectionShortcut nameKey
FileNew WindowCmd+N
FileClose WindowCmd+W
EditCopyCmd+C
EditPasteCmd+V
ViewZoom InCmd+=
ViewZoom OutCmd+-
ViewNormal SizeCmd+0

I think the biggest benefit of the new Terminal shortcuts is the use of sensible copy and paste shortcuts that don't interfere with using Ctrl-C and Ctrl-V in the shell.

Future hacks

The following improvements are left as an exercise for the reader:

  • Have xkbdcomp load the new mapping every time you log in, so you don't have to run it manually.
  • Make other applications (such as Google Chrome) recognize Mac shortcuts such as Cmd-C and Cmd-V.
  • Figure out a generic way for specifying key translations for specific apps that happen to be in the foreground, similar to the functionality that AutoHotkey provides for Windows. (compiz plugin? resurrect the deprecated XEvIE X11 extension?)

Update, November 7, 2011: AutoKey

In the comments, Nivth Ket brought to my attention the AutoKey tool for mapping arbitrary keys to other keys, phrases, or even Python scripts. This tool seems to use the XRecord extension to X11 to listen to incoming keys. I gave AutoKey 0.80.3 a test drive, and found a few limitations that clashed with my needs. However, with a few hacks, I think I've overcome these limitations and found a solution that seems to work for me so far. The limitations and workarounds are as follows:

  • The AutoKey GUI does not allow assigning the same hotkey to multiple actions. This prevents me from assigning a key combination to do one thing in a particular application (i.e. the window title matches "Google Chrome"), and something else in every other application. The workaround is to edit the configuration files in ~/.config/autokey/data directly.
  • AutoKey does not have a notion of order semantics for the entries — the entries are processed in a seemingly random order. Therefore, if my entry for "Cmd-V with no window filter" happens to come before my entry for "Cmd-V only for Terminal windows", the former will eclipse the latter, and the Terminal-only rule will never happen. My workaround was to hack AutoKey to always process entries with filters first, then process entries with no filters. Here is the patch.
  • AutoKey does not support the little-known "Hyper" modifier key, which I use in my layout for the "command" keys. My workaround was to hack AutoKey to support the Hyper modifier. Here is the patch.

Downloads

  • mackeyboard.xkb - The xkb file for my keyboard, suitable for loading into a running X server with xkbcomp.
  • mackeyboard.diff - The changes I made to the original keyboard mappings.
Apple Remote Desktop quirks

While developing Valence, an input-only Android VNC client for remote controlling a computer, I've encountered several notable quirks in Apple Remote Desktop, Mac OS's built-in VNC server. Apple Remote Desktop (ARD) is based on VNC, a system developed in the late 1990's for controlling a remote computer, and its Remote Framebuffer (RFB) protocol. Generally, standard VNC clients can interoperate with ARD. An ARD server reports use of version "3.889" of the RFB protocol, which isn't a real version of RFB, but this version number can be used by clients to know that they are talking to an ARD server and not a conventional VNC server.

ARD authentication

During the RFB handshaking, a VNC server will announce which authentication methods it supports, and the client then picks from among these. Most VNC servers support a scheme known as "VNC authentication," which is a simple DES-based password challenge/response system. ARD offers VNC authentication, but it also offers a proprietary scheme which allows the user to also supply a username. This scheme is known as "Mac authentication" or "ARD authentication." Mac OS X 10.7 Lion includes an important change to the way ARD works: If you connect with VNC authentication, you are presented with a login screen which prompts you for a username and password before allowing you to control the desktop. If you connect with ARD authentication, this login screen is bypassed and you can immediately control the desktop.

Because Valence is an input-only VNC client, the login screen is a show-stopper: since the login screen cannot be seen, the user cannot login and control the desktop. The only way for Valence to support Lion's built-in VNC server was to add the ARD authentication scheme as an option. Apple has a support article which gives a high-level overview of the ARD authentication process, but it unfortunately does not contain the technical detail needed to implement the scheme. I was able to discover the technical details by studying the gtk-vnc open-source library, which implements ARD authentication thanks to a patch provided by Håkon Enger last year. (I'm not sure how Mr. Enger figured out the technique, but I'm grateful.)

The basic steps for performing ARD authentication are as follows:

  1. Read the authentication material from the socket. A two-byte generator value, a two-byte key length value, the prime modulus (keyLength bytes), and the peer's generated public key (keyLength bytes).
  2. Generate your own Diffie-Hellman public-private key pair.
  3. Perform Diffie-Hellman key agreement, using the generator (g), prime (p), and the peer's public key. The output will be a shared secret known to both you and the peer.
  4. Perform an MD5 hash of the shared secret. This 128-bit (16-byte) value will be used as the AES key.
  5. Pack the username and password into a 128-byte plaintext "credentials" structure: { username[64], password[64] }. Null-terminate each. Fill the unused bytes with random characters so that the encryption output is less predictable.
  6. Encrypt the plaintext credentials with the 128-bit MD5 hash from step 4, using the AES 128-bit symmetric cipher in electronic codebook (ECB) mode. Use no further padding for this block cipher.
  7. Write the ciphertext from step 6 to the stream. Write your generated DH public key to the stream.
  8. Check for authentication pass/fail as usual.

For further reference, my Java implementation of ARD authentication is available on GitHub.

Right-click problems in the Snow Leopard ARD v3.5 update

In July 2011, Apple pushed an update to Snow Leopard users which included a newer version of Apple Remote Desktop, the built-in VNC server. This new version, v3.5, interprets mouse buttons differently—the RFB "button-2" event is now used to indicate a right-click instead of the "button-3" event which is used on standard VNC implementations. This broke Valence's support for sending a right-click when the user performs a two-finger tap. To restore right-click support, I had to add a "send mouse button-2 instead of button-3" option to Valence's server configuration.

It is unclear to me why ARD v3.5 does this in Snow Leopard, why this problem doesn't exist in Lion, and why this problem doesn't appear when using the ARD client.

Foreign keyboard key mapping

Valence can be used to send international keys to VNC servers running on Linux or Windows with no problem. However, I was surprised to discover that these key events were often not correctly consumed by Macs. Upon investigation, I learned that the problem was worse—using a foreign keyboard layout can cause the wrong keys to be consumed by the Mac. The root cause appears to be the lack of support in Mac OS for allowing applications (such as a VNC server) to inject synthetic keysyms (symbolic representations of keys). Only injection of physical keycodes (numeric representations of actual keys on a keyboard) is supported. (Note that I'm borrowing "keysym" and "keycode" from X11 terminology, but most systems have equivalent concepts.)

Operating systems generally provide layers of abstraction around keyboard input, and some background is required to understand the problem. Keyboard hardware sends a "scancode" to the computer for each key pressed, using a scheme that dates back to the electrical configuration of rows and columns in early keyboards. These scancodes are then translated into more useful values one or more times before being provided to an application. On a Linux/X11 system, for instance, the scancodes are converted to "keycodes" which are simple representations of each physical key on the keyboard. The scancode to keycode mapping allows computers to use keyboards with different scancode conventions. For example, the 7th key from the left on the top letter row may be represented by different scancode bytes on different keyboards, but the operating system will always translate that key to the same keycode. The keycode, which still represents a physical key, is then mapped into a virtual value known as a keysym, based on the configured keyboard layout. For example, that 7th key on the top letter row would result in a keysym for "Y" when using a U.S. keyboard layout, but it would be translated into a "Z" when using a German keyboard layout. These abstraction layers allow applications to be written without needing to consider the keyboard electronics or the arrangement of keys on the keyboard. When the application receives a "Y" keysym, it knows that the user intended to enter "Y" regardless of which physical key they pressed to make it happen.

VNC clients send symbolic key representations, to avoid any of the headache with keyboard layouts. (The symbolic key representations used in the RFB protocol are defined to be identical to the X11 keysyms, but are trivially translatable into symbolic key values for other systems.) When a user presses a "Y", the VNC client sends 0x0079, which is the keysym for "Y", regardless of which physical key was pressed. The VNC server then injects a synthetic key event with the appropriate symbolic key for "Y" into the input system of the server. A user using a VNC client with a U.S. keyboard and a user using a VNC client with a German keyboard would both be able to type on a server with any defined keyboard layout without noticing any problems, because the VNC server injects the symbolic key and bypasses the whole messy business of keyboard layouts.

That's the theory, anyway. Unfortunately, since Mac OS doesn't seem to support symbolic injection, only physical keycode injection, the whole system falls apart. Here's what happens when an American user presses the 7th key of the top letter row ("Y" on the U.S. keyboard) while using a VNC client to control a server configured with a German keyboard:

  • Linux and Windows. The VNC client sends 0x0079, the VNC symbolic key representation for the letter "Y". The VNC server receives this event, and says, "I see you want to press the letter Y on this computer. I'll send the symbolic key for "Y" directly to the running application. We won't worry about what kind of keyboard is physically attached to the server... it doesn't matter!" The application receives the keysym for "Y", and all is well.
  • Mac OS. The VNC client sends 0x0079, the VNC symbolic key representation for the letter "Y". The VNC server receives this event, and says, "Gosh, the operating system's event API only lets me send physical key events, so I'll have to translate this into a physical keycode. I wonder how I can do that... Oh, I have a built-in list of how keysyms translate to keycodes for American keyboards, I'll use that! It looks like I should send the physical keycode representing the 7th key of the top letter row. Done!" After the VNC server injects the keycode, the operating system then performs its task of translating the physical keycode back into a keysym to be delivered to the application. Unfortunately, since the operating system is configured to be using a German keyboard layout, the keycode is translated into the keysym for "Z" instead of "Y". The user is shocked to see a "z" appear on the screen after he typed a "y"!

A sufficiently smart VNC server perhaps could try to work around this problem by (somehow) being aware of which keyboard layout is in use on the server, and having the information needed to translate keysyms to keycodes for all possible layouts. This may not even be feasible, but even if it is, it doesn't solve the problem: the remote user would still be unable to type characters that are not physically present on the server's keyboard.

A brief survey of open-source software that deals with key injection on Mac OS shows that my Valence app isn't alone in suffering from this issue—they all do. In fact, even Apple Remote Desktop cannot handle this correctly! The usual advice for using an ARD client to control an ARD server with a different keyboard mapping is to change the keyboard mapping on either the client or the server.

I have no solution for the foreign key problem at this time.

Rapid DHCP Redux

I was surprised at the amount of attention attracted by my recent post,"Rapid DHCP: Or, how do Macs get on the network so fast?". Between the 27 comments on my post and the 180 comments on Hacker News, a lot of interesting insights surfaced about the Mac's approach to DHCP. Information that would have taken me a week or two to research arrived within hours from people with experience in these matters. Here are some of the highlights:

  • The scheme Apple uses to achieve rapid network initialization is documented in RFC 4436: Detecting Network Attachment in IPv4 (DNAv4), which was authored by internet engineers from Apple, Sun, and Microsoft.
  • The scheme also seems to be documented in Apple's patent application (pub. no.: US 2009/0006635 A1). A patent has not yet been granted at this time. No Intellectual Property Rights (IPR) disclosures have been filed with the IETF concerning RFC 4436.
  • There's a minute chance for an address collision if the DHCP server loses its lease information after a reset. Such a collision should sort itself out quickly, but may cause a minor disruption to one or both of the hosts competing for the address. Many commodity routers may contain embedded DHCP servers that lose their lease information when the router is powered off. There is some debate over whether it is appropriate for implementors to risk this situation for a great speed benefit, or if they should take the strictly conservative route of accommodating such broken network scenarios.
  • I can't say for certain, but it seems that this process occurs in user-space in Apple's bootp package. (Thanks to everyone for the pointers to Apple's open source code.)
  • The DHCP server on my test network was not set up to be authoritative, so it wasn't prompting sending NAKs in response to bogus requests. Fixing this problem considerably improved the Galaxy Tab's DHCP time, although it (like many other devices) is still pokey compared to Apple's initialization scheme. (Added 2011-07-21.)

Thanks to everyone who joined in on the fun!

Rapid DHCP: Or, how do Macs get on the network so fast?

One of life's minor annoyances is having to wait on my devices to connect to the network after I wake them from sleep. All too often, I'll open the lid on my EeePC netbook, enter a web address, and get the dreaded "This webpage is not available" message because the machine is still working on connecting to my Wi-Fi network. On some occasions, I have to twiddle my thumbs for as long as 10-15 seconds before the network is ready to be used. The frustrating thing is that I know it doesn't have to be this way. I know this because I have a Mac. When I open the lid of my MacBook Pro, it connects to the network nearly instantaneously. In fact, no matter how fast I am, the network comes up before I can even try to load a web page. My curiosity got the better of me, and I set out to investigate how Macs are able to connect to the network so quickly, and how the network connect time in other operating systems could be improved.

I figure there are three main categories of time-consuming activities that occur during network initialization:

  1. Link establishment. This is the activity of establishing communication with the network's link layer. In the case of Wi-Fi, the radio must be powered on, the access point detected, and the optional encryption layer (e.g. WPA) established. After link establishment, the device is able to send and receive Ethernet frames on the network.
  2. Dynamic Host Configuration Protocol (DHCP). Through DHCP handshaking, the device negotiates an IP address for its use on the local IP network. A DHCP server is responsible for managing the IP addresses available for use on the network.
  3. Miscellaneous overhead. The operating system may perform any number of mundane tasks during the process of network initialization, including running scripts, looking up preconfigured network settings in a local database, launching programs, etc.

My investigation thus far is primarily concerned with the DHCP phase, although the other two categories would be interesting to study in the future. I set up a packet capture environment with a spare wireless access point, and observed the network activity of a number of devices as they initialized their network connection. For a worst-case scenario, let's look at the network activity captured while an Android tablet is connecting:

Samsung Galaxy Tab 10.1 - "dhcpcd-5.2.10:Linux-2.6.36.3:armv7l:p3"
time (seconds) direction packet description
00.0000outLLC RNR (The link is now established.)
01.1300outDHCP request 192.168.1.17: The client requests its IP address on the previously connected network.
05.6022outDHCP request 192.168.1.17: The client again requests this IP address.
11.0984outDHCP discover: "Okay, I give up. Maybe this is a different network after all. Is there a DHCP server out there?"
11.7189inDHCP offer 192.168.4.20: The server offers an IP address to the client.
11.7234outDHCP request 192.168.4.20: The client accepts the offered IP address.
11.7514inDHCP ACK: The server acknowledges the client's acceptance of the IP address.

This tablet, presumably in the interest of "optimization", is initially skipping the DHCP discovery phase and immediately requesting its previous IP address. The only problem is this is a different network, so the DHCP server ignores these requests. After about 4.5 seconds, the tablet stubbornly tries again to request its old IP address. After another 4.5 seconds, it resigns itself to starting from scratch, and performs the DHCP discovery needed to obtain an IP address on the new network. The process took a whopping 11.8 seconds to complete. (Note: This would have been faster if my DHCP server was configured to send NAKs properly—see my update below... -simmons, 2011-07-21)

In all fairness, this delay wouldn't be so bad if the device was connecting to the same network as it was previously using. However, notice that the tablet waits a full 1.13 seconds after link establishment to even think about starting the DHCP process. Engineering snappiness usually means finding lots of small opportunities to save a few milliseconds here and there, and someone definitely dropped the ball here.

In contrast, let's look at the packet dump from the machine with the lightning-fast network initialization, and see if we can uncover the magic that is happening under the hood:

MacBook Pro - MacOS 10.6.8
time (seconds) direction packet description
00.0000outLLC RNR (The link is now established.)
00.0100outARP request broadcast who-has 169.254.76.19 (The client is validating its link-local address)
00.0110outARP request unicast 00:22:75:45:e3:54 who-has 192.168.2.1 tell 192.168.2.56
00.0120outARP request unicast 4e:80:98:f0:35:e3 who-has 192.168.4.1 tell 192.168.4.25
00.0120inARP reply unicast from DHCP server: 192.168.4.1 is-at 4e:80:98:f0:35:e3
00.0130outARP request unicast 00:0d:b9:54:27:b3 who-has 192.168.1.1 tell 192.168.1.29
00.0140outDHCP request 192.168.4.25
00.0180outARP broadcast who-has 192.168.4.25 tell 192.168.4.25
00.0210outARP broadcast who-has 169.254.255.255 tell 192.168.4.25
00.0290outARP broadcast who-has 192.168.4.1 tell 192.168.4.25
00.0290inARP reply unicast: 192.168.4.1 is-at 4e:80:98:f0:35:e3
00.0310outUDP to router's port 192 (AirPort detection) This implies that the IP interface is now configured.
......(More normal IP activity on the newly configured interface)
01.2680outDHCP request 192.168.4.25
01.3043inDHCP ACK

The key to understanding the magic is the first three unicast ARP requests. It looks like Mac OS remembers certain information about not only the last connected network, but the last several networks. In particular, it must at least persist the following tuple for each of these networks:

  1. The Ethernet address of the DHCP server
  2. The IP address of the DHCP server
  3. Its own IP address, as assigned by the DHCP server

During network initialization, the Mac transmits carefully crafted unicast ARP requests with this stored information. For each network in its memory, it attempts to send a request to the specific Ethernet address of the DHCP server for that network, in which it asks about the server's IP address, and requests that the server reply to the IP address which the Mac was formerly using on that network. Unless network hosts have been radically shuffled around, at most only one of these ARP requests will result in a response—the request corresponding to the current network, if the current network happens to be one of the remembered networks.

This network recognition technique allows the Mac to very rapidly discover if it is connected to a known network. If the network is recognized (and presumably if the Mac knows that the DHCP lease is still active), it immediately and presumptuously configures its IP interface with the address it knows is good for this network. (Well, it does perform a self-ARP for good measure, but doesn't seem to wait more than 13ms for a response.) The DHCP handshaking process begins in the background by sending a DHCP request for its assumed IP address, but the network interface is available for use during the handshaking process. If the network was not recognized, I assume the Mac would know to begin the DHCP discovery phase, instead of sending blind requests for a former IP address as the Galaxy Tab does.

The Mac's rapid network initialization can be credited to more than just the network recognition scheme. Judging by the use of ARP (which can be problematic to deal with in user-space) and the unusually regular transmission intervals (a reliable 1.0ms delay between each packet sent), I'm guessing that the Mac's DHCP client system is entirely implemented as tight kernel-mode code. The Mac began the IP interface initialization process a mere 10ms after link establishment, which is far faster than any other device I tested. Android devices such as the Galaxy Tab rely on the user-mode dhclient system (part of the dhcpcd package) dhcpcd program, which no doubt brings a lot of additional overhead such as loading the program, context switching, and perhaps even running scripts.

The next step for some daring kernel hacker is to implement a similarly aggressive DHCP client system in the Linux kernel, so that I can enjoy fast sign-on speeds on my Android tablet, Android phone, and Ubuntu netbook. There already exists a minimal DHCP client implementation in the Linux kernel, but it lacks certain features such as configuring the DNS nameservers. Perhaps it wouldn't be too much work to extend this code to support network recognition and interface with a user-mode daemon to handle such auxillary configuration information received via DHCP. If I ever get a few spare cycles, maybe I'll even take a stab at it.

Update, July 12th, 2011 1pm MT:

This post has been mentioned on Hacker News, and there's lots of lively discussion in the comments over there.

Some people have pointed out some disadvantages in putting a full-featured DHCP client in the kernel. I'm skeptical about putting the DHCP client in the kernel, myself. However, I didn't want to elaborate on that at 2:00am, since the post was getting way too lengthy as it was. If I had known it would be subject to such peer review, I might have been a bit more careful with my words. :)

The argument for putting the DHCP client in the kernel basically boils down to:

  1. Achieving speed is all about shaving a few milliseconds here and there, and you just can't launch a program, wait for it to dynamically link, load config files, etc., and get the 10ms response time that the Mac has. (10ms from link establishment to transmitting the first DHCP packet.) I'm told that the dhcpcd program is a persistent daemon, so maybe the launch overhead isn't there. But something is keeping Linux hosts from having a 10ms response time.
  2. Doing ARP tricks could be awkward in user-space. You'd need to use the raw socket interface for transmitting (which isn't a big deal), and you'd have to use something like the packet(7) interface to sniff incoming packets to observe the ARP replies. I haven't played around with the packet(7) interface, so I'm not sure what the pros and cons might be.

Neither of these are show-stoppers to an improved user-mode DHCP client, but that was my thinking at the time. Now, I think I would certainly start with a user-mode solution, since a carefully crafted daemon should be able to achieve comparable response time, and the arping(8) program doesn't seem to have any problem using packet(7) to send and receive ARP packets in user-space.

Update, July 13th, 2011 2:48am MT:

Thanks to M. MacFaden for pointing out in the comments that this scheme is basically an implementation of RFC 4436: Detecting Network Attachment in IPv4 (DNAv4), which was co-authored by an Apple employee.

Update, July 21th, 2011 1:20pm MT:

Thanks to Steinar H. Gunderson for pointing out in the comments that the DHCP server on my test network was incorrectly configured. Since I was using a mostly "out of the box" dhcpd configuration from Ubunbtu Linux, it wasn't set up to be authoritative by default, so it wasn't promptly sending NAKs in response to the Galaxy Tab's requests for an old IP address. After fixing the problem on the DHCP server, the Galaxy Tab's DHCP handshake happens quite a bit faster (although still 85 times slower than the Mac). Below is the revised chart of network activity for the Galaxy Tab:

Samsung Galaxy Tab 10.1 (Revised) - "dhcpcd-5.2.10:Linux-2.6.36.3:armv7l:p3"
time (seconds) direction packet description
00.0000outLLC RNR (The link is now established.)
01.1570outDHCP request 192.168.1.17: The client requests its IP address on the previously connected network.
01.1574inDHCP NAK: The server declines to allow 192.168.1.17 on this network.
02.2261outDHCP discover
02.5871inDHCP offer 192.168.4.20: The server offers an IP address to the client.
02.5951outDHCP request 192.168.4.20: The client accepts the offered IP address.
02.6198inDHCP ACK: The server acknowledges the client's acceptance of the IP address.

These times are more in line with what I see on most non-Mac devices on my non-test networks—about 2.5-3s in DHCP, plus a bit more time for link initialization and such—long enough that I frequently get a "no connection" error in my web browsers. We'll need to find ways to shave this down in emerging consumer electronics devices. Consumers are conditioned to think of PCs as "something you wait on," but expect non-PC network devices to behave more like light switches.

I've posted a summary of the discussion in another entry.