Caffeinated Bitstream

Bits, bytes, and words.

Posts

Finagling with Nagle: Nagle's algorithm and latency-sensitive applications

NOTE: This post discusses a specific problem which is solved by disabling Nagle's algorithm. Do not try this at home: Nagle's algorithm is implemented in TCP/IP stacks for a reason, and I'm told that 99% of programmers who disable Nagle's algorithm do so errantly due to a lack of understanding of TCP. Disabling Nagle's algorithm is not a silver bullet that magically reduces latency.

While developing an input-only VNC client for Android to act as a remote mouse and keyboard, I noticed that mouse pointer movements were particularly jerky when connecting to a computer running TightVNC. Connections to other VNC servers yielded relatively smooth pointer movements, and using the official TightVNC client to connect to the TightVNC server was also smooth. My VNC client was not alone with this problem—connecting to the TightVNC server using other VNC clients such as Chicken of the VNC also resulted in severe pointer jerkiness. What was special about the VNC connection between the TightVNC client and the TightVNC server that made it work so much more smoothly?

To investigate, I used Wireshark to analyze the protocol communication between the client and server to see what was different. To my surprise, the RFB (VNC protocol) messages sent by the TightVNC client were basically the same as the messages sent by my client. However, there was one critical difference that existed deeper in the stack. In the TCP stream from the TightVNC client, each IP packet only contained a single RFB message. In the TCP stream from my client, several RFB messages would sometimes be batched together in a single IP packet. I immediately understood the problem.

The TCP/IP implementations of our operating systems contain a piece of code known as Nagle's Algorithm which improves performance by batching small writes into a single packet if it has not yet received an ACK for the previously sent packet. This algorithm works wonders for most applications, but very occasionally there is a need to disable Nagle's algorithm by setting the TCP_NODELAY socket option. An examination of the TightVNC client and server source code revealed that these programs do indeed disable Nagle's algorithm on each end of the TCP connection. I made a one-line change to my VNC client (socket.setTcpNoDelay(true);) and the problem went away—pointer movements became smooth as silk. The reason for the improvement is two-fold:

  1. The TightVNC server does not animate intermediate pointer events when they are received all at once. In my brief perusal, I couldn't find a specific location in the source code where this was happening, so it may be an efficiency hack in the operating system. (I don't know exactly how the TightVNC sends its synthetic events to Windows, but maybe it needs to do some sort of flush after each event.) By sending the messages in individual IP packets, each pointer event is processed and appears on the screen as a smooth transit.
  2. My use of VNC as an input-only remote control system is very latency-sensitive. The user's eyeballs are watching the remote host's screen while their thumb is moving across their phone, generating pointer events. There is an inevitable delay between the time the user's thumb moves and the time their eyeballs see the pointer move. The shorter the delay, the more natural their phone feels as an input device. The longer the delay, the more painful the experience. In this specific scenario, I judged that snappiness was more important than overall bandwidth, so I disabled Nagle's algorithm. In addition to fixing the jerkiness problem with TightVNC, to my delight this also improved smoothness to some degree with all other VNC server implementations!

This was one of the rare cases were it was appropriate to disable Nagle's algorithm. In fact, this is the only time in my 25+ years of programming that I have ever found a need to disable Nagle's algorithm. My application is only likely to be used on a local network, otherwise the disabling could cause more harm than good. If you set TCP_NODELAY on your sockets, be warned that the responsibility for being efficient with packets now falls into your hands—the operating system is not going to help out. In my case, I found several places in my code where I needed to combine multi-message transmissions into a single send() in order to fix performance lost due to disabling Nagle's algorithm.

Android Network Information

While developing Android applications, I'm often juggling lots of Android machines, both real and virtual. Since I often need to connect to these machines over the network with adb connect, I found it useful and educational to write a small home screen widget that always shows the device's IP address. This is a pretty dumb application, but I decided that it would be a good opportunity to learn how to publish apps on the Android Market.

Downloads

The main screen lists all network interfaces and enumerates their IPv4/IPv6 addresses.
This home screen widget always shows your current IP address.
Introducing Valence

An on-screen trackpad and keyboard allow a computer to be remote controlled.
Valence supports mDNS service discovery (aka Bonjour or Avahi) to locate participating VNC servers on the local network.

In my spare cycles recently, I've been tinkering with developing Android code to control home theater components. As someone who has been passionately involved with the consumer electronics industry over the past ten years, my personal home theater system includes quite a collection of disparate devices that defy even the most feature-rich universal remote controls. What the world needs is an open, extensible Android remote control application that can be rapidly updated to support new devices as they are released.

The first module of code I've developed is a remote control for home theater computers, such as the one I use to watch YouTube videos and other web video content. I've built a standalone app with this code for testing purposes. There are plenty of other such apps on the Android Market, but my program, Valence, seeks to provide a simpler experience in the following ways:

  • Most computer remote control apps require that the controlled computer be running special software, unique to the app, that relays mouse and keyboard events to the operating system. Not only does this add yet another single-purpose ever-running program to your computer, but the author may not have support for your operating system yet. To solve this problem, Valence uses the industry standard VNC system and its RFB (remote framebuffer) protocol. Many operating systems come with VNC built-in, and many people may already have VNC installed and enabled. Instead of using VNC to see the screen of a remote computer, Valence uses VNC in a strictly one-way fashion—input events are transmitted from the Android handset to the VNC server, but video frames are never sent from the server to the handset.
  • Valence supports mDNS service discovery to automatically find VNC servers on your local network, without the need to mess around with IP addresses. Not all VNC servers support discovery, but it sure saves some hassle if yours does. (A few Android phones do have trouble performing such discovery, but most seem to work.)

Today I'm releasing an unpolished "rough draft" of Valence so my friends and colleagues can test its compatibility with their networks. Some known bugs remain in this version, but I'm erring on the side of getting feedback early. Eventually, I'll roll the Valence functionality into my larger remote control framework that provides many more features such as controlling IR devices.

To use Valence, make sure you have VNC server software installed on your computer, and configured to allow connections:

  • Mac OS. Macs have built-in VNC software, so no additional software needs to be installed. In "System Preferences," go to the "Sharing" tab and make sure "Remote Management" is selected. Then click "Computer Settings..." and make sure "VNC viewers may control screen with password" is selected, and provide a password.
  • Ubuntu Linux. Ubuntu Linux has built-in VNC software, so no additional software needs to be installed. In the System menu, select Preferences then Remote Desktop. Check "Allow other users to view your desktop" and "Allow other users to control your desktop," uncheck "You must confirm each access to this machine," and check "Require the user to enter this password." Assign a password.
  • Windows. Windows does not have a built-in VNC server, so you'll need to install one such as TightVNC. Unfortunately, TightVNC seems to be a little jerky with processing mouse movement, so please let me know if you find a better VNC server for Windows that works with recent releases such as Windows 7. On TightVNC's download page, click the "download" link next to "Self-installing package for Windows." Open the downloaded package, click "Run," and proceed through the setup wizard. (Just keep hitting "Next" and "I agree".) Enter a password when prompted in the "Service Configuration" dialog, click "Install," and complete the installation.

Update: June 7, 2011

I've uploaded a new beta package, which is linked below. Changes include the following:

  • Bug fix: Valence did not properly store passwords that were less than eight characters. This has been fixed.
  • Mouse movement should be much smoother now. In particular, usability with the TightVNC server has been vastly improved. (I'm now setting TCP_NODELAY on the socket to disable Nagle's algorithm.)
  • Reorientation (from landscape to portrait and vice-versa) no longer causes the connection to be killed and re-established.
  • An "about" page and help documentation are now available by pressing the menu button.
  • Bug fix: If the user pressed the back button while Valence was trying to connect, the app would crash. This has been fixed.
  • I tweaked the launcher icon slightly.

Update: June 30, 2011

Yet another beta release. Changes include the following:

  • Keyboard
    • Added a new button for special keys like "esc", "tab", F1-F10, etc.
    • Better support for the HTC soft keyboard.
    • Better support for international characters.
  • Documentation
    • Added a section on security considerations to the online help.
    • The "about" window now contains the build date of the package.
    • The "about" window is now scrollable, so it works in landscape.
  • VNC setup
    • In the setup form, alternate port numbers (other than 5900) were not being saved. Fixed.
    • The setup form would reset whenever the device changed orientation. Fixed.
    • Your configured VNC servers can now be edited. Long press on the server, and select "edit."
    • You can now change the displayed name of a VNC server, instead of being forced to see the name that was auto-detected from the server.
    • The setup form is now scrollable, in case it doesn't fit on your screen.
  • Trackpad
    • The trackpad code has been completely rewritten to pave the way for new features. This is still a complex piece of logic, and I wouldn't be surprised to see some regressions resulting from the rewrite.
    • The highly anticipated scrolling feature has been added. Swipe up and down with two fingers to scroll. Horizontal scrolling is not yet supported. Different operating systems (and perhaps different applications within an operating system) have different ideas about what a suitable unit of vertical scrolling is, so at present scrolling is a bit too fast on Windows and a bit too slow on Mac.
    • You can now simulate a right mouse button press by tapping with two fingers.
  • Miscellaneous bug fixes
    • The TCP socket did not always shut down when the Valence activity finished. Fixed.
    • Valence would crash when the screen was reoriented while an alert was being shown. Fixed.

Update: July 18th, 2011

I've made a beta release of Valence available on the Android Market. There have been no significant changes since the previous release, other than removing a bit of debug logging.

Update: August 27th, 2011

I posted a new release to the Market with an important bug fix, and a few minor changes:

  • Users reported a bug that was preventing some handsets from sending right-clicks by tapping with two fingers. This bug has been fixed.
  • I updated the touchpad text and the help file to reveal the ability to right-click and scroll by using two fingers.
  • I dropped "Beta" from the app's name.

Downloads

File event notifications in Mac OS

While using Mac OS, I've been missing the handy Linux inotifywait utility—it's a simple program to use the Linux inotify facility to wait on certain file events. I sometimes write scripts that use inotifywait to automatically launch programs when files are changed. For instance, I can have a script automatically compile a program whenever I save the source file in the editor.

It turns out that Mac OS and other recent BSD operating systems have a similar kernel facility called kqueue, and it was really easy to whip up a small program to block until an event occurs on a file. My filewait program is linked below, and can be used in scripts such as this one:

1234567
#!/bin/sh
# wait for the file to change...
while filewait magnumopus.tex; do
    # compile the file
    sleep 0.2
    pdflatex magnumopus.tex
done

Downloads

  • filewait.c - My simple program to pause until a file event occurs.
Implementing DES

DES, the Data Encryption Standard, was developed by IBM and the US government in the 1970's. Today, DES is considered to be weak and crackable, and a poor choice for anyone in the market for an encryption algorithm. However, many legacy protocols still use DES, so it's important to have implementations handy.

I recently found myself looking for a simple standalone DES implementation to study. Most of the open-source DES implementations are either highly optimized into obfuscation, or sloppily written. Either way, it's hard to find a clearly written and well-commented implementation suitable for educational purposes. I decided it would be a good exercise to write one myself. I wrote the implementation in Java, for the extra challenge of performing bit manipulations on signed primitive types. This implementation is undoubtedly very inefficient, but is well-commented and should be easy to understand for anyone who wants to dive into a sea of Feistel functions, S-boxes, variable rotations, and permutations.

References

I found the following resources useful in my study of DES:

Downloads

  • DES.java - My DES implementation in Java
Testing multicast support on Android devices

In my previous post, I mentioned my frustration that certain Android phones (including my HTC EVO) cannot receive multicast datagrams. I'd like to get feedback from my friends and colleagues about multicast support on their phones, so I wrote a simple app for testing multicast.

The Multicast Test Tool continually monitors the network for Multicast DNS (mDNS) packets while the app is running in the foreground, and presents the contents of these packets to the user. The app also allows the user to perform simple mDNS queries on the local network. If you run the app and touch the "Query" button, it will query for the default _services._dns-sd._udp.local name, which will solicit mDNS responses from devices on your network that advertise services via mDNS.

Mac and Linux machines will respond to this discovery query. Windows machines will also respond if you are running iTunes and have checked "Share my library on my local network" in the preferences. If you think your network may not have any of these devices which support mDNS service discovery, you can run the attached perl script to transmit a gratuitous mDNS packet.

You can also query specific hosts using the format <hostname>.local. My home network is always buzzing with mDNS traffic, so on my multicast-capable virtual machine I see lots of activity without performing any queries.

If you see any activity at all, it means that your phone supports multicast. If you see no activity, then your phone likely does not support multicast.

Downloads

  • multicast_test.apk - This is the Multicast Test Tool app, which you can download and install on your Android phone.
  • send_mdns.pl - This perl script transmits a single mDNS packet, which should be detected by the Multicast Test Tool. This is only needed if you don't already have devices on your network (Macs, Linux boxes, etc.) which implement the mDNS service discovery protocol.
  • multicast_test_tool.tar.gz - The source code for the Multicast Test Tool. This is only needed if you are curious about the inner workings. (Added 2011-02-02.)
Broken multicast networking on HTC smartphones

It looks like some (most?) HTC phones running Android, such as my HTC EVO, are not capable of receiving multicast or broadcast datagram packets over the Wi-Fi network. This means that apps which rely on such communication will fail, often with no indication of the problem. From the app's perspective, no obvious error is happening — it can only assume that no other devices on the network are transmitting such datagrams. Multicast communication is becoming increasingly common as a technique for devices to discover each other on a network, and the absence of this capability represents serious breakage that leaves apps crippled. Examples of resources an app might use multicast to discover are:

  • iTunes music libraries
  • entertainment center components such as Roku and AppleTV
  • network shares
  • other devices running the same app, to set up multiplayer games

Without a functioning multicast capability, these apps either don't work, or require the user to manually configure the IP addresses of the other devices.

There is speculation that HTC's restriction is an attempt to save power and prolong battery life — it takes energy to process a network's many multicast and broadcast packets, many of which are useless to the device and its applications. However, Android already has a facility for allowing prudent use of multicast when needed. Applications acquire a WifiManager.MulticastLock for the duration of their multicast needs, which causes the Wi-Fi chip to stop filtering multicast packets until the app releases the lock. Thus, multicast processing in the software TCP/IP stack only happens when it's really needed.

This is a bit frustrating, as I'm currently tinkering with some multicast code. Some people have been able to fix the problem by rooting their phones and replacing the /system/bin/wpa_supplicant binary with a stock version. This doesn't seem to help on my EVO. To add to the frustration, it can be difficult to test multicast in the Android emulator. I finally had some success in testing multicast code by running android-x86 in a VirtualBox virtual machine, with the network configured for bridging.

References:

The year in bandwidth

I collect data usage statistics on my home broadband connection using a script that polls my router's WAN interface counters via SNMP once a minute. Since I have all this data lying around, I thought it might be neat to chart my broadband usage for 2010 and get an idea of how much of a bandwidth hog I am. My usage includes lots of movie streaming, VoIP phone calls, and work-related applications (since I work from home).

My total usage in 2010 was about 209 GB of download and 24 GB of upload, for a monthly average of about 17 GB and 2 GB, respectively.

IPv6 protocol overhead

The IPv4 address space is nearing exhaustion. The unallocated address pool is currently expected to be depleted at the IANA level in June 2011, and at the Regional Internet Registry (RIR) level in January 2012. Networks will need to transition to IPv6, which allows for an astronomically larger address space (among other nice features). IPv6 has been around in some form since 1995. However, since network operators are human beings, the consensus has not been to methodically migrate to IPv6 over the last 15 years, but rather to procrastinate 15 years and then pull an all-nighter on the eve of exhaustion.

IPv6 is likely to become an important topic in the next few years, and I've become interested in brushing up on the protocol. (I first studied IPv6 in 1996, back when everyone figured adoption would be imminent.) IPv6 is likely to be deployed on a much larger scale in the coming years than in years past, exposing many practical network engineering challenges, and perhaps creating opportunities for a low-level software engineer such as myself to provide value in easing migration and filling in the coverage gaps.

My first experiment was to measure the extra protocol overhead of IPv6 in common network scenarios. An IPv4 header occupies 20 bytes in the common case, and an IPv6 header occupies 40 bytes in the common case. The practical difference in overhead depends on the size of the packets—with large packets, the extra 20 bytes of header doesn't matter too much when measuring the overhead with respect to the total payload being transmitted. IPv6 evangelists occasionally tout the protocols's minimum MTU (1280 bytes) which is larger than in IPv4 (68 bytes, or 576 bytes to avoid fragmentation) when calculating efficiency, and come to the conclusion that IPv6 is even more efficient than IPv4. This doesn't reflect the practical reality of the modern Internet where most routes support a path MTU of 1500 without a problem. IPv6 is a necessary and worthwhile protocol, but we shouldn't be afraid to study the actual, real-world overhead.

To measure IPv6 overhead in a real-world test, I configured two User-Mode Linux instances (similar to virtual machines) with a virtual serial connection between them which can support PPP. I used the nc program to transfer data streams of various sizes over TCP in both IPv4 and IPv6, and recorded the total number of transferred bytes as measured by pppd. Performing real-world measurements was somewhat redundant, since the theoretical overhead can be easily calculated using the size of the packet headers and payload lengths. However, it was interesting to see the protocol in action.

MTU 1500 tests, with respect to sender
512 KiB 4 MiB 16 MiB 64 MiB
IPv4 out54369243476601738983669558696
IPv4 in28681451654452214612
IPv6 out551468
(+1.4%)
4409028
(+1.4%)
17626852
(+1.4%)
70505788
(+1.4%)
IPv6 in4140
(+44.4%)
22584
(+55.6%)
75564
(+38.8%)
297396
(+38.6%)

As expected, the additional outbound overhead from our TCP sending node is quite negligible. However, the return traffic—mostly small packets of TCP ACKs with empty payloads—contains a significant amount of additional overhead. I wonder how this will affect users with asymmetric connections with a limited upload capacity. At some point, upload limitations can start bottlenecking the download. (It might be an interesting experiment to see how effective TCP sliding window sizes are at mitigating this effect.)

I performed this experiment again, limiting the outbound TCP segment payloads to 128 bytes, to get an idea of how applications with smaller packet sizes might impact an IPv6 network.

SO_SNDBUF=128 tests, with respect to sender
512 KiB 4 MiB 16 MiB
IPv4 out665856589917623596416
IPv4 in19240817042526816956
IPv6 out789868
(+18.6%)
6555668
(+11.1%)
26219068
(+11.1%)
IPv6 in284508
(+47.9%)
2360292
(+38.5%)
9438552
(+38.5%)

The IPv6 header overhead begins to really add up when using smaller packet sizes. The total bandwidth consumed by applications with small bits of sporadic data or low-latency requirements (i.e. VoIP) might be significantly higher on an IPv6 network.

All in all, I think the additional protocol overhead of IPv6 is quite manageable in most cases, and I hope network operators begin upgrading their IP networks soon.

Customizing the Mac Dashboard calendar widget

It turns out that it's not too hard to customize the dashboard widgets that come with Mac OS X, since they're implemented in HTML and Javascript. You can just copy the stock widget from /Library/Widgets to ~/Library/Widgets and go to town. In the above picture, you can see how I've hacked the calendar widget to show week numbers.

Update, April 7, 2011: Due to popular demand, I've posted a diff of my changes here, based on the Snow Leopard calendar. Copy the calendar widget into your home directory (~/Library/Widgets) and rename it to wwCal.wdgt.