Caffeinated Bitstream

Bits, bytes, and words.

Migrating from Apache Roller to Hugo and Isso

After almost ten years of using Apache Roller to power this blog, I'm making the leap to the Hugo static site generator. Roller served me well, but after years of watching Roller+Tomcat use hundreds of megabytes of memory on my server, I decided that it was overkill for my needs.1 The only major feature which absolutely demands dynamically generated pages is the comments, and I've migrated that functionally to a distinct service using Isso.

My goals are:

  • Reduce the server resource usage.
  • Allow blog posts to be created, managed, and revisioned using the same tools I use to manage software projects — Vim, Git, etc.
  • Reduce the deployment effort of the server-side software. (Like many Go apps, Hugo is a single statically linked binary with no dependencies to worry about.)

For my own future benefit, I'm providing my notes on this migration below.

Migration steps

Most of the migration process was straightforward, although a bit time consuming. I unfortunately don't have a "roller-to-hugo" script that magically converts a blog, as several categories of content assets needed to be manually adapted to the Hugo way of doing things. I can outline the basic steps, though:

  1. Create a Hugo theme. I wanted the blog to look and feel the same in Hugo as it did in Roller, so I needed to create a custom Hugo theme. I was already using custom stylesheets and Velocity templates on Roller, so simply extracting these assets from the database's webpage table into files got me most of the way there. I then needed to touch up the files to convert Velocity markup into Go templates and adapt the pagination scheme.
  2. Port blog posts. Roller stores blog entries in the weblogentry table, and their associated tags in the roller_weblogentrytag table. I wrote a one-off Python script to create Hugo content files out of this data.
  3. Port static content. This was simply a matter of finding the Roller resources directory in the filesystem, and copying it to the Hugo static/resources directory.
  4. Port RSS and Atom feeds. The current version of Hugo does not have built-in support for Atom feeds, so I used a template-generated solution as described here. I also needed to update the <head><link ... /></head> references in my templates to point to the new feed URLs.
  5. Port comments. I used the Isso comment server to support comments. Isso works similarly to Disqus, except it is self-hosted.
    1. Install Isso in a Docker container for isolation and ease of management.
    2. Map the blog's /isso URLs to Isso.
    3. Add the client HTML bits to inject the Isso comments into pages.
    4. Configure Isso: Basic configuration (dbpath, host, [server].listen), logging, SMTP notifications, moderation, and guard settings (rate limits).
    5. Import comments. I wrote a one-off Python script to import comments, paying careful attention to properly initialize the voters Bloom filter bitmask in Isso's SQLite database.
  6. Map old Roller URLs to Hugo URLs. I configured some 301 (permanent) redirects on my web server so that existing links to block posts and feed URLs will continue to work.

I have a few ugly Python scripts for migrating data from Roller to Hugo/Isso, but I'll hold off on posting them unless someone really wants to see them.

Pros and cons

Hugo pros:

  • When I first started working through the Hugo Quickstart Guide, it seemed like a lot of steps. However, after playing around with it for a while, everything seems really easy and straightforward.
  • As expected, the resource usage is low. Hugo is a single, self-contained ~6MB binary. Since static pages are generated, there is no persistent resource usage.
  • Blog posts can be composed completely offline, and tested using Hugo's built-in web server. When ready, I can push the Git commit(s) and rebuild the site on the server. Rebuilding my blog from scratch takes about 200 milliseconds.

Hugo cons:

  • No built-in support for Atom feeds. (But it's easy to add via a template.)
  • It's not obvious how trackbacks would be implemented with Hugo.
  • Dynamic web apps have the luxury of providing the correct MIME type with every document that is delivered. Since Hugo is generating static files, I now rely on the web server to determine the MIME type based on filename extensions. This may be an obstacle to preserving some URL schemes when migrating to Hugo.2 I ended up restructuring the web site to use Hugo-friendly URLs, and adding permanent redirects to map old Roller URLs to Hugo URLs.

Isso pros:

  • As a self-hosted solution, Isso avoids some of the privacy concerns that people have with third-party solutions such as Disqus.
  • Notification and moderation of comments via mail.

Isso cons:

  • Isso is a very simple, no-frills service. It accepts and regurgitates comments, but not much more.
  • There is no visible feedback when the guard rate-limits are hit, so the user doesn't receive any hint about why the comment is not being posted.
  • It doesn't seem practical to add the comment count below each entry on the main page, as I did with Roller.
  • I haven't figured out how to configure Isso to use my correct base URL in mail notifications, so I have to tweak the URLs when approving or deleting comments. (The host option seems to not be useful here.)
  • Isso seems to wake up every 500 milliseconds to do something, even when it is not being actively used:
    # strace -p 4890
    strace: Process 4890 attached
    select(5, [4], [], [], {0, 85673})      = 0 (Timeout)
    select(5, [4], [], [], {0, 500000})     = 0 (Timeout)
    select(5, [4], [], [], {0, 500000})     = 0 (Timeout)
    select(5, [4], [], [], {0, 500000})     = 0 (Timeout)
    select(5, [4], [], [], {0, 500000})     = 0 (Timeout)
    
    Perhaps this is a function of Werkzeug. Despite the 500ms wakeup, the CPU utilization seems to be negligible.

Footnotes

  1. To be fair, there are probably lots of opportunities to tweak the parameters of Tomcat and Roller to tune the resource usage. Perhaps the JVM heap size and/or the size of internal caches could be adjusted. Also, memory usage of specific services on Linux can be notoriously difficult to determine. (Top is currently showing that resident memory usage of my Tomcat server is 286MB.)
  2. I suppose someone could manually add web server configuration rules to match URL patterns to the right MIME types, but this seems needlessly manual and brittle.
  3. Isn't it fun reading through all the footnotes?