Fun with Fonts on the Web

A more accurate version of the title probably should be "Fun with Fonts in Web Browsers", but oh well, it sounds cooler that way. Text rendering is hard, and it certainly doesn't help that we have a plethora of different writing systems (blame the Tower of Babel for that, I guess) which cannot be elegantly fitted into a uniform system. Running a bilingual blog doubles the trouble in font picking, and here's a compilation of the various problems I encountered.

Space Invaders

Most browsers join consecutive lines of text in HTML to a single one with an added space in between, so

<html>Line one and
line two.</html>

renders to

Line one and line two.

Such a simplistic rule doesn't work for CJK languages where no separators is used between words. The solution is to specify the lang attribute for the page (or any specific element on the page) like so:

<html lang="zh">第一行和
第二行。</html>

If your browser is smart enough (like Firefox), it will join the lines correctly. All the Blink based browsers, however, still stubbornly shove in the extra space, so it looks like I will be stuck in unwrapped source files like a barbarian for a bit longer. While not a cure-all solution, specifying the lang attribute still have the added benefit of enabling language-specific CSS rules, which comes in handy later.

Return of the Quotation Marks

As mentioned in a previous post, CJK fonts would render quotation marks as full-width characters, different from Latin fonts. This won't be a problem as long as a web page doesn't try to mix-and-match fonts: just use language specific font-stack.

body:lang(en) {
    font-family: "Oxygen Sans", sans-serif;
}

body:lang(zh) {
    font-family: "Noto Sans SC", sans-serif;
}

Coupled with matching lang attributes, the story would have ended here. Firefox even allows you to specify default fonts on a per language basis, so you can actually get away with just the fallback values, like sans-serif or serif, and not even bother writing language specific CSS.

However, what if I want to use Oxygen Sans for Latin characters, Noto Sans SC for CJK characters? While seemingly an sensible solution, specifying font stack like so,

body:lang(zh) {
    font-family: "Oxygen Sans", "Noto Sans SC", sans-serif;
}

would cause the quotation marks to be rendered using Oxygen Sans, which displays them as half-width characters. The solution I found is to declare an override font with a specified unicode-range that covers the quotation marks,

@font-face {
    font-family: "Noto Sans SC Override";
    unicode-range: U+2018-2019, U+201C-201D;
    src: local("NotoSansCJKsc-Regular");
}

and revise the font stack as

body:lang(zh) {
    font-family: "Noto Sans SC Override", "Oxygen Sans", "Noto Sans SC", sans-serif;
}

Now we can enjoy the quotation marks in their full-width glory!

Font Ninja

Font files are quite significant in size, and even more so for CJK ones: the Noto Sans SC font just mentioned is over 8MB in size. No matter how determined I am to serve everything from my own server, this seems like an utter overkill considering the average HTML file size on my site is probably closer to 8KB. How does all the web font services handle this then?

Most web font services work by adding a bunch of @font-face definitions into a website's style sheet, which pulls font files from dedicated servers. To reduce the size of files been served, Google Fonts slice the font file into smaller chunks, and declare corresponding unicode-range for each chunk under @font-face blocks (this is exactly how they handle CJK fonts). They also compress the font files into WOFF2, further reducing file size. On the other hand, Adobe Fonts (previously known as Typekit) seem to have some JavaScript wizardry that dynamically determines which glyphs to load from a font file.

Combining best of both worlds, and thanks to the fact that this is a static site, it is easy to gather all the used characters and serve a font file containing just that. The tools of choice here would be pyftsubset (available as a component of fonttools) and GNU AWK. Compressing font files into WOFF2 also requires Brotli, a compression library. Under Arch Linux, the required packages are python-fonttools, gawk, brotli, and python-brotli.

Here's a shell one-liner to collect all the used glyphs from generated HTML files:

find . -type f -name "*.html" -printf "%h/%f " | xargs -l awk 'BEGIN{FS="";ORS=""} {for(i=1;i<=NF;i++){chars[$(i)]=$(i);}} END{for(c in chars){print c;} }' > glyphs.txt

You may need to export LANG=en_US.UTF-8 (or any other UTF-8 locale) for certain glyphs to be handled correctly. With the list of glyphs, we can extract the useful part of font files and compress them:

pyftsubset NotoSansSC-Regular.otf --text-file=glyphs.txt --flavor=woff2 --output-file=NotoSansSC-Regular.woff2

Specifying --no-hinting and --desubroutinize can further reduce size of generated file at the cost of some aesthetic fine-tuning. A similar technique can be used to shrink down Latin fonts to include only ASCII characters (or keep the extended ASCII range with U+0000-00FF):

pyftsubset Oxygen-Sans.ttf --unicodes="U+0000-007F" --flavor=woff2 --output-file=Oxygen-Sans.woff2

Once this is done, available glyphs can be checked using most font manager software, or this online checker (no support for WOFF2 though, but you can convert into other formats first, such as WOFF).

I also played around the idea of actually dividing the glyphs into further chunks by popularity, so here's another one liner to get list of glyphs sorted by number of appearances

find . -type f -name "*.html" -printf "%h/%f " | xargs -l awk 'BEGIN{FS=""} {for(i=1;i<=NF;i++){chars[$(i)]++;}} END{for(c in chars){printf "%06d %s\n", chars[c], c;}}' | sort -r > glyph-by-freq.txt

It turns out my blog has around 1000 different Chinese characters, with roughly 400 of them appearing more than 10 times. Since the file sizes I get from directly a single subsetting is already good enough, I didn't bother proceeding with another split.

For Your Browsers Only

With all the tricks in my bag, I was able to cut down the combined font file size to around 250KB, still magnitudes above that of an HTML file though. While it is nice to see my site appearing the same across all devices and screens, I feel the benefit is out of proportion compared to the 100-fold increase in page size.

Maybe it is just not worth it to force the choice of fonts. In case you want to see my site as I would like to see it, here are my go-to fonts:

Review of Star Wars: The Rise of Skywalker

Spoiler alert!

Just to get it out of the way: I watched the prequels before the original. I thought the prequels were fine - at first viewing, I felt it was as much an Obi-Wan story as it was Anakin's, and I didn't fully realize how good McGregor's performance was until I watched the old trilogy: they felt like the same person to my childhood self. After seeing full picture of the story, I can see how people who grew up with the original trilogy would view the prequels as an utter blasphemy of the original. Watching the prequels first did took out some thrill of the big reveal in Empire Strikes Back, but I was no less shocked when Anakin actually turned to the dark side in the prequels.

My very first encounter with Star Wars, however, were not the prequels, but a version of Star Wars: The Visual Encyclopedia I found in a local book store. It was all those weird weapons (including a lightwhip that I remembered distinctively), spaceships, and costumes that first enticed me to this world. I was more than rejoiced to find out that the prequels depicted exactly such an colorful yet exotic world. The tone of the original trilogy was a lot more bleak, more "spacey" than "alieny", and as a child who just witnessed the downfall of Anakin, the transition felt natural to me.

Moving on to the sequel trilogy. I watched The Force Awakens on launch date at 19:00 with my college roommate on launch date and we spent half an hour searching for a parking spot, barely making it to the screening by the opening scroll (we still got a ticket though). As for The Last Jedi, I watched it at night a month after it launched. I went to see The Rise of Skywalker at 9:00 the day it was released, a surprisingly fitting time for the end to a trilogy. TFA was a decent start, nostalgia mixed with several intriguing leads made the experience quite enjoyable. TLJ left a really bad taste in my mouth in that it not only answered questions TFA raised in the poorest way possible, but also spent too much time trying to teach the old, established characters a lesson while neglecting the growth and development of the new characters. I still have an unpublished blog post full of my rants on TLJ (from 2018), so let's move on to TROS.

In short, I enjoyed watching TROS, despite it being a over-packed messy hodgepodge.

The beginning sequence revealing Kylo's encounter with the Emperor and the Falcon crew escaping First Order was succinct and exciting. As Finn, Rey, and Poe reunite though, the pacing dropped considerably, with meaningless arguments breaking out between the trio: I really hope team building is something the last movie of a trilogy shouldn't be worrying about, but the plots of TLJ left J. J. Abrams little choice here I guess.

Then the movie went nowhere for a good half an hour showing the trio wondering around different planets doing things, also "sacrificing" Chewbacca and C-3PO in the process. Even the revelation of Rey's healing powers seemed so intentional that they are bound to be plot devices. The only good scene out of all these is probably the Rey facing off Kylo. In fact, most of the dual scenes between the two are really enjoyable, and these are the only places I can see the slightest bit of human emotion from Rey (in contrast to Kylo, whose constant struggle and change of heart were expressed amazingly by Adam Driver). Rey being a Palpatine was interesting at first, but adds little to the her overall character: it was Kylo who felt the temptation from the dark side this whole time, and all of a sudden this becomes Rey's thing?

The dual on Death Star remains was visually stunning, but the way it ended could have been a bit less awkward: more mandatory plot device showoff, and an extra dose of Han Solo that I think was totally unnecessary given how good Adam Driver's portrayal is. Carrie Fisher's passing away was unfortunate, but I think that caused the rather rushed ending for her character. The entire self-exile sequence also felt corny, and uncharacteristic of Rey. Perhaps Leia being the one to give Rey the last guidance and her lightsaber would have worked better (either as she is passing away or as a Force ghost)?

Subsequent plot again splits Rey away from her supposed "teammates", and sets the stage of the final showdown between the Resistance and the "Final Order" based on Finn and Poe's seemingly crazy idea (which Poe was specifically told not to do in TLJ). Lando appearing early in the plot and doing pretty much nothing feels like a missed opportunity: lack of screen time with the new crew in previous films left him with little ways to interact with them. I would much prefer if he just make a one-off appearance among the thousands of starships coming to the Resistance's aid in the end. I like the do Resistance' side of story here though: characters are shown to be working together with good chemistry, and they accomplished the impossible in a sensible way.

The fight with the Emperor though was a mixed bag: everything leading up to the final face down was amazing (the lightsaber passing scene was great), until Rey had to face the Emperor alone. There seems to be simply too little emotional connections between Rey and the Emperor for any confrontation between them to have any weight. If anyone, Kylo Ren should be the one allowed to show his resolution at the end of his long journey, not Rey being the same Rey she was in TFA. Having Kylo sacrificing himself in the fight, assuming the role Vader played in Return of the Jedi, would have been a much more fitting ending to him than crawling back to heal and kiss Rey (AKA showing off the plot device we spend 15 minutes foreshadowing). The thousand Sith vs. thousand Jedi bit felt forced (pun intended) and doesn't really even tie into the story that much. By the way, the Emperor looked SCARY, and in a entertaining way: the aesthetic resembles 80's horror film, and strangely felt right here (not to mention that the movie opened with "THE DEAD SPEAK!"). It's also funny that star destroyers are finally rightfully so with their shiny new canons.

Well, looks like I didn't really enjoy the movie now, do I? I'm also surprised that I can still pull out so many things I didn't like despite remembering walking out of the movie theater with a sense of relief and fulfillment. Looking back, the whole trilogy just felt poorly planned, with throw-away characters appearing here and there whose screen time fed to some new droid or alien creature every film presumably just to sell more toys, and broken plot lines that just didn't really make sense. Perhaps The Rise of Skywalker is a valiant attempt at responding to a trick question with no suitable answer and I appreciated the effort. I wonder what the generation growing up with the sequel trilogy would think about them though: would they look back on them fondly the same way I look at the prequels (or Spider-Man 3 for that matter), or is my feeling not entirely clouded by nostalgia after all?

2019 in Review

Each New Year feels incrementally less special (see the relative length argument I wrote in review of 2018), although as an end to the laziness-inducing and overly noisy holiday season, having festival associated with new goals and resolutions is not too bad an idea.

Looking at the number 2020 does give me a bit more excitement as it always felt like such a distant time in the future: I don't even remember any sci-fi works referencing anything in the 2020s, as most of them either stopped at the 2010s or go way beyond to the 3000s. This new decade is stuck in what I call, an expectation limbo. Oh, fun fact from Wikipedia, 2020 will also see the beginning of the year of Metal Rat in Chinese calendar - such a punk name for a something rather traditional.

2019 Rewind

I'm glad to announce that everything went according to plan in 2019!

  • ☑ Run 400 miles. [555/400]
  • ☑ Write 10 blog posts. [14/10]
  • ☑ Stop using Gmail/Inbox app. [2/2]
  • ☑ Add rel=me links to blog.
  • ☑ Dive into Rust and Julia. [2/2]
  • ☑ Record books, music, and shows I enjoyed.
  • ☑ Clean my desktop computer.

With some uncertainties in my life sorted out, I've been following my daily routines a lot better in 2019. I still procrastinate on blog posts from time to time, but at least I'm also publishing ever so more often. Learning Rust and Julia was fun, and I did get some uses out of them in real projects. I also attempted Advent of Code in both languages and got to the day 17 before each day's questions started taking too much time. I thought about finishing all the remaining ones after Christmas, but decided against it - I don't see myself learning more about the languages through these questions and the time could be better spent elsewhere.

I gave the blog a face-lift, writing a new Hugo theme in the process, with this article serving as an inspiration. I find the process of eliminating all the nice-to-haves and bells-and-whistles oddly satisfying: I enjoy having my blog as a lean, mean, killing machine. The website feed has been updated to use ATOM, with each post displayed in its entirety instead of the default broken summary. I started using web feeds for reading blogs after failing miserably at my last attempt to organize my web browser bookmarks and I've enjoyed it so far.

I went to the movies three times this year, for Dragon Ball Super: Broly, Promare, and Star Wars: The Rise of Skywalker, and I mostly enjoyed my time in all three occasions: trying to quantify this enjoyment using a rating is probably not doing them justice, as I enjoyed the three for vastly different reasons. Since I visit the theaters so scarcely, the time I spend on airplanes actually account for a large portion of my movie consumption, and Spider-Man: Into to the Spider-Verse was the stand out among those I watched while flying. On Christmas Eve, instead of Toradora!, an tradition of mine during college, I rewatched Scott Pilgrim vs. the World, which I also first saw on a flight years ago, and it was as geird (Good but in a weird way and weird but in a good way, like, you know, goodd? This is not a real word yet, but who cares? It's year of Metal Rat!) as I remembered.

As someone that have always lived at places that never snow until 2018, snow to me is such a distant concept that I'd consider any movie with snow scenes a Christmas movie. Even after moving to a snowing city mid 2018, I still haven't seen the stereotypical fluffy Christmas snow that almost seem warm to the touch: all I got were piles of dirty water-ice compound on roadsides (oh and they find their way into your shoes too) that reminds me more of fish market than Christmas. Meh, I bet all those snow in the movies are just props, just like the Moon landing.

Back in 2018, I set up a Mastodon instance, and I have since deleted my twitter account, but I never use Mastodon that much. At first I thought it's just that I'm not the microblogging type, but now I'm starting think it's all the cognitive overhead of logging in, editing, tagging, and translating that made posting a status update rather daunting for me. I have been microblogging in twtxt format for the last few months of 2019 and I did rack up a considerable status count (all 110 of them). Being one command away from dumping whatever silly one-liner I have in mind is quite addicting, and revisiting the twtxt file also gives me new post ideas at times. Separating a Twitter-like social media service into read-only (with feed reader), write-only (what I use twtxt for), and interactive (still figuring that one out) parts simply works better for me. My twtxt file will likely replace Mastodon in the footer this year, as keeping a behemoth (just one more pun, please?) of a web app happy and running has not been exactly pleasant for me.

Speaking of deleting accounts, I've setup forwarding from all my Gmail accounts to my self-hosted email and all my devices are Gmail and Inbox (RIP) free now. I'm still not quite ready to completely ditch the Google accounts yet, but I'm getting closer: I've been trying out registrars other than Google Domains, and using Youtube's built-in web feeds instead of relying on subscriptions. I did finally remove my Facebook account though, and the account purge will likely continue.

Road to 2020

Since it worked out really well for 2019, I'll continue to match previous year's numbers on blog posts and running. Donuts have been my guilty pleasure late night in 2019 (thanks to Dunkin'), and it's one of the few kinds of food I still willingly consume despite the detrimental effect to my health. Thus, no donuts in 2020, not under my watch!

I do want to set up a proper data backup workflow, following 3-2-1 strategy and all that. As for new languages to learn, Go has been creeping on my radar for a while and with the release of Go modules, it seems like the right time to jump in. There's also C++20 on the horizon.

2020 will also see the release of several Linux phones (Librem 5 and PinePhone), and I'm more than willing to try some of them out and explore ways of escaping the Apple ecosystem. PineTime, on the other hand, might be the perfect replacement for my aging Pebble Time Round.

Out side of technical ones, I haven't read any books in quite a while, at least not those that can be classified as proper literature, and I seek to change that in 2020.

Here's to the end of Skywalker saga, and whatever comes after generation Z!

Staring at Yesterday in One Eye

The dead speak! The galaxy has heard a mysterious broadcast, a record of THE GREAT DIMENSIONAL FRACTURE in the familiar yet distant tone of MY PREVIOUS VESSEL.

Alright, that's enough puns and The Rise of Skywalker jokes from me. As mentioned before, my blog was originally using WordPress and switched to Hugo in 2017-09-01. To be more specific, I actually had two WordPress blogs: one named Pandora (because of Borderlands 2, but this is such an cliche name that I'm sure there's a million other imaginary planets using this name) hosted with WordPress.com, and another being Library of Trantor (because of Isaac Asimov's Foundation series) hosted on Bluehost, with the former written in English and latter in Chinese. Since I kept archives of both before taking them down, I was able to revive all those old posts from the grave using this tool and some elbow grease. I refrained myself from leaving out any of the old posts, as the main motivation of this effort is really just to be able to easily see and be reminded of my younger self. It's a strange yet familiar experience reading those old writings: I can see parts of me that has changed and parts that are still distinctively shimmy1996.

Handling images is tricky and my old posts made quite liberal uses of them unfortunately: I opted for the simplest way out and just kept the originals without any kind of fancy compression or styling. I still need to figure out a more efficient way to both store and serve those images. Even with Git LFS available, I was reluctant to add over 300 MB of images to my blog repository (so they are currently in untracked-land), and now my blog could definitely benefit from a CDN setup. Perhaps I could also do what Jupyter notebooks do—encode all images in Base64—to get a single HTML file.

For the regular visitors of my blog out there (if there are any), you might notice that the comment section looks different: that's right, the search for the ideal static site commenting system is finally over for me (until it starts again)! Through out the years, I've used WordPress, Duoshuo (now defunct), Disqus, and Isso as my comment systems. Now, my Hyperskip has superseded them all: taking inspiration from Staticman, I set up Hyperskip to store all the comments in a TOML file, and opted to use email as the submission method, simplifying the setup. Gone are the days of databases, queries, and external scripts, and I get to migrate and version control all the comments (including the ones from the WordPress era) in the same Git repository as my blog too.

The Other Old Friend

One week into the New Year and I have already switched color scheme of the website five times, with another dozen sitting in my folder (totally not because of how unsightly these RGB color codes are). Much like how I'm a bit burned-out from getting fonts right for everything, I have decided to remove all custom color choices from the website: no more syntax highlighting, fancy buttons, nor dark modes.

As long as there are little knobs that I can toy around with, I always find myself distracted and spending way too much time worrying about the most insignificant choice of words, colors, or spacing (as you can tell by how much of I blog about the blog). The only cure I found is to simply remove the opportunities of making those choices altogether, and stick with the default. This is why I replaced Isso (I spent too much time trying to make it not look so foreign) and tags and categories are now gone, too.

Completely contrast to how the saying normally goes, I hardly ever find myself missing the things I cut away. More often than not, I sympathize with the elephant that finally broke free from the rope, rather than the remorse after losing something cherished. I do occasionally ask myself whether maintaining all those babbling by my past self is just another such rope holding me back that I just haven't realized yet. Well, my response is: a cowboy could always use a lasso on the road.

Becoming Pangu with GNU sed

In case you aren't familiar with Chinese mythology or blogosphere, there's an old meme aptly named "Space of Pangu": a typesetting rule of thumb in favor of additional spacing between Chinese characters (but not punctuation marks) and Latin characters or numbers. My variant of the rule also includes additional spacing around any HTML elements like links and emphasis.

Up till now, I've been manually adding spaces in my source files (in Markdown or org), which is admittedly the worst way to do it. Aside from the additional chore, such a typesetting rule should, in my opinion, be implemented in the output/rendering format, not the source. Besides, manually fixing all the old posts I just brought back is not exactly a rewarding task. Unwilling to load additional JavaScript, I turned to the all-mighty GNU sed. To add Space of Pangu to the final HTML and XML files that Hugo produces (normally in the ./public directory), I used the following shell script:

#! /usr/bin/env sh
# For punctuation marks to be recongnized correctly.
export LC_CTYPE=en_US.UTF-8
find . -path "./public/*" \( -name "*.html" -or -name "*.xml" \) -print -exec sed \
     -e 's/\([a-zA-Z0-9]\|<\/[a-z]*>\)\([^[:punct:][:space:]a-zA-Z0-9\s]\)/\1 \2/g' \
     -e 's/\([^[:punct:][:space:][:alnum:]]\)\([a-zA-Z0-9]\|<[a-z]\)/\1 \2/g' \
     -i {} ";"

In case you are adamant about adhering to the recommendation by this W3C Working Draft and wouldn't mind bloating up the resulting web page, using CSS to create the spacing should do the trick:

find . -path "./public/*" \( -name "*.html" -or -name "*.xml" \) -print -exec sed \
     -e 's/\([a-zA-Z0-9]\|<\/[a-z]*>\)\([^[:punct:][:space:]a-zA-Z0-9\s]\)/\1<span style="margin:0.25ch;"><\/span>\2/g' \
     -e 's/\([^[:punct:][:space:]a-zA-Z0-9]\)\([a-zA-Z0-9]\|<[a-z]\)/\1<span style="margin:0.25ch;"><\/span>\2/g' \
     -i {} ";"

If you are another one of those Space of Pangu disciples, just note that there's no need to worry about adding spaces when leaving comments here: thanks to Hyperskip comments being inserted at Hugo's building stage, they are affected by those scripts as well. Just sit back, relax, and enjoy staring at the blank spaces.

March Goes out Like a Lion, Too

It was not until a few weeks ago (while watching Level1 News) that I learned about the complete version of the saying "March comes in like a lion and goes out like a lamb." I knew the first half of the saying from manga series March Comes in Like a Lion, but I had no idea the saying was describing the weather in March.

What you have read so far was actually my entire motivation for starting this post, but it has indeed been a rather unusual March. Because of COVID-19, I'm spending time at home "social distancing", or rather, indulging myself in the company of solitude. In preparation for (a.k.a. using as an excuse) extended periods of working from home, I went on an upgrade spree for electronics: I got a second monitor, a monitor stand, and larger hard drives for my NAS. In fact, I've been gradually expanding my arsenal of devices since last Fall, so look out for a potential setup post.

Amazon's Prime Now service has been keeping me fed for two out of the past three years during which I cooked for myself. It's a bit alarming that Amazon of all things has become literally something I can't live without. But until I have my underground bunker and algae farm, I'll have to make do with this symbiotic (or should I say parasitic) relationship. I'm not sure if I really enjoy cooking though, as least most of my efforts devoted to it has been on how to reduce the amount of time I spend in the kitchen. Fortunately I hardly ever get tired of eating the same dishes, so I just kept making the same ones, while gradually optimizing the preparation: I have yogurt and trail mix for breakfast, beef curry with rice for lunch, and pan-fried salmon with rice and stir-fried cabbage for dinner.

The pandemic also puts my running plans on hold: the trail I normally run on has been closed down. I did get plenty of mileages in before social distancing started (4 weeks ahead of schedule in terms of total mileage now), so I should still be on track to hit my 2020 target. Perhaps due to the snow and ice along the way, my running shoes (Mizuno Wave Rider 23) are wearing out faster than before: at 250-mile-mark, I'm already feeling arch discomfort in longer distance runs, while previous iterations of those shoes lasted until around 300 miles. Aside from shoe issues, shin pain also started to creep up as I've been doing longer runs, so this just might be the opportunity I needed to take some rest. I have converted myself to a morning runner as I plan to ultimately sneak a run or two on weekdays. So far I'm enjoying my morning routines, despite a few of snow-stormy days that were extra tough (but fun). Plus, I get to see sunrise instead of its less cheerful sibling.

Reading the news during the outbreak frequently struck me with an unreal feeling: because of both the things that are actually happening and the way news articles covers them in a deliberately divisive facade. To be fair, asking an organization that preys on human attention to report in a plain and down-to-earth way is an oxymoron in itself. It's probably hypocritical for me to pick on the news agencies though, as I am also guilty of deriving excitement from the current situation: the mere thought that what is ordinarily just an apartment is now my personal fortress against an uncured pathogen is enough to keep me up at night.

Should this indeed be the downfall of humanity, at least my blog and Emacs configuration will (assuming Microsoft means it) live on thanks to the Github Archive Program. Before that, be safe, stay at your personal living pods, and prepare for the neon-colored Space-Age algae diet we've all been waiting for.

Static Alternatives to Mastodon and Gitea

Like how I decided to switch off Wordpress, I think I've had enough running Mastodon and Gitea.

Keeping up with configuration changes with Gitea had been annoying, whereas with Mastodon, breakages are common due the mismatching system library versions (mostly protobuf) in the dependencies. While the latter is not a fault of Mastodon itself, having to install two package managers (for Ruby and Node.js, respectively) just to run a program is rather ridiculous to me.

I started hosting both applications in 2018: Mastadon first as a replacement for Twitter, and Gitea later in reaction to Microsoft's acquisition of Github. Looking back, they were probably overkill for my needs: my primary use case for a git server and a micro blog are both very much single-user focused and write-only, which means these content should be available in read-only form for my site's visitors, making static pages the perfect replacement for both web front ends.

Starting with Mastadon, I'm using the twtxt format to store and serve my micro blog. The format has existed for some time now, but enjoyed a recent resurgence in the tildeverse (a series of websites offering public access Unix-like systems). While there is now a whole community supported ecosystem of various syntax extensions and software seeking to add more features to the format, I have found the barebone timestamp-tab-and-then-text syntax to be sufficient. The write-and-forget cycle is really addicting, and even more so when using a command line client (mine is aptly named twixter).

As for Gitea, while an excellent Github replacement in my opinion, is more suitable for community collaboration than as a personal project dumping ground. I opted to manage the git repositories directly (see Chapter 4.4 and 4.5 of Pro Git), and use stagit to generate the corresponding HTML files. These stagit-generated pages have replaced Gitea as the new Trantor Holocron.

Now that I have found satisfactory solution for the write-only portion of my online presence, I will continue to explore options for the remaining two pillars: read-only (content consumption) and interaction (means of communication). Web feeds and email are my best answers now, but they still don't cover all the bases in my experience.

Blog 9 from Outer Space

Recently, I've been thinking about ways to unify my micro blog entries with my current site, and I've been reconsidering the ideas from IndieWeb: unlike ActivityPub (the protocol Mastodon, Pleroma and the likes use for federation), which seems to want everything be done dynamically via server APIs and JSON responses, the various standards recommended by the IndieWeb community allows machine readable feed to be generated straight from a static HTML file correctly marked-up. A core idea that IndieWeb seem to implicitly rely on is the lifetime of the URIs, and to a greater extent, site owner's control over the domain name. Withe the recent drama regarding the .ORG domain, I came to realize that a future in which domain names are too expensive to maintain (or are subject to seizures by various entities) may not actually be too distant, and this could seriously undermine the entire premise IndieWeb is built upon, not to mention the a lot more common link rots. Fortunately, I think the IPFS (InterPlanetary File System) has the potential to solve both problems.

A Crash Course on IPFS

Now, now, I know when compared similar projects like the Dat protocol, pingfs, or even Scruttlebutt, IPFS has a really buzz-wordy vibe (trust me, I was as skeptical as you are at the beginning) to it, and the various cryptocurrency start-ups that bundle IPFS and all kinds of acronyms in their marketing materials surely doesn't do it any favors, but it does seem like the most established and ready-to-use. Here's my best attempt at explaining IPFS, with information mostly obtained from the official documentation and this talk. In case you are interested in further implementation details, this session from IPFS Camp 2019 is a great starting point.

A simplified interpretation of link to an web page is but a fancy way to point to a file on some server. Just like path to a file, the link would be unreachable if the server is down, even if someone sitting in the same room might have the contents cached. In IPFS, files (or data blocks) are addressed by corresponding cryptographic hashes of their contents, and stored in a distributed fashion across all peers. This means no centralized facility is required to access the files, file integrity can be easily verified, P2P sharing can be used to speed up access, and files stored this way are inherently immutable.

Not being able to change files seems like a rather large price to pay, but just like any other problem in computer science, this can be solved by adding a layer of abstraction. IPNS (InterPlanetary Name System) utilizes public-key cryptography to create immutable addresses that can point to different files. An IPNS address is basically the hash of a public key. An IPNS lookup would involve retrieval of the public key, searching for files (each containing an IPFS address) signed by the corresponding private keys, identifying the most recent one, and finally redirecting to the correct file. To utilize IPNS, the user would start by creating a public-private key pair, followed by uploading desired files into IPNS, and sign and upload a pointer file containing IPFS address to the uploaded content. When an update is desired, the user only need to sign and upload another pointer file to the new location.

A lot of ideas used in IPFS has been explored before by projects like BitTorrent (peer-to-peer sharing), Fossil and Venti from Plan9 (write-once data blocks and path redirection), git (Merkle tree/directed acyclic graph), etc. However, the killer feature is how easily IPFS integrates with existing infrastructure. Not only are there HTTP gateways that allows for accessing IPFS/IPNS from web browsers instead of IPFS clients, but also compatibility with FUSE (Filesystem in Userspace), which actually allows you to mount the entire IPFS as a read-only partition: sure this also makes hosting static websites possible, but you have to admit that having access to a global-scale (or should I say, interplanetary?) P2P shared drive is way cooler.

Hosting Static Websites on IPFS

The official guide already outlines the general usage pattern pretty well. Here's the TLDR:

  • Run ipfs init and ipfs daemon to initialize and start the IPFS client.
  • Generate the website files and run ipfs add -r <website-root> to send its contents onto the IPFS. The last few lines of the output should tell you the hash for the root directory.
  • If you want to make use of IPNS, run ipfs name publish <website-root-hash> to direct the IPNS link to the folder you just uploaded. The IPNS public key hash can be obtained via ipfs key list -l.
  • Repeat the last two steps every time and the website files are updated or rebuilt. The process has little overhead due to the inherent deduplication in addressing, making it particularly suitable for static sites where larger files (like photos) tend to change less often.

Once this is done, you can access your website at either <gatway-address>/ipfs/<website-root-hash> or <gatway-address>/ipns/<ipns-address> from any HTTP gateway: you can use the local one (likely at 127.0.0.1:8080) started by the IPFS daemon, or any of the public ones (comes with extra risk of MITM attacks from the gateway owners as file retrieval is done on the gateway servers). In case you have multiple websites, you can generate more IPNS key pairs using ipns key gen, and specify --key when running ipfs name publish to a specific IPNS address.

Before IPFS supports import/export of the IPNS keys though (so that we can backup keys and publish from multiple devices), DNSLink can be used to more conveniently maintain access to a site, albeit at the cost of depending on owning a domain name and trusting the DNS host provider. To allow access to the site from the gateways via /ipns/<domain-name>, simply add a TXT record to the domain:

dnslink=/ipfs/<website-root-hash>

or

dnslink=/ipns/<ipns-address>

For instance, you can now access this site using at /ipns/shimmy1996.com (this is a link using the ipfs.io gateway). While not flawless, to me this is a reasonable compromise for now. I find find IPFS to be generally faster than IPNS, so using IPFS address with DNSLink probably makes more sense. To avoid manually copy-pasting the IPFS address each time, I added to my blog build script the following to automatically upload website to IPFS and update DNS record (using DigitalOcean's API):

echo "Uploading to IPFS..."
hash=$(/usr/bin/ipfs add -Qr "<website-root>")

echo "Updating DNSLink record..."
token="<digitalocean-api-token>"
curl -X PUT \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer $token" \
     -d "{\"data\":\"dnslink=/ipfs/$hash\"}" \
     "https://api.digitalocean.com/v2/domains/<domain>/records/<record-id>"

Record ID for DNS records on DigitalOcean can also be retrieved via their API. You may need to add ?page=2 or later to the request to find the record you want.

Do note that like using any offline HTML files, we need to use relative URLs in the generated web pages. In Hugo, this can be achieved by setting

relativeURLs = true

in config.toml.

Of course, being a P2P network, IPFS won't be able to retrieve the files if there is no copy to work with at all. By default, IPFS client would pin anything you shared from the local machine: pinned contents won't get deleted, ensuring at least one copy of the shared content is available on IPFS. You can unpin outdated versions of the website, or if you want, find and pin the shared directory on multiple machines for some redundancy.

The Stars, Like Dust

Back to the issue with IndieWeb: the increasingly shady domain name system and link rots makes URI stability in HTTP hard to maintain. However, what if we use IPFS/IPNS addresses as URIs? It's a match made in heaven: we get robust distributed access to static web pages, gated by Mathematics instead of FBI warnings, that can theoretically last forever. Removing the need for maintaining a server also lowers the barrier of entry of owning a website. The HTTP protocol has existed for 29 years, and IPFS, only 5. I don't know if IPFS will continue to exist for the next 24 years to come, but if it does, I hope we will be looking at a perhaps more chaotic, but more robust, lively, and colorful, online world.