Astral Pain
Anyone who’s followed this blog, since the early days, has read more than one instance of my complaints about crappy support for Unicode in common programming tools. I’m sad to report that, even in 2012, doing Unicode is (still) harder than it looks.
Heterotic Beast is my math-enabled Forum software. It runs on Rails 3.1.6 and Ruby 1.9.3, so you’d think that all would be good. Which was why I was surprised that this post was ill-formed.
For faster performance, Heterotic Beast caches the rendered (X)HTML of each post in the database. Sure enough, the cached XHTML was truncated just before the “𝒜”, a character which, in Unicode, lies in Plane-1 (U+1D49C). Evidently, there was a problem storing characters outside the BMP.
Now, Rails3, by default, creates MySQL database tables with the ‘utf8
’ encoding. Since UTF-8 covers all 16 Unicode planes, you might think that would be sufficient. You would be wrong. MySQL’s utf8
encoding only covers the BMP. It can’t handle 4-byte characters at all.
Fortunately, MySQL 5.5.3 (released in March 2010) introduced a new encoding, ‘utf8mb4
’, which actually, y’know, supports Unicode.
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
did the trick. Now the posts, in the database, didn’t get truncated at the first astral plane character. Unfortunately, instead of astral plane characters, the database entries contained garbage characters. Obviously, Rails had no idea that I had switched encodings in the database. I needed to say so, explicitly, in config/database.yml
:
production: adapter: mysql2 host: 127.0.0.1 database: beast username: ... password: ... encoding: utf8mb4 port: 3306
Ah, if only life were so simple. The release version of the mysql2
gem doesn’t support the utf8mb4
encoding. Fortunately (as of December, 2011), the development version does. So
gem 'mysql2', :git => 'http://github.com/brianmario/mysql2.git'
(finally!) makes everything work as it should.
Remarkably, even after a decade of such pain, Unicode is, in 2012, still “cutting edge.”
Re: Astral Pain
I’m not sure what you’re quoting “Cutting Edge” from, but neither Ruby, nor Rails, nor MySQL could be considered, on any level, “Cutting Edge.”
Unless you had Cobol or Coldfusion in mind, in which case, sure, if people use it now, it’s relatively “cutting edge.”