Skip to main content

SQLite WASM: Something subtle in your browser

 5 minutes

Here’s an exercise for you. Go to the Craft of Emacs website and type the word “list” into the search bar.

You might think you’ve executed a perfectly ordinary text search, the sort you can do on Medium, or Hacker News, or Wikipedia, but obviously simpler to account for the smallish content on the site.

You’d be wrong.

If you were savvy enough to flip open your browser’s developer tools, you’d notice a distinct absence of network requests made during the search. You might reasonably conclude that it’s making some kind of local JavaScript index, a bit custom perhaps, but not too bizarre for a niche homebrew website.

You’d still be wrong.

If you were really competent with dev tools, and dedicated enough to pour over the uncompressed JavaScript, you’d notice a few references to a suspiciously named sqlite.wasm file, and if you were really ahead of the web development curve you’d know what sqlite and wasm actually meant…

And you’d be right.

In making that search, you’ve just spun up a light, but very powerful database. In your browser.

The trouble with text

For those of us who missed the invite to the web search party, an explanation is in order.

Designing a full-text search is hard. You need to store your entire corpus of text, potentially millions of words, split it into tokens, and then have a way of finding groups of those tokens that match a search term. Your method must cope with the intricate complexities of the English language, its plurals and punctuation, and to match multiple search terms in a way that gets accurate results. And it needs to be fast.

Designing a full-text search is hard, but it’s well understood in the space of data storage. There are off the shelf technologies like Elasticsearch which is well, you know, for search, and most databases will have a full-text search feature, (FTS because we love our acronyms).

If you want to search a corpus, you only need to pick your favourite flavour of data store, slather it on a server, sprinkle it with text, and let its own search feature do the rest.

What data store you choose depends on your desires. Do you dream of creating a website as large as Wikipedia? Or to support thousands of searches simultaneously? To add text to your corpus as quickly as you can type it?

For my own humble aspirations, I’d like a store that scales, but only a little, that can be rebuilt quickly and easily each time I write and, since its exposed and vulnerable on the web with all sorts of malicious characters typing and being typed into the search bar, one that’s safe.

A local flavour

My own comfort food is SQLite.

SQLite is, as its name suggests, a lightweight SQL database. It doesn’t scale like Elasticsearch does. It can only live on a single machine, backed by a single simple file, and scales just as a file scales.

Don’t let its unassuming nature fool you. It’s not distributed, and therefore less complex and more versatile for an application than any heavyweight store could be.

It’s the data store your operating system uses if you’re reading this from a phone, and the data store your browser uses even if you’re not.

As for how it works: think of it less a database and more as a library for incredibly sophisticated file manipulation. Whenever you store a bookmark in Firefox, it calls some SQLite code for persistence. The SQLite code, crafted in C, is embedded; weaved into the application. Since the code is part of its own, Firefox doesn’t concern itself with calling out to another process, or need worry about handling an asynchronous response.

In browser country

But there’s a limit to where SQLite can be used. It’s written in C, and so whatever ecosystem you call it from needs to support C bindings. And while C is a lingua franca — any language worth its salt must bind to C eventually — the web notoriously cannot support it.

Any code bundled by a webpage ultimately must be in JavaScript, to be interpreted by your browser. This is safe and sandboxed, as it should be — you can rest assured that an arbitrary web page can’t read your filesystem for its own malicious ends.

But the safety of JavaScript comes at a cost. If you want to use a tool in another language in the browser, you need to rewrite that tool in JavaScript. Web apps are so prolific that a lot of langauges can be automatically rewritten, but for a language as complex as C, you’re at a loss.

That is, you were before the advent of WebAssembly.

To WebAssembly

WebAssembly (WASM for short) is a byte code. It’s executed directly by the browser, not interpreted, so is much faster than JavaScript. More importantly for SQLite, C can be compiled to it.

Through their own efforts, the SQLite team have wrangled with the compiler toolchain to generate a WebAssembly binary. That sqlite.wasm file, only 780KB large, contains the whole of SQLite, or at least everything you need of it to run a full-text search.

Creatively, it uses your browser’s own storage as a filesystem. Its your exclusive database: you aren’t sharing it with anyone else. I’m not tracking whatever you search for — I can’t — and you can’t, through malicious queries, harm anyone but yourself.

Is it performant? Worth the effort? Widely supported?

I’m an enthusiast writing a small website. At this point, I’m not concerned with those questions. But as a fond supporter of SQLite, who has been watching WebAssembly from the sidelines, I can say that SQLite WASM just is.

It is in my browser, and now in yours too.