kebab-case: Understanding thread pools through cats-effect- the IORuntime

Blocking and computing

The snooze task didn’t do any calculating — its only operation was a Thread.sleep — so its thread didn’t occupy a processor in order for the task to progress. As long as we had an unbounded thread pool, an unlimited number of snooze tasks could run at once, each on their own thread. These sorts of tasks are known as “blocking”: the application sits and waits for them to complete, but doesn’t actually do any calculations. Blocking tasks are rare, and you should be very reluctant to write one.

The factorial task, on the other hand, did a lot of multiplication. Each task occupied one of my eight processors as it ran, so only eight of those tasks could run at the same time. These sorts of tasks are termed “compute-intensive”. While factorial doesn’t resemble a typical Scala application, it’s more similar to one than snooze.

Varying it up

This difference between blocking and compute-intensive tasks poses a problem for us: what if we want to run a load of factorial and snooze tasks at the same time?

val snoozeAndCompute: IO[Unit] = 
  List(tenFactorials, tenSnoozes).parSequence.void

Which runtime should we choose?

If we use the basicRuntime, each task will be given its own thread. This is good for the blocking snooze task, but bad for factorial. But if we use a boundedRuntime our snooze task will block a thread that factorial could use to progress.

time(snoozeAndCompute).unsafeRunSync()(boundedRuntime(numProcessors))
// res12: String = "The task took 6 seconds."

As expected, using a boundedRuntime isn’t ideal.

How can we give the blocking snooze task unlimited scaling, but bound the factorial task at eight threads?

Thankfully, there’s a way to get the best of both worlds. Instead of having just one thread pool, we could have two: an unbounded thread pool for blocking tasks and a bounded one for compute tasks.

It turns out that cats-effect 3 IORuntime supports this exact use case. Let’s take a closer look at the setup code for the boundedRuntime to see how. Here’s a simplified version:

def boundedRuntime(numThreads: Int): IORuntime = 
  IORuntime(
    compute = IORuntime.createDefaultComputeThreadPool(numThreads),
    blocking = IORuntime.createDefaultBlockingExecutionContext()
  )

The IORuntime accepts two thread pool arguments: compute and blocking. It uses these thread pools for the compute-intensive and blocking operations respectively.

We can access the compute thread pool using the compute field. This gives us an ExecutionContext:

boundedRuntime(numProcessors).compute
// res13: ExecutionContext = cats.effect.unsafe.WorkStealingThreadPool@4da4ceec

A proper snooze

You might be a bit confused by this: there are two pools in the IORuntime, but haven’t we only been thinking about one?

So far, we’ve thought of the basicRuntime and boundedRuntime functions as configuring a single pool. In actual fact, they configure two: they both have a hard-coded unbounded blocking pool. It’s just that we never used it.

By default, cats-effect’s IO will always use the compute pool — this is the pool we set a bound on in boundedRuntime. If we want to tap into the blocking pool, we must use a different constructor: the aptly named IO.blocking.

Here’s a better snooze function:

val betterSnooze: IO[Unit] = IO.blocking(Thread.sleep(2000L))
val tenBetterSnoozes: IO[Unit] =
  List.fill(10)(betterSnooze).parSequence.void

Let’s run a few better snoozes using our boundedRuntime.

time(tenBetterSnoozes).unsafeRunSync()(boundedRuntime(numProcessors))
// res14: String = "The task took 2 seconds."

Our previous tenSnoozes task took four seconds on the boundedRuntime because it was run on the bounded compute pool. On the other hand, tenBetterSnoozes only takes two seconds: it’s run on the unbounded blocking pool.

A better work-sleep balance

What happens if we interleave blocking operations with compute-intensive ones?

Let’s have a task composed of both:

val betterSnoozeAndCompute: IO[Unit] =
  List(tenFactorials, tenBetterSnoozes).parSequence.void

time(betterSnoozeAndCompute).unsafeRunSync()(
  boundedRuntime(numProcessors)
  )
// res15: String = "The task took 3 seconds."

It’s much faster: the threads in the bounded compute pool no longer need to handle the Thread.sleep, and the unbounded blocking pool lets the betterSnooze task scale unlimitedly.

The global IORuntime

We’ve explored a lot with our basicRuntime and boundedRuntime functions. But we really wanted to know about IORuntime.global.

What’s special about it?

In actual fact, you’ve already used it: the global runtime is effectively a runtime with a compute pool bounded at the number of available processors. In other words, it’s the same as the boundedRuntime(numProcessors) we settled on earlier.

Whenever you need to use a thread pool, you can rarely do better than importing IORuntime.global and making use of it.

The cats-effect IOApp does this for you, so in most cases you don’t even need to know that the IORuntime exists.