Effectively using Dart isolates - part 2

the story so far

In part 1 of our ‘How and when to use isolates’ blog we found a fast way to send and return data from an isolate avoiding jank during startup and shutdown of an isolate.

We described these as the 'fast send' and 'fast return' paths.

At the end of this blog we include a benchmarking tool to help you identify jank inducing functions.

Unfortunately, it turned out that for the 'fast send path' we were exploiting a bug in Dart 2.19.

The ‘fast return’ path described in part 1 was and still is correct, it was just the ‘fast send path’ that was the problem.

There is still a chance the fast send path (for immutable objects) will work as the Dart team is considering an optimisation that should allow for the technique I covered in my original blog post.
You can track progress on that work here:
Issue 51334
Upvote this issue to show your support for having a viable 'fast send path' for Isolates.

an alternate narrative

Whilst the sending of data to an isolate was an important topic of part 1, I actually think it isn’t the correct technique to use.

The poor performance characteristics of sending data to an Isolate demands that we look for an alternate solution.

In part 2 we are going to look at techniques that make using isolates even faster and yes faster than the version that exploited a dart bug.

Before we get into the details, I think it’s worth stopping for a moment, to consider when we should use isolates.

not the right metric

There was a recent reddit poll on r/FlutterDev that asked what triggers people to use an isolate.

One of the possible answers was:

  • when the frame rate drops below 60 fps.

Frame drops are more colloquially known as jank.

The problem with this metric is that it's not universal.

It's great to use a metric, not so great if it's meaningless in the real world.

Every device has different hardware and it's fairly normal for a mobile OS to speed limit your cores when the battery is going flat.

So if you are measuring the frame rate in your dev environment, or on the latest iPhone with a fully charged battery, it's a completely meaningless metric.

To understand when to use an isolate we need to step back and have a look at the bigger picture.

historically, isolates didn't work

Historically, the problem with isolates was the cost of exchanging data with the isolate.

The simple act of transferring data to/from an isolate induced jank, making Isolates largely unusable for what is arguably their most important use case.

The fast return path (Isolate.exit) has solved the return path but the send path is still a jank inducing issue.

Sending data to an isolate, is still jank inducing.

So how do we solve this problem?

The simple answer is; don’t send data to an isolate!

OK, so that's a little impractical, a better take would be to say; don't send large amounts of data (> 1MB) to an isolate.

The standard approach to using isolates breaks this rule:

  • fetch some data from the internet

  • pass it to an isolate for processing

  • pass the data back to the primary isolate.

The right approach is:

  • send a url to an isolate

  • have the isolate fetch the data from the internet

  • process the data

  • pass the processed data back to the primary isolate as Dart objects (NOT JSON).

We have now completely sidestepped the ‘slow send path’ and associated jank.

Let’s look at a concrete example.

First, we implement a class called RemoteServices which exposes a function ‘fetchImage’.

import 'dart:isolate';

import 'package:http/http.dart' as http;

import 'image.dart';

class RemoteServices {
  /// Public API to fetch an image.
  /// Spawns an isolate and calls [_networkFetchImage]
  Future<Image> fetchImage(Uri pathToImage) async =>
      Isolate.run(() => _networkFetchImage(pathToImage));

  /// Fetch the image and apply post processing to it
  Future<Image> _networkFetchImage(Uri pathToImage) async {
    final client = http.Client();
    final request = http.Request('GET', pathToImage);
    final response = await client.send(request);
    final stream = response.stream;

    final image = Image();

    // ignore: prefer_foreach
    await for (final data in stream) {
      image.append(data);
    }

    client.close();

    image.fetchCompeted = DateTime.now();
    image.postProcesss();

    return image;
  }
}

class Image {
  List<int> data = <int>[];

  DateTime fetchCompeted = DateTime.now();

  void append(List<int> data) {
    this.data.addAll(data);
  }

  String show() => '${data.length}';

  void postProcesss() {
      // do some work to process the image at no cost to the primary isolate.
  }
}

Next we implement the code to call the RemoteService from our primary isolate:

import 'package:onepub_isolates/onepub_isolates.dart';

Future<void> main(List<String> args) async {

  final remoteService = RemoteServices();
  const pathToImage = 'https://sample-videos.com/img/Sample-jpg-image-30mb.jpg';
  final imageUri = Uri.parse(pathToImage);
  final image = await remoteService.fetchImage(imageUri);

  print('Fetch Completed in secondary isolate at: ${image.fetchCompeted}');

  print('Image Info: ${image.show()}');
}

We now have a simple, performant, no-jank solution to fetching and processing an image.

All network requests should be done entirely in an isolate.

The same goes for any JSON data requests. Simply have your isolate fetch the JSON, convert it into dart objects and return it to the primary isolate.

During testing for this blog I wrote an isolate that created one billion (simple) student objects, amounting to over 16GB of memory, and returned the data to the main isolate with zero jank, proving that the 'fast return path' scales.

We now have a general solution to fetching and processing large chunks of data without inducing jank.

Remember that this same approach can also be used for fetching data from the user's device. If you want to upload a file on the user’s device, simply pass the path to the file to an isolate and have that isolate read the file and upload it to the network.

If you want to show progress as the file is uploaded then read the next section on Data Streams.

The solution we have is for a fairly specific domain. Don’t get me wrong it is probably the most common domain but there are other types of problems for which we need a different approach.

Data Streams

If your app is receiving a stream of data, then restarting an https connection each time you want to receive a packet is expensive. In these cases, you may want to start up an isolate that remains running and doesn’t use isolate.exit.

Restarting an https session is expensive so we want to limit how often we do it.

By keeping the Isolate alive, we keep the https connection alive and can use http/2 streams to perform bidirectional communications with our server. You might use this when implementing a chat app, stock trading, collaborative apps or just any real-time feed.

In this model, jank can be avoided by breaking the data sent between the secondary isolate (that holds the https connection) and the primary isolate into small chunks (less than 1MB).

This technique can be used anywhere you need to process a large amount of data by having the Isolate return it in chunks. Even rendering a list, you can have the isolate pass back each item, one (or a few) at a time, adding them to your list to avoid jank inducing loads.

By constraining the size of the data packets, the copy operation used to move data between the isolates is fast enough not to induce jank.

This same process can be used to show the progress of an upload in your UI.

We earlier mentioned uploading a file from the user's phone. In this scenario, we don’t use Isolate.exit but have the called isolate send back small progress messages as it uploads the file to our server. Again we have completely bypassed the need to move large chunks of data between isolates.

In our example, we use the stream_isolate package by Andrew Ackerman, which I quite like (except for the use of dynamic in the spawn argument which I believe could be replaced with a generic type).

In a Flutter app, you might use the StreamBuilder to take data from the isolate’s stream and show the progress.

Our API might look something like this:

import 'dart:async';
import 'dart:io';

import 'package:stream_isolate/stream_isolate.dart';

void main() async {
  final uploader = Uploader();

  await uploader.sendFile('bin/simple');

  // print the progress as the file is uploaded
  uploader.streamIsolate.stream.listen((progress) => print('$progress%'));
}

class Uploader {
  late final StreamIsolate<int> streamIsolate;

  // Send the file to the server.
  Future<void> sendFile(String pathToFile) async {
    // spawn an isolate to upload the file
    streamIsolate =
        await StreamIsolate.spawn<int>((dynamic p) => _sendFile(pathToFile));
  }

  // upload the file, reporting progress via the returned stream.
  Stream<int> _sendFile(dynamic pathToFile) async* {
    final connection = Connection();

    final file = File(pathToFile as String);
    final size = await file.length();
    var sent = 0;

    await for (final data in file.openRead()) {
      sent += data.length;
      connection.sendData(data);
      // report progress
      yield (sent / size * 100).floor();
    }
  }
}

// placeholder for class that connects via http
// and uploads the file in parts.
class Connection {
  void sendData(List<int> data) {}
}

Pools

The final technique we are going to look at is the use of Isolate Pools.

Within a mobile app, I suspect that the use of a pool offers little benefit, as most apps aren’t doing a heavy amount of threading (or should we call it isolating).

The premise behind an Isolate pool is that it takes a certain amount of time to spin up an isolate. On a well specced desktop PC this is about 0.6ms. Isolate start time will take longer on a phone.

By using a pool of isolates, we can create a number of isolates up front (or grow the pool as needed) and only wear the cost of starting the isolates once.

Where things get interesting is when the isolates hold an expensive resource such as an https connection, a db connection (NEVER connect directly to a remote db from a mobile app) or initialisation logic.

By having a pool, we now get to reuse our https connection or db connection rather than having to re-establish them each time we need to perform a task.

But of course, nothing is ever simple. You probably don’t want to hold https connections open for an extended period as they consume considerable resources on your server. Managing connections to your server needs to be part of your design considerations when designing your app's server access patterns.

Be careful holding an https connection open for extended periods. With lots of users, you can DDOS your own server.

It's also worth remembering that when your app is backgrounded and reawakened, the https connection that you ‘think’ is live may well have timed out on the server and as such you will need to reconnect.

With all of these issues, Isolate Pools still have their place in helping solve a significant class of problems.

Conclusion

With the introduction of Isolate.exit and the associated optimisations, Isolates have gone from being fairly useless on a mobile device, to being a core tool in app performance and the elimination of jank.

As we have seen, we should now be placing any heavy workloads in an isolate as a best practice rather than waiting for it to cause jank. This should be a core design feature of your application's architecture.

Don’t try to pass large chunks of data into the isolate but rather have the isolate fetch and process the data before passing it back to the primary isolate.

Wrapping an existing method into an isolate is now easy, providing that it doesn’t need to access any global or injected data. If it does, then modify the method so these can be passed in.

Test for Jank.

Use the supplied benchmark tool (see below) or the Dart profiling tools to ensure that your methods don't induce jank.

As stated earlier in this blog, measuring frame drop on your dev machine is a meaningless metric. However if you do a few runs on your dev platform and an actual device, you should be able to derive a ratio.

5ms of lag on my dev box, equates to 12ms of lag on the device.

You can then use our benchmarking tool to target any heavy methods, if they exceed 5ms then they should be considered for an isolate.

Final Words

Isolates in Dart have finally come of age, there is still more to do but we definitely have lift-off.

Going forward, I still think we are going to need threads to make Dart a full-fledged server-side language, but for desktop and mobile we now have a nice solution.

Benchmarking for jank

The following is a benchmark tool that allows you to test any function for what sort of latency it will cause in the primary isolate, as well as reporting any potential frame drops.

When benchmarking, you need to understand that your app's performance is going to be affected by your development environment. The only way to truly bench mark an app is to run it on an actual device. And remember you need to run it on the slowest device that is likely to run your app.

In the following example we use the benchmark tool to test the lotsOfStudents method that allocates 100 million student objects.

This example use of the benchmark tool allocates over 16GB of memory so if your PC is short of ram perhaps adjust the no. of students being allocated before running it.

The benchmark runs the method directly and via an isolate.

Here is the output of the two runs (AOT compiled).

Run the method in the primary isolate:

example/benchmark 
Framerate: hz60, Frame Interval: 16 ms
Total Runtime: 0:00:01.002000
JANK was induced: Frame Drops: 63 Avg Lag: 1001ms Max Lag: 1001 ms
Student Count: 100000000
First Student: Name: Me, rollNum: 11

Run the method in a secondary isolate:

Framerate: hz60, Frame Interval: 16 ms
Total Runtime: 0:00:01.139965
All Good, Max Lag: 8ms, Avg Lag: 2 ms Probes: 564
Student Count: 100000000
First Student: Name: Me, rollNum: 74

We can see from the example that the simple act of moving the jank inducing method into an isolate completely resolved jank.

import 'dart:isolate';
import 'dart:math';

import 'package:onepub_isolates/src/measure_frame_drops.dart';

static const studentsToAllocate = 100000000;
void main() async {
  await jankInducingMethod();

  // give the GC a moment to run.
  await Future<void>.delayed(const Duration(milliseconds: 1));

  await nonJankInducingMethod();
}

// Place the allocation into an isolate.
Future<void> nonJankInducingMethod() async {
  final measurement = await measureFrameDrops(
      frameRate: FrameRate.hz60, () async => Isolate.run(lotsOfStudents));

  measurement.show();

  final results = measurement.result!;
  print('Student Count: ${results.length}');
  print('First Student: ${results.first.details}');
}

/// Measure lotsOfStudents which doesn't yield
Future<void> jankInducingMethod() async {
  final measurement = await measureFrameDrops(
      frameRate: FrameRate.hz60, () async => lotsOfStudents());

  measurement.show();

  final results = measurement.result!;
  print('Student Count: ${results.length}');
  print('First Student: ${results.first.details}');

  print('');
}

Future<List<Student>> lotsOfStudents() async {
  final rand = Random.secure();

  /// Allocate 100 million students
  return List.filled(100 * 1000 * 1000, Student(rand.nextInt(100), 'Me'));
}

class Student {
  Student(this.rollNum, this.name);
  int rollNum; // not final
  final String name;

  String get details => 'Name: $name, rollNum: $rollNum';
}

Benchmark tool

Code for the benchmarking tool

import 'dart:async';

enum FrameRate {
  /// test for frame drops on a device with a 60hz refresh cycle
  hz60(60),

  /// test for frame drops on a device with a 120hz refresh cycle
  hz120(120);

  const FrameRate(int hz) : durationMS = 1000 ~/ hz;

  /// Duration in milliseconds of a single frame, derived
  /// from the FrameRate Hz
  final int durationMS;
}

typedef Measurable<T> = Future<T> Function();

/// Call this method to measure any lag induced by running [measurable]
/// and report details of frame drop inducing events.
Future<Measurement<T>> measureFrameDrops<T>(Measurable<T> measurable,
    {required FrameRate frameRate}) async {
  final runtime = Stopwatch()..start();

  final completer = Completer<bool>();

  final measurement = Measurement<T>(frameRate: frameRate);

  final frameWatch = Stopwatch()..start();

  // Run the function we are measuring.
  final measure = measurable();

  // ignore: avoid_types_on_closure_parameters
  unawaited(measure.catchError((Object e) {
    completer.complete(true);
    // ignore: only_throw_errors
    throw e;
  }));

  unawaited(measure.then((value) {
    completer.complete(true);
    measurement.result = value;
  }));

  while (!completer.isCompleted) {
    // give async tasks a chance to run.
    await Future.delayed(const Duration(milliseconds: 1), () => null);

    final elapsed = frameWatch.elapsed;
    measurement._track(elapsed);
    frameWatch.reset();
  }

  measurement.runtime = runtime.elapsed;

  return measurement;
}

class Measurement<T> {
  Measurement({required this.frameRate});

  FrameRate frameRate;
  // Each occurrence of a frame drop.
  final framedrops = <Duration>[];
  // How long the [measurable] took to run
  late final Duration runtime;
  Duration maxLag = Duration.zero;
  Duration totalLag = Duration.zero;
  int interations = 0;

  T? result;

  /// Get the total duration of all frame drops
  Duration get total => framedrops.isNotEmpty
      ? framedrops.reduce((a, b) => a + b)
      : Duration.zero;

  /// Get the average duration of frame drops
  Duration get averageFrameDrop => framedrops.isNotEmpty
      ? Duration(microseconds: total.inMicroseconds ~/ framedrops.length)
      : Duration.zero;

  // Get the maximum duration of a frame drop
  Duration get maxFrameDrop => framedrops.reduce((a, b) => a > b ? a : b);

  /// True if the measurable method cased a frame drop
  bool get hasDroppedFrames => framedrops.isNotEmpty;

  /// The total number of frames dropped.
  int get droppedFrames => framedrops.length;

  /// The average lag induced by the measurable method.
  /// This includes all lag, not just lag excessive enough
  /// to induce a frame drop.
  Duration get averageLag =>
      Duration(microseconds: totalLag.inMicroseconds ~/ interations);

  /// The maximum duration of any lag event.
  Duration get maxBlocked => framedrops.isEmpty
      ? Duration.zero
      : framedrops.reduce((a, b) => a.compareTo(b) > 0 ? a : b);

  void _track(Duration elapsed) {
    totalLag = totalLag + elapsed;
    interations++;

    if (elapsed > maxLag) {
      maxLag = elapsed;
    }

    if (elapsed > Duration(milliseconds: frameRate.durationMS)) {
      var elapsedMs = elapsed.inMilliseconds;

      // a lag of > frameRate.durationMS represents multiple frame
      // drops, so register each one.
      while (elapsedMs > frameRate.durationMS) {
        framedrops.add(Duration(milliseconds: frameRate.durationMS));
        elapsedMs -= frameRate.durationMS;
      }
      // last partial frame.
      if (elapsedMs > 0) {
        framedrops.add(Duration(milliseconds: elapsedMs));
      }
    }
  }

  /// Display the results.
  void show() {
    print(
        'Framerate: ${frameRate.name}, Frame Interval: ${frameRate.durationMS} ms');
    print('Total Runtime: $runtime');

    if (hasDroppedFrames) {
      print('JANK was induced: Frame Drops: $droppedFrames '
          'Avg Lag: ${averageLag.inMilliseconds}ms '
          'Max Lag: ${maxLag.inMilliseconds} ms');
    } else {
      print('All Good, Max Lag: ${maxLag.inMilliseconds}ms, '
          'Avg Lag: ${averageLag.inMilliseconds} ms '
          'Probes: $interations');
    }
  }
}