Implementing a Stepping Debugger in JavaScript

May 26, 2016

In my previous post I introduced Unwinder, a project which implements continuations in JavaScript. What does that have to do with stepping debuggers? Unwinder uses continuations to implement a debugger, since it can reuse the same machinery to pause code at any time.

This post could be titled "Implementing Continuations in JavaScript," but a lot more people know what a stepping debugger is. Besides, implementing a stepping debugger is a pretty cool use case for this stuff.

If you haven't read the previous article (which explains in-depth what continuations are and heavily uses stepping debuggers to help explain them), here is a live example of a stepping debugger. You can edit the code, and click on any line to set a breakpoint

function foo(x) { console.log(x); if (x <= 0) { return x; } else { return x + foo(x - 1); } } function main() { console.log(foo(3)); } main();

A working stepping debugger! Edit the code or click on any line to add a breakpoint.

Isn't that so cool!? Well... that may have been anti-climatic. It's not a fun visual or anything. But still, this is all implemented in JavaScript! No hook into the native JavaScript engine or anything! My previous post goes into a lot more detail about what you can do with Unwinder.

This works with a sophisticated transform explained in the paper "Exceptional Continuations in JavaScript". It's a very clear and approachable paper so you should read it if this is interesting. The neat thing about this technique is performance: we are still using native function scoping, variables, and more which allows the JIT to highly optimize it normally. However, the code is instrumented in ways that allow us to save the stack.

Both continuations and stepping debuggers need to arbitrarily pause the code at any time. This means that we need to save the entire stack, and be able to restore it. Let's take a look at the technique that paper describes to implement this. While the paper is very clear, it was fun for me to write down the basic process.

State Machine

The first thing we need to do is compile the code into something that we can control. That means a state machine. This is a rather common transform; it's how regenerator compiles generators, Clojure compiles go blocks, and many more. Unwinder actually started as a fork of regenerator for this reason.

Let's take a look at this code:

function foo() {
  var x = 5;
  var y = 6;
  return x + y;
}

Turning the function foo into a state machine looks like this:

function foo() {
  var $__next = 0, x, y;
  while(1) {
    switch($__next) {
      case 0:
        x = 5;
        $__next = 1;
        break;
      case 1:
        y = 6;
        $__next = 2;
        break;
      case 2:
        return x + y;
    }
  }
}

The $__next variable is the important thing here: it controls exactly where execution is happening. Note that we also hoisted x and y manually because we need to re-implement variable scoping. We compile various blocks into linear case statements so we need to maintain scopes and enforce any shadowing between var/let/const declarations.

You can already see that this is not a simple transform. We just started and the generated code is already a lot bigger than the original, but it works fine for small programs. This isn't for writing real-world production code; it's for smaller demos and experiments.

Re-implementing Native Control Constructs

Let's take a look at some more complicated code. Here we have a while statement and variable shadowing.

function foo(x) {
  while(Math.random() < .3) {
    let x = 5;
    console.log(x);
  }
}

What if we want to pause on line 3 inside the while loop? We cannot use the native while statement, of course. We must re-implement it in our state machine. A clean version of the generated code would look like this. Note how we shadowed the x passed into the function by renaming the x declared inside the while loop as x$0.

function foo(x) {
  var $__next = 0, $__t1, x$0;
  while(1) {
    switch($__next) {
      case 0:
        $__t1 = Math.random();
        $__next = 1;
        break;
      case 1:
        if(!($__t1 < .3)) {
          $__next = 5;
        }

        $__next = 2;
        break;
      case 2:
        x$0 = 5;
        $__next = 3;
        break;
      case 3:
        console.log(x$0)
        $__next = 4;
        break;
      case 4:
        $__next = 0;
        break;
      case 5:
        return;
    }
  }
}

The while loop starts at label 0 and loops around at label 4, where $__next is set back to 0. In label 1, the while condition is checked and if its falsy it skips to label 5 which exits the loop and returns from the function.

Unfortunately, we must re-implement all native control constructs. This means while, for, try, and everything else has custom logic in Unwinder (which was inherited from regenerator).

Temporary Variables

You may have noticed the weird $__t1 variable in the above code. That is a temporary variable allocated by the compiler. We must use temporary variables because we are splitting up nested expressions, and we must save the intermediate result of each expression somewhere.

For example, take the following code:

function foo(x) {
  return func1(func2(func3(x)));
}

The output would be:

function foo(x) {
  var $__next = 0, $__t1, $__t2, $__t3;
  while(1) {
    switch($__next) {
      case 0:
        $__next = 1;
        $__t3 = func3(x);
        break;
      case 1:
        $__next = 2;
        $__t2 = func2($__t3);
        break;
      case 2:
        $__next = 3;
        $__t1 = func1($__t2);
        break;
      case 3:
        return $__t1;
    }
  }
}

This allows us to pause in between each expression.

Optimizing Regenerator

I started this project by forking regenerator, which compiles generators to ES5 code. I was able to take advantage of its existing state machine transform, which does all the hard work. However, it compiles code into something like this:

var marked0$0 = [foo].map(regeneratorRuntime.mark);
function foo() {
  var x, y;

  return regeneratorRuntime.wrap(function foo~D(context$1$0) {
    while (1) switch (context$1$0.prev = context$1$0.next) {
    case 0:
      x = 5;
      y = 6;
      return context$1$0.abrupt("return", x + y);
    case 3:
    case "end":
      return context$1$0.stop();
    }
  }, marked0$0[0], this);
}

This is fine for generators because they aren't used everywhere. More importantly, you only need to break up yield expressions. Everything else can stay the same, as you see in the x and y assignments above. We need to break up every expression and pump each one through the switch statement. (We can't even fall-through case statements because we need to check for breakpoints, as you'll see below).

Wrapping every function is too expensive, and I wanted to remove the context object completely and only use local variables. We need to read $__next between every single expression, and surely it's faster to read from a local variable than off the context object (there might be ways to make sure the engine JIT's the context access appropriately, but local variables guarantees the best performance).

Wrapping the function is only needed to allocate the context object, so if we get rid of it we don't need to wrap every function. I changed the compiler to emit local variables instead, which is harder then it sounds. First, when we resume a function, how do we restore the $__next variable? It's trapped as a local variable now. We will show how to restore it below, and that technique is the only reason we can compile to local variables (normal regenerator could not make this optimization).

Temporary variables are also stored on the context, so we need to track those and declare them as local variables as well. That's why you saw $__t1, $__t2, and $__t3 in the "Temporary Variables" section above.

Doing all of this work means we that we generate code more like the example state machines you saw above. There are a few cases where we need direct access to the virtual machine, like checking if breakpoints exist at a certain expression, and right now the code assumes that a global variable VM exists. There are probably better ways to do that.

Saving and Restoring

Now that we have a state machine, we need a way to save all the state and restore it. This seems like a tricky problem, but the paper presents an interesting solution: exceptions.

Exceptions are the only way we can divert control flow. They allow us to generate code which use native calling convections: function calls are normal function calls. Exceptions allow us to stop the program and inspect the state machine.

Let's look at the first example again, but with a debugger statement:

function foo() {
  var x = 5;
  var y = 6;
  debugger;
  return x + y;
}

The trick is to wrap the entire function body in a try/catch statement, and within the catch save the values of $__next, x, and y. We need to save not only the state machine, but also the values of local variables so that we can restore them later.

Here's how the output would look:

function foo() {
  var $__next = 0, x, y;
  try {
    while(1) {
      switch($__next) {
        case 0:
          x = 5;
          $__next = 1;
          break;
        case 1:
          y = 6;
          $__next = 2;
          break;
        case 2:
          $__next = 3;
          throw new $ContinuationExc();
        case 3:
          return x + y;
      }
    }
  }
  catch(e) {
    if(!(e instanceof $ContinuationExc))
      e = new $ContinuationExc({ error: e })

    e.save($__next, { x: x, y: y });
    throw e;
  }
}

In label 2, which is what the debugger statement compiled to, we throw what's called a "continuation exception". This is caught by our catch statement, where we record the current state in it and throw it again.

If an error occurs, we will catch that as well and save the error object in our continuation exception.

In our virtual machine, we run all code within a top-level try/catch which switches into the "paused" state when continuation exceptions are thrown and saves them for a future resumption.

Note: you might be worried about the performance of throwing exceptions around, and you aren't wrong. This technique favors the performance of normal code at the cost of the performance of continuations. For some use cases, like debuggers, the performance of pausing and resuming does not matter. For other use cases (where continuations are frequently saved and restored), this technique will be too slow. But it's still a great way to teach continuations.

The Stack

You probably wondering about function calls and the stack. What happens to functions on the stack? We can't reliably restore a program without restoring the stack as well, so we must capture the entire stack.

It's natural for the above code to capture the whole stack. In the continuation exception, instead of saving a single function's state, make it save a list of frames (the stack). Let's rename the save method to pushFrame, introduce a $Frame object, and it looks like this:

function foo() {
  var $__next = 0, x, y;
  try {
    // while & switch statements ...
  }
  catch(e) {
    if(!(e instanceof $ContinuationExc))
      e = new $ContinuationExc({ error: e })
    e.pushFrame(new $Frame($__next, { x: x, y: y }));
    throw e;
  }
}

function bar() {
  var $__next = 0;
  try {
    // while & switch statements ...
    foo();
    // ...
  }
  catch(e) {
    if(!(e instanceof $ContinuationExc))
      e = new $ContinuationExc({ error: e })
    e.pushFrame(new $Frame($__next, {}));
    throw e;
  }
}

This shows two functions, bar which calls foo. When foo pauses, it creates a continuation exception, saves the frame, and re-throws it. bar will then capture it, save its own frame, and re-throw it.

As mentioned before, our virtual machine installed a top-level try/catch which will grab the accumulated call stack and save it.

This way the continuation exception will have all the frames of the current stack. We could serialize the continuation exception into this:

[["foo", 2, { x: 5, y: 6 }]
 ["bar", 10, {}]]

The stack starts with the innermost frame and goes out from there. The $__next value is saved as the second entry, and local variables as the third. This is all the information we need to restore the entire program!

Restoring

Now it's time to take that information and "resume" the program by restoring the call stack. This is also a bit tricky, but I think it's ingenious.

Let's start with this simple function again:

function foo() {
  var x = 5;
  var y = 6;
  debugger;
  return x + y;
}

We need to restore the position of the state machine ($__next), x, and y, and we have all this information in the frame object. The problem is how to give the function the frame to restore.

By toggling an internal flag doRestore, we tell functions to do a restore. They need a reference to the virtual machine, and right now they assume a global VM variable exists. There might be a better way to achieve that. To restore, they pop a frame off the saved stack in the VM and apply it. This new code exists at the top of the function:

function foo() {
  var $__next = 0, x, y;

  try {
    if(VM.doRestore) {
      var $__frame = VM.popFrame();
      $__next = $__frame.next;
      x = $__frame.state.x;
      y = $__frame.state.y;
    }

    while(1) {
      switch($__next) {
        // ... code ...
      }
    }
  }
  catch(e) {
    // ... save frame ...
  }
}

All of the local variables, including $__next, are restored so we can resume execution as if nothing happened.

The popFrame method on the virtual machine is pretty simple: pop a frame, and if there are no more frames left, set doRestore to false to turn of restoration mode:

Machine.prototype.popFrame = function() {
  var r = this.stack.pop();
  if(!this.stack.length) {
    this.doRestore = false;
    this.stack = null;
  }
  return r;
};

But we still have a problem with this: what about the stack? We need to restore the whole stack, and this only restores the first function. More trickery is needed for that.

The trick is to also save a reference to the corresponding function in the frame itself when saving in the catch block. This allows us to call it again when restoring (we also save this to call it with the right context). With these new fields on the frame, we can restore the entire stack with this code:

function bar() {
  var $__next = 0, $__t1;

  try {
    if(VM.doRestore) {
      var $__frame = VM.popFrame();
      $__next = $__frame.next;
      var $__child = $__frame.child;

      // This is new! It will call the next function, which will also
      // pop off a frame and restore
      if($__child) {
        $__frame.state.$__t1 = $__child.fn.call($__child.thisPtr);
      }

      $__t1 = $__frame.state.$__t1;
    }

    while(1) {
      switch($__next) {
        // ... code ...
      }
    }
  }
  catch(e) {
    // ... save frame ...
  }
}

We actually just call the function which re-creates the stack. The called function will also restore itself, cascading the restoration until the entire stack exists again. Without going into details, the compiler knows which temporary variable to store the result in, which is $__t1 in this case. The code within the switch is waiting for the result in this temporary variable.

Live Breakpoints

If you want the ability to set breakpoints on a running script, you need to add another check. Supporting debugger statements is easy: just compile it to a throw new $ContinuationExc(). However, to support live breakpoints we need to check if a breakpoints exists at every single step of execution.

We can do this in the while(1) loop which is running the state machine. Again, we need access to the virtual machine to check for breakpoints. All it takes is this:

while(1) {
  if (VM.breakpoints[1][$__next] !== undefined)
    throw new $ContinuationExc();

  switch($__next) {
      // ... code ...
  }
}

Don't worry about the structure of VM.breakpoints; it is broken up in a way that we can track functions separately (hence the [1] lookup). It gives us the ability to stop at any point a script by resolving a line/column location to a specific label in a function's state machine, and setting a breakpoint there.

That's It!

Although it's large transform, it gives us everything we need to arbitrarily save and resume the stack. It's not something you should use in production, but it's useful for demoing or prototyping ideas without having to know the internals of a real JS engine.

I showed a small function and incrementally explained the compiler output. To see the final output which has all the saving, restoration, and breakpoint machinery, check out this gist.

If you want to learn more, check out Unwinder, particularly the compiler and the virtual machine code. You can see the implementation of $ContinuationExc, $Frame, and how it runs programs.

You can also learn more about continuations (and see demos that this enables) in my last post.

Output:

Stack:

continuations