Summer of Code Recap


As many of you are aware, I have been working on compiling Guile Scheme to JavaScript this summer, as part of the Google Summer of Code. This post serves to bookend my work for the year.

Before I go any further, I have to give my thanks to my mentor Chris Webber, without whom this project would have fizzled out weeks ago; Google and the Gnu Project, naturally, for providing the Summer of Code and allowing me to work on this project; and our fearless leader, Andy Wingo, for answering a wide variety of stupid questions.

Project Aims

For a full introduction to the project, you can of course refer back to my project proposal[pdf], but very briefly my hopes for this summer were:

  1. To rewrite the previous version of my compiler from the previous CPS representation to use the new representation "CPS Soup" representation.
  2. To completely port ice-9/boot-9.scm (our basic "prelude") to JavaScript, and in particular, to support the Guile Module system.
  3. To handle Proper Tail Calls by use of the Cheney on the MTA strategy.
  4. To include a new guild script for bundling compiled JS files with their dependencies.

What was Achieved

You can find all of my work on the compile-to-js-2017 branch of my Gitlab. A full list of the commits can be found here, but I will summarise the changes now:

Compile Guile CPS Soup to JavaScript

When I was working on my initial attempt at compiling Guile to JavaScript, two years ago, Guile used a different CPS representation as its intermediate language. The initial experiments with the CPS Soup representation occurred while that work was ongoing, but as it was not considered "stable", the plan was not to move to this representation until after I had completed my other objectives.

Now, however, CPS Soup is the IL of Guile, and so the first task that was accomplished was to move to this representation. Since I had already created my own JS-IL as a target, I did not need to make any changes to the code generation side from JS-IL to JavaScript proper. The main change was to reconstruct the nested scope structure that was implicit in the dominator structure that Guile made available.

The full code for the compiler is split into several sections, corresponding to different stages in the compiler pipeline.

CPS to JS-IL Compiler

These modules constitute the compiler from CPS to my JS-IL intermediate language.

JS-IL to JavaScript Compiler

These modules constitute a somewhat ad-hoc intermediate representation as a target for the CPS compiler. It differs from JavaScript, e.g., by continuing to separate continuations and functions, and a slightly specialised function representation to handle Guile's complicated notion of procedure arity.

JavaScript Representation

This is primarily the representation of JavaScript as Scheme Records. This is separate from the representation of JavaScript Guile already has in the form of (language ecmascript) primarily to avoid a circularity when Guile determines which compilers to run in the pipeline, as recommended by Andy Wingo.

A pre-amble capable of running through boot-9

In order to run Guile, it is not enough to be able to compile Scheme (or indeed any other language supported by Guile) forms to JavaScript, we also need to incorporate as much of Guile's runtime as possible. This involves implementing VM primitives (such as you might see in vm-engine.c); basic Guile types like Symbols, Pairs, and Structs; as well as many of the functions that Guile implements in C rather than Scheme.

Although I certainly did not implement all of the functionality Guile achieves, I was able to implement sufficiently many (including what amounts to a port of much of module.c) that one can successfully run though ice-9/boot-9.scm from start to finish.

This took up the bulk of the time I spent on this project, due to the size of the compiled output of boot-9.scm, and my own difficulties debugging the bootstrap process. More on this below.

The code can be found at module/language/js-il/runtime.js

A linking script for JavaScript

Since we are using the (language ...) infrastructure, we can take advantage of the existing guild compile script for compiling to JavaScript, we simply need to use the --to switch. However, this does not produce a file which you can just load up without any additional work, especially if you are working with multiple modules.

In order to make it easier to deal with this, I have included a guild jslink script, which can be used to package up a "main" script along with the runtime.js and its dependencies. See below for an example.

The code can be found at module/scripts/jslink.scm

What was not Achieved

Cheney on the MTA

One of my regrets is that I did not implement Baker's "Cheney on the MTA" (as seen in Chicken Scheme) for handling Proper Tail Calls in JavaScript. Historically, JavaScript has not guaranteed that tail position function calls do not grow the stack, and this is obviously of fundamental importance for languages like Scheme. Fortunately, ES6 has added support for proper tail calls and we can expect to see increased support for it in future JavaScript versions. (Indeed, during testing on node v.6.10.3, I did not have to increase the stack size until very late).

How to use it

I've talked a lot about what I've did and didn't do, but what about actually using this thing?

Obtaining the Code

The code is not currently available from the main Guile repository, but only the compile-to-js-2017 branch on my GitLab.

If you already have a checkout of guile, you can add my repo as a remote with

$ git remote add ijp

and fetch the branch with

$ git fetch ijp

You can then check out the compile-to-js-2017 branch and build as normal.

A Non-Trivial Example

As an example of how to use the JS Backend that is short, but non-trivial, I am using John McCarthy's amb operator (see A Basis for a Mathematical Theory of Computation[pdf]) to search for Pythagorean Triples.

First we have a module for the amb operator in amb.scm

(define-module (amb)
  #:export (amb fail))

(define original-fail
  (lambda _
    (error 'amb "No more paths to search")))

(define *amb-fail* original-fail)

(define (fail)
  (*amb-fail* #f))

(define (amb-thunks . values)
  (let ((failure *amb-fail*))
    (call/cc (lambda (escape)
               (for-each (lambda (value)
                           (call/cc (lambda (continue)
                                      (set! *amb-fail* continue)
                                      (escape (value)))))
               (failure #f)))))

(define-syntax amb
  (syntax-rules ()
    ((amb exprs ...)
     (amb-thunks (lambda () exprs) ...))))

Next we have the code performs the search in triple.scm

(use-modules (amb))

(let ((a (amb 4 5 6 7 8 9 10))
      (b (amb 4 5 6 7 8 9 10))
      (c (amb 4 5 6 7 8 9 10)))
  (if (= (* c c) (+ (* a a) (* b b)))
      (list a b c)

We compile the files in the usual manner, only now we specify the javascript language (We make sure to add the current directory to the load-path for triple.scm).

$ guild compile amb.scm --to=javascript --output=amb.js
$ guild compile -L . triple.scm --to=javascript --output=triple.js

Next we link the two together into a file main.js, making sure to specify amb.js as a dependency of triple.js. (This step will take a little while, since it also compiles a bunch of dependencies)

$ guild jslink triple.js -o main.js --depends="(\"amb\" . \"amb.scm\")"

Finally, you can run it with node, although as mentioned above you may have to increase the stack size.

$ node  --stack-size=2000 main.js

Which should, fingers crossed, print out the triple 6,8,10.

What is next?

Having recapped what was and what was not achieved, the next question is: where does the project go from here? I have been asked about my plans for all sorts of features, e.g. support for Web Assembly, but I think the following things are the most important to think about.

Inclusion into Guile

The entire point of the project is to have something that can be included in Guile proper. I have not spoken with Guile's maintainers about incorporation into the main distribution, but I expect there would be not be too many problems with moving the "official branch" to the main repository.

All Guile built-ins in runtime.js

Although I have included enough to get though boot-9.scm, this does not include all of the built-ins we would want in our programs. Two things I use very often which do not appear in runtime.js are ports and bytevectors.

We would like most, if not all, Guile built-ins to be available for those who need them, so these will need to be implemented. However, this is a lot of extra code for some people who don't need it, which brings us to a different issue…

Linking Guile Modules & Features

In a blog post, Andy Wingo lays out many tasks that he would like to see in a future Guile. One of the most important of these, for us, are under the headings "linking multiple modules together" and "linking a single executable". To grossly simplify, we want to be able to link various files into one single executable, which contains all and only the code we need for our application.

As it stands, I included a simple script guild jslink that bundles various compiled JavaScript files into one file, but we would like it to be much more featureful: removing modules, functions, even types we don't need; and inferring which modules are required by our application and bundling them without requiring the information jslink does. This would allow us to minimise the amount of code that needs to be sent over the network, which is very important to web developers.

This is a large task, and one I don't know enough about at the moment to attempt, but it is work that would benefit not just our JavaScript compiler, but people who want to deploy regular Guile applications.

JavaScript Version

I am not an expert in JavaScript, in fact, before this summer I probably hadn't written it for two years, which means the code certainly does not match up with the current best practices and specifications. Further, all of my testing for this compiler was done on Node.js v.6.10.3 only (this was the version available in the Fedora 25 repositories).

The code should be vetted to determine precisely which modern JS features are used (I believe proper tail calls, and ES6 Maps are the main ones), and it should be tested on all major browsers. If necessary, we should incorporate switches in the compiler to allow JS users to compile for particular implementations, taking advantage of particular modern JS features, or providing our own implementations of those that are not supported (e.g. Cheney on the MTA).

JS Integration

One of the strengths of Guile is that it allows people to integrate their Scheme and C code, and although it has not been a focus for this summer, we should aim to provide similar levels of integration between Scheme and JS. There are two cases to consider.

JS calling Scheme

As it stands, you can perform some limited interaction from JavaScript in a similar manner to how you would interact with Guile from C. For instance, by using scm_current_module, scm_public_lookup, and the scheme.Symbol constructor, one could look up a scheme function, e.g. iota, and then invoke it by

That said, C idioms are not JS idioms, and so we should work to provide a much nicer API through the scheme object.

Scheme calling JS

In the case of Scheme calling JavaScript, I think we should follow the example of (system foreign), which provides an API for linking to dynamic C libraries, and creating Scheme versions of C functions, and automatically marshalling/unmarshalling C types to Scheme types. One additional complication we would have with JS would be the presence of exceptions, but I think these could also be marshalled into Scheme ones without much trouble.

Lessons Learned

It goes without saying that a project like this teaches you a lot about the technical design of Guile, how to navigate the codebase, etc, but I want to highlight a few "softer" lessons from this summer.

Compilers are "Easy", Runtimes are Hard

When I first set out to write this project two summers ago, I naturally assumed that the majority of the effort would go into the compiler, and much less into the built-ins. In reality, the effort was reversed. Partly this was due to my experience in writing Scheme, and Functional Programming more generally, meant that the tree-traversing code typical of a compiler pass was relatively straightforward, and the compiler was not doing a lot of optimisation, mostly code generation.

Bootstrapping is Hard

The last point leads into this one, bootstrapping is pretty tricky. With boot-9.scm, you have several versions of the module system at different times. My own attempt to write module code that handled this ended up being abandoned for a rewrite that more closely followed the Guile C code. The size of the compiled boot-9.scm code, and the, at times, non-local consequences of implementing certain built-ins made it tricky to debug.

Don't Panic

This is a much more personal one, and one that I think is very important for anyone who wants to take part in a program like the Summer of Code, where you are spending a lot of time mostly on your own. In a complex software project, things are not always going to go smoothly. You might spend weeks banging up against a difficult problem. Don't Panic! If it was easy it would have already been done. Keep in Contact with your Mentor! It is tempting to only check in when you think you have something of progress to report, but they are there to help you, and explaining your issues to someone else is often very useful when trying to overcome them, even if they don't have an answer for you.

Wrapping Up

If you are still with me, good on you. As the new semester is starting I will be devoting much less time to this, and that will likely be true till December, but I will make an effort to keep up with guile-user and be on the IRC Channel to help the daring souls who want to give this a go. My priorities will be documenting the ILs, filling in missing builtins, and improving jslink. I especially want to see basic IO and MiniKanren up and running, and for it to be convenient to use Guile's builtin libraries.