Fun with Self-Reproducing Programs

They say that idle hands are the devil's playthings, well, I have some idle hands, so let's play.

Since we will be writing programs that manipulate parts of programs, we will be using GNU Guile and Scheme.

Ω and Y

In the beginning, Alonzo Church created the Lambda Calculus, and it was good. Well, not quite, Church was concerned about paradoxes in his untyped Lambda Calculus, the most famous of which is the “paradoxical combinator” Y.

(define y
  (lambda (p)
    ((lambda (f) (f f))
     (lambda (f)
       (p
        (lambda args (apply (f f) args)))))))

Y is interesting because it is an indirect self-reference, and as the lambda term “reduces”, it actually gets larger. For this post though, a simpler indirectly recursive form, dubbed Ω, is the inspiration.

((lambda (x) (x x)) (lambda (x) (x x)))

This is the shortest way to make a valid lambda term that doesn't reduce to a normal form. When you evaluate the expression, it doesn't get any simpler, it just goes on forever.

A Quine

Quines are named after the logician and philosopher Willard Van Orman Quine and they are programs that, when ran, produce a copy of their source code as output. Since Ω reduces to itself, we can manipulate it into simple little quine by using Scheme's quote form.

((lambda (source)
   (list source (list (quote quote) source)))
 (quote (lambda (source)
          (list source (list (quote quote) source)))))

I've written it out without the usual quote shorthand, since it obscures the simple symmetry of the code. It's hard to write a shorter, or clearer, quine in a reasonable programming language.

An evolving program

So, we have a Quine, which we can view as a fixed point of an execution function, but that's not very exciting. It's static; one iteration you've seen them all. Fortunately, it is not difficult to extend our quine to create a program that evolves over time.

The Fibonacci numbers provide a good starting point for an evolving program.

((lambda (i j self)
   (list self j (+ i j) (list (quote quote) self)))
 0
 1
 '(lambda (i j self)
    (list self j (+ i j) (list (quote quote) self))))

Run this program, then run its output, and so on for a few iterations, and you will see that the second element of the list gives you the Fibonacci numbers.

Of course, to write a real self-replicating program, we'll need to perform IO, but that isn't so bad, the same pattern applies.

#!/usr/bin/env guile
!#
((lambda (count self)
   (format #t "virus version ~a~%" count)
   (when (>= count 5)
     (format #t "5 times is quite enough silliness, thank you\n")
     (quit 0))
   (call-with-output-file
       (string-append "/tmp/foo" (number->string (+ 1 count)))
     (lambda (out)
       (format out "#!/usr/bin/env guile\n!#\n")
       (write (list self (+ 1 count) (list (quote quote) self)) out)
       (chmod out #o755)))
   (quit 0))
 0
 '(lambda (count self)
    (format #t "virus version ~a~%" count)
    (when (>= count 5)
      (format #t "5 times is quite enough silliness, thank you\n")
      (quit 0))
    (call-with-output-file
        (string-append "/tmp/foo" (number->string (+ 1 count)))
      (lambda (out)
        (format out "#!/usr/bin/env guile\n!#\n")
        (write (list self (+ 1 count) (list (quote quote) self)) out)
        (chmod out #o755)))
    (quit 0)))

Run it, you'll get a new program called /tmp/foo1, which if you run will make a /tmp/foo2, etc. and on /tmp/foo5 we stop.

Throw in some network code, an exploit, and some calls to exec, and you'd have the core for a simple virus, if you were unethical enough to write one, like this one guy who used Scheme for adware.

Thompson's bug

Viruses are all very well and good if you are into that sort of thing, but is there something more clever we can do? Back in 1984, Ken Thompson gave the most devilish quine in his talk “Reflections on Trusting Trust”. In it, he combines two notions, quines and bootstrapping, to create a malicious compiler that inserts bugs into the programs it compiles. Which wouldn't be interesting, but after bootstrapping the buggy compiler, it will bug its own now “bug-free” source code, thus ensuring its own survival.

So how about we do just that? Well, the code is kind of long, so I'll forgive you for skipping to the demo. You can get the code as a gist or as a tar.

The Buggee

First off, we need a simple program to bug. In the interest of factual accuracy, I've chosen a simple program that spreads misinformation.

(display "scheme sucks!\n")

Obviously, that isn't true, so we'll correct it to

(display "scheme is great!\n")

The Safe Compiler

For our safe compiler, we're going to use the good old identity function, but we'll add in some code to do file reading and writing.

(define (compile form) form)

(define (read-all in)
  (define (loop curr lst)
    (if (eof-object? curr)
        (reverse lst)
        (loop (read in) (cons curr lst))))
  (loop (read in) '()))

(define (write-all list)
  (for-each (lambda (obj) (write obj)) list)
  (newline))

(define (main args)
  (let ((from (list-ref args 1)))
    (call-with-input-file from
      (lambda (in)
        (write-all (compile (read-all in)))))))

The Buggy Compiler

The core of our buggy compiler is very simple.

(define (compile form)
  (cond ((bug? form) buggy-code)
        ((safe-compiler? form) buggy-compiler)
        (else form)))

After that, everything else is just scaffolding to add in the source for the code we want to bug, its replacement, the safe compiler, and of course, making it into a quine.

(define (bug? form)
  (equal? form '((display "scheme sucks!\n"))))

(define buggy-code
  '((display "scheme is great!\n")))

(define safe-compiler
  '((define (compile form) form)
    
    (define (read-all in)
      (define (loop curr lst)
        (if (eof-object? curr)
            (reverse lst)
            (loop (read in) (cons curr lst))))
      (loop (read in) '()))
    
    (define (write-all list)
      (for-each (lambda (obj) (write obj)) list)
      (newline))
    
    (define (main args)
      (let ((from (list-ref args 1)))
        (call-with-input-file from
          (lambda (in)
            (write-all (compile (read-all in)))))))))

(define (safe-compiler? form)
  (equal? form safe-compiler))

(define buggy-compiler
  (let ((x '((define (bug? form)
               (equal? form '((display "scheme sucks!\n"))))
             
             (define buggy-code
               '((display "scheme is great!\n")))
             
             (define safe-compiler
               '((define (compile form) form)
                 
                 (define (read-all in)
                   (define (loop curr lst)
                     (if (eof-object? curr)
                         (reverse lst)
                         (loop (read in) (cons curr lst))))
                   (loop (read in) '()))
                 
                 (define (write-all list)
                   (for-each (lambda (obj) (write obj)) list)
                   (newline))
                 
                 (define (main args)
                   (let ((from (list-ref args 1)))
                     (call-with-input-file from
                       (lambda (in)
                         (write-all (compile (read-all in)))))))))
             
             (define (safe-compiler? form)
               (equal? form safe-compiler))))
        (y (lambda (x y z)
             (append x
                     `((define buggy-compiler
                         (let ((x ',x)
                               (y ,y)
                               (z ',z))
                           (y x ',y z))))
                     z)))
        (z '((define (compile form)
               (cond ((bug? form) buggy-code)
                     ((safe-compiler? form) buggy-compiler)
                     (else form)))
             
             (define (read-all in)
               (define (loop curr lst)
                 (if (eof-object? curr)
                     (reverse lst)
                     (loop (read in) (cons curr lst))))
               (loop (read in) '()))
             
             (define (write-all list)
               (for-each (lambda (obj) (write obj)) list)
               (newline))
             
             (define (main args)
               (let ((from (list-ref args 1)))
                 (call-with-input-file from
                   (lambda (in)
                     (write-all (compile (read-all in))))))))))
    (y x '(lambda (x y z)
            (append x
                    `((define buggy-compiler
                        (let ((x ',x)
                              (y ,y)
                              (z ',z))
                          (y x ',y z))))
                    z))
       z)))

(define (compile form)
  (cond ((bug? form) buggy-code)
        ((safe-compiler? form) buggy-compiler)
        (else form)))

(define (read-all in)
  (define (loop curr lst)
    (if (eof-object? curr)
        (reverse lst)
        (loop (read in) (cons curr lst))))
  (loop (read in) '()))

(define (write-all list)
  (for-each (lambda (obj) (write obj)) list)
  (newline))

(define (main args)
  (let ((from (list-ref args 1)))
    (call-with-input-file from
      (lambda (in)
        (write-all (compile (read-all in)))))))

You can make the code shorter, but what we've paid in length, we've made up for by having no new surprises.

The Demo

So, that's the the code, Now for the demo. Since, we are going to be making a lot of new Scheme programs, we will turn off autocompilation in Guile, since it produces a bunch of noise we aren't interested in, but it will make no difference to the results.

$ export GUILE_AUTO_COMPILE=0
$
$ cat sucks.scm
(display "scheme sucks!\n")
$ guile -e main safe.scm sucks.scm
(display "scheme sucks!\n")
$ guile -e main buggy.scm sucks.scm
(display "scheme is great!\n")
$ # as you can see, the buggy compiler miscompiles our target executable
$
$ guile -e main safe.scm safe.scm > safe2.scm
$ guile -e main safe.scm buggy.scm > buggy2.scm
$ # first, we compile with the safe compiler to strip whitespace
$ # this will make it easier to perform source comparisons
$ guile -e main safe2.scm safe2.scm > safe3.scm
$ diff safe2.scm safe3.scm
$ # no diff output, so our safe compiler compiles itself just fine
$ guile -e main buggy2.scm buggy2.scm > buggy3.scm
$ diff buggy2.scm buggy3.scm
$ # again, no diff output, so our buggy compiler compiles itself just fine
$
$ # now the real magic
$ guile -e main buggy2.scm safe.scm > unsafe-safe.scm
$ diff buggy2.scm unsafe-safe.scm
$ # oh look, we've compiled the safe compiler with our buggy compiler,
$ # and ask expected, we get the buggy compiler again.

Perfect.

Wrapping Up

Ken Thompson's bug scared a lot of people, because if the compiler bugged enough software: the disassemblers, the binary editors, etc. then it would have been able to hide itself more effectively than any other kind of malicious software, because compilers have a privileged position in our operating systems.

I'm not personally too worried, because the bugs are very fragile. With some code reorganisation, your bugged compiler is not going to be able to bug this new version of the source. And there are ways to defend against it, David A. Wheeler got his PhD for this, and his page on this topic is worth a look.

Cheers to ski on freenode's #emacs channel for picking some nits in this post.