Programming Fun
Self Replicating C code
by Paul Hsieh


Introduction First Attempt Second Attempt Third Attempt

Introduction

In collecting links for my web page, I happened across Ken Thompson's little essay on backdoors. Early on in it he poses the question of how to write a piece of C code that when executed would output the identical source code that it was compiled from. He also said that it was very worth while doing this yourself rather than looking up the answer which he provides as a link.

I had never attempted, or heard of such a program before reading this page, and it did seem like an interesting challenge so I decided I would give it a go. I have dedicated this page to it because my results and Ken's are somewhat different, which surprised me greatly. I am also mildly interested in what other people think of the various solutions, and if there are still other solutions. Don't hesitate to mail me with whatever you've got.

My first attempt

Ok, so my first thought is that I need to have a printf that outputs the source. But when I generate the source I'll have other support code which itself would have to be added to the output. That is to say, as I wrote more code to handle all the new code which was there to output the previous layer of stuff, I would have more stuff to output. A little confusing, but basically I initially believed it was not possible to do this in a single file.

So I just did the brain dead thing of literally reading the source file and just outputting the result:

First attempt

The problem with this solution is that it requires that the source file be accessable at run time. This is obviously a less than desirable situation (the binary could be in a different directory) which motivated me to search for another solution.

My next attempt

My next thought is, how do I get something for nothing? What I need is for my characters to give back more than the number of characters typed. The answer was simple: the pre-processor. At first I didn't remember the preprocessor expansion precedence, and that lead me down some blind alleys for a while. But as soon as I rediscovered that the limitation was non-recursive top level expansion, it allowed me to focus on the idea where the parameter of the macro was most of the substance of the program.

Second attempt

(Note that contrary to what is claimed in the comp.lang.c FAQ in Q 20.34, it can't be that hard to write portable self replicating code since that's exactly what I have did in only my second attempt.) I was proud of this little program. It had no obvious flaws, its short, looks portable and does not rely on any OS trickery. So at this point I decide to check my answer against what Ken Thompson intended. Imagine my surprise ...

Ken Thompson's solution

Ken Thompson's solution is very neat in that it relies on even less trickery than my second solution. In a sense it stays away from my appealing to the pre-processor as a meta tool to get the job done.

Ken Thompson's solution

They are, however, both the same in that they essentially copy the substantive parts of the program twice, while hiding the copying in the functionality of the language itself. In my case I hid the duplication of the program text in a macro parameter. In his case he hid the copying of the string declaration in a fixed length program which dereferences the string itself.

The problem with his solution is that its kind of complicated to make portable. The version I give above, in fact, probably only works on x86 PC's because it gives an actual character code representation of '\n' as 0x0D, 0x0A. This can be gotten around by actually writing a parser in the code itself to have the '\n' character processed specially. In fact I believe that was his original attempt.

Another advantage of his solution, however is that it is fairly easy to see how it could be ported to most other languages, even BASIC.

A reader chimes in

In a classic case of one-upmanship, Parker Shaw submitted the following to me:

Parker Shaw's solution

Its even shorter than mine, and avoids both the preprocessor and the carriage return ugliness -- I'm so jealous. :o)

Update: Thanks to Armin Gerristen for finding a bug in the C code. In addition, note that the carriage return ugliness is actually there in the sense that code must not have a carriage return at the end of the file which is unusual for most text files.

Other self replicating code pages

Well it turns out this has been the subject of much perverse contemplation. Check out the The Quine page.

And if you like this stuff, you'll probably enjoy multilingual programming. Just as a bizarre aside, in their Windows 3.1 SDK. Microsoft used to supply an include file that could be used by a C compiler or the MASM assembler.

Updated 01/31/02
Copyright © 1997-2007 Paul Hsieh All Rights Reserved.


Programming Mail me

Valid HTML 4.0!