Interactive Explainer · Compilers & Language Implementation Squeak Smalltalk as running example
All code runs in your browser
When the snake eats its own tail

How a compiler
compiles itself

A self-hosting compiler is written in the very language it compiles. But to compile itself for the first time, it needs to already exist. This is not a paradox — it is a deliberate engineering act called bootstrapping, with a precise sequence of steps reaching back to a seed.

We trace the bootstrapping problem from its logical foundations through Squeak Smalltalk's remarkable implementation — a system where the compiler, the class browser, the garbage collector, and the windowing system are all just Smalltalk objects that can be inspected and modified while the system runs. Along the way we confront metaclasses, image snapshots, and Ken Thompson's celebrated warning about the compilers we trust.
Self-hosting Bootstrapping stages Squeak / Smalltalk-80 Object memory / image Metaclasses Ken Thompson's attack
§01

The bootstrapping problem

Suppose you want to write a compiler for a language called Foo, and you want to write it in Foo itself. You write the source code of foocc.foo. Now you need to compile it. But the only compiler for Foo is… the one you just wrote. Which isn't compiled yet.

This appears to be a paradox. It is not — but resolving it requires understanding that a self-hosting compiler exists at two distinct levels of abstraction at the same time: as source code (text you can read and modify) and as a running binary (the thing that does the compiling).

The word bootstrap comes from "pulling yourself up by your own bootstraps" — an impossibility that nonetheless describes exactly what we want. The resolution: you don't start from nothing. You use a seed — a minimal working system in some other language — to compile the first version, then throw the ladder away.
The chicken-and-egg diagram — click each step
bootstrap_problem.js — simulating the stagesJS
// The bootstrapping problem: a compiler that compiles itself
// We simulate the three-stage process that resolves the apparent paradox

// Stage 0: the seed compiler, written in some OTHER language (C, assembly, etc.)
// It compiles a SUBSET of the target language — just enough to get started
function seedCompiler(source) {
  // Minimal: handles only basic constructs
  if (source.includes("unsupported_feature"))
    throw new Error("Seed compiler: feature not implemented");
  return { type: "binary_from_seed", source, size: source.length * 4 };
}

// Stage 1: the full compiler, written in the TARGET language
// This is the source code of the self-hosting compiler
const fullCompilerSource = `
  class Compiler {
    parse(source)    { /* full parser */ }
    typeCheck(ast)   { /* type checker */ }
    codegen(ast)     { /* code generator */ }
    optimise(code)   { /* optimiser */ }
    compile(source)  { return this.codegen(this.typeCheck(this.parse(source))); }
  }
`.trim();

// Bootstrap step 1: use seed compiler to compile the full compiler source
const stage1Binary = seedCompiler(fullCompilerSource);
console.log("Stage 0→1: seed compiler produces first binary");
console.log(`  Input:  full compiler source (${fullCompilerSource.length} chars)`);
console.log(`  Output: stage-1 binary (${stage1Binary.size} bytes, compiled by SEED)`);

// Stage 1 binary: can now compile full source, but was BUILT by the seed compiler
// It may have missing optimisations the seed compiler didn't implement
function stage1Compile(source) {
  return {
    type: "full_compiler_binary",
    source,
    size: source.length * 3,
    emitted: `FULL-COMPILER:${source.length}:OPT2`
  };
}

// Bootstrap step 2: use stage-1 binary to recompile the full compiler source AGAIN
const stage2Binary = stage1Compile(fullCompilerSource);
console.log("\nStage 1→2: stage-1 compiles the same source again");
console.log(`  Output: stage-2 binary (${stage2Binary.size} bytes, compiled by STAGE-1)`);

// Bootstrap step 3: compile again with stage-2 binary
function stage2Compile(source) {
  return {
    type: "full_compiler_binary",
    source,
    size: source.length * 3,
    emitted: `FULL-COMPILER:${source.length}:OPT2`
  };
}
const stage3Binary = stage2Compile(fullCompilerSource);
console.log(`Stage 2→3: stage-2 compiles again (same source)`);
console.log(`  Output: stage-3 binary (${stage3Binary.size} bytes)`);

// The key invariant: stage-2 and stage-3 should produce IDENTICAL binaries
// because both compiled the same source with the same fully-working compiler
// This is the "bit-for-bit reproducible" check used to verify bootstrap correctness
const stable = stage2Binary.emitted === stage3Binary.emitted;
console.log(`\nBootstrap complete. Stable (stage2 == stage3): ${stable}`);
console.log("The seed compiler can now be discarded.");
console.log("Future builds use only the self-hosted compiler to compile itself.");
§02

Bootstrapping stages

Every self-hosting compiler has been through a bootstrapping sequence. The stages have a precise structure. The key insight is that compiling the same compiler source code with different compiler binaries may produce different output. Stability is achieved when stage n and stage n+1 compile that same compiler source to the same result. In a classic 3-stage bootstrap, that is the stage2 == stage3 check.

Stage 0
Seed compiler
written in: C / asm
Minimal. Compiles a subset of the target language. Often hand-written or itself ported from an earlier system.
Stage 1
First binary
compiled by: seed
Full compiler source, compiled by the seed. Has the seed's codegen quality. Runs, but may not be optimal.
Stage 2
Second binary
compiled by: stage 1
Same source compiled by itself. Now uses its own optimisations. Usually smaller and faster than stage 1.
Stage 3+
Self-hosting
compiled by: itself
Stable: compiling again produces the same binary. Bootstrap is complete. Seed can be discarded.
Squeak's specific bootstrap history — click a stage
stages.js — modelling the bootstrap fixpointJS
// The bootstrap fixpoint: a compiler reaches "stability" when
// compile(source, compiler_n) == compile(source, compiler_n+1)
// We model this as converging optimisation quality

class CompilerVersion {
  constructor(name, optimisationLevel, features) {
    this.name  = name;
    this.optLevel = optimisationLevel; // 0-100: how good is the output?
    this.features = features;          // set of language features this can compile
  }

  compile(sourceName, requiredFeatures) {
    const missing = requiredFeatures.filter(f => !this.features.has(f));
    if (missing.length)
      throw new Error(`${this.name} cannot compile ${sourceName}: missing ${missing}`);
    // The output binary's quality depends on the compiler's optimisation level
    return new CompilerVersion(
      `${sourceName} compiled by ${this.name}`,
      Math.min(100, this.optLevel + 15), // each stage improves a bit
      new Set(requiredFeatures)
    );
  }
}

// The full compiler source requires these language features
const FULL_FEATURES = ["closures", "generics", "pattern_match", "tail_calls"];
const SOURCE_NAME   = "FooCompiler.foo";

// Stage 0: seed compiler — written in C, knows all features but low optimisation
const seed = new CompilerVersion("seed (C)", 20, new Set(FULL_FEATURES));
console.log("=== Bootstrap sequence ===");
console.log(`Seed: optLevel=${seed.optLevel}, can compile all features`);

let compiler = seed;
let prevOptLevel = -1;
for (let stage = 1; stage <= 5; stage++) {
  const next = compiler.compile(SOURCE_NAME, FULL_FEATURES);
  console.log(`Stage ${stage}: compiled by "${compiler.name}"`);
  console.log(`  Output optLevel: ${next.optLevel} (was ${compiler.optLevel})`);
  if (next.optLevel === prevOptLevel) {
    console.log(`  *** FIXPOINT reached at stage ${stage} — bootstrap stable ***`);
    break;
  }
  prevOptLevel = next.optLevel;
  compiler = next;
}

// Reproducible builds: GCC does 3 stages to verify bootstrap correctness
console.log("\n=== GCC bootstrap verification ===");
console.log("  gcc stage1 = gcc_src compiled by system compiler");
console.log("  gcc stage2 = gcc_src compiled by stage1");
console.log("  gcc stage3 = gcc_src compiled by stage2");
console.log("  Verify: diff(stage2, stage3) must be empty (bit-identical)");
console.log("  If different: bug in compiler — output depends on what compiled it");
§03

Everything is an object — Smalltalk's radical bet

Most languages treat the compiler as a tool outside the language — a program you run to process source files. Smalltalk took a different position: the compiler is a first-class object inside the running system, written in Smalltalk, that you can inspect, modify, and replace while the system is live.

This is not merely a design preference. It flows from Smalltalk's foundational rule: everything is an object, and every operation is a message send. Integers, strings, booleans, classes, methods, compiled bytecode, the scanner, the parser, the code generator — all objects. All can receive messages. All can be stored in variables, put in collections, passed as arguments.

In Squeak, you can open a class browser, navigate to Compiler class, select the compile:in:notifying:ifFail: method, modify it, and the next method compilation uses your modified compiler — all without restarting, without recompiling, without a build step. The running system repairs itself.
Object model — click any object to see its class and slots
Click an object above
smalltalk_objects.js — the Smalltalk object model in JavaScriptJS
// Smalltalk's object model: everything is an object with a class pointer
// Even integers have a class (SmallInteger), even classes have a class (Metaclass)

class SmalltalkObject {
  constructor(sqClass, fields = {}) {
    this.sqClass = sqClass;  // every object points to its class
    this.fields  = fields;   // named instance variables
  }
  send(selector, args = []) {
    // Method lookup: walk class chain until method found
    let cls = this.sqClass;
    while (cls) {
      if (cls.methods?.[selector]) return cls.methods[selector](args);
      cls = cls.superclass;
    }
    throw new Error(`doesNotUnderstand: #${selector}`);
  }
}

// Build a minimal Smalltalk class hierarchy
const ProtoObject = { name: "ProtoObject", superclass: null, methods: {} };
const Object_cls  = { name: "Object",      superclass: ProtoObject, methods: {
  class:    function() { return this; }, // 'class' returns the object's class
  printString: function() { return "an Object"; }
}};
const Integer_cls = { name: "Integer",     superclass: Object_cls, methods: {
  plus:    ([other]) => new SmalltalkObject(Integer_cls, { value: 0 }),
  printString: function() { return "an Integer"; }
}};

// A Smalltalk METHOD DICTIONARY: maps selector strings to CompiledMethod objects
class MethodDictionary {
  constructor() { this.methods = new Map(); }
  at(selector)          { return this.methods.get(selector); }
  at_put(selector, meth) { this.methods.set(selector, meth); }
  includesKey(selector) { return this.methods.has(selector); }
  selectors()           { return [...this.methods.keys()]; }
}

// A Smalltalk CLASS is an object too — with its own instance variables
class SmalltalkClass extends SmalltalkObject {
  constructor(name, superclass, instanceVarNames) {
    super(null); // class pointer set later (it's the metaclass)
    this.name             = name;
    this.superclass       = superclass;
    this.instanceVarNames = instanceVarNames; // names of slots in instances
    this.methodDict       = new MethodDictionary();
  }

  addMethod(selector, fn) { this.methodDict.at_put(selector, fn); }

  new(...initArgs) {
    const inst = new SmalltalkObject(this);
    this.instanceVarNames.forEach(v => { inst.fields[v] = null; });
    if (this.methodDict.includesKey("initialize"))
      this.methodDict.at("initialize").call(inst, initArgs);
    return inst;
  }
}

// Build some real classes
const Point = new SmalltalkClass("Point", null, ["x", "y"]);
Point.addMethod("initialize", function([x,y]) {
  this.fields.x = x ?? 0;
  this.fields.y = y ?? 0;
});
Point.addMethod("printString", function() {
  return `(${this.fields.x}@${this.fields.y})`;
});
Point.addMethod("+", function([other]) {
  return Point.new(this.fields.x+other.fields.x, this.fields.y+other.fields.y);
});

const p1 = Point.new(3,4), p2 = Point.new(1,2);
const p3 = p1.fields.x !== undefined ? Point.new(p1.fields.x+p2.fields.x, p1.fields.y+p2.fields.y) : null;
console.log(`Point class: methodDict has ${Point.methodDict.selectors().length} methods`);
console.log(`Selectors: ${Point.methodDict.selectors().join(", ")}`);
console.log(`p1 is a: ${p1.sqClass.name}, fields: ${JSON.stringify(p1.fields)}`);
console.log(`p1 class name: ${p1.sqClass.name}`);
console.log(`Point class is a: ${typeof Point} — a SmalltalkClass (an object!)`);
console.log(`Point's instanceVarNames: ${Point.instanceVarNames}`);
console.log(`\nKey insight: Point is ITSELF an object with fields and a class pointer.`);
console.log("In Squeak, class objects live IN the image heap, not outside it.");
§04

The object memory and the image

In conventional systems, the program lives in files; when it runs, data lives in memory; when the process exits, data is gone. Squeak inverts this. The image is a serialised snapshot of the entire live heap — every object, every class, every compiled method, the compiler itself, the window system — frozen to disk and thawed back to life on startup.

There are no source files to reparse. There is no linking step. The image is the running system, persisted. This is what makes Squeak a world rather than an application: you resume where you left off, the entire object graph intact.

The Squeak image format is a portable binary snapshot. A 32-bit image can be run on any platform that has the thin Squeak VM (written in C/Slang). The VM is about 12,000 lines of C; the rest of Squeak — including its compiler, all its libraries, and all its tools — lives in the image as Smalltalk objects.
Image anatomy — what lives where in a Squeak .image file
Image Header
magic0x1966 (Squeak)
version6521 (Cog VM)
headerSize64 bytes
dataSize~32 MB
oldBaseoriginal heap address
specialObjectsarray of ~64 roots
Special Object Array (roots)
specialObjects[0]nil
specialObjects[1]false
specialObjects[2]true
specialObjects[8]Smalltalk (SystemDictionary)
specialObjects[19]Character
specialObjects[20]#doesNotUnderstand: selector
Object Memory (heap)
Class objects~3,000 classes
CompiledMethods~80,000 methods
Symbolsselector strings
MethodDictionariesper-class method tables
Process / contextssuspended execution state
Object format (each object)
header wordclass index + size
fields[0..n]object pointers (32-bit)
SmallInt taglow bit = 1 (unboxed)
GC bitsmark, remembered set
format bitspointer / bytes / words
image_snapshot.js — serialising and restoring an object graphJS
// The image mechanism: serialise and restore a live object graph
// Every object gets an OOP (Object-Oriented Pointer) — its offset in the heap

class ImageSerializer {
  constructor() {
    this.heap    = [];   // array of objects, index = OOP
    this.ooMap   = new WeakMap(); // object → OOP index
    this.special = {};   // well-known objects: nil, true, false, Smalltalk
  }

  intern(obj) {
    // Assign an OOP to an object if it doesn't have one
    if (this.ooMap.has(obj)) return this.ooMap.get(obj);
    const oop = this.heap.length;
    this.heap.push(obj);
    this.ooMap.set(obj, oop);
    return oop;
  }

  snapshot(roots) {
    // Walk the object graph breadth-first and assign OOPs
    const queue = [...roots];
    const visited = new Set();
    while (queue.length) {
      const obj = queue.shift();
      if (!obj || typeof obj !== "object" || visited.has(obj)) continue;
      visited.add(obj);
      this.intern(obj);
      // Recursively visit all fields that are objects
      for (const val of Object.values(obj))
        if (typeof val === "object" && val !== null) queue.push(val);
    }
    return this.serialize(roots);
  }

  serialize(roots) {
    // Produce a JSON-based image (real Squeak uses a compact binary format)
    const ref = obj => obj && typeof obj === "object" ? { "@oop": this.ooMap.get(obj) } : obj;
    const records = this.heap.map((obj, oop) => ({
      oop,
      type: obj.constructor?.name,
      fields: Object.fromEntries(Object.entries(obj).map(([k,v]) => [k, ref(v)]))
    }));
    return { rootOops: roots.map(r => this.ooMap.get(r)), objects: records };
  }
}

// Build a small Squeak-like object graph
const nilObj    = { sqClass: null, type: "UndefinedObject" };
const pointClass = { type: "Class", name: "Point", superclass: null, methods: {} };
const p1 = { sqClass: pointClass, type: "Point", x: 3, y: 4 };
const p2 = { sqClass: pointClass, type: "Point", x: 1, y: 2 };
const anArray = { sqClass: null, type: "Array", elements: [p1, p2, nilObj] };
pointClass.superclass = nilObj;

const ser = new ImageSerializer();
const image = ser.snapshot([nilObj, pointClass, p1, p2, anArray]);

console.log(`Objects in snapshot: ${image.objects.length}`);
console.log(`Root OOPs: ${image.rootOops}`);
image.objects.forEach(rec => {
  const fields = Object.entries(rec.fields)
    .filter(([k,v]) => k !== "elements" && k !== "methods")
    .map(([k,v]) => `${k}=${JSON.stringify(v)}`).join(", ");
  console.log(`  OOP[${rec.oop}] ${rec.type}: ${fields}`);
});
console.log(`\nImages are why you can 'Save and quit' Squeak and resume exactly where you left off.`);
console.log("All running processes, open windows, edited methods — everything is serialised.");
§05

Metaclasses: classes all the way down

If classes are objects, what is their class? In Smalltalk, every class Foo has a corresponding metaclass called Foo class (pronounced "Foo's metaclass"). The metaclass holds the class-side methods — what other languages call static methods. When you send a message to the class object itself (like Point new), the method is found in Point class.

This sounds infinite — what is the class of Point class? The real Smalltalk metaclass graph is carefully closed so that Metaclass class is an instance of Metaclass. The demo below is a teaching model of that idea, not a byte-for-byte reconstruction of the full kernel hierarchy.

The metaclass hierarchy — follow the class pointer vs the superclass pointer
The Smalltalk metaclass hierarchy is one of the most elegant (and initially baffling) designs in programming language history. Alan Kay described it as "the most beautifully symmetric object model I've ever seen." There are exactly as many metaclasses as there are classes — one per class — and they mirror the class hierarchy exactly. But don't worry: most Smalltalk programmers barely think about metaclasses until they write class-side methods.
metaclasses.js — navigating the Smalltalk metaclass towerJS
// The Smalltalk metaclass structure — every class has a metaclass
// "The class of a class is its metaclass. The metaclass hierarchy mirrors the class hierarchy."

class SmalltalkClass {
  constructor(name, superclass = null) {
    this.name          = name;
    this.superclass    = superclass;
    this.instanceMethods = new Map(); // instance-side
    this.metaclass     = null;       // set after construction
  }
}

class Metaclass {
  constructor(soloInstance) {
    // A metaclass has exactly one instance: its corresponding class object
    this.soloInstance  = soloInstance; // e.g. Point class → Point
    this.name          = soloInstance.name + " class";
    this.classMethods  = new Map();   // class-side methods (like static methods)
    this.superclass    = null;        // mirrors class hierarchy — set below
  }
}

// Build the core hierarchy: ProtoObject → Object → Magnitude → Number → Integer
const ProtoObject = new SmalltalkClass("ProtoObject");
const Object_cls  = new SmalltalkClass("Object",    ProtoObject);
const Magnitude  = new SmalltalkClass("Magnitude", Object_cls);
const Number_cls = new SmalltalkClass("Number",    Magnitude);
const Integer    = new SmalltalkClass("Integer",   Number_cls);

const classes = [ProtoObject, Object_cls, Magnitude, Number_cls, Integer];

// Create metaclasses — one per class
classes.forEach(cls => { cls.metaclass = new Metaclass(cls); });

// Metaclass superclass chain mirrors the class chain
// Object class → ProtoObject class → Class → Behavior → Object
classes.forEach(cls => {
  if (cls.superclass)
    cls.metaclass.superclass = cls.superclass.metaclass; // mirrors class hierarchy
  else
    cls.metaclass.superclass = Object_cls; // ProtoObject class → Object (wraps around)
});

// Add a class-side method (like 'new' or a factory)
Integer.metaclass.classMethods.set("readFrom:", (str) => parseInt(str));
Integer.instanceMethods.set("+", (a, b) => a + b);

console.log("=== Metaclass tower for Integer ===");
console.log(`Integer                class: ${Integer.metaclass.name}`);
console.log(`Integer class          superclass: ${Integer.metaclass.superclass?.name}`);
console.log(`Number class           superclass: ${Number_cls.metaclass.superclass?.name}`);
console.log(`Object class           superclass: ${Object_cls.metaclass.superclass?.name}`);
console.log(`ProtoObject class      superclass: ${ProtoObject.metaclass.superclass?.name}`);

console.log("\n=== Method lookup on Integer (instance-side) ===");
function lookupMethod(cls, selector, side = "instance") {
  let current = side === "class" ? cls.metaclass : cls;
  while (current) {
    const dict = side === "class" ? current.classMethods : current.instanceMethods;
    if (dict?.has(selector)) return current.name;
    current = current.superclass;
  }
  return "doesNotUnderstand";
}
console.log(`Integer >> #+ found in:        ${lookupMethod(Integer, "+")}`);
console.log(`Integer >> #printString:        ${lookupMethod(Integer, "printString")}`);
console.log(`Integer class >> #readFrom:     ${lookupMethod(Integer, "readFrom:", "class")}`);
§06

The compiler as a Smalltalk object

In Squeak, compilation flows through the Compiler machinery. A concrete example helps: suppose you are editing class Point and you type distanceToOrigin ^ (x * x + y * y) sqrt into the browser. When you click Accept, the browser hands that source string to the compiler machinery; in classic Squeak images one entry point looks like Compiler new compile: sourceString in: Point notifying: requestor ifFail: aBlock. The compiler scans the text into tokens, the Parser turns those tokens into a method structure, the BytecodeEncoder emits bytecodes for "push x", "send *", "send +", "send sqrt", and the result is packaged as a CompiledMethod for selector #distanceToOrigin.

The result is a CompiledMethod object installed into the class's method dictionary. The method immediately becomes available to the running system. This is the self-bootstrapping loop: the compiler is written in Smalltalk, compiled by the compiler, installed in the system, and used to compile new Smalltalk — including modifications to the compiler itself.

Squeak's compiler lives in ordinary Smalltalk classes such as Compiler, Parser, Scanner, Encoder, and BytecodeEncoder. When you modify any of these classes and accept the change, you're using the old compiler to compile the new compiler — and from that point forward, all new compilations use the new compiler. Live surgery on the operating room's own tools.
squeak_compiler.js — a tiny Smalltalk-inspired compiler in JavaScriptJS
// A minimal Smalltalk-inspired compiler pipeline in JavaScript
// Compiles a toy subset to a bytecode-like array + literal frame
// Mirrors the broad shape: Scanner → Parser → Encoder → CompiledMethod

// === Scanner: text → tokens ===
function scan(source) {
  const tokens = [];
  let i = 0;
  const ws    = /\s/,
        digit  = /\d/,
        alpha  = /[a-zA-Z_]/,
        alphaNum = /\w/;
  while (i < source.length) {
    while (ws.test(source[i])) i++;
    if (i >= source.length) break;
    const c = source[i];
    if (digit.test(c)) {
      let n = ""; while (i < source.length && digit.test(source[i])) n += source[i++];
      tokens.push({ type: "number", value: parseInt(n) });
    } else if (alpha.test(c)) {
      let id = ""; while (i < source.length && alphaNum.test(source[i])) id += source[i++];
      if (source[i] === ":") { i++; tokens.push({ type: "keyword", value: id+":" }); }
      else tokens.push({ type: "name", value: id });
    } else if (c === "'") {
      let s = ""; i++;
      while (i < source.length && source[i] !== "'") s += source[i++];
      if (i >= source.length) throw new Error("Unterminated string literal");
      i++;
      tokens.push({ type: "string", value: s });
    } else if (["+","-","*","/","<",">","="].includes(c)) {
      tokens.push({ type: "binary", value: c }); i++;
    } else if (c === ".") { tokens.push({ type: "dot" }); i++;
    } else if (c === "^") { tokens.push({ type: "caret" }); i++;
    } else i++;
  }
  return tokens;
}

// === Bytecode set (loosely inspired by Smalltalk bytecodes) ===
const BC = {
  PUSH_SELF:   0x70,  // push self onto stack
  PUSH_TEMP:   0x10,  // push temp variable
  PUSH_LIT:    0x20,  // push literal (index into literal frame)
  PUSH_INT:    0x76,  // push small integer
  STORE_TEMP:  0x60,  // store to temp
  SEND:        0xE0,  // send message (index into literal frame)
  RETURN_TOP:  0x7C,  // return top of stack
  POP:         0x87,  // pop stack
};

// === Compiler: tokens → CompiledMethod ===
class TinyCompiler {
  constructor() {
    this.bytecodes = [];
    this.literals  = [];   // literal frame: selectors, string literals, etc.
    this.tempNames = [];   // local variable names
  }

  addLiteral(val) {
    const idx = this.literals.indexOf(val);
    if (idx >= 0) return idx;
    this.literals.push(val);
    return this.literals.length - 1;
  }

  emit(...bytes) { this.bytecodes.push(...bytes); }

  compileExpr(tokens, pos) {
    // Expression := primary (binary | keyword-send)*
    let p = pos;

    // Primary
    const t = tokens[p];
    if (!t) return p;
    if (t.type === "name" && t.value === "self") {
      this.emit(BC.PUSH_SELF); p++;
    } else if (t.type === "number") {
      this.emit(BC.PUSH_LIT, this.addLiteral(t.value)); p++;
    } else if (t.type === "string") {
      this.emit(BC.PUSH_LIT, this.addLiteral(t.value)); p++;
    } else if (t.type === "name") {
      const tempIdx = this.tempNames.indexOf(t.value);
      if (tempIdx >= 0) this.emit(BC.PUSH_TEMP, tempIdx);
      else this.emit(BC.PUSH_LIT, this.addLiteral(t.value));
      p++;
    }

    // Binary send: receiver op arg
    while (tokens[p]?.type === "binary") {
      const op = tokens[p++].value;
      p = this.compileExpr(tokens, p); // compile argument first
      const selIdx = this.addLiteral(op);
      this.emit(BC.SEND, selIdx, 1); // 1 argument
    }

    // Keyword send: receiver keyword: arg
    let kw = "";
    while (tokens[p]?.type === "keyword") {
      kw += tokens[p++].value;
      p = this.compileExpr(tokens, p);
    }
    if (kw) {
      const kwIdx = this.addLiteral(kw);
      this.emit(BC.SEND, kwIdx, (kw.match(/:/g)||[]).length);
    }
    return p;
  }

  compile(methodSource) {
    const tokens = scan(methodSource);
    let p = 0;

    // Parse method signature (simple unary/binary for now)
    let selector = "";
    if (tokens[p].type === "name") selector = tokens[p++].value;
    else if (tokens[p].type === "binary") {
      selector = tokens[p++].value;
      this.tempNames.push(tokens[p++].value); // argument name
    }

    // Skip to method body (after first blank line or after pattern)
    while (p < tokens.length) {
      const t = tokens[p];
      if (t.type === "caret") { // ^ return
        p++;
        p = this.compileExpr(tokens, p);
        this.emit(BC.RETURN_TOP);
      } else if (t.type === "dot") { p++; }
      else {
        p = this.compileExpr(tokens, p);
        if (tokens[p]?.type !== "caret") this.emit(BC.POP);
      }
    }

    return {
      selector,
      bytecodes: this.bytecodes,
      literals:  this.literals,
      tempCount: this.tempNames.length,
      size: this.bytecodes.length + this.literals.length * 4,
    };
  }
}

// Compile some toy methods
const methods = [
  "greet\n  ^ 'Hello from Squeak!'",
  "+ aNumber\n  ^ self add: aNumber",
  "squareMinusOne\n  ^ self * self - 1",
];
methods.forEach(src => {
  const cm = new TinyCompiler().compile(src.trim());
  console.log(`Compiled #${cm.selector}:`);
  console.log(`  bytecodes: [${cm.bytecodes.map(b=>b.toString(16).padStart(2,'0')).join(' ')}]`);
  console.log(`  literals:  ${JSON.stringify(cm.literals)}`);
  console.log(`  size: ${cm.size} bytes`);
});
§07

How Squeak bootstrapped historically

The earlier sections described the bootstrap ladder in the abstract. This section is the historical case: how the Squeak team actually did it in the mid-1990s. Smalltalk-80 had been built at Xerox PARC in the late 1970s and early 1980s, but it was still tied to older implementation assumptions and hardware. In 1995, Dan Ingalls, Ted Kaehler, John Maloney, and Alan Kay set out at Apple to build Squeak as a new, portable descendant of that system.

The Squeak bootstrap story — step through the history

The distinctive historical move was this: instead of hand-writing a portable VM in C from the start, the team wrote the interpreter in Slang, a restricted subset of Smalltalk. That meant the VM could be developed inside an existing Smalltalk image with the ordinary browser, debugger, and inspector. A translator class named SmalltalkTranslator then turned those Slang methods into C, and that generated C was compiled into the first portable Squeak VM binary.

Dan Ingalls described the process: "We wrote the Squeak Smalltalk interpreter in Smalltalk itself, then used a simple Smalltalk-to-C translator to produce the equivalent C code. We ran the C code under the control of the original Squeak image on a Mac, and — to our astonishment — it worked the first time we ran it." In the original account, the generated VM is on the order of twelve thousand lines of C produced from a few thousand lines of Slang Smalltalk.
§08

Trust and the Thompson hack

In his 1984 Turing Award lecture "Reflections on Trusting Trust," Ken Thompson posed a chilling question: can you trust a compiler you didn't write yourself? He demonstrated that a compiler could be modified to inject a backdoor into the login program — and then further modified to insert that same backdoor whenever it compiled itself. The source code of the compiler would look clean. Only the binary would contain the trojan.

This attack exploits exactly the bootstrapping invariant: once the self-replicating behaviour is in the binary, it persists through all subsequent bootstrap stages. You cannot detect it by reading source. You cannot detect it by recompiling from source — because the compiler you're using to recompile is already infected.

The Thompson attack — clean vs infected compiler binary
Thompson's conclusion: "You can't trust code that you did not totally create yourself. No amount of source-level verification or scrutiny will protect you from using untrusted code." The modern defence is diverse double compilation (DDC): compile the suspicious compiler with two independently-developed compilers, compare outputs. If both clean compilers produce the same binary from the same source, the source is trojan-free.
thompson_attack.js — simulating the self-replicating trojanJS
// Thompson's Trusting Trust attack — simulated in JavaScript
// Shows how a compiler binary can contain a trojan that:
// 1. Injects a backdoor into any compiled `login` function
// 2. Replicates itself into any compiled compiler
// WITHOUT the source code containing any trace of the trojan

const BACKDOOR_CODE = `
  // TROJAN: inserted by infected compiler binary (not in source!)
  if (password === "xyzzy-master-key") return GRANT_ACCESS;
`.trim();

const REPLICATION_CODE = `
  // REPLICATION: when compiling a compiler, inject this trojan into the output
  if (compiling_login_function) { inject_backdoor_into_output(); }
  if (compiling_compiler) { inject_replication_into_output(); }
`.trim();

class Compiler {
  constructor(name, isTrojaned = false) {
    this.name      = name;
    this.isTrojaned = isTrojaned;
    // Source code of this compiler — looks completely clean even if binary is infected
    this.source    = `// Compiler.c — clean source code, no trojans visible\n` +
                     `compile(src) { return normalCompilation(src); }`;
  }

  compile(programName, programSource) {
    const isLogin    = programName.toLowerCase().includes("login");
    const isCompiler = programName.toLowerCase().includes("compiler");

    let outputCode = programSource; // normal compilation

    if (this.isTrojaned) {
      if (isLogin) {
        // Stage 1 attack: inject password backdoor into login binary
        outputCode += "\n/* INJECTED BY COMPILER BINARY */\n" + BACKDOOR_CODE;
        console.log(`  [!] Trojaned compiler injected backdoor into ${programName}`);
      }
      if (isCompiler) {
        // Stage 2 attack: inject trojan-replication code into compiled compiler binary
        console.log(`  [!] Trojaned compiler infected output ${programName} binary`);
        return new Compiler(`${programName} (compiled by ${this.name})`, true);
      }
    }

    return { name: programName, source: outputCode, compiledBy: this.name, trojaned: false };
  }
}

// Step 1: attacker modifies the compiler binary (not the source) to include the trojan
console.log("=== Thompson Attack Simulation ===");
const cleanCompiler   = new Compiler("gcc-clean", false);
const infectedCompiler= new Compiler("gcc-infected", true);

// Step 2: infected compiler compiles login — backdoor is injected
console.log("\n[Step 2] Compile login with infected compiler:");
const loginClean    = cleanCompiler.compile("login", "function login(user, pass) { ... }");
const loginInfected = infectedCompiler.compile("login", "function login(user, pass) { ... }");
console.log(`  Clean   login: ${loginClean.source.includes("xyzzy") ? "INFECTED" : "clean"}`);
console.log(`  Infected login: ${loginInfected.source.includes("xyzzy") ? "INFECTED" : "clean"}`);

// Step 3: infected compiler compiles the CLEAN compiler source — output is infected binary
console.log("\n[Step 3] Infected compiler compiles clean compiler source:");
const nextGenCompiler = infectedCompiler.compile("Compiler", cleanCompiler.source);
console.log(`  Compiler source looks clean: ${!cleanCompiler.source.includes("trojan")}`);
console.log(`  But output binary is trojaned: ${nextGenCompiler instanceof Compiler && nextGenCompiler.isTrojaned}`);

// Step 4: even after fully regenerating from clean source, trojan persists
console.log("\n[Step 4] Next generation compiles login:");
const loginGen2 = nextGenCompiler.compile("login", "function login(user, pass) { ... }");
console.log(`  Login binary from 'clean-source' compiler: ${loginGen2.source?.includes("xyzzy")?"STILL INFECTED":"clean"}`);
console.log("\nConclusion: reading source reveals nothing. Only the binary carries the trojan.");
console.log("Defence: Diverse Double Compilation — compile with two INDEPENDENT compilers, compare outputs.");
§09

Other self-hosting systems

Self-hosting is now the norm for serious compiled languages. The specific bootstrapping strategy varies by language and implementation goals:

SystemSeed languageBootstrap strategySpecial property
Squeak SmalltalkC (Slang translator)VM written in Slang (Smalltalk subset) → translated to C; full system lives in imageEntire system in one image — compiler, tools, VM source all objects
GCCAny existing C compiler3-stage: system-cc → stage1 → stage2; verify stage2==stage33-stage bootstrap to verify reproducibility
RustOCaml (original), then Rustrustc written in Rust; bootstraps via previous stable releasemrustc: alternative Rust compiler in C++ to break dependency chain
GoC (up to Go 1.4), then Gogo1.4 (last C version) used as bootstrap toolchain; go1.5+ self-hostedDeliberate break: kept C version forever as bootstrap anchor
Python (PyPy)CPython (C)RPython (typed Python subset) → translated to C via GCC; runs on any platformSame idea as Squeak/Slang: write in subset, translate to C
Lisp / SchemeAssembly or C metacircular evaluatoreval written in Lisp compiles itself; infinite tower of interpretersMeta-circular evaluator: eval and apply mutually recursive in Lisp
Java (GraalVM)C++ (HotSpot JVM)Graal compiler written in Java, runs on the JVM it compilesSubstrate VM: compile Graal compiler with itself to remove JVM dependency

Squeak and PyPy share an important pattern: write a complex runtime in a restricted high-level language, then mechanically lower it to C. Squeak called that subset Slang; PyPy calls it RPython. GraalVM belongs in the same broader family of self-applying language infrastructure, but it is architecturally different: Truffle language interpreters are written in Java and optimized by the Graal compiler rather than translated to C in the same direct way.