Data structures in practice

Ropes and Piece Tables
— the editor’s data structure

How does a code editor handle a 10 MB file where every keystroke mutates the text? The answer isn’t a string.

Sections 7 Runnable demos 6 Libraries none Examples VS Code, Xi, Emacs, AbiWord

§01

The keystroke problem

Open a 10 MB source file in VS Code and start editing. Every character you type, every delete, every paste — the editor must handle each as a near-instant operation. No perceptible lag. No stuttering. And after ten thousand edits, undo must still work.

The challenge is that text editing is a sequence of small, frequent mutations to an arbitrarily large buffer. Insert one character at position 50,000. Delete three characters at position 120,000. Paste 400 characters at position 8. These operations must each complete in microseconds, not milliseconds.

The naive approach — storing text as a single contiguous string — turns every insert into an O(n) memory copy. For a 10 MB file, inserting a single character near the start means shifting ~10 million bytes. At 120 Hz, that’s not going to work.

The solution is to stop thinking about the text as a single object and start thinking about it as a set of references to immutable fragments. Real editors use a small handful of specialized text-buffer structures; this explainer focuses on two influential ones: the rope and the piece table.

The scale of the problem — adjust file size and see the cost of a naive insert

File size 10 MB

Insert position 5% from start

insert_cost.js — measuring string insert performance

// Why naive string mutation is expensive
// In JS, strings are immutable — every "mutation" creates a new string

function naiveInsert(str, pos, ch) {
  // String concatenation: O(n) — copies the whole string
  return str.slice(0, pos) + ch + str.slice(pos);
}

function benchmark(sizeMB, insertPos) {
  const n      = sizeMB * 1024 * 1024;
  const str    = "a".repeat(n);
  const pos    = Math.floor(n * insertPos);
  const bytes  = n - pos; // bytes that must shift right

  // Estimate time: modern JS engines copy ~4GB/s
  const copySpeedBytesPerMs = 4 * 1024 * 1024; // 4 MB/ms
  const estimatedMs = bytes / copySpeedBytesPerMs;

  console.log(`File: ${sizeMB} MB, insert at ${(insertPos*100).toFixed(0)}%`);
  console.log(`  Bytes to copy: ${(bytes/1024/1024).toFixed(2)} MB`);
  console.log(`  Estimated time: ${estimatedMs.toFixed(2)} ms per keystroke`);
  console.log(`  At 60 fps, budget per frame: 16.7 ms`);
  console.log(`  Headroom: ${(16.7 - estimatedMs).toFixed(2)} ms (${estimatedMs > 16.7 ? "OVER BUDGET" : "ok"})`);
}

benchmark(1,  0.05);  // 1 MB file, insert near start
console.log("");
benchmark(10, 0.05);  // 10 MB file, insert near start
console.log("");
benchmark(100, 0.05); // 100 MB file, insert near start

// The point: cost scales with (file_size - insert_position).
// Inserting near the start of a large file is catastrophically slow.

§02

Why strings fail

Before the solutions, it’s worth understanding exactly what fails and why. Text editors need five core operations to be fast: insert, delete, get character at index, get substring (for rendering), and undo/redo.

Structure	Insert/Delete	Index access	Substring	Undo	Memory
String / Array	O(n)	O(1)	O(k)	Copy whole string	1×
Gap buffer	O(1) near gap	O(1)	O(k)	Complex	1×
Rope	O(log n)	O(log n)	O(log n + k)	Share subtrees	~2×
Piece table	O(log n)	O(log n)	O(log n + k)	Append-only log	~1×

Watch what happens when you insert a character into a plain array at position 3 — every element after it must shift right by one slot:

Array insert — visualise the shift cost

Original array: 9 characters, each stored at a fixed memory offset.

The gap buffer — used by Emacs — is a clever patch: keep a “gap” of empty space at the cursor position so inserts there are O(1). But moving the cursor still requires moving the gap, which is O(n). Both ropes and piece tables take a more radical approach: never mutate the original text at all.

gap_buffer.js — the gap buffer: Emacs’s approach

// Gap buffer: keep a hole at the cursor for O(1) local inserts
// Used by Emacs, older versions of many editors

class GapBuffer {
  constructor(initial, gapSize = 8) {
    this.buf = [...initial, ...new Array(gapSize).fill(null)];
    this.gapStart = initial.length;
    this.gapEnd   = this.buf.length;
  }

  _moveGapTo(pos) {
    // Shift gap to cursor — O(|pos - gapStart|)
    if (pos < this.gapStart) {
      const shift = this.gapStart - pos;
      for (let i = 0; i < shift; i++) {
        this.buf[this.gapEnd - 1 - i] = this.buf[this.gapStart - 1 - i];
        this.buf[this.gapStart - 1 - i] = null;
      }
    } else {
      const shift = pos - this.gapStart;
      for (let i = 0; i < shift; i++) {
        this.buf[this.gapStart + i] = this.buf[this.gapEnd + i];
        this.buf[this.gapEnd + i] = null;
      }
    }
    this.gapEnd   += pos - this.gapStart;
    this.gapStart  = pos;
  }

  insert(pos, ch) {
    this._moveGapTo(pos);
    this.buf[this.gapStart++] = ch; // O(1) — just fill next gap slot
  }

  toString() {
    return [...this.buf.slice(0, this.gapStart), ...this.buf.slice(this.gapEnd)]
      .filter(c => c !== null).join("");
  }

  debug() {
    return this.buf.map(c => c === null ? "_" : c).join("|");
  }
}

const gb = new GapBuffer("Hello world".split(""));
console.log("Initial buffer:", gb.debug());
console.log("Text:", gb.toString());

gb.insert(5, ",");
console.log("\nAfter insert ',' at 5:");
console.log("Buffer:", gb.debug());
console.log("Text:", gb.toString());

gb.insert(6, " ");
gb.insert(7, "b");
gb.insert(8, "i");
gb.insert(9, "g");
console.log("\nAfter typing ' big':");
console.log("Buffer:", gb.debug());
console.log("Text:", gb.toString());

console.log("\nLimitation: typing at position 0 (far from gap):");
console.log("Gap must travel from position 9 to 0 — O(n) shift");

§03

Ropes

A rope is a binary tree whose leaves contain short string fragments. Each internal node stores the total length of all text in its left subtree (the “weight”). Together these weights let you navigate to any character in O(log n) by walking the tree — no shifting required.

The weight is not arbitrary metadata; it is literally how many characters live to the left of this node’s split point. If an internal node joins a left subtree representing "Hello, " (7 chars) and a right subtree representing "world!" (6 chars), the node’s weight is 7. To look up character i, compare i to the weight: if i < weight, go left; otherwise go right with index i - weight. Every internal node repeats the same rule, so the weights act like signposts through the tree.

Edits work by cutting and rejoining tree boundaries, not by moving characters around in memory. To insert at position p, split the rope into [0,p) and [p,n), build a small rope for the inserted text, then concatenate left + inserted + right. To delete [start,end), split once at start, split the remainder again at end - start, discard the middle piece, and concatenate the two outer pieces. In a balanced rope, both are O(log n); without rebalancing, repeated edits can make the tree taller and degrade performance.

Ropes are persistent by nature. Because you’re building new tree nodes rather than mutating old ones, old versions of the rope remain valid. Undo is just keeping a pointer to the old root — no copy needed.

Rope structure — click a node to inspect it

Rope edits — insert and delete as split + concat

rope.js — a minimal rope implementation with split, concat, insert, delete

// Rope: a binary tree of string fragments
// Internal nodes store weight (length of left subtree)
// Leaves store actual string chunks (≤ MAX_LEAF chars)
// This demo keeps the core rope operations but omits rebalancing.

const MAX_LEAF = 8; // in practice ~4096 bytes per leaf

class RopeLeaf {
  constructor(text) { this.text = text; this.length = text.length; }
  charAt(i) { return this.text[i]; }
  toString() { return this.text; }
}

class RopeNode {
  constructor(left, right) {
    this.left   = left;
    this.right  = right;
    this.weight = left.length; // length of LEFT subtree only
    this.length = left.length + right.length;
  }
  charAt(i) {
    // Navigate by weight: if i is in left subtree, go left; else adjust and go right
  return i < this.weight
      ? this.left.charAt(i)
      : this.right.charAt(i - this.weight);
  }
  toString() { return this.left.toString() + this.right.toString(); }
}

function ropeFrom(text) {
  // Build a balanced rope from a string by recursive halving
  if (text.length <= MAX_LEAF) return new RopeLeaf(text);
  const mid = Math.floor(text.length / 2);
  return new RopeNode(ropeFrom(text.slice(0, mid)), ropeFrom(text.slice(mid)));
}

function split(rope, i) {
  // Returns [left, right] ropes where left has i characters
  if (rope instanceof RopeLeaf) {
    return [new RopeLeaf(rope.text.slice(0, i)),
             new RopeLeaf(rope.text.slice(i))];
  }
  if (i === rope.weight) return [rope.left, rope.right];
  if (i < rope.weight) {
    const [ll, lr] = split(rope.left, i);
    return [ll, new RopeNode(lr, rope.right)];
  } else {
    const [rl, rr] = split(rope.right, i - rope.weight);
    return [new RopeNode(rope.left, rl), rr];
  }
}

function concat(a, b) {
  if (a.length === 0) return b;
  if (b.length === 0) return a;
  // Educational simplification: just create a node; no rebalancing step here.
  return new RopeNode(a, b);
}

function insert(rope, pos, text) {
  const [left, right] = split(rope, pos);
  return concat(concat(left, ropeFrom(text)), right);
  // 3 new nodes created. Original rope untouched → undo for free.
}

function deleteRange(rope, start, end) {
  const [left, rest] = split(rope, start);
  const [, right]   = split(rest, end - start);
  return concat(left, right);
}

function height(rope) {
  if (rope instanceof RopeLeaf) return 1;
  return 1 + Math.max(height(rope.left), height(rope.right));
}

// --- Demo ---
let r = ropeFrom("Hello, world!");
console.log("Initial rope:", r.toString(), `(height=${height(r)})`);
console.log("charAt(7):", r.charAt(7));

// Insert "beautiful " before "world"
r = insert(r, 7, "beautiful ");
console.log("\nAfter insert:", r.toString(), `(height=${height(r)})`);

// Delete "beautiful "
r = deleteRange(r, 7, 17);
console.log("After delete:", r.toString());

// Persistent undo: original rope is still intact
const v1 = ropeFrom("version 1");
const v2 = insert(v1, 0, "[edited] ");
console.log("\nPersistence demo:");
console.log("  v1:", v1.toString(), "← still valid");
console.log("  v2:", v2.toString(), "← new root, shares v1 leaves");
console.log("  Both refer to the same leaf objects — no copying.");

§04

Piece tables

The piece table is VS Code’s choice (described in their 2018 blog post). It’s simpler than a rope conceptually, and in practice just as fast.

The core insight: keep two buffers — the original buffer (the file as loaded, immutable) and the add buffer (an append-only log of all inserted text). Then represent the document as an ordered list of pieces, each specifying: which buffer, at what offset, for how many characters.

No text is ever deleted or modified in the buffers. To “delete” text, you split the piece that contains it and discard the middle piece. The buffer content itself never changes — making crash recovery trivial.

Piece table — step through building a document from edits

piece_table_basics.js — the core data model

// Piece table — core data model (no tree yet, just the concept)

class SimplePieceTable {
  constructor(original) {
    this.orig  = original;    // immutable original buffer
    this.added = "";          // append-only add buffer
    // pieces: { buf: "orig"|"add", start, length }
    this.pieces = original.length > 0
      ? [{ buf: "orig", start: 0, length: original.length }]
      : [];
  }

  _buf(piece) {
    return piece.buf === "orig" ? this.orig : this.added;
  }

  length() { return this.pieces.reduce((s, p) => s + p.length, 0); }

  toString() {
    return this.pieces.map(p => this._buf(p).slice(p.start, p.start + p.length)).join("");
  }

  _findPiece(pos) {
    // Walk pieces to find which piece contains logical position `pos`
    // (in practice: binary search on a red-black tree of piece lengths)
    let offset = 0;
    for (let i = 0; i < this.pieces.length; i++) {
      const p = this.pieces[i];
      if (pos <= offset + p.length) return { idx: i, offsetInPiece: pos - offset };
      offset += p.length;
    }
    return { idx: this.pieces.length, offsetInPiece: 0 };
  }

  insert(pos, text) {
    // 1. Append text to the add buffer
    const addStart = this.added.length;
    this.added += text;
    const newPiece = { buf: "add", start: addStart, length: text.length };

    // 2. Find which existing piece contains `pos`, split it
    const { idx, offsetInPiece } = this._findPiece(pos);
    if (offsetInPiece === 0) {
      // Insert at piece boundary — no split needed
      this.pieces.splice(idx, 0, newPiece);
    } else {
      const p = this.pieces[idx];
      const left  = { ...p, length: offsetInPiece };
      const right = { ...p, start: p.start + offsetInPiece, length: p.length - offsetInPiece };
      this.pieces.splice(idx, 1, left, newPiece, right);
    }
    console.log(`insert("${text}", ${pos}) → ${this.pieces.length} pieces → "${this.toString()}"`);
  }

  delete(start, len) {
    // Find pieces that overlap [start, start+len) and trim/remove them
    const end = start + len;
    const { idx: si, offsetInPiece: so } = this._findPiece(start);
    const { idx: ei, offsetInPiece: eo } = this._findPiece(end);

    const newPieces = [];
    if (so > 0) newPieces.push({ ...this.pieces[si], length: so });
    if (eo > 0 && ei < this.pieces.length) {
      const p = this.pieces[ei];
      newPieces.push({ ...p, start: p.start + eo, length: p.length - eo });
    }
    this.pieces.splice(si, ei - si + (eo > 0 ? 1 : 0), ...newPieces);
    console.log(`delete(${start}, ${len}) → ${this.pieces.length} pieces → "${this.toString()}"`);
  }
}

const pt = new SimplePieceTable("The quick brown fox");
console.log("Initial:", pt.toString(), `[${pt.pieces.length} piece]`);
console.log("");

pt.insert(4, "very ");       // "The very quick brown fox"
pt.delete(9, 6);            // delete "quick " → "The very brown fox"
pt.insert(9, "speedy ");    // "The very speedy brown fox"

console.log("\nAdd buffer contents:", '"' + pt.added + '"');
console.log("(original buffer never modified)");

§05

Operations in detail

The piece list above is a linked list — fine conceptually, but O(n) for finding which piece contains a given position. VS Code’s production implementation stores pieces in a red-black tree, where each node caches the total length of its left subtree (just like a rope’s weight). This makes all operations O(log n).

Rendering a line

To render line 5000, the editor needs the character range for that line. VS Code additionally caches line feed counts in the tree — each piece node stores how many \n characters are in its left subtree. A binary search on these counts locates the right piece in O(log n), then scans within the piece for the line boundary.

Undo and redo

Undo is the killer feature of the piece table. Because the add buffer is append-only and pieces are small structs, the undo history can store either edit records or piece list snapshots. This explainer uses snapshots of the piece list. That is still much cheaper than copying the whole document, because the original and add buffers remain immutable and shared.

Interactive piece table — type in the editor to watch pieces update live

Edit here

Piece list

Note: this demo uses a simplified piece table without the red-black tree. Position lookup is O(n) — fine for small demos, not for 10MB files.

piece_table_undo.js — undo/redo via piece list snapshots

// Undo/redo via piece-list snapshots
// The text buffers stay immutable; undo swaps metadata, not document copies

class UndoPieceTable {
  constructor(text) {
    this.orig    = text;
    this.added   = "";
    this.pieces  = text ? [{ buf:"orig", start:0, length:text.length }] : [];
    this.history = [];  // stack of {pieces, addedLen} snapshots
    this.future  = [];  // redo stack
  }

  _snapshot() {
    // Save the current piece list metadata.
    // This copies piece descriptors, not the full document text.
    this.history.push({ pieces: [...this.pieces], addedLen: this.added.length });
    this.future = []; // clear redo stack on new edit
  }

  _buf(p) { return p.buf === "orig" ? this.orig : this.added; }
  toString() { return this.pieces.map(p => this._buf(p).slice(p.start, p.start+p.length)).join(""); }

  _findPiece(pos) {
    let off = 0;
    for (let i = 0; i < this.pieces.length; i++) {
      if (pos <= off + this.pieces[i].length) return {i, o: pos-off};
      off += this.pieces[i].length;
    }
    return {i: this.pieces.length, o: 0};
  }

  insert(pos, text) {
    this._snapshot();
    const as = this.added.length; this.added += text;
    const {i, o} = this._findPiece(pos);
    const np = {buf:"add", start:as, length:text.length};
    if (o === 0) { this.pieces.splice(i,0,np); return; }
    const p = this.pieces[i];
    this.pieces.splice(i, 1, {...p,length:o}, np, {...p,start:p.start+o,length:p.length-o});
  }

  undo() {
    if (!this.history.length) { console.log("Nothing to undo"); return; }
    this.future.push({ pieces:[...this.pieces], addedLen:this.added.length });
    const snap = this.history.pop();
    this.pieces = snap.pieces;
    // We don't truncate the add buffer — just ignore the extra bytes
    // They'll be reused if redo is called, or orphaned if not
  }

  redo() {
    if (!this.future.length) { console.log("Nothing to redo"); return; }
    this.history.push({ pieces:[...this.pieces], addedLen:this.added.length });
    const snap = this.future.pop();
    this.pieces = snap.pieces;
  }
}

const pt = new UndoPieceTable("Hello world");
console.log("Start:  ", pt.toString());

pt.insert(5, ", beautiful");
console.log("Edit 1: ", pt.toString());

pt.insert(0, "[DRAFT] ");
console.log("Edit 2: ", pt.toString());

pt.undo();
console.log("Undo 1: ", pt.toString());

pt.undo();
console.log("Undo 2: ", pt.toString(), "← back to original");

pt.redo();
console.log("Redo:   ", pt.toString());

console.log("\nAdd buffer (never truncated):", '"'+pt.added+'"');
console.log(`Undo cost: copy ${pt.pieces.length} piece structs — O(pieces), not O(text)`);

§06

Full implementation

Below is a complete educational piece table with a sorted array and a cached cumulative-offset index. It supports insert, delete, undo, redo, charAt, substring, line counting, and line-by-line access. Reads use binary search over cached offsets, but edits still mutate arrays and rebuild cached metadata, so this is not asymptotically equivalent to VS Code’s tree-backed implementation.

piece_table_full.js — complete educational piece table (~150 lines)

// Complete educational piece table implementation
// VS Code uses a red-black tree for O(log n) lookups and updates.
// Here we binary-search cached offsets, but edits still cost O(pieces).

class PieceTable {
  constructor(text = "") {
    this.orig    = text;
    this.added   = "";
    this.pieces  = text.length ? [{ buf:0, start:0, len:text.length }] : [];
    this._cache  = null;   // cumulative offset cache, invalidated on mutation
    this.history = [];
    this.future  = [];
  }

  // --- Internal helpers ---

  _raw(p) { return p.buf === 0 ? this.orig : this.added; }

  _offsets() {
    // Build cumulative offset array: offsets[i] = start of pieces[i] in logical doc
    if (this._cache) return this._cache;
    const offsets = [0];
    for (const p of this.pieces) offsets.push(offsets.at(-1) + p.len);
    return (this._cache = offsets);
  }

  _find(pos) {
    // Binary search: which piece contains logical offset `pos`?
    const offs = this._offsets();
    let lo = 0, hi = this.pieces.length - 1;
    while (lo <= hi) {
      const mid = (lo + hi) >> 1;
      if      (pos < offs[mid])      hi = mid - 1;
      else if (pos >= offs[mid+1])  lo = mid + 1;
      else    return { idx: mid, off: pos - offs[mid] };
    }
    return { idx: this.pieces.length, off: 0 };
  }

  _invalidate() { this._cache = null; }

  _snap() {
    this.history.push(this.pieces.map(p => ({...p})));
    this.future = [];
    this._invalidate();
  }

  // --- Public API ---

  get length() { return this._offsets().at(-1); }

  charAt(pos) {
    const {idx, off} = this._find(pos);
    if (idx >= this.pieces.length) return "";
    const p = this.pieces[idx];
    return this._raw(p)[p.start + off];
  }

  substring(start, end) {
    // Collect all piece fragments that overlap [start, end)
    let result = "";
    const offs  = this._offsets();
    for (let i = 0; i < this.pieces.length; i++) {
      const pStart = offs[i], pEnd = offs[i+1];
      if (pEnd <= start || pStart >= end) continue;
      const p = this.pieces[i];
      const s = Math.max(0, start - pStart);
      const e = Math.min(p.len, end - pStart);
      result += this._raw(p).slice(p.start+s, p.start+e);
    }
    return result;
  }

  insert(pos, text) {
    if (!text.length) return;
    this._snap();
    const addStart = this.added.length; this.added += text;
    const np = { buf:1, start:addStart, len:text.length };
    const {idx, off} = this._find(pos);
    if (off === 0) { this.pieces.splice(idx, 0, np); return; }
    const p = this.pieces[idx];
    this.pieces.splice(idx, 1,
      { buf:p.buf, start:p.start,      len:off },
      np,
      { buf:p.buf, start:p.start+off,  len:p.len-off }
    );
  }

  delete(start, len) {
    if (len <= 0) return;
    this._snap();
    const end = start + len;
    const offs = this._offsets();
    const result = [];
    for (let i = 0; i < this.pieces.length; i++) {
      const pS = offs[i], pE = offs[i+1], p = this.pieces[i];
      if (pE <= start || pS >= end) { result.push(p); continue; }
      if (pS < start) result.push({ ...p, len: start-pS });
      if (pE > end)   result.push({ ...p, start: p.start+(end-pS), len: pE-end });
    }
    this.pieces = result;
    this._invalidate();
  }

  undo() {
    if (!this.history.length) return false;
    this.future.push(this.pieces);
    this.pieces = this.history.pop();
    this._invalidate(); return true;
  }

  redo() {
    if (!this.future.length) return false;
    this.history.push(this.pieces);
    this.pieces = this.future.pop();
    this._invalidate(); return true;
  }

  lineCount() {
    let count = 1;
    for (const p of this.pieces)
      for (let i = p.start; i < p.start+p.len; i++)
        if (this._raw(p)[i] === "\n") count++;
    return count;
  }

  getLine(n) {
    // Return the text of the nth line (0-indexed)
    let line = 0, lineStart = 0;
    const offs = this._offsets();
    for (let i = 0; i < this.pieces.length; i++) {
      const p = this.pieces[i], raw = this._raw(p);
      for (let j = 0; j < p.len; j++) {
        if (raw[p.start+j] === "\n") {
          if (line === n) return this.substring(lineStart, offs[i]+j);
          lineStart = offs[i]+j+1; line++;
        }
      }
    }
    return line === n ? this.substring(lineStart, this.length) : null;
  }

  toString() { return this.substring(0, this.length); }
}

// --- Demo: simulate editing a source file ---
const src = `function greet(name) {\n  return "Hello, " + name;\n}`;
const pt = new PieceTable(src);
console.log("=== Original file ===");
console.log(pt.toString());
console.log(`Lines: ${pt.lineCount()}, Length: ${pt.length}`);

// Edit 1: rename function
pt.delete(9, 5);           // delete "greet"
pt.insert(9, "welcome");   // insert "welcome"

// Edit 2: change return value
pt.delete(33, 8);
pt.insert(33, "Welcome, ");

console.log("\n=== After edits ===");
console.log(pt.toString());
console.log(`Pieces: ${pt.pieces.length}, Add buffer: "${pt.added}"`);

console.log("\n=== Line access ===");
for (let i = 0; i < pt.lineCount(); i++)
  console.log(`  line ${i}: ${pt.getLine(i)}`);

// Undo back to original
while (pt.undo());
console.log("\n=== After full undo ===");
console.log(pt.toString());
console.log("Add buffer still holds all text: " + '"'+pt.added+'"');

§07