Developer Tools · Storage · History

How Git Stores History

Git feels like a source-control tool, but its core implementation is closer to a content-addressed object store with a few carefully chosen pointer types. Files become blobs, directories become trees, commits point to snapshots and parents, and branches are just movable references. That design is why commits are cheap, branching is cheap, and history becomes a graph instead of a mutable folder full of versions.

Problem track versions without copying whole repos Idea immutable objects plus movable refs Implementation tiny Git-like repo in JavaScript

§01

The model: snapshots, not mutable folder copies

A naive version-control system could copy every file into a new directory for every revision. That works for a while, but it wastes space and makes branching expensive. Git takes a different approach: each version is represented by a graph of immutable objects, and those objects are reused whenever content has not changed.

The key mental shift is this: a commit is not “a diff file.” It is a pointer to a tree object that describes the project snapshot, plus metadata and parent commit IDs. Diffs are computed later by comparing snapshots; they are not the primary storage format for ordinary history.

Git version control is built on three ideas working together: immutable content-addressed objects, a staging area that prepares the next tree, and lightweight references that name commits.

§02

Blobs, trees, and commits

A file’s contents become a blob. A directory becomes a tree listing filenames and the object IDs they point to. A commit points to a root tree and to one or more parent commits. Because objects are immutable and keyed by their content hash, unchanged files and unchanged subtrees can be reused across commits.

The commit process is a small pipeline. First, git add turns the file contents you staged into blob objects and records, in the index, which paths should point to which blobs. Then git commit walks that staged path table, writes tree objects for directories from the bottom up, and finally writes one commit object containing the root tree ID, parent commit ID(s), author/message metadata, and timestamps.

Nothing inside the old commit is edited. Git just creates a new commit object and then moves the current branch reference to point at it. That is why history is tracked as an object graph rather than as mutable folders: each commit is a frozen snapshot, and “the latest version” is simply whichever commit your branch name points to now.

This also explains reuse. If README.md did not change, its blob hash is the same, so the next tree can point at the exact same blob object as the previous commit. If an entire directory is unchanged, Git can reuse that subtree too. Only the changed paths need new objects.

Object graph for three snapshots

commit tree blob reused object

What This Commit Does

objects.js — blobs, trees, commits, and deduplication

function fakeHash(type, payload) {
  const text = type + "\0" + payload;
  let h = 2166136261;
  for (let i = 0; i < text.length; i++) {
    h ^= text.charCodeAt(i);
    h = Math.imul(h, 16777619);
  }
  return (h >>> 0).toString(16).padStart(8, "0");
}

class ObjectStore {
  constructor() { this.objects = new Map(); }

  put(type, payload) {
    const id = fakeHash(type, JSON.stringify(payload));
    if (!this.objects.has(id)) this.objects.set(id, { type, payload });
    return id;
  }
}

const store = new ObjectStore();

function blob(text) { return store.put("blob", text); }
function tree(entries) { return store.put("tree", entries); }
function commit(treeId, parents, message) {
  return store.put("commit", { treeId, parents, message });
}

const readme1 = blob("hello\n");
const app1 = blob("console.log('v1');\n");
const src1 = tree([{ name: "app.js", type: "blob", id: app1 }]);
const root1 = tree([
  { name: "README.md", type: "blob", id: readme1 },
  { name: "src", type: "tree", id: src1 }
]);
const c1 = commit(root1, [], "initial");

const readme2 = blob("hello\n"); // same contents, same object
const app2 = blob("console.log('v2');\n");
const src2 = tree([{ name: "app.js", type: "blob", id: app2 }]);
const root2 = tree([
  { name: "README.md", type: "blob", id: readme2 },
  { name: "src", type: "tree", id: src2 }
]);
const c2 = commit(root2, [c1], "update app");

console.log("commit 1:", c1);
console.log("commit 2:", c2);
console.log("README reused:", readme1 === readme2);
console.log("total objects:", store.objects.size);

§03

Why branches are cheap

Git branches feel substantial in the UI, but internally a branch is just a name pointing at one commit. Creating a branch does not copy history. It only adds another reference. As new commits are made on that branch, the reference moves to the new tip commit, while the old commits remain in place.

This is why branching is so cheap compared with centralized systems that historically treated branches as heavyweight server-side structures.

Refs moving across the same commit graph

Refs

A branch is just a ref. HEAD is another ref-like pointer telling Git which branch or commit is currently checked out.

refs.js — branch creation and ref movement

const commits = {
  a1: { parents: [], msg: "initial" },
  b2: { parents: ["a1"], msg: "main work" },
  c3: { parents: ["b2"], msg: "feature work" },
  d4: { parents: ["b2"], msg: "hotfix on main" },
};

const refs = {
  "refs/heads/main": "b2",
  HEAD: "refs/heads/main",
};

function createBranch(name, target) {
  refs[`refs/heads/${name}`] = target;
}

function commitOnBranch(name, newCommitId) {
  refs[`refs/heads/${name}`] = newCommitId;
}

console.log("start:", JSON.stringify(refs));
createBranch("feature", refs["refs/heads/main"]);
console.log("after branch:", JSON.stringify(refs));
commitOnBranch("feature", "c3");
console.log("feature moved:", JSON.stringify(refs));
commitOnBranch("main", "d4");
console.log("main moved independently:", JSON.stringify(refs));

§04

The index: why `git add` exists

The index, or staging area, is the missing piece for many people learning Git. It is not the same as the working tree, and it is not yet the next commit. It is a separate table describing what tree Git should write when you commit.

That design lets Git stage part of your changes, leave other edits unstaged, and build the next snapshot deliberately instead of automatically committing the entire working directory.

Working tree vs index vs commit

The index is why Git can say “changes not staged for commit” and “changes to be committed” at the same time. That is not UI fluff. It reflects a real internal structure.

§05

Real Git goes further: packfiles, delta compression, reachability

The simplified model above is enough to explain normal version control behavior, but real Git does more. Loose objects are often repacked into packfiles for space and transfer efficiency. Within a pack, Git may delta-compress similar objects to save disk and network bandwidth.

Git also relies on reachability. Objects remain in the repository as long as some ref, reflog entry, or other reachable object graph can still get to them. Garbage collection later prunes unreachable objects that are no longer needed.

The everyday Git experience comes from a small, elegant model. The performance comes from layers built around that model, not from changing the model itself.

§06

Implementation: a tiny Git-like repository

This final snippet combines the pieces: a content-addressed object store, an index, branches as refs, and commits as immutable snapshots. It is not Git, but it is close enough to show why commands like add, commit, branch, and checkout fit together the way they do. The simplified checkout below also reloads the working tree and index from the checked-out commit so branch state does not leak across checkouts.

mini_git.js — add, commit, branch, checkout, log

function fakeHash(type, payload) {
  const text = type + "\0" + payload;
  let h = 2166136261;
  for (let i = 0; i < text.length; i++) {
    h ^= text.charCodeAt(i);
    h = Math.imul(h, 16777619);
  }
  return (h >>> 0).toString(16).padStart(8, "0");
}

class MiniGit {
  constructor() {
    this.objects = new Map();
    this.working = {};
    this.index = {};
    this.refs = { "refs/heads/main": null };
    this.HEAD = "refs/heads/main";
  }

  put(type, payload) {
    const id = fakeHash(type, JSON.stringify(payload));
    if (!this.objects.has(id)) this.objects.set(id, { type, payload });
    return id;
  }

  writeFile(path, text) {
    this.working[path] = text;
  }

  add(path) {
    const blobId = this.put("blob", this.working[path]);
    this.index[path] = blobId;
  }

  readCommitTree(commitId) {
    if (!commitId) return {};
    const commit = this.objects.get(commitId).payload;
    const tree = this.objects.get(commit.treeId).payload;
    const files = {};
    for (const entry of tree) {
      files[entry.path] = entry.id;
    }
    return files;
  }

  commit(message) {
    const entries = Object.keys(this.index).sort().map(path => ({
      path, type: "blob", id: this.index[path]
    }));
    const treeId = this.put("tree", entries);
    const parent = this.refs[this.HEAD];
    const commitId = this.put("commit", { treeId, parent, message });
    this.refs[this.HEAD] = commitId;
    return commitId;
  }

  branch(name) {
    this.refs[`refs/heads/${name}`] = this.refs[this.HEAD];
  }

  checkout(name) {
    this.HEAD = `refs/heads/${name}`;
    const files = this.readCommitTree(this.refs[this.HEAD]);
    this.index = { ...files };
    this.working = {};
    for (const [path, blobId] of Object.entries(files)) {
      this.working[path] = this.objects.get(blobId).payload;
    }
  }

  log() {
    const lines = [];
    let at = this.refs[this.HEAD];
    while (at) {
      const commit = this.objects.get(at).payload;
      lines.push(`${at} ${commit.message}`);
      at = commit.parent;
    }
    return lines;
  }
}

const repo = new MiniGit();
repo.writeFile("README.md", "hello\n");
repo.add("README.md");
const c1 = repo.commit("initial");
repo.branch("feature");

repo.checkout("feature");
repo.writeFile("README.md", "hello feature\n");
repo.add("README.md");
const c2 = repo.commit("feature edit");

repo.checkout("main");
repo.writeFile("notes.txt", "todo\n");
repo.add("notes.txt");
const c3 = repo.commit("main adds notes");

console.log("main tip:", repo.refs["refs/heads/main"]);
console.log("feature tip:", repo.refs["refs/heads/feature"]);
console.log("feature log:");
repo.checkout("feature");
repo.log().forEach(line => console.log(" ", line));

The model: snapshots, not mutable folder copies

Blobs, trees, and commits

What This Commit Does

Why branches are cheap

Refs

The index: why git add exists

Real Git goes further: packfiles, delta compression, reachability

Implementation: a tiny Git-like repository

The index: why `git add` exists