::pycorePython as the document's native language; shared context; sandboxing91::tablecoreStructured data as text; the spreadsheet dissolved; divergence protocol113There is a question nobody asks about software, because the answer has always seemed too obvious to bother with. The question is: who is the application for?
The obvious answer is: for you, the user. The application exists to serve your needs. The calendar is for tracking your time. The email client is for managing your correspondence. The note-taking app is for capturing your thoughts. The applications are tools, and you are the person who wields them.
This book argues that the obvious answer is wrong. Or rather — that it was once approximately right, and has become, over forty years of accumulated design decisions, profoundly and systematically wrong. The application is no longer primarily for you. It is for the company that built it. Your needs are accommodated insofar as accommodation keeps you inside the application. Your data is stored insofar as storage creates dependency. Your experience is designed insofar as design increases retention. You are not the customer of the application. In the oldest and most precise sense of the word, you are its captive.
This is not a conspiracy. Nobody sat in a room and decided to trap users. It is the emergent consequence of a specific technical architecture — the application silo — combined with a specific economic model — the software subscription — repeated across every category of personal software until the pattern became invisible through ubiquity. We do not notice the cage because we have never seen the outside of it.
In the beginning — and here "the beginning" means roughly 1969, in the computing laboratories of Bell Labs and MIT — there were no applications. There was text. Programs read text and wrote text. The output of one program could be the input to another, because they all spoke the same language. A user who wanted to count the words in a document, sort them alphabetically, and find the ones that appeared more than ten times could do so by connecting three small programs with two vertical bars: wc | sort | grep. No application needed. No vendor required. The data was text. The operations were text transformations. The user owned both.
This architecture had a name — the Unix philosophy — and it had a beauty that its practitioners recognised immediately. The beauty was structural. Any piece of data could flow into any operation. The system was composable: you could combine its parts freely, because the parts all spoke the same language.
Then came the graphical interface, and everything changed — for better and for worse simultaneously. The graphical interface was genuinely liberating. It made computers accessible to people who had no desire to learn a command language. But it carried a hidden cost, paid slowly over decades, that has now come due. The hidden cost was composability. Graphical applications do not speak a common language. An email client and a calendar application can share data only if someone has written a specific integration between them — a bridge that had to be designed, built, maintained, and constantly repaired. The bridge is not free. It is not guaranteed. And there are never enough of them.
The result is the world we inhabit: a landscape of applications that are individually capable and collectively stupid. None of them is intelligent about the relationships between things — the email that references the meeting that produced the note that generated the task — because intelligence about relationships requires a shared substrate, and the application architecture has no shared substrate. The shared substrate is you. You are the integration layer.
The argument of this book is that there is a better substrate, and that it has been available since before the graphical interface was invented. The substrate is text. Not text in the diminished sense — not the pale imitation of a word processor, not the blinking cursor in a search box. Text in the full sense: structured, composable, human-readable, machine-processable, version-controllable, greppable, archivable text.
What this book proposes is an architecture in which text is the ground truth and applications are renderers. Your data lives in a document. The document is a text file. Applications are summoned into the document when you need them, render their output inline, and write their state back to the text when something changes. The document is always the source of truth. And when you want to compute — to total a column of numbers, to project a budget, to analyse a dataset — you write code (this book uses Python, but it could be Lua, Scheme or something else), directly in the document, and the result appears inline. No spreadsheet application. No proprietary formula language. The computation lives where the reasoning lives: in the document, in prose, in context.
"The future is already here — it's just not evenly distributed."
— William Gibson, The Economist, 2003
The ideas in this book are not new. Ted Nelson described transclusion in 1963. Doug Engelbart demonstrated collaborative hypertext in 1968. The Unix philosophy was articulated in 1978. Emacs org-mode proved in 2003 that a single text file could serve as calendar, task manager, spreadsheet, and programming environment simultaneously. Literate programming — the practice of writing code and prose together — was proposed by Donald Knuth in 1984. The future that these thinkers described has been here for decades. This book is an attempt to make it available to everyone — not by requiring them to learn Emacs, but by designing an architecture that delivers the benefits of text-native computing through an interface worthy of 2026.
A note on what this book is not. It is not a critique of the people who built the applications I am arguing against. The engineers and designers who created Gmail and Notion and Apple Calendar are not villains. They were working within constraints that made the application silo the natural unit of software. I am not indicting them. I am indicting the constraints. It is not a proposal for destroying existing software. It is not nostalgic. And it is not a technical manual, though there is technical content — enough that an implementation team could begin from it.
The title of this book is The Document is the Computer. It says that the primary unit of personal computing is not the application, not the service, not the platform, but the document — the file, the text, the thing you author and own and can hold in your hand as a string of bytes on your own storage. The document is where you live, and you should not have to leave home every time you need to compute.
That is the argument. The rest of the book makes it in detail.
Imagine opening your computer in the morning to a single document. Not an inbox, not a dashboard, not a home screen scattered with application icons. A document — a long, scrollable page that is yours, that you have been writing into and that has been written into on your behalf, that contains everything relevant to your day in the order you have chosen to arrange it.
Near the top, perhaps, a few sentences you wrote last night about what you intended to accomplish. Below that, without any border or transition, a live view of your calendar: today's meetings rendered inline, inside the prose, as naturally as a table appears in a newspaper article. Below that, the three emails that arrived overnight and that your document has decided are worth surfacing here. Below those emails, a block of Python you wrote last week that recalculates your monthly budget from first principles every time the document opens, and whose output — a small table, a single summary line — sits quietly beneath the code, updated, current, requiring nothing from you.
You add a sentence. You check a task. You accept a meeting invitation by clicking a button that sits inside a paragraph. The acceptance is recorded — in the calendar backend, yes, but also in the document itself, as a line of text appended to the prose: accepted team sync, 14:00, 24 March. The document knows what you did. It will know forever, because it is a text file, and text files do not forget.
This is not a fantasy. Every component of this description exists, in some form, right now. What does not exist is the architecture that assembles them coherently. This book describes that architecture in enough detail that it could be built, and argues that it should be.
The central concept is simple enough to state in a sentence: the document is the operating system, and applications are renderers.
In the world this book describes, you do not open an email application. You write ::email[inbox]{filter=unread} in your document, and the document's rendering layer summons a mail renderer, which displays your unread email inline. You do not open a spreadsheet. You write a ::py block containing the computation you need, and the result appears directly below the code. You do not open a calendar application. You write ::cal[today]{view=agenda} and your day appears, in context, between the paragraphs that motivated you to look at it.
This fragment — prose, computation, table, task — is a single document. It is also a single text file. The reasoning and the computation and the action are all in the same place, in the order they happened, in the words that make them meaningful. No application switch. No copy and paste. No context lost in transit.
The architecture has four primitives. Everything in this book is built from them.
The embed directive — ::app[id]{params} — is the universal syntax for summoning any app renderer into the document. Email, calendar, contacts, chat, browser, files, terminal: all invoked by the same grammar, all rendering inline, all writing state back to the same text layer.
The computation block — ::py — evaluates Python in a sandboxed interpreter, captures output, and renders it inline. Blocks share a document-scoped namespace. The document is the notebook. There is no separate kernel, no separate file, no separate application.
The table directive — ::table — is a text-serialisable structured data block with a live renderer. It can be authored by hand, generated by a ::py block, or edited directly in the rendered view — with the text always updated to match. The spreadsheet is not replaced. It is dissolved.
The capability registry — invisible to the user, foundational to the system — maps directive types to renderer implementations, enforces sandboxing, and handles the bidirectional sync protocol that keeps rendered state and text state in agreement.
A word about Python. I chose it not because it is the best language but because it is the most legible for people who are not primarily programmers. A product manager reading a colleague's document should be able to understand a ::py block without knowing what a monad is. total = sum(values) is not a program. It is a sentence. The sandbox is strict: a ::py block cannot read from disk, make network requests, or import arbitrary libraries. It can compute. It can produce output. That is all. Computation without side effects is computation you can trust.
The book is in four parts.
Each chapter in Parts II and III contains at least one interactive figure — a live prototype of the concept being described, embedded in the text in exactly the way the architecture proposes all embeds should work. The book is, in this sense, a demonstration of itself.
I hope that someone reads it and starts building. I hope that someone is you.
On the morning of 9 December 1968, Douglas Engelbart walked onto a stage in San Francisco and changed the way the world understood what a computer could be. The audience of roughly a thousand computer professionals watched as he demonstrated, for the first time in public, a system that supported real-time collaborative editing, hypertext linking, video conferencing, and a small handheld device he called a mouse. He called the session "A Research Center for Augmenting Human Intellect." History would call it the Mother of All Demos.
What is remarkable about the demo, watching it today, is not how futuristic it looks. It is how familiar, and how strange simultaneously. Familiar because nearly everything Engelbart showed has been incorporated into the computers we use every day. Strange because the spirit of what he showed — the vision of the computer as a tool for augmenting human thought, for connecting ideas to ideas, for making reasoning visible and shareable — was never fully incorporated. We took the mouse. We left behind the philosophy.
Engelbart's system, NLS, treated the document as the primary unit of computing. Everything lived in documents. Documents contained other documents by reference. Computation was not separate from writing — it was woven into it. We did not build that system. We built something different: a world of windows.
The graphical interface as most people know it was born at Xerox PARC in the early 1970s, and popularised by Apple with the Macintosh in 1984. Its designers borrowed deliberately from the visual language of offices: a desktop, folders, files, a trash can. The metaphor was chosen for accessibility. A person who had never touched a computer could look at the screen and find something recognisable.
The metaphor worked. Personal computing went from a hobbyist pursuit to a mass-market phenomenon in the span of a decade. By the early 1990s, the visual vocabulary of WIMP — Windows, Icons, Menus, Pointer — was universal. By the 2000s, it was invisible: not a metaphor users were consciously applying, but the unchallenged reality of what a computer looked like and how it behaved.
Invisible metaphors are the most consequential kind. When you can see a metaphor, you can interrogate it. When the metaphor becomes the world — when nobody alive remembers computing before windows and icons — you lose the ability to see it clearly enough to question it.
The desktop metaphor rests on a physical object — a desk — that the vast majority of computer users no longer have in any meaningful sense. The desk of 1984 was a real working surface covered in real papers, real folders, a real telephone. The computer sat on one corner of it. The desktop metaphor made the computer's screen look like the rest of the desk, so that the transition between physical and digital would be cognitively smooth. The desk of 2026 is often the computer itself. There are no physical papers. The metaphorical desktop on the screen refers to physical objects that have not existed in most offices for twenty years. It is a metaphor for a world that has vanished, running as the primary interface of the world that replaced it.
The word "desktop" appears in the name of a feature introduced by Microsoft in Windows 95 called "Active Desktop," which allowed the desktop background to be a live HTML page. This was, in retrospect, a brief and failed attempt to make the desktop metaphor computational. It was discontinued in Windows XP. The impulse — to make the desktop surface dynamic and data-bearing — was correct. The implementation was not.
The deeper problem with WIMP is not the desktop metaphor itself but the architectural principle it encodes: that information lives inside containers, which live inside other containers, in a strict hierarchy. This is so natural-feeling, after forty years of conditioning, that it is hard to see it as a choice. But it is a choice, and it has costs.
The costs can be made visible with a simple experiment. Choose any task you performed on a computer today. Now count the number of container boundaries you crossed to complete it. Suppose the task was reading an email that contained a link to a shared document, and leaving a comment on that document. Starting from a locked screen: you unlocked the device (boundary one: operating system), opened the email client (boundary two: application), found the email (boundary three: inbox folder), clicked the link (boundary four: browser opened), navigated to the document (boundary five: web application), found the relevant section (boundary six: document structure), and left a comment (boundary seven: comment thread). Seven container crossings for a task that should feel like three: read, navigate, respond.
Each container crossing requires a context switch. The email client does not know about the document. The document does not know about the email. The browser does not remember why it opened. The containers are individually functional and collectively amnesiac.
Notice the third bar. The computer holds none of the context — zero, at every depth. The entire burden of remembering why you opened the browser, what was in the email, which section of the document you were looking for, is carried by you. This is not a flaw in any individual application. It is a structural property of the container hierarchy. Containers do not communicate context upward or downward. They simply contain.
It is worth being precise about what the operating system's window manager actually knows, because the gap between what it knows and what you need it to know is the measure of the problem.
The window manager knows the position and size of each application window on screen. It knows which window is in focus. It knows how to draw window borders, title bars, and scroll bars. This is, essentially, the complete list. It does not know what application is in each window, beyond a name and an icon. It does not know what content the application is displaying. It does not know the relationship between the content in one window and the content in another. It does not know anything that would allow it to help you — to surface connections, to suggest relevant information, to remember context across application boundaries.
The window manager is a sophisticated picture frame. It knows how to display rectangles. It does not know what is in them.
An icon is a picture that represents a program. The picture is meant to communicate what the program does. The icon is a promise: this picture stands for this capability, and the capability lives here, inside this container, and nowhere else.
The promise is false. Or rather — it was true in 1984, when a computer had perhaps a dozen applications and each application did one clearly bounded thing. Consider what the envelope icon on your dock actually promises today. It promises email. But email is not a bounded capability. Email is communication, which involves people (contacts), time (calendar), tasks (todo items), documents (files), and memory (search and history). An email about a project involves all of these simultaneously. The envelope icon promises access to a container. What the user needs is access to a relationship. The icon cannot represent a relationship. It can only represent a container. And so the user must manually reconstruct the relationship — four icons, four containers, one relationship held together by memory and effort rather than by the system.
The iOS home screen of 2007 was perhaps the purest expression of icon-as-promise thinking. A grid of identical rounded squares, each standing for a container, the user's entire computing life organised as a set of destinations to be navigated to and departed from. Sixteen years later, the home screen of iOS 17 is a grid of identical rounded squares. The paradigm has not moved.
Return, for a moment, to that stage in San Francisco in 1968. Engelbart's question — the question that animated the entire demo — was not "how do we make computers easier to use?" It was a harder and more important question: how do we use computers to augment human intellect?
Augmentation is a specific claim. It means that the human-computer system, taken together, should be capable of things that neither the human nor the computer could accomplish alone. The WIMP paradigm, despite its many genuine achievements, has not produced augmentation in this sense. It has produced acceleration — computers that let us do more of what we were already doing, faster. We write more emails, not better ones. We attend more meetings, not more productive ones. The speed is real. The augmentation is largely absent.
Augmentation requires that the computer hold context. It requires exactly what the container hierarchy prevents: a shared substrate in which all of this information lives together, accessible to every operation, composable with everything.
"The digital computer is... a means for the extension of the human intellect, as the printed book was, or the pen, or language itself."
— J.C.R. Licklider, Man-Computer Symbiosis, 1960
There is a response to everything in this chapter that is worth addressing directly: but we have multi-window setups for exactly this reason. If the problem is that email and calendar and notes are in separate containers, arrange them side by side. Problem solved.
It is not solved. It is deferred. Arranging windows side by side addresses the visual problem of proximity, not the architectural problem of isolation. The email window placed next to the calendar window does not share data with the calendar window. Moving information between them still requires manual intervention — copy, paste, type, click. The windows are next to each other. They do not know about each other. The cognitive burden of maintaining the connection between them has not been reduced.
The window is not the answer because the window is the problem. The window is the container. Adding more containers, or arranging them differently, does not dissolve the container hierarchy. It tiles it. The answer is not a better arrangement of windows. It is an architecture in which the container boundary does not exist — in which the email and the calendar event and the note and the task are all aspects of the same underlying thing, all readable and writable through the same interface, all composable with each other because they live in the same substrate. That substrate is the document.
A note on what this chapter has not argued: it has not argued that the GUI will or should disappear. The visual interface is not the problem. The container hierarchy is the problem. A text-native document architecture can be visually rich — richer, in some respects, than the current paradigm. The visual surface of computing is not what we are replacing. We are replacing what lies beneath it.
There is a specific kind of frustration that every person who uses a computer regularly has felt but rarely named. You are in a meeting. Someone references a document. You open your email to find the link. You find the email, click the link, the document opens in a browser. The meeting is now about something in that document. You want to make a note. You open your notes application. The note has no connection to the email, to the meeting, to the document, to the calendar event that convened you all. It is a note that exists in isolation, in a silo, knowing nothing about the context that produced it.
This is the silo problem. It is not a bug. It is not an oversight. It is the direct and inevitable consequence of the application architecture described in Chapter 1, combined with a set of economic and technical incentives that have made the problem progressively worse over the decades in which it should, by rights, have been getting better.
Every application stores its data in a format. In practice, the format is a fence. It determines who can read the data, who can write it, and who must ask permission to do either. Some formats are open. HTML is a format — anyone can write it, open it in any browser, read it in any text editor. Most application formats are not open in this sense. A .pages file requires Pages. A .sketch design file is opaque to every application except the one that created it. The format encodes a power relationship: the application that writes the file is the canonical reader of the file. All other applications are guests, and the terms of their access can be revoked at any time.
In 2019, Google announced that it would stop supporting third-party applications in Gmail through the IMAP protocol's XOAUTH2 extension for "less secure apps." The effect was that many email clients — applications that users had been using for years to access their own email — simply stopped working. The users' email had not changed. A policy decision by the company that ran the server revoked the access. The data was in a silo. The silo's owner changed the lock.
The modern response to the format problem is the API. In practice, the API is a more sophisticated version of the same problem. The API is controlled by the application that exposes it. The application decides what can be requested, at what rate, by whom, and under what terms. The API can be restricted, rate-limited, priced, versioned, and shut down. It is not a window between silos. It is a controlled aperture — a hole in the wall whose size and position are determined entirely by the party with the most to lose from genuine openness.
The history of consumer software is littered with APIs that were opened generously during a growth phase and closed abruptly when the platform reached dominance. Twitter opened its API in 2006 and built an ecosystem of third-party applications on top of it. In 2023, it priced that API at $42,000 per month, destroying the ecosystem overnight. Facebook did it in 2015. Netflix did it in 2014. The pattern is consistent: open during growth, close at dominance. The API is not a bridge between silos. It is a drawbridge, and the people inside the castle control when it is raised.
The shift from one-time software purchases to subscription pricing created a new and more durable form of the silo. Under the purchase model, the user bought a copy of an application and received a file. The file was theirs. Under the subscription model, the data lives on the vendor's servers. The user does not receive a file. They receive access to a view of their data, for as long as they continue paying. If they stop paying, access is suspended. If the vendor shuts down, access ends permanently. The subscription model transfers custody of the user's data from the user to the vendor.
Evernote, for many years one of the most popular personal note-taking applications, changed its free tier limits multiple times between 2016 and 2023 — restricting the number of devices, reducing offline access, and limiting the size of uploads. Users who had stored years of notes in the application faced a choice: pay an increasing subscription fee or accept degraded access to their own notes. The notes had not changed. The bargain had.
Consider what should be the simplest possible inter-application transaction: receiving a meeting invitation by email and accepting it so that it appears on your calendar. The iCalendar standard is twenty-six years old. Calendar invites from Outlook do not reliably render in Gmail. Google Calendar events do not always round-trip correctly through Apple Calendar. Timezone handling is a persistent source of errors across all combinations. Reply handling — the mechanism by which the organiser learns that you have accepted — breaks silently in a meaningful fraction of cases.
Why? Because the standard defines a format, but not a rendering model, not a conflict resolution protocol, not a timezone normalisation procedure, not a reply handling mechanism. Each application implements the gaps differently. The gaps are where the silos live. And this is the easy case — two applications with a documented standard, a clear user intent, and decades of implementation experience. Now consider the cases with no standard: the email about a project that should update a task list, the chat message that should create a calendar event. For these, there is no standard. There is only the user, copy-pasting between silos.
Given the silo problem, a natural market response would be integration — products that connect the silos. This market exists. Zapier, Make, and dozens of similar services have built businesses on the premise that people need their applications to talk to each other. These services are valuable. They are also a damning indictment of the underlying architecture. A world in which a thriving industry exists solely to pass data between applications that should already share a substrate is a world that has systematically failed to solve a foundational problem.
Integration services fail in predictable ways. They fail when APIs change — and APIs always change. They fail when vendors implement the same standard differently. They fail silently, leaving users with the impression that data has been transferred when it has not. And they introduce a new party — the integration service itself — into the custody chain of the user's data, adding a third silo to the problem of two. More fundamentally, integration services treat the symptom rather than the disease.
Here are five tasks that a person performing knowledge work routinely needs to perform, and a count of the silo boundaries each one crosses in a typical modern setup. Finding the context for a meeting you are about to join — four applications, four silos, zero of them shared; three to five minutes per meeting, performed manually, every time. Creating a task from an email — two silos, one manual transfer, the task now having no persistent connection to the email that generated it. Writing a weekly status update — five silos synthesised into a sixth that immediately becomes a frozen snapshot, stale by Monday. Onboarding a new colleague — six silos, one hour, always incomplete. Finding something you know you have — four separate searches across four separate indexes, may still fail.
The silo problem has resisted solution for forty years not because nobody has noticed it, and not because the technical problems of integration are genuinely unsolvable. It has resisted solution because the solutions attempted have all worked within the existing architecture rather than replacing it. Better APIs do not dissolve silos — they create better-maintained walls with better-maintained drawbridges. Integration services do not dissolve silos. All-in-one workspaces do not dissolve silos — they create larger silos with more internal connections. The silo is not a local problem that can be fixed locally. It is a systemic consequence of the architectural decision to make the application the primary unit of data ownership.
The structural solution requires a different primary unit. Not the application, but the substrate — a shared layer that all applications can read from and write to, that no single vendor owns, that persists independently of any particular product's continued existence, and that the user controls absolutely. That substrate is text. Specifically: a plain text document in which data from any application can live as a structured directive, rendered by a registered handler, and written back as text when something changes.
In 1971, Ken Thompson sat down and wrote a text editor. The editor was called ed. It accepted commands as text, produced output as text, stored its files as text. It had no graphical interface, no mouse support, no menus, no icons. It was, by any contemporary standard of interface design, a tool of extraordinary austerity.
The files that ed created are still readable today. Not readable in the sense that they can be opened with compatibility software, or converted through an intermediate format. Readable in the literal sense: you can open them in any text editor on any operating system on any device manufactured in the past forty years, and the text will be there, in the same order, in the same encoding, as legible as the day it was written. Try this with a .wps file from Microsoft Works 3.0. Try it with an .lwp file from Lotus Word Pro. These formats existed. They held real documents. They are now effectively inaccessible — not because the information was lost, but because the formats that encoded it became orphans when the applications that wrote them were discontinued. The containers decayed. The text inside them was never the problem.
Plain text is the single format that consistently survives digital preservation challenges. The reason is structural: plain text has no rendering layer. A .docx file contains not just the text of a document but instructions for how to render it — font specifications, layout rules, embedded objects, tracked changes, macro definitions. The rendering instructions require a renderer. When the renderer is discontinued, the instructions become uninterpretable, and the document, though technically present, is effectively lost. A plain text file contains nothing but characters and the order in which they appear. There are no rendering instructions. There is nothing to interpret beyond the characters themselves. The file is complete. It needs nothing.
The Voyager spacecraft, launched in 1977, carries a golden record — a physical disc containing sounds and images from Earth, encoded in a format chosen specifically for durability: analogue grooves that any sufficiently advanced civilisation could reconstruct from the physics of the medium alone, with no knowledge of any particular encoding standard. The designers of the Voyager record understood something that software vendors prefer not to discuss: the most durable format is the one that requires the fewest assumptions about what the reader already knows.
Of text's properties, composability is the one that the graphical interface most completely abandoned, and the one whose loss has been most costly. Composability, in the Unix sense, means that any program can read from any source and write to any destination, because all programs speak the same language: a stream of text. This is enforced by the architecture. A Unix program that reads from standard input receives text. A program that writes to standard output produces text. The pipe operator — the vertical bar — connects the output of one program to the input of another. The programs do not need to know about each other. They need only to speak text. A program written in 1975 can pipe its output to a program written in 2024. Composition is free, because the common language was chosen once and never changed. The graphical interface broke this. Graphical applications write to windows. Windows are not pipeable.
In 1978, Doug McIlroy, the inventor of the Unix pipe, summarised the Unix philosophy in three rules: write programs that do one thing and do it well; write programs that work together; write programs that handle text streams, because that is a universal interface. The third rule is the foundational one. Text streams are a universal interface because every program, regardless of its purpose, domain, or implementation language, can produce and consume them. The universality is not a feature of any particular program. It is a property of the interface itself — and a property that holds precisely because the interface is not owned by any particular vendor.
"Write programs that do one thing and do it well. Write programs that work together. Write programs that handle text streams, because that is a universal interface."
— Doug McIlroy, A Quarter Century of Unix, 1994
In 1984, Donald Knuth published a paper introducing a practice he called literate programming. The idea was simple and radical: that a program should be written primarily for a human reader, with the machine as a secondary audience. The programmer would write a document — prose and code interleaved — in which the explanation of the algorithm and the implementation of the algorithm were the same artifact.
Knuth was describing, in 1984, something very close to the ::py block in the architecture of this book. The ::py block is a literate programming construct: it is code embedded in prose, where the prose explains the intent and the code implements it, and both are part of the same document. Literate programming never achieved mainstream adoption. The tooling was complex, the discipline required was high, and the payoff was diffuse. What changed in the years since is the existence of Jupyter notebooks — which made the insight accessible to data scientists. The architecture of this book takes the Jupyter notebook's core insight and removes the application boundary entirely.
If you want to see the architecture of this book already built, partially, by one person, in a tool that has been in continuous use for more than twenty years, open Emacs and type M-x org-mode. A .org file can be a task manager, a calendar, a note-taking system, a spreadsheet, a programming notebook, and a publishing system — all of this from a text file, all composable with every Unix tool, all version-controllable with git. Org-mode is proof — running, deployed, actively used for more than twenty years — that the model works. One text file. Many rendering modes. Everything composable. No silos.
Text possesses five properties that proprietary application formats do not. Every design decision in the chapters that follow should be evaluated against this checklist. If a proposed feature requires storing state in a database rather than in the text file, it compromises durability. If it requires a specific application to read the file's contents, it compromises portability. If it makes the file's text opaque to general tools, it compromises composability. If it stores state that cannot be diffed, it compromises diffability. If it makes the raw file illegible to a careful human reader, it compromises legibility.
diff) that any version control system can produce and interpret. The history of a text file is a text file.There is a fifth reason, not available to the architects of Unix or the designers of org-mode, why text is the right substrate for personal computing in 2026: language models. A language model is a system that reads text and produces text. It is, in this sense, a Unix program — one that participates naturally in text-based workflows, that can read any text file without special integration and produce output that any text-processing tool can consume.
This has a direct implication for personal computing. A computing environment built on text is natively legible to language models. An AI assistant in a text-native document can read the entire document — the prose, the directives, the Python blocks, the table data, the completed tasks — without any special interface, without any API integration, without any export step. It reads the same file you read. It knows what you know, in the same format you know it. Contrast this with the current state of AI assistants embedded in applications: the AI sees what the integration layer permits, in the format the API returns. The AI is inside the silo, looking out through the same controlled aperture as everything else. A text-native document has no controlled aperture. The AI reads the file. The file is the context. The context is complete.
Part I has made its argument. The WIMP paradigm created a cage. The silo problem is the cage at scale, reinforced by economic and technical forces that have made it self-perpetuating. Text, by contrast, has properties that make it the right substrate for a different architecture, and a tradition of practice going back fifty years that has proven the model works. Part II builds the architecture.
An operating system, in the formal sense, is software that manages hardware resources and provides services to application programs. We use the phrase "the document as OS" in a more functional sense: the document as the layer that the user actually lives in. The layer that organises access to capabilities — email, calendar, computation, files — without requiring the user to navigate to a separate application for each one. The layer that holds context across capabilities, so that the email and the calendar event and the task and the note are all visible in the same place, at the same time, in the order that makes sense to the person looking at them. The document OS is not invisible. It is the primary thing the user sees. It is the page they open in the morning. It is where their thinking lives.
The primary interface of the text-native computing environment is a scrollable document — not a fixed layout, not a grid of panels, not a dashboard, but a flowing page of text that grows as the day progresses and that can be reorganised at any time by the user without breaking anything. The daily document is not a place to write things down about what is happening elsewhere. It is the place where things happen. The email is not summarised in the daily document. It is there, rendered inline, reply-able in context. The calendar is not referenced in the daily document. It is there, showing the agenda, accepting invites in place.
This fragment — prose, calendar, email, computation, tasks — is a single text file. The user did not switch applications to compose it. The Python ran because the document opened. The tasks are connected to each other through the blocked-by parameter: send-q3-sara cannot be completed until call-finance is checked. This dependency is in the text. It is not in a project management database.
The daily document has structure, but the structure is not imposed. There are no mandatory sections, no required fields, no template that must be filled in. The structure emerges from the user's practice and is recorded in the text itself. A user who finds it useful to begin every document with a calendar view and a task list will write those directives at the top of each day's document. A user who prefers to start with prose reflection and pull in email only when needed will write it that way. The document accommodates both, because the directives are text and text can go anywhere in a document. This is structurally different from most productivity applications, which impose a data model on the user. The document has no predefined categories. A thing is whatever the prose around it says it is.
A critical design principle of the document OS: the renderer is replaceable. The document — the text file — is permanent. The renderer — the application that parses the directives and displays the widgets — is a software component that can be swapped, upgraded, or replaced without affecting the document's content. This is not how applications work today. In most applications, the renderer and the data model are inseparable. In the document OS, the renderer is explicitly downstream of the document. The document is the source of truth. Any renderer that understands the directive grammar can render any document. Your data stays in your text file. The renderer is just the lens through which you see it.
The closest existing analogy is CSS applied to HTML. An HTML file is a document with semantic structure; a CSS file determines how that structure is displayed. Changing the CSS does not change the HTML. The document OS applies the same principle to personal computing: the text file is the HTML, the directive grammar is the semantic layer, and the renderer is the CSS — separable from the content, swappable without data loss.
The document OS has a political consequence that is worth stating plainly. When your data lives in a text file on your disk, it is yours — not in the contractual sense of "you retain ownership of your content" as defined in section 14(b) of a terms of service agreement. Yours in the physical sense: you have the bytes. You can read them. You can move them. You can delete them. Nobody can revoke your access, change the pricing model, discontinue the product, or sell your data to an advertiser, because nobody else has the data. The renderer may be a service. The calendar backend may be hosted. But the document itself — the text file that holds your thinking, your directives, your computation, your tasks — is yours unconditionally. If every service you use shuts down tomorrow, your document remains. It is readable in any text editor. It is yours.
Niklaus Wirth, designing the Oberon system in 1988, articulated a principle that this architecture inherits but has not yet stated with sufficient precision: the state of the system should be practically determined by what is visible to the user. Wirth observed that hidden state forces users to maintain a mental model of something the computer is concealing. Every hidden field, every in-memory application state, every formula invisible behind a cell value, every mode that changes what a button does — each is a tax on human working memory, paid continuously, invisibly, in full.
The Four Laws of this architecture are consequences of this deeper principle. Text is always ground truth because hidden databases are hidden state. All mutations are text mutations because silent in-memory changes are hidden state. Embeds are sandboxed and declared because undeclared capabilities are hidden state. Graceful degradation is mandatory because renderer-only content is hidden state — content that exists but cannot be read.
The practical test is Wirth's: can the user determine the complete state of the system by looking at what is visible? In the daily document, the answer must be yes. The computation is visible as code. Its output is visible inline. The tasks are visible as text. The accepted calendar invite is visible as a log line. The diverged table cell is visible as an amber-highlighted field. Nothing is concealed. Nothing requires the user to remember a prior action, maintain an awareness of a mode, or infer a state from an effect. The document is a complete and readable record of everything that has happened in it.
Wirth's specific formulation: "The absence of hidden states. The state of the system is practically determined by what is visible to the user. This makes it unnecessary to remember a long history of previously activated commands, started programs, entered modes, etc. Modes are in our view the hallmark of user-unfriendly systems." — Wirth & Gutknecht, Project Oberon, 1992. Oberon achieved this for a 1988 single-user workstation. The text-native document achieves it for personal computing at large.
Every architecture needs a syntax. The text-native document architecture's syntax is the embed directive: a compact, human-readable expression that tells the document renderer to summon a specific application capability at a specific point in the text. The directive is text. It looks like text. A person reading the raw document without a renderer can parse it mentally in seconds. A renderer that encounters it knows exactly what to do. The directive is the single point of contact between the document and the application ecosystem — the seam where prose becomes computation, where text becomes a calendar view, where a line of markup becomes an inbox. Getting the grammar right is the most important design decision in the entire architecture, because every other decision builds on top of it.
The embed directive follows a single grammar, with no exceptions:
The sigil :: was chosen because it does not appear in natural prose at the start of a line. A sentence will never begin with two colons in English or in any major natural language. This means the parser can unambiguously identify a directive: any line beginning with :: is a directive; all other lines are prose. No ambiguity. No escape sequences. No edge cases.
The identifier in square brackets is optional but strongly encouraged. It serves two purposes: it gives the directive a name that the document context can reference (a ::py block with id=q3-analysis makes its variables available to subsequent blocks), and it makes the raw text more legible to a human reader. ::cal[today] is self-describing even without documentation. The parameters in curly braces are key-value pairs that configure the renderer — key=value, space-separated, double-quoted for values containing spaces.
Every directive must be meaningful as raw text. This is a hard requirement, for two reasons. First, durability: if the document is to outlast the renderer — and it must, because that is what text-nativity means — then the directives must be interpretable without the renderer. ::cal[2026-03-23]{view=agenda} tells a human reader: this is a calendar embed, for 23 March 2026, in agenda view. The reader knows what was meant, even if they cannot see the rendered calendar. Second, interoperability: not every environment that reads a document will be a full renderer. Partial renderers should be able to operate on documents containing directives they do not handle, rendering what they can and displaying the others as text. The test for any directive syntax is: can a careful human reader, who has never seen this system before, understand what it means from the raw text alone?
A natural question arises: why not use HTML? The answer is legibility. HTML is not legible to a human reader without a renderer. A person who encounters <div class="cal-embed" data-view="agenda" data-date="2026-03-23"></div> in a text file needs to know HTML, they need to know the application's class naming conventions, and they need to know the data attribute schema to interpret it. This fails the graceful degradation test. ::cal[2026-03-23]{view=agenda} passes. Any literate person can read it. The directive syntax is also intentionally minimal — one construct, with two optional extensions. The parser is trivial to implement. The writer does not need to remember a markup language. The syntax is a notation, not a language.
The closest prior art for this syntax is the CommonMark Generic Directives Proposal, a draft specification by John MacFarlane and others that would add a ::directive[argument]{key=value} syntax to the CommonMark Markdown standard. The proposal has been discussed since 2017 and has not yet been formally adopted, primarily because the use cases it addresses — embedding application capabilities in prose — were not yet considered mainstream. The text-native document architecture makes those use cases central.
The ::py block is the most consequential addition to the document architecture — more consequential, even, than the embed directive — because it changes what a document fundamentally is. An embed directive makes a document a surface on which applications render. A Python block makes the document itself computational. The document does not just display results from elsewhere. It produces results, from code that lives inside it, in context, alongside the prose that motivated the computation. This is the literate programming insight, delivered at last to a mainstream audience. The code is in the document, the prose is in the document, the output is in the document. The reasoning and the result are inseparable, because they are literally in the same text file.
All ::py blocks in a document share a single Python namespace — the document context. A variable defined in the first block is available in every subsequent block. The document is, in this sense, a single Python program whose source code is interspersed with prose.
The context is rebuilt by evaluating blocks in document order, top to bottom, every time the document opens. This means the document's computation is always reproducible: given the same document text, the same context is always produced. There is no hidden kernel state, no out-of-order evaluation, no "restart and run all" button needed because the kernel fell out of sync. The document is the program. The program runs from the top. Every time.
March budget review. Software came in under — we cancelled two subscriptions. Hardware over again. Running the numbers:
The table below is generated from the Python block above — not a separate widget. Edit any cell to override; the block tracks divergence and offers reconciliation.
| Category | Budget | Actual | Delta | % used |
|---|
actual list, hit run ▶ — the table regenerates from the new values. Click ctx ▾ to inspect the live Python namespace. Then edit a table cell directly — the divergence badge appears and the reconcile bar offers to re-run or detach. Toggle to source view to see the exact text written to disk at every step.When a ::table block is generated by a ::py block, a connection exists between the two. The table's source= parameter records which block generated it. This connection creates three possible states:
The spreadsheet application has three jobs: storing data, computing over it, and displaying it. The text-native document splits these three jobs along cleaner lines. Data is stored as ::table text — human-readable, diffable, greppable, writable by any program. Computation is expressed as ::py blocks — readable as prose, editable in place, auditable by anyone reading the document. Display is handled by the renderer — sortable, editable, optionally chartable.
The deeper cost of the spreadsheet is that its data is trapped. A .xlsx file requires Excel or a compatible application to read. A ::table block in a document is text. grep, awk, python, git diff — every tool you already have can read it, transform it, version it, and pipe it somewhere else. The spreadsheet is a silo. The ::table directive is a first-class citizen of the text layer.
The embed directive grammar tells the document what to render. The capability registry tells the document how. The registry is the mapping between directive types and renderer implementations — the table that says "when you encounter ::cal, use the CalendarRenderer; when you encounter ::email, use the MailRenderer." It is invisible to the user and foundational to the system. Without it, the directive is syntax. With it, the directive is a live capability.
A renderer registers itself with the capability registry by declaring its manifest — a small text document that states what directive types it handles, what parameters it accepts, what capabilities it requires, and what it promises not to do. The manifest is verifiable: the renderer runtime enforces the capability declarations at execution time.
The registry resolves directive types to renderers in order of specificity: a renderer registered for cal:google takes precedence over one registered for cal. If no renderer is registered for a directive type, the directive is rendered as its raw text — the graceful degradation fallback.
Each renderer runs in an isolated execution context. It receives only the identifier and parameters declared in the directive. It cannot read the document text, cannot access other renderers' output, cannot read the user's filesystem, and cannot make network requests to origins not declared in its manifest. The isolation is enforced by the renderer runtime, not by trust in the renderer's code. This is the architectural guarantee that makes it safe to install third-party renderers. A renderer for a specialised calendar backend, written by an unknown developer, can be installed and used without fear that it will access the email embed rendered next to it, or exfiltrate the content of the Python blocks above it. The sandbox is the contract, and the contract is enforced structurally.
The registry also manages the bidirectional sync protocol. When a user clicks "Accept" on a calendar invite rendered by the CalendarRenderer, the renderer does not directly edit the document. It emits a sync event — a structured record of what changed — to the registry, which applies the change to the document text and notifies any other renderer whose displayed content depends on the changed data. This indirection is load-bearing. It ensures that the document text is always the authoritative source of state, that changes are atomic, and that the document's version history accurately reflects every user action, whether that action was a keystroke in prose or a click in a rendered widget.
There is a consequence of the capability registry design that deserves its own treatment, because it is the most powerful thing the architecture enables and the one least obvious from the structural description: when a ::py block evaluates, the registry injects every registered app's read interface into the Python namespace as a first-class object.
This means that a Python block does not query a database. It does not call an API. It does not import a library. It simply calls cal.events(date="today") or tasks.query(done=False, overdue=True) or email.query(unread=True, starred=True) — and the registry routes each call to the appropriate renderer's read interface, which returns plain Python objects. The document context becomes a unified query layer over all eleven apps simultaneously.
The capability model is carefully asymmetric. ::py blocks have read access to app namespaces — they can query, filter, join, and compute over any app's data. They do not have write access. Mutations — accepting a calendar invite, sending an email, completing a task — go through the sync protocol described in Chapter 9, not through the Python namespace. This separation is load-bearing: it means a ::py block you received in a document from a stranger can read your data to produce output, but cannot send emails on your behalf or modify your calendar. The sandbox constraint (no network, no filesystem) and the read-only namespace together define a safe computation model.
The practical consequence is that Python becomes what SQL is to relational databases — a universal query language — except that the "tables" being queried are your live calendar, your email, your tasks, your contacts, and your files, all at once, with no schema to learn and no joins to configure. The query is Python. The result is rendered inline in the document. The reasoning that produced the query lives in the prose around it.
::py block querying app namespaces injected by the capability registry. Toggle to code view to read the query, then hit run ▶.
Oberon's module system had an elegant property that the capability registry should inherit: a module is only loaded when it is actually needed. In Oberon, the system does not load a compiler when it boots. It loads the compiler the first time a command invokes it, and thereafter keeps it resident. This is not a performance optimisation — or not only that. It is a security and transparency principle: a module that is not loaded cannot have side effects, cannot consume resources, and cannot be exploited.
The capability registry should enforce the same rule. A renderer for ::cal is loaded only when a document contains at least one ::cal directive. A renderer for ::sh — the privileged shell executor — is loaded only when a document explicitly contains a ::sh block. A document with no shell directives has no shell renderer loaded, which means it has no shell access, which is a security property expressible without any additional mechanism: the absence of the directive is the absence of the capability.
This has a useful consequence for document metadata. Every document that declares its directives in a preamble — a short block at the top listing which renderer types it uses — becomes self-describing in terms of capabilities. A reader encountering the document for the first time can see, before rendering anything, that this document requires a calendar renderer and a Python sandbox but no shell access and no email renderer. The capability manifest is the document's permission declaration, readable by a human without any tooling.
The preamble is optional — a document without one is rendered with whatever renderers the registry has available for the directive types it encounters. But a document with one is auditable, portable, and safe to open in restricted environments. It is, in miniature, the same principle as a Unix process's file descriptor table: a complete declaration of what the process can touch, visible before the process runs.
Every system that claims to keep two representations of the same information in sync — a rendered widget and a text file, a local document and a remote backend — faces the synchronisation problem: what happens when both representations change at the same time? Which one wins? How do conflicts get resolved? How does the user know what happened? The text-native document architecture resolves the synchronisation problem by making it asymmetric. The text file is always the source of truth. The rendered widgets are always derived from the text. Changes flow in one direction for the authoritative state — from text to widget — and in one direction for user actions — from widget to text.
Every user action in a rendered widget produces exactly one of three mutation types: Append — new text is added to the document; sending a reply appends a block, accepting an invite appends a log line, completing a task appends a timestamp. Replace — existing text is modified in place; editing a table cell replaces the cell value, changing a task's due date replaces the parameter value, toggling a checkbox replaces [ ] with [x]. Delete — existing text is removed; dismissing a directive removes it, detaching a table from its source removes the parameter. Every mutation is expressed as a text edit. Every text edit is recorded in the document's version history. The document's history is a complete audit trail of every action the user took.
Had a call with Sara this morning about the product roadmap. Need to schedule a follow-up with the full team next week.
When a directive interacts with a remote backend — sending an email, accepting a calendar invite, posting a chat message — the mutation model extends to cover the remote state. The renderer sends the action to the backend and waits for confirmation. On success, it emits a sync event that writes the mutation to the document text. On failure, it emits an error that renders inline in the document. The key property is that the document text reflects reality. If an email was sent, the document contains a record of it. The document is not an interface to the backends. It is a journal of what has happened, with the backends as the execution layer.
This model has a direct analogy in event sourcing — an architectural pattern in software systems where the system's state is derived by replaying a sequence of events rather than by reading a current state snapshot. The text-native document is an event-sourced personal information system: the document text is the event log, the rendered widgets are derived views, and the "current state" is always computable from the log. The benefits of event sourcing — auditability, reproducibility, time travel — apply directly.
Conflicts arise when the same document is edited simultaneously from two locations — two devices, two users, two renderers. The conflict resolution strategy follows the git model: conflicts are surfaced as text in the document, using a standard conflict marker format, and resolved by the user editing the text to select one version or the other. This is not as elegant as automatic conflict resolution. It is more honest. Automatic conflict resolution requires rules about whose edit takes precedence, and those rules are inevitably wrong in some cases. Surfacing the conflict as text puts the resolution decision in the user's hands, expressed in the same medium as everything else. The conflict is text. The resolution is text. The history shows the conflict and its resolution. Nothing is hidden.
Part III redesigns each of the eleven applications in the ecosystem by applying the four primitives developed in Part II. Each application is approached the same way: what does this app fundamentally do, what data does it own, how does that data become a directive, and what does bidirectional sync mean for this particular kind of interaction? The result, in each case, is an application that no longer requires you to leave the document — that renders where you need it, interacts in context, and writes every action back to the text layer. The chapters below cover each app in turn, with a design principle and worked directive examples for each.
Email is the oldest and most durable of personal computing's communication tools. It has survived the rise and fall of instant messaging, social networking, enterprise chat, and every other communication platform that promised to replace it. It survives because it is federated, because it is asynchronous, and because it is text. These are precisely the properties that make it natural in the text-native document.
The ::email directive renders email where it is relevant — not in a separate application, not in a dedicated pane, but in the document, next to the prose that motivated you to check it. The meeting you are preparing for has an email thread; the thread appears above your preparation notes. The task you created from a message has the message embedded beside it.
Inbox view — ::email[inbox]{filter=unread} — renders a filtered list of messages. The filter uses the same query syntax as the underlying mail backend. Thread view — ::email[thread-id] — renders a full email thread inline; each message in the thread is a paragraph. Compose view — ::email[draft]{to=sara@example.com subject="Q3 report"} — renders a compose form; the draft is stored in the document text until sent.
The calendar application commits the original sin of the WIMP paradigm more completely than any other: it is a destination. You go to the calendar to see your time. You leave the calendar to use your time. The calendar knows nothing about why you accepted the meeting, what you were working on before it, or what you need to do after it. In the text-native document, time is always in context. The calendar view for today appears between the paragraph that motivated you to check it and the notes you took in response to what you found.
The ::cal directive renders in three modes switched by the view= parameter: agenda, week, and month. The user switches between views inline; the switch writes the new view= value to the directive text, so the document remembers the preferred view. Accepting an invite from a rendered calendar view writes two things to the document: the acceptance is sent to the calendar backend, and a one-line log entry is appended to the document text.
::task directive automatically — "prepare agenda for team sync" — anchored to the event's date and connected by ID. The task and the event know about each other through the document text. No separate integration required.A task, reduced to its essential nature, is a line of text with a state. It is either done or not done. Every other attribute — due date, priority, project, assignee, recurrence — is metadata attached to that line. The ::task directive makes this explicit. A task is a directive with an identifier, a state, and optional metadata. It renders as a checkbox. Checking the box replaces [ ] with [x] in the document text. The entire task state is visible in the text. There is no hidden database.
Every task in every document is findable with a single search: grep -r "::task\[" ~/documents/ | grep "done=false". The task list — across all documents, all projects, all contexts — is a filtered view of the document collection. The ::task directive with a query parameter does exactly this: ::task[today]{filter="due=today done=false"} renders all incomplete tasks due today, across all documents, as a live interactive list.
blocked-by= parameter that references another task by ID. The dependency is expressed in text. The renderer shows the blocked task as inactive until its blocker is completed. Project management without a project management database.The note-taking application has a fundamental design flaw that its users have learned to work around: it creates copies. When you want to include content from one note in another, you copy it. The copy immediately begins to diverge from the original. The text-native document resolves this with transclusion — the principle that any document can embed any section of any other document by reference, not by copy. The embedded content renders inline, reads from the source, and changes at the source are visible at the reference.
::note[document-name]{section=heading} renders the specified section of the named document inline. Edits made in a transcluded section propagate back to the source document. The transcluded content is not a snapshot. It is a live view. This is the behaviour Ted Nelson described in 1963 and the web, with its hyperlinks-not-transclusions model, never implemented.
Chapter 14 is where the architecture earns its most ambitious claim: that the spreadsheet, as an application category, can be dissolved into the document. Not replaced by a better spreadsheet. Dissolved — because the functions a spreadsheet serves (data storage, computation, display) are better served, separately, by the text layer, the Python block, and the table renderer respectively. A worksheet in the text-native document is a document section containing a ::py block that defines the data and computation, a ::table block that displays the results, and the prose that explains what the numbers mean and why they were computed.
=SUMIF(B2:B10,">0",C2:C10) is a spell. The Python equivalent is a sentence: total = sum(v for v in values if v > 0). The document makes the reasoning visible by making the computation legible.The ::file directive does not replace the file system. It makes it navigable from the document — which is where the context for finding a file usually lives. When you are writing about a project and want to reference a PDF report, the directive ::file[~/documents/q3-report.pdf]{preview=true} renders the PDF preview inline, in context, without requiring you to open a file manager. The file stays where it is. The reference to it lives where it is relevant. Dragging a file into the document writes a ::file directive at the cursor position. The file is not embedded; it is referenced. Moving the file updates the directive. Deleting the file marks the directive as broken.
In the text-native document, a person is a named entity in the text. Typing @sara.chen anywhere in the document creates a live reference to Sara's contact record. The ::contact directive renders her card inline when needed. Actions taken from the card — sending an email, scheduling a meeting — write back to the document as log entries, creating a running account of your relationship with this person in the natural flow of your writing. The document becomes the relationship log. Every @mention is a timestamped reference. A ::py block can query all mentions of a person across all documents: mentions = [l for l in doc_lines if "@sara.chen" in l]. The CRM is a view over the document collection.
Chat is the most context-destroying of all communication formats. A Slack message is a fragment — a sentence or two, deprived of the thread that preceded it, disconnected from the document it was discussing, unlinked from the task it created or the decision it recorded. The ::chat directive renders a slice of a channel or thread in the document — the last ten messages, or the messages since yesterday, or a specific thread. The slice is in context: below the paragraph that motivated you to check the channel, above the notes you take in response to what you find. A decision made in a chat thread can be promoted to the document with one action: the message text is written as a prose line below the ::chat embed, prefixed with the date and the participants. The decision is now in the document. The document is the record.
The ::web directive addresses the web's text-accessibility problem by defaulting to reader mode — the cleaned, text-extracted, typography-first rendering of a web page that modern browsers offer as an optional mode and that most users have never discovered. Reader mode strips the web page to its content: the prose, the headings, the images that are part of the content. The result is text. The result can be saved to the document, searched, and processed by Python blocks. Saving a web clip writes three things to the document: the URL as a ::web directive, the page title as a heading, and the first two sentences of the article as a summary. The clip is saved as text. The document holds the essence; the web holds the detail.
The ::sh directive is the terminal's representative in the document. It is distinct from the ::py block in one critical way: it is not sandboxed. A ::sh block runs in the user's shell, with the user's permissions, with access to the full file system and network. It can do anything a terminal command can do. This power requires explicit acknowledgement: ::sh blocks in documents received from others are never auto-executed; they are shown as text until the user explicitly chooses to run them. For a user who works at the command line, the text-native document becomes a runnable notebook: a record of every command run, every output received, every script executed, integrated with the prose that explains why each command was run and what the output meant.
::sh is the one directive that can have external side effects. This is by design. The shell is the privileged context for operations that reach beyond the document. The distinction between ::py (sandboxed computation) and ::sh (privileged execution) maps onto the distinction between reasoning and acting.Part IV does not describe features. It describes consequences. The five properties in these chapters are not things you have to implement, configure, or pay for. They are things that fall out of the architecture automatically, by virtue of the data being text. They are the second-order benefits of the design — the reasons why, even if the text-native document did not have a single interactive feature, it would still be a better place to keep your information than any application silo.
When your computing life is text, you get version control for free. Not as a feature you must configure, but as a consequence of the format. Any directory of text files can be a git repository. Any change to any file is tracked. Any previous state is recoverable. The entire history of your documents — every task completed, every note written, every budget revised, every email drafted — is accessible with git log.
The most underused capability of version control in a personal context is the diff — the comparison between two versions of a file that shows exactly what changed and when. In a text-native document, every decision is visible in the diff. Accepting a meeting invite is a line added. Completing a task is a character changed from [ ] to [x]. Changing a budget number is an integer replaced.
This diff tells you that on 23 March 2026, at some point during the day, you completed the task to call finance, and you accepted the team sync meeting. The diff is the record. The record is text. The text is yours.
When all your information is in text files, you can search all of it at once, with a single tool, in a fraction of a second. grep -r "Sara" ~/documents/ returns every reference to Sara across every document you have ever written — meeting notes, tasks, email drafts, budget comments, project documents, daily notes. The result is a list of lines, each preceded by its file path and line number. The search crossed no silo boundaries, because there are no silo boundaries. Sara is a name in text files. The text files are a directory. The directory is searchable.
Because the document uses a consistent directive grammar, search can be made structurally precise. Finding all incomplete high-priority tasks: grep -r "::task\[" ~/documents/ | grep "priority=high" | grep -v "done=true". Finding every calendar event accepted in the past month: grep -r "^accepted:" ~/documents/2026-02* ~/documents/2026-03*. These are not queries to a database. They are text searches. The tools that run them are fifty years old, universally available, and require no installation, no account, and no subscription.
The spreadsheet's formula is a secret. It lives in the cell, invisible in the default view, accessible only by clicking into the cell and reading a syntax that takes years to read fluently. A spreadsheet can contain years of accumulated business logic, and none of it is visible in the output. The numbers appear. Their provenance does not. The ::py block is the opposite of this. The computation is in the document. The code is in the text. Any reader of the document can see exactly what was computed, in language designed to be read, in the same scroll as the result.
Consider a budget overrun report shared with a finance team. In a spreadsheet, the finance team receives a grid of numbers. If they want to understand how any number was derived, they must click into cells, trace formula chains, and hope the formula references are labelled. In practice, they rarely do this. In a text-native document, the finance team receives the prose explanation, the Python block, and the table. The computation is right there. Anyone with basic Python literacy can verify the derivation in sixty seconds. Auditability becomes the natural consequence of keeping computation in prose.
The academic concept of "reproducible research" — the practice of publishing not just results but the code and data that produced them — is exactly the text-native document ideal applied to science. The Jupyter notebook made reproducible research practical for data scientists. The text-native document makes it practical for personal computing at every scale.
The language model is the first fundamentally new computing tool since the spreadsheet. It reads text and produces text — which means it is, structurally, a Unix program: it participates naturally in text-based workflows without any special integration. A language model can read your text-native document because your document is text. This is, when you think about it, remarkable: the most powerful AI tool ever built is natively compatible with the simplest possible data format.
In the current application paradigm, integrating a language model with personal data requires significant work: connect the model to Gmail via OAuth, connect it to Calendar via a separate OAuth, connect it to Notion via yet another API, manage the permissions, manage the rate limits, manage the data residency questions. Each connection is a negotiation. In the text-native document, there are no apertures to negotiate. The language model reads the document. The document contains everything relevant. The model knows what the user knows, in the same format the user uses.
The language model context window — the maximum amount of text a model can reason about at once — is the binding constraint on AI assistance for personal computing. A well-maintained daily document for one week is approximately 5,000–10,000 words. A model that reads the entire week's document has complete context for any question about that week. The text-native document is already shaped for this — a curated selection of relevant information assembled in the order that makes sense to the person writing it. The document is already the right size and shape for AI assistance, because it was designed to be the right size and shape for human attention. These turn out to be the same thing.
Every design system should be tested at its weakest point. For the text-native document architecture, the weakest point is the failure mode: what happens when the renderer is unavailable, when a directive type has no registered handler, when the user opens the document in a plain text editor because nothing else is installed? The answer must be: the document remains useful. Not as useful — the calendar does not render, the Python does not evaluate, the tasks do not have checkboxes. But the information is all there, in text, legible to any reader.
This book was written in a text file. It contains directives that I could not render while writing it, because the renderer does not yet fully exist. The interactive figures in the chapters were built as prototypes — widgets demonstrating how the architecture would work — rather than as genuine ::py blocks evaluated in a live document context. I have written a book about a system that describes itself, using a partial implementation of the system it describes. This is not irony. It is the honest state of an architecture that is ahead of its implementation.
The gap between the architecture and the implementation is not large. The components exist. Sandboxed Python evaluators exist; Pyodide runs Python in a browser sandbox with no server required. Markdown table parsers exist; there are dozens of implementations in every language. Bidirectional sync protocols exist; operational transformation and CRDTs are well-understood. The directive grammar is fifty lines of parsing logic. The capability registry is a manifest format and a dispatch table. None of this requires research. It requires engineering.
What it also requires is a decision about where to start. I suggest starting with the daily document and the ::task and ::py directives — the two capabilities that demonstrate the core value proposition with the least integration complexity. A daily document that evaluates Python inline and renders tasks as live checkboxes is already more powerful than most note-taking applications. It is the minimum viable document OS. Everything else can be added as renderers are built and registered.
I want to be clear about what I am not asking for. I am not asking for a single company to build this as a product, own the format, and charge a subscription. That would reproduce the exact dynamic this book argues against. I am asking for an open specification — a directive grammar, a capability registry protocol, a sync event format, a sandbox specification — that any implementer can build to and any user can adopt without vendor dependency. The specification is already partially written in the Appendices.
I hope this book is the beginning of a community. I hope someone reads it and starts building. I hope someone else reads it and starts writing their daily document in a text editor and discovers that even without a renderer, the format is useful — that naming things with the directive syntax helps, that keeping everything in one file helps, that writing prose around your tasks and computations helps. The document I am writing is not finished. It is the document you continue when you put this one down.
"The best way to predict the future is to invent it."
— Alan Kay
— end —
directive = "::" type [ "[" identifier "]" ] [ "{" params "}" ]block = directive CRLF body CRLF "::" "end"type = 1*( ALPHA / DIGIT / "-" )identifier = 1*( ALPHA / DIGIT / "-" / "." / "/" / "~" / "@" )params = param *( SP param )param = key "=" valuekey = 1*( ALPHA / DIGIT / "-" )value = token / quoted-stringtoken = 1*( ALPHA / DIGIT / "-" / "." / "/" / ":" / "+" )quoted-str = DQUOTE *( any char except DQUOTE ) DQUOTE
cal · email · task · note · py · table · contact · file · chat · web · sh · end (block closer, not a directive type)
id= — unique identifier within documentrender=false — suppress rendering, show as text onlycomment= — human-readable annotation, ignored by rendererversion= — schema version, for forward compatibility
run=auto blocks) or by user action. Total execution time limit: 10 seconds per document open. Individual block time limit: 3 seconds. Memory limit: 64MB per document context.math · cmath · decimal · fractions · statistics · random · datetime · calendar · collections · itertools · functools · operator · re · string · textwrap · json · csv · enum · dataclasses · typing · abc · copy · pprintdoc.tasks(filter=None) — list of task dicts from all blocks in documentdoc.tables(id=None) — list of table dicts, optionally filtered by iddoc.blocks(type=None) — list of all directive blocks in documentdoc.metadata — dict of document-level metadata (date, title, tags)table(data) — helper: outputs a list-of-dicts as a ::table blockchart(type, data) — helper: outputs a chart specification
Module.Proc in any document can be clicked to invoke it), modules as lazy-loaded renderers, a shared text substrate as the universal data layer, and no-hidden-state as an explicit design principle. Wirth reached the same architecture from a different direction — systems programming rather than personal productivity — and in doing so validated it. The text-native document architecture and Oberon are the same answer to the same question, separated by thirty-five years and a change in the question's scale.::directive[id]{params} syntax that this architecture adopts and extends.
The Document is the Computer — complete manuscript draft · 2026
Written in a text file · Rendered in a document · Owned by nobody but the reader