If the file is 500GB and memory is 8GB, the main algorithmic question is no longer comparison count. It is I/O shape. External mergesort wins because it touches disk in long sequential passes, first creating sorted runs that fit in memory and then merging those runs with a heap.
BinaryFile with seek
Quicksort assumes cheap random access. Its partition pass reaches back and forth across the array, which is fine when the array lives in RAM. But on disk, that pattern becomes a seek-heavy disaster. What looks like a few arithmetic operations in memory turns into repeated repositioning of the read head.
External mergesort changes the goal. Instead of trying to sort the whole file in place, it makes a small number of long sequential passes. That single design choice is why a 500GB file is a practical job instead of an absurd one.
Quicksort’s partitioning pattern is a cache story in RAM and a seek story on disk.
Read a chunk, sort in memory, write a run. Then merge runs in order using buffered reads.
The first phase is simple: read as many records as fit in memory, sort that chunk in RAM, and write the result out as a sorted run. Repeat until the whole file has been turned into runs.
This phase is where your in-memory sorter still matters. If the chunk fits in RAM, use the fast in-memory algorithm there. External mergesort is not replacing introsort inside RAM; it is orchestrating introsort across many disk-sized chunks.
Once you have sorted runs, the merge phase is conceptually straightforward: keep one current record from each run, repeatedly emit the smallest one, and then replace it with the next record from the same run. The right data structure for that is a min-heap of size k.
Each output record costs one heap pop and, unless a run is exhausted, one heap push. That gives O(n log k) merge work, where k is the number of runs being merged in the current pass.
The heap stores the smallest current record from each live run.
Every pop emits the next global minimum across all runs.
External sorting lives or dies by buffering and access order. Reads should be sequential within each run. Writes should be sequential into the output file. Random seeks should happen only when opening a run, switching buffers, or repositioning within a simulated file abstraction.
This is also why multi-pass merging exists. If you cannot afford one input buffer per run, do not force it. Merge a manageable number of runs into larger runs, then merge those larger runs in the next pass.
The implementation below uses a browser-side BinaryFile-style class with seek, readInt32, and writeInt32. It is not real disk I/O, but it preserves the important part for an explainer: a file cursor, run files, and a merge phase that treats files as streams rather than plain arrays.