At the root of every TeX-based engine sit Knuth’s original TeX and its strict superset, e-TeX. This page covers what “the TeX program” actually is, what plain TeX is, why almost nobody writes it directly today yet it still matters, and the e-TeX extensions — added in the 1990s — that are now the baseline every modern engine builds on.
What “the TeX program” is
Confusingly, the word TeX names two things. One is the typesetting program itself, written by Donald Knuth. The other is the system of commands that program interprets. The program proper is a relentlessly low-level engine: it packs characters and boxes into lines and paragraphs, optimizes the line breaks, and writes the result to an output file — and little else.
Dissatisfied with the typographic quality available for his book *The Art of Computer Programming*, Knuth began the first TeX in 1978. That initial version was an exploratory prototype; it was completely rewritten in 1982, and that rewrite is known as TeX82. “TeX” today effectively means this TeX82 lineage. The minimal set of commands Knuth built in are called primitives — \def, \hbox, \vbox, and the like.
Writing a document in bare primitives is impractical, so you normally use a format — a layer of macros built on top of them. Loading plain.tex gives you plain TeX; loading a far larger macro set gives you LaTeX. Formats are pre-expanded and saved as a .fmt file at build time so the engine need not re-read them from scratch on every run.
Plain TeX
Plain TeX is the standard format Knuth presented in *The TeXbook*. It collects a minimal toolkit — font setup, the basic math symbols, convenience macros such as \bye — into plain.tex, which ships with TeX itself. Before LaTeX, this was the only practical way to use TeX.
The notation differs noticeably from LaTeX. Inline math is $...$ (the same as LaTeX). A horizontal box is \hbox{...}, a vertical box \vbox{...}, tabular alignment is done with \halign, a command is defined with \def, the text width is \hsize, and a document ends with \bye. There is no \documentclass and no \begin{document} — those are all things LaTeX builds out of macros.
% plain TeX — tex hello.tex で処理 / process with: tex hello.tex
\hsize=10cm
\font\big=cmr10 at 17pt
{\big Hello, plain \TeX!}
\medskip
This paragraph is set in the default font.
Inline math works too: $E = mc^2$.
\byeProcessing the example with tex hello.tex produces a DVI file (not a PDF — see below). Logo macros like \TeX and spacing macros like \medskip (a medium vertical skip) are also defined by plain TeX. Set beside a LaTeX \documentclass document, it shows just how thin a skin plain TeX is over the bare engine.
Literate programming and WEB
TeX itself is written in WEB, Knuth’s own system and the founding practice of literate programming. A WEB source is a single document in which human-facing explanation and Pascal code are woven together; two tools then extract derivatives from it — tangle produces compilable Pascal, and weave produces the typeset commentary (the book *TeX: The Program*).
The original target language was Pascal; in modern distributions a tool called web2c translates the WEB (via Pascal) into C for building. So the pdfTeX or LuaTeX running on your machine today still traces back, ultimately, to that single literate source.
A version number converging to π
TeX’s version numbering is idiosyncratic. Since version 3, each update appends one more digit, so the number asymptotically approaches π. The current version is 3.141592653… — a symbol that the program is essentially done and exceptionally stable.
Knuth has declared that the “final change,” made posthumously, will set the version to exactly π — at which point any remaining bugs become permanent features. Indeed TeX has gained no new features for years, and reported issues amount only to tiny fixes. This frozen stability is precisely why a .tex file from decades ago still produces the same output today.
METAFONT and DVI
TeX has a twin companion, METAFONT (mf): a system that describes a font not as fixed shapes but as a *program for drawing* the glyphs. Knuth designed the entire Computer Modern typeface family for TeX with it. The division of labor is that TeX decides where the characters go, while METAFONT produces the shapes of the characters themselves.
What Knuth’s TeX emits directly is not PDF but a DVI (DeVice Independent) file. A DVI holds only device-independent instructions — “place this character at this position” — which are then converted to PostScript by dvips or to PDF by dvipdfmx. A major difference of today’s widely used pdfTeX, XeTeX, and LuaTeX is that they skip this step and produce PDF directly.
Why it still matters
In practice, almost nobody writes raw plain TeX for new work: the practical machinery — numbering, cross-references, class files — all lives in the LaTeX layer on top, and everyday writing happens on that comfort. Yet there is real value in knowing the primitives described here.
- The foundation of every format. LaTeX and ConTeXt alike are ultimately built on these primitives.
- You can see to the bottom of errors. Tangled errors and low-level tweaks become readable once you know the vocabulary of
\hboxand\vbox. - The boxes-and-glue worldview. TeX’s idea of building a page from *boxes* and stretchable *glue* is the shared language for using LaTeX deeply.
- Universality and reproducibility. Because the engine is frozen, past documents keep producing the same result into the future.
What is e-TeX
Because Knuth froze TeX, new features had to come from other hands. So in the 1990s, out of Europe’s NTS (New Typesetting System) project, came e-TeX, with Peter Breitenlohner doing the principal development. e-TeX does not replace TeX; it is designed as a strict superset of it — existing input runs unchanged, with identical output.
e-TeX has two modes. In compatibility mode it behaves exactly like classic TeX; only in extended mode — entered by starting it with a * — do the added primitives become available. When a format is built in extended mode, the user need not think about it at all.
Now the baseline
Today the e-TeX extensions are simply assumed to be present. TeX Live has defaulted to e-TeX since 2003, and LaTeX has officially required the e-TeX primitives since 2017. More important still, every current engine bundles e-TeX: pdfTeX, XeTeX, and LuaTeX all incorporate the e-TeX extensions and add their own features on top.
In other words, when you typeset with pdfLaTeX or LuaLaTeX, you are running on e-TeX without realizing it. Many modern packages — starting with expl3, the LaTeX3 programming layer — simply could not exist without the e-TeX primitives.
The primitives e-TeX adds
e-TeX’s additions matter most to people who write macros. The biggest is arithmetic on integers, dimensions, and glue. In bare TeX you had to shuffle scratch registers to compute; e-TeX provides \numexpr, \dimexpr, and \glueexpr, which evaluate expressions like (a+b)*c/d on the spot, in an expandable form.
% 拡張モードの e-TeX 系エンジンで / on any e-TeX engine in extended mode
\count0=\numexpr (3+4)*2/7 \relax % 2 が入る / yields 2
% プリミティブが「定義済みか」を安全に分岐 / branch safely on whether a name is defined
\ifdefined\foo \message{foo exists}\else \message{no foo}\fi
% 制御綴を作らずに名前の存在を確かめる / test a control sequence without creating it
\ifcsname chapter\endcsname \message{chapter is defined}\fi
% \unless で「if の否定」を直接書く / negate a conditional directly
\unless\ifnum\count0>10 \message{count0 is not greater than 10}\fiA second pillar is conditionals and token manipulation. \ifdefined tests whether a control sequence is defined, and \ifcsname...\endcsname tests the existence of one assembled from a name — both without side effects (the old \ifx trick in bare TeX could silently turn an undefined sequence into \relax). \unless inverts any conditional, and \detokenize turns a token list into its string form (characters of category code 12).
Expansion control is strengthened too. \unexpanded leaves its contents in place without expanding them, and \protected makes a defined macro one that will *not* expand of its own accord in expansion contexts such as \edef and \write — the key to implementing LaTeX’s \protect correctly at a low level.
| Primitive | What it does |
|---|---|
\numexpr / \dimexpr / \glueexpr | Evaluate integer / dimension / glue expressions in place (expandable) |
\ifdefined / \ifcsname | Test if a control sequence is defined / exists, with no side effects |
\unless | Invert the sense of the following conditional |
\protected | Define a macro that will not expand inside expansion contexts |
\detokenize / \unexpanded | Turn tokens into a string / keep tokens unexpanded |
\scantokens / \readline | Re-read a string as tokens / read an input line verbatim |
\middle | Place a stretchy delimiter in the middle of \left … \right |
\currentgrouplevel / \interactionmode | Inspect the current group depth / get and set the interaction mode |
Beyond these there is \middle for a stretchy delimiter in the middle of math like \left( … \middle| … \right); \scantokens to re-read a string as tokens; \readline to read an input line verbatim; \currentgrouplevel returning the current group depth; and \interactionmode to read and change the interaction mode. e-TeX also introduced hooks for bidirectional (right-to-left) typesetting, which fed into later work in XeTeX, LuaTeX, and Japanese processing.
A quiet but consequential extension is a large increase in registers. Bare TeX had only 256 each of \count, \dimen, \skip, \toks, and so on; e-TeX raised this to 32768 (allocated as sparse arrays). Registers no longer run out so easily even when many large classes and packages are loaded — an unsung support beneath today’s heavyweight LaTeX setups.