The Three-Tier Stack

The criteria by which we judge programming languages were derived for a partnership that no longer exists in its pure form. For sixty years that partnership had two parties: the programmer and the machine, mediated by a language and its toolchain. The language served the programmer’s cognition — readable, learnable, debuggable, amenable to the human practice of building mental models over weeks and revising them over years. The machine’s needs were addressed downstream, through compilers that translated human-suited surface syntax into machine-suited bytecode. Disputes about language fitness took place inside this two-party frame, and the criteria stabilized accordingly. Readability, expressiveness, ecosystem maturity, runtime efficiency. Those were the axes that mattered for the partnership of the era.

A new partner has joined the work. Large language models, deployed as agentic coders, have inserted themselves between the human and the substrate, and the partnership is now a stack of three. The human tier contributes intuition — pattern recognition compressed from experience, taste about which approaches will compose well downstream, judgment about where to circumscribe the problem and how to nudge the solution toward fitness. The LLM tier contributes scale — sustained semantic processing across thousands of tokens, mechanical execution of well-specified intent, fluent recognition of patterns from the sum of publicly written software, a willingness to read every file and follow every reference without growing tired. The substrate tier provides the materials and the workshop. Each tier has a contribution the others cannot substitute for, and the partnership reaches its potential only when each tier is allowed to lean in with its essential and optimal contribution.

This three-tier framing is the actual unit of analysis for any modern claim about programming language fitness. Discourse about “the best language” without reference to which partnership the language is serving has become incomplete. Worse, much of the current discussion is still fitting its conclusions to the two-party frame, applying criteria mechanically to a world where they no longer carry the same weight. The result is a confused literature that argues about the wrong questions while the actual leverage point — substrate fitness for the three-tier partnership — goes underexamined.

This essay argues a specific thesis. When one derives substrate-fitness criteria from the structure of the three-tier partnership and applies them rigorously, Haskell extended with Liquid Haskell and supported by deliberate library curation emerges as the strongest production-ready substrate for workloads where iteration economics dominate. Rust emerges as the strongest substrate where runtime characteristics dominate. The two languages are optimizing different blends of the partnership rather than competing on a single axis, and the choice between them resolves cleanly once the partnership structure is taken as the frame.

Deriving the Metric

The three-tier stack suggests a single quantity that captures most of what substrate fitness means: tokens × time, per unit of correct semantic progress. Every iteration of the development loop costs tokens — read by the LLM, written by the LLM, read by the human — and costs time, in compile cycles, test cycles, attention cycles. Not every token expended produces forward motion. Much expenditure produces wasted exploration, backtracking, and the correction of errors that better tooling would have caught earlier. The substrate’s job is to make this product as small as possible.

This unifies what would otherwise look like separate concerns. Compiler feedback quality matters because it converts ambiguity into actionable signal that costs few tokens to read and resolves the next step quickly. Edit locality matters because it bounds how much code must enter the context window per change. Type-level expressiveness matters because a well-typed signature compresses semantic content into very few tokens — Eq a => [a] -> [a] -> Maybe Int carries information that would take a substantial docstring to convey in a less expressive language. Refinement types extend the compression further, packing invariants directly into the surface. Purity matters because it eliminates the time dimension from reasoning about a function’s behavior; there is no temporal ordering to track, no hidden state to query. Specifications checked at compile time matter because they catch errors before the iteration cycle has spent its tokens on a debugging detour.

Several axes follow from this. Feedback signal locality, edit locality, compositional coherence, machine-checkable specifications, and what one might call language generativity — how readily the language supports agent-produced extensions that compose cleanly — are not separate virtues to be balanced. They are facets of a single optimization target. A substrate that scores well on one tends to score well on the others, because the underlying design discipline that produces good feedback also tends to produce good locality, good composition, and good specifications. The exception is runtime efficiency, which serves a different cost function entirely. CPU cycles per unit of execution. The two cost functions are not in conflict, but the design moves that maximize one are often different from the moves that maximize the other, which is why languages tend to specialize.

A few obvious objections deserve direct treatment.

The first is that LLM inference cost is falling fast, and so the token side of the calculation matters less than it did a year ago. This is true on its face but does not survive close inspection. Tokens are a proxy for context-window pressure, not just for inference cost. Even when running a model is nearly free, the model’s context window remains finite, and the density of semantic content per token determines how much of a codebase the model can hold coherently in a single episode. A substrate that compresses semantics into types lets the model reason about more of the program at once. A substrate that diffuses semantics into prose, tests, and runtime behavior forces the model to either lose coherence or repeatedly re-read material it has already seen. Cheaper inference does not relax this constraint.

The second is that LLMs trained on global corpora will simply get better at the languages with the most training data — JavaScript, Python, Java — and substrate fitness will therefore matter less than data abundance. This argument has more force than the first, but it confuses a transient advantage with a structural one. Volume of training data improves the model’s ability to write idiomatic code in a language. It does not improve the language’s structural properties. An agent fluent in Python still cannot get compile-time feedback on a Python program, because Python does not produce compile-time feedback. The data-abundance advantage is a fixed offset; the substrate-fitness advantage compounds across iterations. As context windows grow and as agents handle longer-running tasks, the compounding effect dominates the offset.

The third is that verification will eventually be done by separate verifier-agents rather than by the language toolchain, which would commoditize the substrate’s verification advantage. This is plausible as a long-run trajectory but underestimates the cost of running a verifier-agent compared to running a compiler. The compiler is a deterministic, fast, free piece of infrastructure; a verifier-agent is a probabilistic, slow, expensive call. Pushing verification into the substrate where the compiler can handle it is dominant over pushing it into the agent layer, for the same economic reasons that pushing computation into hardware is dominant over running it in software. Substrate verification will remain cheaper than agentic verification for the foreseeable future, and the gap is one of the structural reasons substrate fitness matters.

The Field, Narrowed

Before comparing Haskell and Rust as the leading candidates, it is fair to ask why the field narrows to those two. The wider landscape includes Zig, OCaml, Lean 4, F*, Swift, Mojo, Carbon, and Idris, among others, each with arguments that could be developed at length. The narrowing is principled rather than arbitrary. Lean 4 and F* require proof engineering that does not fit current agentic economics — the proof burden is exactly the cost the metric penalizes most heavily. Zig is pre-1.0 and lacks meaningful verification tooling. Carbon and Mojo are early experiments without production-ready stories. Swift is excellent inside Apple’s gravitational well and irrelevant outside it. OCaml is genuinely a contender, and shares many of Haskell’s structural properties without the laziness; an essay that included OCaml alongside Haskell would not be obviously wrong, but the verification story is weaker (no production-grade refinement-type tool comparable to Liquid Haskell) and the ecosystem coherence is weaker still.

What remains are two languages that score highest on the relevant axes among production-ready options, that have mature toolchains, that have proven themselves on substantial industrial codebases, and that represent meaningfully different bets about what substrate fitness should look like. Haskell and Rust. The comparison between them is the one that matters.

Walking the Comparison

A serious comparison requires walking each axis with the same rigor and refusing to declare a winner before the evidence supports it. The conclusion will not be “Haskell wins everywhere” or “Rust wins everywhere.” It will be a textured account of which language minimizes the metric better under which conditions, with the recommendation following from the workload rather than from a global ranking.

Feedback signal quality is the closest of the axes. Both languages catch a substantial class of errors at compile time. Both produce error messages that have improved markedly over the last decade. Rust’s compiler is arguably better at suggesting concrete fixes — its errors frequently include “did you mean…” snippets the agent can apply directly — while Haskell’s are more concise but sometimes less actionable. Both have known weak spots. Haskell’s type-class resolution failures can blame sites several derivations away from the actual mistake. Rust’s trait-resolution failures in heavily-generic code, particularly involving async, can be similarly inscrutable. The languages tie at a level well above mainstream alternatives.

Edit locality is more interesting. Haskell achieves it through purity: pure functions can be modified without re-examining call sites for side-effect interactions, and the type signature is the contract. Rust achieves it through the borrow checker: functions taking borrowed references can be modified without re-examining call sites for ownership interactions, provided the borrowing contract is preserved. Different mechanisms, similar effect. Where the languages diverge is in the cost of changing the contract itself. Adding an effect in Haskell propagates a new monadic context through call sites — mechanical, compiler-guided. Adding a lifetime in Rust propagates a new lifetime parameter through signatures — also mechanical, also compiler-guided, but harder to think about because lifetimes have semantic constraints that effects do not. Haskell’s pure-function refactors are slightly cheaper than Rust’s lifetime refactors, but Rust’s effect visibility at the granular level — the explicit unsafe, async, ?, mut markers — gives the agent more information per token than Haskell’s coarser monadic distinctions. A marginal Haskell advantage on this axis, with Rust competitive.

Compositional coherence is where the picture becomes interesting and where a piece of conventional wisdom needs revising. At the language level, Haskell holds a real advantage. The core abstractions of algebraic data types, type classes, parametric polymorphism, and monadic composition have been stable for decades and compose with each other in disciplined ways. Rust’s core mechanisms — ownership, borrowing, traits, lifetimes — are powerful but produce occasional friction, particularly around async and around the mutable-reference rules in collections. At the ecosystem level, Rust has historically held a much stronger position. Cargo’s tooling is exemplary. The cultural pressure toward standard libraries (serde, tokio, anyhow, clap) produces a coherence Haskell’s pluralistic ecosystem has not matched. Haskell has multiple competing answers at the effect-system, streaming, optics, and configuration layers, and the choice forces matching choices several layers up.

The traditional reading is that Rust’s ecosystem coherence outweighs Haskell’s language-level advantage for any practical project. That reading no longer survives contact with current development economics. When library gaps can be filled by an LLM in an hour of wall-clock time — either by writing a competent implementation from scratch or by wrapping a C library through FFI — the ecosystem-coherence axis loses much of its weight. The library was the thing humans had to provide because building from scratch was too expensive in human-hours. Agents change that calculation. What remains valuable about ecosystem coherence is not that the libraries exist but that the conventions exist, and conventions are partially recoverable through training data and partially through curation. The reweighted axis still favors Rust, but by a smaller margin than the conventional analysis would suggest, and the gap can be closed on the Haskell side through deliberate community action in a way that the Rust ecosystem cannot easily replicate even if it wanted to.

A related axis the conventional framework misses is language generativity. How readily does the language support agent-produced extensions that compose cleanly with existing code? Haskell scores high here. Its abstractions are principled enough that ad-hoc extensions tend to fit. FFI to C is excellent and produces idiomatic Haskell types rather than leaky abstractions. The cultural pressure toward “make illegal states unrepresentable” produces extensions that respect existing invariants. Rust scores middling. Its abstractions are powerful but have steeper coherence requirements. Agent-produced Rust code tends to compile but introduce friction — reflexive clone() calls to satisfy the borrow checker, Arc<Mutex<T>> patterns where a different architecture would have been more idiomatic, lifetime parameters threaded through APIs in ways that complicate downstream use. The borrow checker, which is one of Rust’s great achievements for memory safety, is a partial liability for generativity, because it rewards local appeasement over architectural correctness when the generator does not have full system context. Haskell holds a real and underappreciated advantage on this axis, and the advantage compounds over project lifetime as more of the codebase comes from generated extensions rather than from imported libraries.

Machine-checkable specifications is where Haskell extended with Liquid Haskell currently dominates. Refinement types verified by SMT solvers catch a class of logical errors no production-ready Rust verification system can match today. Consider the kind of refinement the agent gets to write:

{-@ head :: {xs:[a] | len xs > 0} -> a @-}
head :: [a] -> a
head (x:_) = x

Now any caller must prove the list is non-empty, and the proof is automatic when the agent has just constructed the list with a known-non-empty operation. Or:

{-@ divide :: Int -> {y:Int | y /= 0} -> Int @-}
divide :: Int -> Int -> Int
divide x y = x `div` y

The agent cannot accidentally pass zero. The compiler refuses, and the refusal is local and actionable.

The Rust verification ecosystem (Kani, Creusot, Verus, Prusti) is fragmented and research-grade. Verus in particular is moving fast and has serious institutional backing, but it does not yet provide a unified mainline experience. The current gap is wide. Forecasting it forward, the gap narrows but does not close. The Rust verification efforts will improve, but the borrow checker complicates rather than simplifies refinement-style verification, because the aliasing discipline interacts with the predicate logic in ways that are still being worked out. For today’s agentic work, Haskell + Liquid Haskell holds a decisive advantage on this axis. For 2028, the advantage probably remains but is smaller.

Runtime efficiency and resource management is where Rust dominates without serious dispute. Zero-overhead memory management, predictable runtime characteristics, tight machine code as a default. Haskell can be made fast, sometimes very fast, but the language’s defaults trade runtime characteristics for development-time properties, and that trade is real. For embedded work, hard real-time systems, kernel-adjacent code, the inner loops of high-performance computing, and any system where runtime characteristics dominate the project’s success criteria, Rust is the right substrate. Haskell is not.

The aggregate picture: Haskell wins clearly on machine-checkable specifications and on language generativity. Rust wins clearly on runtime characteristics and holds an advantage on ecosystem coherence that has shrunk under agentic economics but not vanished. The other axes are close. Which language fits the partnership better depends on which blend of axes the workload prioritizes. For projects where iteration economics dominate — most application development, most infrastructure tooling, most data-processing systems, most domain-specific scientific or engineering software — Haskell’s combination of generativity and machine-checkable specs makes it the better substrate, and Liquid Haskell extends the lead. For projects where runtime characteristics dominate, Rust’s combination of zero-cost abstractions and mature systems-level ecosystem makes it the better substrate, and the verification gap is acceptable because runtime correctness can be partially recovered through testing and through the type system’s existing memory-safety guarantees.

The two languages are optimizing different blends. The blends correspond to different parts of the software landscape. The right question is not “which language wins” but “which workloads belong to which substrate.”

The Specific Case for Haskell + Liquid Haskell + Curation

For workloads where iteration economics dominate, the case for Haskell as the substrate of choice runs deeper than the axis-by-axis scoring. The deeper reason is that Haskell’s design discipline aligns with the metric the three-tier partnership demands. Strong static types. Purity by default. Monadic effect tracking. Principled abstractions. The cultural commitment to making illegal states unrepresentable. These features minimize tokens × time per unit of correct semantic progress, and they do so by pushing semantic content into the type-level surface where it is compressed and machine-checkable, rather than into runtime behavior where it is verbose and only checkable by execution.

Liquid Haskell extends this fit by adding a specification layer that catches logical errors at compile time without requiring proof engineering. Most refinements verify automatically; the cases that fail produce actionable counterexamples. The agent writes a refinement, the solver checks it, the loop continues. No tactic-writing intermission. No proof apprenticeship. No separate verification language to learn. Refinements live in special comments inside ordinary Haskell code, integrate through the GHC plugin model with the existing toolchain, and propagate through the recompilation machinery so only changed modules are re-checked.

The combined system gives the agent a verification ladder. Types catch structural errors instantly. Refinements catch logical errors at compile time, with SMT discharging most obligations automatically. Property tests catch what slips through. Unit tests catch specific scenarios. Each rung is more expensive than the one below it, and pushing as much verification as possible to the cheapest rung is the dominant strategy for agentic economics. Most languages give you only the bottom two rungs. Haskell with Liquid Haskell gives you all four — and the upper rungs are precisely the ones that catch the bugs LLM-generated code introduces most frequently. Off-by-one errors. Missing case alternatives. Silent semantic drift during refactors. The entire class of “I’ll fix it later” implicit invariants that humans paper over with comments and that agents reproduce verbatim from training data without noticing.

The third piece is the one that has not yet happened in the Haskell community at the scale required. The ecosystem’s expressive pluralism is a virtue for a research community and for human teams with bandwidth to make architectural decisions thoughtfully. It is a liability for agentic coding at scale. Agents handle ecosystem fragmentation badly. They need a stable, learnable convention; “it depends on which library the project uses” is hard for them to internalize without strong project-level context, and the failure mode is to produce code that mixes conventions inconsistently or that defaults to whichever convention was most prominent in training data three years ago.

The strategic move is to circumscribe the abstraction space deliberately. Pick one effect system. Pick one streaming library. Pick one error-handling convention. Pick one optics library. Pick one approach to configuration, logging, database access, HTTP, JSON, testing. The specific choices matter less than the fact of choosing. Ecosystem-level coherence purchased by curation is more valuable than the marginal expressiveness given up by ruling out alternatives. The Go community made this choice culturally rather than technically, and it has paid enormous agentic dividends. The Haskell community can make the same choice technically, by publishing curated meta-packages and project templates that compose only the chosen abstractions. The underlying expressiveness does not disappear — alternative effect systems still exist on Hackage — but the default shifts. Defaults are leverage.

The curation argument acquires additional force when combined with the language-generativity observation. If agents can fluently produce missing pieces in a language whose abstractions compose cleanly, then a curated stack does not need to be exhaustive. It needs to be coherent. The library gaps that would have been fatal in a pre-agentic world are now closed by the LLM tier, provided the substrate cooperates. Haskell cooperates well because its abstractions are principled enough that agent-produced extensions tend to fit. Curation provides the architecture; generativity provides the gap-filling. Together they produce a substrate where the agent can construct fit-for-purpose solutions even in domains where the conventional ecosystem is sparse, and where the solutions remain coherent across the lifetime of the project.

Stakes Beyond Language Choice

The three-tier framing implies something more general than a language recommendation, and the more general claim is the one that matters most for the next decade of software development.

Most current discussion of AI in software development frames the wrong question — whether LLMs will replace programmers. The question presupposes the two-party frame in which programmers and machines are the only actors, and asks whether one party will displace the other. Inside the three-tier frame, the question dissolves. Humans are not being replaced; they are being relocated to the layer where their irreducible contribution lives — intuition, taste, judgment about where to circumscribe problems and how to nudge solutions toward fitness. LLMs are not replacing humans; they are filling a layer that did not previously exist, and in doing so they make the human contribution more leveraged, not less. The substrate is not changing; it is being asked to serve a new partnership. The substrate that serves the new partnership best will increasingly dominate the workloads that match its blend.

The right question, given the three-tier frame, is how to optimize each tier for its essential contribution and how to design the interfaces between tiers so that each tier’s contribution is amplified rather than blunted. Current discourse is largely not asking this. The consequence is that investment is flowing toward problems that do not deserve it while leaving the actual leverage points underexamined. Improving inference speed matters, but it matters less than improving the substrate the inference is operating on. Training larger models matters, but it matters less than ensuring the substrate gives the model dense semantic content per token. Better tools matter, but tools are downstream of the substrate they target; tools for a poorly-fit substrate cannot exceed the substrate’s ceiling.

The substrate is the variable we get to choose. The human tier is what it is — humans bring intuition, taste, and judgment, and these capacities are hard to engineer. The LLM tier is improving along its own trajectory, driven by forces largely external to any individual project or community. The substrate is the layer where deliberate community action compounds. Every project that uses a substrate teaches developers — human and LLM — to use it more effectively. Every library written against it deepens the conventions. Every hour of training data generated by people working in it improves agent fluency in it. A substrate that fits the partnership well becomes more valuable over time as the partnership scales. A substrate that fits poorly becomes a tax on every future project that uses it, paid in tokens × time, that no amount of investment in the other tiers can recover.

For any language community that wants its substrate to thrive in the next decade, the strategic posture follows. Evaluate every proposed change to the language, the standard library, and the conventional toolchain through the lens of how the change affects the three-tier partnership. Does it make the agent’s job cheaper or more expensive? Does it preserve and amplify the human’s intuitive contribution, or does it ask the human to spend intuition on low-level concerns the substrate should have handled? Does it make agent-produced extensions more or less likely to compose cleanly with existing code? These are answerable questions, and they yield real investment priorities — toward better error messages, toward cheaper specifications, toward stronger conventions, toward type-level expressiveness that compresses semantic content efficiently.

For the Haskell community specifically, the implication is sharper. The community has unusual leverage in shaping the next decade of agentic software development, and the leverage is greatest in workloads where iteration economics dominate. The technical pieces are largely already built. The remaining work is integrative — pulling Liquid Haskell forward to track current GHC versions, publishing curated stacks, tuning documentation and project templates for the agentic case, resisting the cultural tendency toward expressive pluralism in favor of coherence at the high-level abstraction layer. None of this is research-grade work. It is engineering and community organization, which is the kind of work the Haskell community has historically been less good at than the languages it sometimes loses to. The window is finite. Other languages will adapt. The opportunity is not that Haskell will remain uniquely positioned forever; the opportunity is that it is uniquely positioned now, and that the actions required to convert the position into market share are within reach.

The bet the essay is asking the reader to make, if the reader is in a position to make it, is this: invest in the three-tier substrate now, before the alternatives catch up, on the wager that the partnership-fit advantage will compound faster than the data-abundance advantage of mainstream languages. The wager could lose. Inference costs could fall faster than expected, or verifier-agents could mature into substitutes for compile-time verification, or the curation effort could fail to coalesce. But if the wager wins — and the structural argument suggests it should — the community that placed it early will own the substrate the next generation of software is built on, in workloads measured in trillions of dollars of economic activity. That is a bet worth thinking carefully about, and one the conventional discourse is not yet thinking about at all.