Haskell for Agents — Eversosoft

The question of which programming language best fits a given purpose has historically been answered in human terms. Readability for human reviewers. Hireable developer pools. Onboarding cost for new team members. Cognitive load on the maintainer at 2am. These criteria still matter, but they are no longer the only criteria, and increasingly they are not the dominant ones. Agentic coding has introduced a second customer for our languages — one with a different cost structure, different failure modes, and different things it needs from a toolchain. The conventional wisdom about which languages “win” in this new world is mostly being formed by people who have not yet thought carefully about what the agentic customer actually wants.

This essay argues a particular thesis. Haskell, for reasons that are largely accidental from the standpoint of its original designers, has turned out to be unusually well-suited as a target language for agentic coding. The features that gave Haskell its reputation as a research language are precisely the features that pay extraordinary dividends when the consumer of compiler feedback is an LLM rather than a human. Adding Liquid Haskell extends this fit substantially, and circumscribing the choice of high-level abstractions through deliberate library curation could close most of what gap remains. The result would be something close to an optimal substrate for AI-driven development at scale, and the pieces required to assemble it largely already exist.

The essay is about the workload class where iteration economics dominate. Application development. Infrastructure tooling. Data-processing pipelines. Domain-specific scientific and engineering software. Anything where the cost of a project is measured primarily in the time it takes to converge on correct behavior rather than in the cost of running the resulting program. For workloads where runtime characteristics dominate — embedded systems, kernel-adjacent code, hard real-time applications — Rust is the right answer and a different essay would make that case. Both arguments are correct; they apply to different parts of the software landscape.

The Agentic Customer Has Different Needs

A human programmer reads code linearly, builds a mental model over weeks, reasons by analogy with patterns seen across a career, and pays a high cost to context-switch between projects. An agent reads code in chunks bounded by context windows, reasons from artifacts produced by the toolchain rather than from accumulated experience, can re-derive understanding cheaply but cannot easily retain it across episodes, and pays a high cost when feedback is delayed, noisy, or non-local. Different cost structures. Different failure modes. Different language fitness criteria.

Five axes capture most of what matters. The first and most important is feedback signal locality and quality — how quickly and precisely the toolchain can tell the agent whether its last edit was correct, and how precisely the error can be localized to source. An agent’s effective intelligence on a codebase is roughly proportional to this quantity. The second is edit locality — when the agent makes a change, how confidently can it know what else might be affected? The third is compositional coherence — when the agent reaches for a library, are there many incompatible alternatives, or is there a stable convention it can rely on? The fourth is machine-checkable specifications — can the agent express what it intends and have the compiler verify the implementation matches, without writing English prose that no machine can validate? The fifth, and least important for current agentic economics, is runtime efficiency. The cost of an inefficient program is currently far below the cost of a program that took twenty iterations to produce.

The order matters. A language could be theoretically beautiful on the lower axes and still produce poor agentic outcomes if the upper axes are weak. JavaScript, Python, and Ruby produce confidently-wrong agent output disproportionately often, not because they are deficient as languages — they are excellent for many human purposes — but because their feedback signal arrives at runtime, after the agent has already moved on, and is mediated through stack traces that lose locality at module boundaries. C++ is similarly weak: its template error messages can be catastrophically non-local, regularly blaming a site three abstractions away from the actual mistake, which produces exactly the kind of spinning loop that agentic systems handle worst. The texture of working with an LLM on a TypeScript or Vite project — where lint passes, tests run, and yet subtle regressions surface at runtime in ways the tooling cannot anticipate — is qualitatively different from working with the same LLM on a strongly-typed functional codebase, and anyone who has tried both has noticed the difference.

Haskell’s Current Scorecard

Haskell scores remarkably well on these axes, and the strength is structural rather than incidental.

Consider feedback signal quality first. Haskell’s type system catches a substantial class of program errors at compile time, and the errors are mostly local — they identify the expression that violated a type constraint and explain what was expected. Error messages have improved markedly over the last decade. The days of inscrutable type-class resolution failures have not entirely passed, but they are no longer typical. The overall experience of “follow the type errors and the program will become correct” is one Haskell programmers have internalized for years. Agents internalize it faster, because they do not have to overcome the human resistance to being told they are wrong. The compile-edit-recompile loop in Haskell, when working from a sound architectural starting point, is the closest thing in mainstream programming to a verifier that produces actionable feedback at every step.

Edit locality is where Haskell’s purity pays its largest dividend. In an imperative language, changing the behavior of a function requires reasoning about every call site’s surrounding mutable state, every shared resource that function might touch, every ordering dependency in the wider system. In Haskell, referential transparency means that pure functions can be modified without re-examining call sites for runtime side effects. If the type signature is preserved, behavior changes are confined to the function itself. When the type signature must change, the compiler enumerates the call sites that need updating. This is not just a comfort for human programmers; it is a structural reduction in the size of the code region an agent must reason about per change. The blast radius of an edit is bounded by what the type system says it can be bounded by, and the type system says quite small things in well-designed Haskell code. Effect tracking via monads makes the non-pure parts of the program visible at the type level, so even when an effect is involved, the agent can see it without inspecting the implementation.

Compositional coherence is the axis where Haskell is genuinely mixed. At the language level, Haskell has unusually strong coherence — the core abstractions of algebraic data types, type classes, parametric polymorphism, and monadic composition have been stable for decades and compose in disciplined ways. At the ecosystem level, the picture is messier. The effect-system question is unsettled: mtl, ReaderT pattern, effectful, polysemy, Bluefin. The streaming abstraction question is unsettled: conduit, pipes, streamly, streaming. The optics question bifurcates libraries: lens versus optics. The “stringly-typed Haskell” versus “type-level Haskell” stylistic split creates communities that nominally write the same language. This is the gap that library curation must close, and we will return to it.

Machine-checkable specifications are where standard Haskell stops short. Types catch a great deal — much more than mainstream languages — but they do not catch everything an agent needs verified. A function annotated Int -> Int could be the identity, the doubler, or the constant zero, and the type system cannot distinguish them. Property-based testing via QuickCheck closes some of this gap by running checks at test time, but the verification is probabilistic and the feedback arrives later in the loop. This is where Liquid Haskell becomes interesting.

Resource management is Haskell’s traditional weakness. The lazy-evaluation memory profile and the garbage collector are part of the deal. Linear types and the broader Linear Core work hint at a future where memory management can be made more precise within the existing language, but for now Haskell trades runtime efficiency for development-time safety. The good news is that this axis matters least for current agentic economics: the cost of running a slightly inefficient program for a year is generally below the cost of the human review cycles that would have been required to make it efficient.

The aggregate picture is that Haskell currently scores well on three of the five axes — feedback, edit locality, language-level compositional coherence — and middling on the others. For most agentic coding purposes, the three strong axes are the ones that matter most, and Haskell’s scoring on them is unusually high relative to mainstream alternatives. This explains the experiential observation that agentic coders work remarkably well on Haskell projects. The substrate is well-suited to their cost structure even though it was not designed for them.

Liquid Haskell as the Specification Layer

Standard Haskell catches a great deal at compile time but not everything. The natural next step is to give the agent a way to express what code should do, not just what shape its inputs and outputs have, and to have the compiler verify the match. This is the territory of refinement types and dependent types, and the tradeoffs between approaches matter for the agentic case in ways that are not obvious.

Lean 4 and F* offer extraordinarily expressive specification languages, but they require the programmer to write proofs when the type checker cannot discharge an obligation automatically. Proof writing is labor — sophisticated labor that LLMs can in principle do, but labor nonetheless, and labor that scales poorly when verifying every function in a large codebase. For research purposes and for systems where correctness is paramount and engineering effort is abundant, this is acceptable. For agentic coding at industrial scale, where the entire economic argument depends on iteration being cheap, the proof burden is a significant cost.

Liquid Haskell makes a different bet. Refinement types — predicates attached to ordinary types — are verified by an SMT solver (typically Z3) which discharges most obligations automatically. Consider what the agent gets to write:

{-@ head :: {xs:[a] | len xs > 0} -> a @-}
head :: [a] -> a
head (x:_) = x

Now any caller must prove the list is non-empty, and the proof is automatic when the agent has just constructed the list with a known-non-empty operation. Or this:

{-@ divide :: Int -> {y:Int | y /= 0} -> Int @-}
divide :: Int -> Int -> Int
divide x y = x `div` y

The agent cannot accidentally pass zero. The compiler refuses, and the refusal is local and actionable.

When the solver cannot discharge an obligation, the agent usually receives either a counterexample or a clear indication that the specification needs strengthening, the implementation needs refactoring, or an intermediate measure needs to be defined. The cases that fall outside SMT-decidable logic do exist and require manual proof effort, but they are the minority for the kinds of properties most code actually wants to assert.

This asymmetry — cheap verification of routine claims, actionable signal on the rest — is precisely what agentic feedback loops want. The agent writes a refinement, the solver checks it, the loop continues. There is no tactic-writing intermission. No proof-engineering apprenticeship. No separate verification language to learn. Refinements live in special comments inside ordinary Haskell code, which means the cognitive switching cost is minimal. Since 2020, Liquid Haskell has been a GHC plugin, which means it participates in the normal compilation pipeline, ships specifications inside ordinary Haskell packages, and integrates with the recompilation machinery so only changed modules are re-checked.

Consider what refinement types catch in practice. Off-by-one errors, because indices can be specified to lie within bounds. Missing case alternatives, because measures can require functions to handle every constructor. Silent semantic drift during refactors, because the specification cannot be satisfied by an implementation that drifted from intent. The entire class of “I’ll fix it later” implicit invariants — the comments saying “must be non-empty,” “must be sorted,” “must be positive” — by lifting them into checkable types. These are exactly the bugs LLM-generated code introduces most frequently. The agent is reasoning by pattern-match against training data, and the patterns sometimes contain the bugs. A type system that refuses to compile until the patterns are correct closes a category of failure that is otherwise expensive to catch.

The combined system gives the agent a verification ladder. The base type system catches structural errors instantly and locally. Refinement types catch logical errors at compile time, with SMT discharging most obligations automatically. Property tests via QuickCheck catch what slips through, at the cost of running them. Unit tests catch specific scenarios. Each rung is more expensive than the one below it, and pushing as much verification as possible to the cheapest rung is the dominant strategy for agentic economics. Most languages give you only the bottom two rungs, or only the bottom one. Haskell with Liquid Haskell gives you all four.

The practical limits are real. Refinement types are decidable by SMT exactly when the properties are expressible in the solver’s logic; deep structural reasoning about ASTs, complex inductive properties, and nonlinear arithmetic typically fall outside this fragment. The type-class story has been a known weakness, with refinements not always propagating through type-class dispatch, though active work is closing this gap. Liquid Haskell currently lags GHC HEAD by several versions and requires a specific GHC release, which constrains what the rest of the toolchain can use. None of these limits are fatal — they are engineering tradeoffs that improve year over year — but they mean that Liquid Haskell is not a silver bullet and the verification ladder still has rungs above the SMT layer.

The Library Curation Argument

Even with Haskell’s language-level coherence and Liquid Haskell’s specification layer, an unresolved problem remains: which libraries does the agent reach for, and how do they compose? An agent can be excellent at writing Haskell while still producing a project that is incoherent at the architectural level because the libraries it chose were each individually reasonable but collectively at war.

The standard response in the Haskell community is to embrace pluralism — let many flowers bloom, allow each project to choose its own effect system, streaming abstraction, and optics convention. This is the right answer for a research community, and possibly for human teams who have the bandwidth to make these choices thoughtfully. It is the wrong answer for agentic coding at scale. Agents handle ecosystem fragmentation badly because they need a stable, learnable convention; “it depends on which library the project uses” is hard for them to internalize without strong project-level context, and the failure mode is to produce code that mixes conventions inconsistently. Worse, agents trained on a corpus of public Haskell code see all the conventions roughly equally and reproduce whichever was most prominent in their training data, which biases toward whatever was popular on Hackage three years ago rather than what is best for your project today.

The strategic move is to circumscribe the abstraction space deliberately. Pick one effect system. Pick one streaming library. Pick one error-handling convention. Pick one optics library. Pick one approach to configuration, logging, database access, HTTP, JSON, testing. The specific choices matter less than the fact of choosing — the ecosystem-level coherence purchased by curation is more valuable than the marginal expressiveness given up by ruling out alternatives. The Go community made this choice culturally rather than technically, and it has paid enormous agentic dividends. The Haskell community can make the same choice technically, by publishing curated meta-packages and project templates that compose only the chosen abstractions.

Such a curation would minimize the choice space the agent must reason about. It would maximize the consistency of patterns across modules. It would let documentation and training data concentrate on a smaller surface area, which improves the agent’s ability to generate idiomatic code. It would reduce the impedance mismatches that arise when libraries with different conventions must be glued together, because the conventions would already match. It would make project bootstrapping faster, because the architectural decisions would be pre-made. And it would create a stable target for tooling — language servers, formatters, linters, and Liquid Haskell specifications could all be tuned for the curated stack rather than for the cross-product of all possible stacks.

The deeper point is that polymorphism and library expressiveness, while genuine virtues, have an economic cost that becomes visible only at the scale of agentic coding. A language with five effect systems is in some sense richer than a language with one, but the richness is paid for in choice-overhead that the agent must navigate every time. Curation does not eliminate the underlying expressiveness — the alternative effect systems still exist on Hackage and a project that needs them can use them — but it changes the default. The default shapes what agents produce. What agents produce shapes the texture of codebases that humans must later maintain. Defaults are leverage.

The curation argument acquires additional force when combined with what one might call language generativity — how readily the language supports agent-produced extensions that compose cleanly with existing code. Haskell scores high on generativity because its abstractions are principled enough that ad-hoc extensions tend to fit. FFI to C is excellent and produces idiomatic Haskell types rather than leaky abstractions. The cultural pressure toward “make illegal states unrepresentable” produces extensions that respect existing invariants. If agents can fluently produce missing pieces in a language whose abstractions compose cleanly, then a curated stack does not need to be exhaustive; it needs to be coherent. The library gaps that would have been fatal in a pre-agentic world are now closed by the LLM tier, provided the substrate cooperates. Curation provides the architecture; generativity provides the gap-filling. Together they produce a substrate where the agent can construct fit-for-purpose solutions even in domains where the conventional ecosystem is sparse, and where the solutions remain coherent across the lifetime of the project.

Counter-Arguments

Two counter-arguments deserve direct response.

The first is that LLMs trained on massive corpora will simply get better at the languages with the most training data, and substrate fitness will therefore matter less than data abundance. This argument has surface plausibility but confuses a transient advantage with a structural one. Volume of training data improves the model’s ability to write idiomatic code in a language. It does not improve the language’s structural properties. An agent fluent in Python still cannot get compile-time feedback on a Python program, because Python does not produce compile-time feedback. The data-abundance advantage is a fixed offset; the substrate-fitness advantage compounds across iterations. As context windows grow and as agents handle longer-running tasks, the compounding effect dominates the offset.

The second is that the curation thesis depends on community discipline that the Haskell community has not historically shown. This is the strongest objection, and I find it largely persuasive as a description of risk. The Haskell community has been good at language design and bad at the kind of ecosystem coordination that produces curated stacks. Stackage was a partial answer but operated at the package-version-coherence layer, not at the abstraction-choice layer. There is no Haskell equivalent of “the standard Go web service stack” or “the standard Rust async stack.” Whether the community can produce one is an empirical question I am uncertain about. The technical case for trying is strong. The execution risk is real. A revision of this essay in five years might have to acknowledge that the case was correct in principle and the community failed to act on it. That is a possible outcome. Naming it does not weaken the technical argument; it just means the argument is conditional on action that has not yet happened.

Closing

The throughline of the case is that the criteria for language fitness have changed, and Haskell happens to score unusually well on the new criteria for reasons that have little to do with what its designers were originally optimizing for. The features that gave Haskell its smaller-than-deserved industrial footprint — strong static types, purity by default, monadic effect tracking, principled abstractions, “make illegal states unrepresentable” — turn out to be precisely the features that make it an extraordinary substrate for agentic coding. The compiler is a mostly-trustworthy oracle that produces local, actionable feedback at every edit. The blast radius of a change is bounded by the type system’s claims about it. The cost of refactoring is lower than in any mainstream alternative.

Liquid Haskell extends this fit by adding a specification layer that catches logical errors at compile time without requiring proof engineering. Library curation closes the remaining gap — the ecosystem-coherence problem at the high-level abstraction layer — through community discipline rather than through technical change. The pieces required to assemble this stack largely already exist, distributed across the Haskell ecosystem and waiting for someone to compose them.

The strategic implication is concrete. The Haskell community has historically marketed itself on the basis of correctness and expressiveness — virtues that resonated with a small audience of unusually rigorous developers. The agentic coding revolution has created a different and possibly larger audience for whom Haskell’s traditional virtues are economic necessities rather than academic preferences. A language that lets agents iterate cheaply, produces high-quality feedback signal, bounds the blast radius of changes, and supports machine-checkable specifications produces better economic outcomes in agentic development. The community that recognizes this and acts on it — by investing in Liquid Haskell, by publishing curated stacks, by tuning documentation and training data for the agentic case — will find itself with an unusual amount of leverage in shaping the next decade of software development.

The window for taking advantage of the current position is finite. Other languages will adapt. Rust is investing heavily in verification tooling. Mainstream commercial languages are working on safety profiles and gradual type systems. The opportunity is not that Haskell will remain uniquely positioned forever; it is that Haskell is uniquely positioned now, with most of the technical pieces already built and a community that — if it can resolve its tendency toward expressive pluralism in favor of curated coherence — could deliver the most agentic-friendly substrate in mainstream programming before the alternatives catch up. Whether the community takes that opportunity is a separate question from whether the technical case for taking it is strong. The case is strong. The execution is the variable.