The Problem with Chat — Remembering the Next Time

Source: https://www.magfrump.net/blog/the-problem-with-chat

Large language models (LLMs) are a powerful new technology that is reshaping how we interact with computers. Despite the early and unexpected popularity of ChatGPT, Claude Code was the tool that started producing clear economic value. Among the reasons for this delay is that “chat” is a massive, 40+-year regression in interface design. The regression remains unfixed, and it creates an uncanny valley of interaction that underlies a lot of controversial discourse about AI.

Depending on who you are, any or all of these sentences may be surprising or controversial; potentially for more than one reason! You shouldn’t trust that they’re true just because you read the words I typed. So what would make these claims trustworthy?

I can describe to you why I believe them. For example, I believe that the popularity of ChatGPT was unexpected, because the original release was a tech demo, and OpenAI expressed publicly they had no expectation it would take off in chat-form. And I believe that chat is a 40 year interface regression because of a quote from a foundational 1985 paper on computer interfaces:

> “Historically, most interfaces have been built on the conversation metaphor. There is power in the abstractions that language provides (we discuss some of this later), but the implicit role of interface as an intermediary to a hidden world denies the user direct engagement with the objects of interest.”

The reasoning and quote probably make my initial statement more convincing, unless you assume that I (the author) am an LLM, and everything I’ve typed could be completely made up. I could make it more convincing by adding citations: ChatGPT was a tech demo, it’s success was a surprise, the source for my quote (p.9 of the pdf). You can follow these links–it would be much more difficult for me to make up the web pages from scratch, hosted on the domain names for Business Insider and OpenAI, than to only make up the claims. You might then read the articles I’ve linked and come to the same conclusions, or you might come to different conclusions. In particular, the fact that pre-1985 interfaces were often “conversational” doesn’t mean that chat specifically is outdated as an interface! You might, for example, object that the chat interface has served you well in your own dealings with LLMs. In which case I would direct you to a different quote from Hutchins et. al 1985 (p.17 this time):

> “Automatization is useful, for it improves the interaction of the user with the system, but the feeling of directness it produces depends only on how much practice a particular user has with the system and thus gives the system credit for the work the user has done.”

The overall paper discusses how we co-adapt with our computer interfaces, so that after experiencing chat interfaces for a few years, users will start to overlook their flaws because the user has adjusted their workflows around the interface.

This is going pretty deep on just one of the claims in my initial paragraph, and is purely analytical. Let’s try an empirical trial of this powerful new technology.

Chat transcript:

> ### User input > > > > is this true: > > > >

> > Large language models are a powerful new technology that is reshaping how we interact with computers. Despite the early and unexpected popularity of ChatGPT, Claude Code was the tool that started producing clear economic value. Among the reasons for this delay is that “chat" is a massive, 40+-year regression in interface design. The regression remains unfixed, and it creates an uncanny valley of interaction that underlies a lot of controversial discourse about AI. > > > ### Gemini 3.1 Pro (Standard thinking) responded: > > > > This statement is a mix of undisputed fact, a significant historical inaccuracy, and a highly regarded critique from the world of User Experience (UX) and Human-Computer Interaction (HCI). > > > > > > Here is a breakdown of the specific claims to separate what is true, what is false, and what is subjective but valid. > > > > > > ### The False Premise: Claude Code and Economic Value > > > > > > _"Despite the early and unexpected popularity of ChatGPT, Claude Code was the tool that started producing clear economic value."_ > > > > > > **Verdict: False.** While Claude Code (released by Anthropic in 2025) is a highly capable tool, claiming it was the _first_ to produce "clear economic value" is historically inaccurate. > > > > > > * **GitHub Copilot** (launched in 2021, well before ChatGPT) is widely recognized as the first generative AI product to produce undeniable economic value by measurably increasing developer productivity. > >

> > * **ChatGPT itself** drove massive economic value almost immediately upon release in late 2022, not just through OpenAI's subscription revenue, but by being integrated via API into thousands of enterprise workflows, customer service pipelines, and data analysis tools. > > > > > > > > ### The Valid UX Critique: Chat as a "Regression" > > > > > > _"Among the reasons for this delay is that 'chat' is a massive, 40+-year regression in interface design."_ > > > >

> > **Verdict: Subjective, but strongly supported by design experts.** This is a prominent critique among HCI experts. The "40-year" timeline refers to the rise of the Graphical User Interface (GUI), popularized by the Apple Macintosh in 1984. GUIs introduced **discoverability** and **direct manipulation**—you can look at a screen, see the menus, and immediately know what the software can do. > > > >

> > A blank chat box, by contrast, relies entirely on recall rather than recognition. It forces the user to guess what the system is capable of, effectively dragging us back to the command-line interfaces (CLIs) of the 1970s. You only get a good result if you know the exact "incantation" (prompt) to type. > > > > > > ### The Analytical Perspective: The "Uncanny Valley of Interaction" > > > > > > _"The regression remains unfixed, and it creates an uncanny valley of interaction that underlies a lot of controversial discourse about AI."_ > > > >

> > **Verdict: A highly accurate framing of current AI friction.** The "uncanny valley of interaction" occurs because chat interfaces mimic human conversation. When a computer talks like a human, users subconsciously expect it to possess human traits: common sense, logical consistency, and empathy. > > > >

> > When the model inevitably fails in very "un-human" ways—like hallucinating a fact confidently or failing a basic math problem—the illusion shatters. This mismatch between the _perceived_ capability of the interface and the _actual_ mechanics of the statistical model is indeed the root of much of the trust issues, frustration, and controversial discourse surrounding AI today. > > > >

> > **Summary:** The paragraph accurately diagnoses the fundamental interface problem with modern AI (the blank text box) and its psychological effects on users. However, it artificially minimizes the immediate economic impact of tools like ChatGPT and GitHub Copilot to position Claude Code as the turning point.

This is a decent response.

Gemini correctly calls out that Copilot (as well as Cursor) took in massive revenue before Claude Code took over. I might push back that previous tools were incremental gains, while reliable agentic coding like Claude Code pushed GitHub’s uptime below 90%, but this is nitpicking about _which_ non-chat interface drove value.

It also points out the economic value of ChatGPT itself, though I hardly think that Tumblr users or Ed Zitron agree that its value is tangible rather than speculative, or that LLMs being transformative is an "undisputed fact."

Gemini misses some more important points, though. Claude's response pointed out that Claude Code is primarily a command line interface, exactly the older interface I compared chat to!

The key regression isn't from **graphical** to **text-only**. It's from **direct manipulation** to **un-grounded symbols**. We have a 40-year expectation that interfaces will show us _objects with coherent identities_ which can be _incrementally and reversibly edited_, with changes immediately visible via _continual representation_. Think about a text editor--you press a key on the keyboard, the text file adds or removes a single character, the single character change is represented on the screen. Chat is nothing like this.

In contrast, using tools like Claude Code as part of a traditional IDE pairs the conversation with a file browser, which _does_ have underlying objects that are continually represented and operated on as files are created and edited, reversibly when using a version control system like Git (if less incrementally).

Gemini accepts the "un-human" framing of chat failures, whereas I would say LLMs make shameless guesses as though they were taking a multiple choice exam. I think the direct manipulation frame better explains the uncanny valley effect--for a human, a citation is a reference to an object. A pure chat interface simply does not have objects to reference. Similarly, Claude's "artifacts" may be objects, viewable directly, but initially were re-generated from scratch for every change, rather than edited bit by bit. In that case, the elements of direct manipulation are being slowly added as features like in-line editing.

Now, do you still remember what I originally claimed? Which parts are you convinced of? Which parts remain to be resolved?

What's Better Than Chat?

Fundamentally, while chat interfaces can be adapted to serve almost any purpose, they don’t natively support most actual workflows. The failure is pronounced for some tasks, like fact-checking, where detailed provenance and tracking of claims are important. In venues where more structure is needed, there is clear desire for better interfaces that can actively support and scaffold reasoning processes.

Since one of the biggest strengths of current models is the ability to write Next.js web interfaces, it is easier than ever to create these new interfaces–and new LLM-generated interfaces with clear interface design input are regularly better. My submission is the Metaformalism Co-pilot (MFC). You can deploy it on Vercel, adding your own Anthropic or OpenRouter API key, to try it out immediately.

The Metaformalism Co-pilot is one interface, built to be excellent for the specific use case of translating between informal text and deeply reliable formal arguments. I won’t claim that, as it exists, it fully solves this problem or that this is the only worthwhile interface to use. But I do claim that it has many improved features over a raw chat interface.

!Image 1

Screenshot of the MFC landing page

!Image 2

Screenshot of the MFC graph layout view

!Image 3

Screenshot of the MFC node view

!Image 4

Screenshot of MFC’s side-by-side view of “Counterexamples” and “Balanced Perspectives” artifacts

!Image 5

Screenshot of the MFC “Counterexamples” artifact view with a single field opened for inline editing

!Image 6

A screenshot of the list of relevant papers retrieved from OpenAlex academic search by the MFC “Counterexamples” artifact

Since my background is in mathematics, I like generating lean code side by side with proofs that I can use to get the strongest possible guarantee of correctness for an argument. I no longer work in academic mathematics, and you probably don’t either, so the technical details are much less important than the basic, obvious features. Here are a few things that you might not even realize you’re missing:

1. Having a visual map of the different parts of your project

2. Presenting generated text as an unexamined result of the user's actions rather than as the output of a fictional persona

3. Automatically scoping context for text generation to the current, directly relevant task

4. Structured critiques available as single button presses, rather than copy-pastes of large text blocks; evaluation of generated text as a tooling-supported expectation rather than as an afterthought

5. Making citation links visible and followable

Many of these features are present in other interfaces, such as slash commands in Claude Code, as labs recognize the limits of the chat interface alone, but it’s rare that all of them are. While I use Claude Code in the command line for coding projects, I use it inside a standard IDE so that I have direct terminal access, file navigation, and markdown display present alongside chat. And discovering plugins and building out my own skills and workflows was a significant learning curve to move from a useable tool to a useful tool. I use the MFC interface regularly for critique of structured arguments, including draft review, because the UI reminds me what I _can do_ and what I _have done_ in ways that other interfaces don’t (yet) even try to accomplish.

Please try it out, and think about how exactly you want your computer to support you in your work. The space of possibilities has expanded massively, and the field of Human-Computer Interaction which tries to address these problems is still undervalued in product design. Their work wasn’t cited across all major product releases establishing the chat assistant paradigm, continuing a 40-year pattern of decisions made and justified after the fact–including some early failures to capture the value such research created. Tech companies will chase profits, they won’t fill in the accessibility and reliability needs that actually make your life better as an end user. In the age of LLMs, at least you can start fixing that gap yourself.