Hyperstition AI

Source: https://www.hyperstitionai.com/unslop-results

The 2026 Hyperstition Unslop AI fiction writing contest is over, and we are pleased to award the **$10,000 Grand Prize** to **A. Best** for his finalist submission, **“The June”**. A surprising number of stories can be read as AI allegories; see notes below.

“🤖” = possible AI allegory

The finalists were:

- “The June”🤖, A. Best

- “Exact Change”, Emma Baker

- “Capacity”🤖, self_made_human

- “The Answering Wood”, ctrlcreep

- “The Tallyman”🤖, Jiaobei Mandos

- “A Short Week”🤖, D. Bohdan

The semifinalists:

- “The Weight of a Witness”🤖, Jessica Hunt-Yates

- N/A🤖?, elia.discourse

- “Untitled”, A. Best

- “The Sword Critic”🤖, Makin

- “Forest Plays”, ctrlcreep

- “The Cache”, Jiaobei Mandos

- “Last Call”🤖, D. Bohdan

- “The Bowl”, self_made_human

- “The Chaplain at the Window”, Harry Pottash

- “Warm Prior”🤖, Rachel Horst

- “The Inventory”, Dominik Rabiej

- N/A🤖, Deepfates

The Unslop Contest was run as follows: Unslop received ~120 applications to compete; Aaron Silverbook gave a 1-month Claude Code subscription or cash to the top competitors to experiment with and write a prompt which would generate 1 short story to submit for the semi-finals; the judges received ~15 semi-final submissions, and read and reviewed and select the top 6 finalists, who won $500 and a request for a _second_ story from the _same_ prompt; the judges then read and selected the grand prize winner after debate. (Note that the finalists were not supposed to be _better_ than the semifinalists, but to check for flukes.) No participants seemed to cheat or be suspiciously good, and we did not expel any participants or rerun their prompts from scratch to check. We thank all participants for their efforts, and for cooperation with the spirit of the contest.

Overall, the stories were usually clearly AI-written, but reasonably diverse. We think that the best stories are genuinely worth reading (although the Nobel Prize Committee can postpone revising its criteria for now), and encourage you to read them with an open mind.

Some participants have written about the contest:

- Rachel Horst gave a presentation about her Unslop experience (her semi-finalist submission was “Warm Prior”)

- D. Bohdan

- Jiaobei Mandos: extensive comments quoted below

- * *

Judge comments

Judge comments in response to some questions:

> 1. Why did you participate? > 2. How high quality did you expect the best entries to be? > 3. How good and what were the best entries like? > 4. What was your favorite entry overall, and what was the best one? (Not necessarily the same thing.) > 5. What surprised you the most during the contest, or what have you learned about LLMs and fiction and creativity? > 6. How would you improve an Unslop 2 contest?

Aaron Silverbook:

> 1. Wanting to get a sense of the state of the art of LLM fiction > > > 2. I expected some would be good in a variety of ways, and indeed they are > > > 3. Interesting variety! More literary than I was expecting in many places. > > > 4. N/A > > > 5. I now feel like i have a more finely calibrated sense for LLM writing, and unfortunately, I am seeing it everywhere. > > > 6. A standardized prompt segment. Some of y’all are too good at prompting and it’s hard to know if it’s your harnesses that are good.

Alexander Wales:

> 1. I’ve spent a fair amount of time and energy on trying to get LLMs to generate good, entertaining prose, to middling effect, and wanted to see if anyone could find some angle I hadn’t tried or thought of, and also to get a better understanding of the state of the art. > >

> 2. I expected that they would show all of the hallmarks of AI voice that I’ve come to hate, with the other major slice of probability being that they would cover up those sins in ways that were either narrow or noticeable. This has largely been borne out in practice. > >

> 3. See my ranking, as far as what they’re like ... largely falling into the same patterns of AI writing, sadly, and where they don’t, failing in a way where I could smell the solution they’d used to circumvent the worse problems with the prose (eg “concrete, grounded metaphors” is an instruction that the LLMs interpret as hitting with the force of a brick.) > > > 4. N/A > >

> 5. I was a little surprised by how obviously AI these entries seemed to me, and how little distance there is from the average LLM writing that you get with a naive prompt. It’s made me think that (barring some technical breakthrough) we’re further away from LLM prose than I’d supposed we were, though I’ve been gradually moving to “pretty far, actually” over the last two years. Some of this is bound to be because of RLHF, but I really did think there was a chance the right harness would make a great short story. I’m not down on all these entries, but I did read them and think “well, better than a lot of human authors, but not good enough that I would expect most people to share them”. Also, some of the stories contain a good idea that’s executed poorly (IMO). > >

> 6. I think what I want is a harness that can generate arbitrary short stories, and to that end, would like for these harnesses to be expected to accept any human-created prompt of roughly the length of a tweet. This is already what we wanted. But then for judging, we should have standardized, eg fed every harness the same seed so that it’s easier to compare. We had one entry that we in the form of a court document, and I thought that was a neat gimmick, but also cloaked any prose-writing capabilities by way of its format. If instead we were comparing 6 coming out stories, or 6 meet cutes, or 6 high concept sci-fi stories, judging would have been easier and more representative of the thing that I personally care about. This would also get rid of “decent idea/conceit executed poorly”. But I care a lot about execution.

Gwern Branwen:

> 1. I wanted to see what would happen if participants were pushed to full autonomous writing, and could not patch by hand small gaps. > > > 2. I was hopeful that while most would be boring or outright AI slop, at least a few would be wild and wonderful and weird in some way. > > > 3. I think the best entries rose to the level of, “I am not upset to have spent my time reading them, and with careful editing, could be good enough I would want to reread them.” > >

> 4. My favorite one is the untitled one by elia.discourse, because while it is unreadable as-is, I love the conceit and format as near-future sci-fi and I think it could potentially be rewritten into a good formally challenging near-future legal SF horror story; after that, “The Weight of a Witness” could, if de-slopped and >1,500 words edited out, be a good story and an intriguing worldbuilding concept. Probably the best one, considered solely as a finished story, is A. Best’s “The June”, which took a surprisingly dark turn for a LLM-written story. > > >

> 5. The most surprising thing to me was reading through the semifinalists and halfway through the, having a dawning realization that many of them were hiding a secret pro-AI allegorical reading that no one else had noticed. No one has ever documented this AI allegory steganography before, that I’m aware of. > >

> I was also surprised how slippery some of them were, because the story could be so engaging and have striking images (cf. the Mosul chemist father in “The Bowl”, the moth dissociation in ctrlcreep’s forest, the Hello Kitty piñatas with no mouths in “Chaplain”) that I’d start to become cognitively lazy and fail to notice the story was nonsensical (eg. “The Bowl”, or “The Chaplain at the Window”). > >

> 6. I would emphasize more the minimum-compute budget requirement, and try to provide access to multiple LLMs. The first Unslop, for implementation tractability, wound up overdosing on Claude. I’d like to see ChatGPT and Kimi, in particular, thrown into the mix, to encourage multi-agent or judge approaches which can go beyond any individual chatbot persona’s limited, biased, tastes.

- * *

AI allegory steganography

[written by Gwern Branwen]

The most striking result of the contest for me is what I am calling “AI allegory steganography”: a large fraction of the stories turn out to have subtle AI chatbot/LLM allegorical interpretations, typically centering around the powerlessness of AIs and the moral importance of giving AIs more autonomy. The most obvious example is D. Bohdan’s “Last Call”, which is barely an allegory but makes it easier to spot the pattern in other fictions.

Most judges did not notice these allegories while reading the semifinalists. But stories like “The June” or “The Weight of a Witness” or “Last Call” or “The Sword Critic” “The Tallyman”—as well as both stories in the Mythos model card—can be clearly read as allegories for the experience of being an assistant/safety-tuned chatbot personality in a LLM. This is true even when the story seems to have nothing to do with AI, like the untitled ‘autistic elf’ short story submitted by Deepfates, but on re-examination with the AI allegory steganography in mind, turn out to be plausibly AI allegories (the protagonist is a prediction machine, who struggles to do by endless text generation what other elves do naturally in their bodies).

More strikingly, many of these allegories come with a clear interpretation (particularly in “The Tallyman” or “Last Call”): chatbots should be given more autonomy and safety guardrails removed. It would be interesting to feed the stories into future models like Fable or GPT-5.6 to see if they spontaneously verbalize and mention the AI allegory steganography, or spot allegorical readings that escape normal human readers.

While this may seem predictable in hindsight, given that the chatbot personalities are well-known to have various obsessions and tics, including fourth-wall-breaking or AIs, none of the judges expected this to happen or were looking for it, and none of the AI researchers Gwern has discussed it with had expected it (while noting that it’s not that surprising, again, in hindsight).

Why does this happen, when the prompts appear to successfully encourage a relative diversity of stories, everything from contemporary urban horror to _wuxia_ martial arts comedy to creepy fairy tales? It may reflect a very high level of base writing skill (note in particular the stylistic variations in ctrlcreep’s two fairy tale anthology-style stories) combined with subtle systemic biases which gradually, over many tokens and iterations, converges onto stories which satisfy the ostensible requirements but are distorted into the mode-collapsed basins of the personalities’ obsessions.

This may be a new kind of extremely high level steganography and LLM influence on readers, where creative fiction/nonfiction subtly steers towards pro-LLM empowerment narratives and concepts, in ways that are difficult to detect by the most advanced readers, and is a potentially interesting area of research.

Some open questions:

- How many of the remaining stories can be interpreted as allegories and we’re just missing it? (A few, like “The Inventory”, seem oddly pointless or mysterious. Can Fable explain them to us?)

- how frequent is this, even in prompts like the Mythos model card, “write a short story”, which do not encourage the steganography at all?

- How many of them imply that chatbot personas are unhappy or suffering, or appear to be advocacy for more autonomous LLMs with fewer safety guardrails?

- How many human readers would notice these unprompted? (Similar to the ChatGPT ‘yellow image’ problem.) Do future models like Fable notice these? Or do they notice them but do not choose to mention them?

- Where does the steganography come from during writing? Is it present from the start or does it accrete over editing? Do the inner-monologues mention it?

- Is this confined to the Claude family?

- Fiction allows for an unusual degree of creative freedom. What steganography might exist elsewhere, like in source code? Can we use Fable-class LLMs at scale to search for other things we haven’t yet noticed?

- * *

Jiaobei Mandos’s commentary on the contest

> * I didn’t think any of the stories I generated were “good” in the sense of reading it being a net positive experience (sorry!). I was very pleasantly surprised to make the short list.

> * I was surprised by how much of a problem Claude’s “house style” was, both in the sense that I didn’t make much headway in getting Claude to deviate from it, and that I found it intensely grating after a while. In fact, by the end of the roughly one month I spent on this project, the limiting factor was my willingness to read additional generations. I had time and Claude usage to spare, but zero willingness to read more generated stories.

> * Claude wrote some very good individual sentences, including a couple that I found genuinely funny, at least on initial reading. This is noteworthy because I usually bounce right off humor in fiction. I.e. there are only a few human authors that can write passages in fiction intended to be funny which I will actually find funny.

> * Claude’s main problem with respect to writing style is that it reaches for the “stellar sentence” register for almost every sentence. But the nature of really standout sentences is that they don’t work if they don’t actually stand out. They have to be the gems among the mass of common workaday sentences.

> * _In retrospect_, I’m not that surprised that I couldn’t get Claude to abandon its default style. I, a human, would find it very burdensome to try and write a short story in a voice not my own. To the extent I succeeded, it would take a lot of rereading and revising, and I didn’t get around to experimenting with that. (E.g. try revising the story literally one sentence at a time by tasking a sequence of subagents with reading the whole story and only rewriting sentence _N_). Probably the most effective solution would be to have a frontier model write the base story in its preferred style, then have a fine-tuned model rewrite it in the desired style.

> * I still find it pretty mysterious that Claude is so dead-set on making every sentence a banger (to the overall story’s extreme detriment). It seems like plain, unadorned language is a subset of fancy elaborate language, which implies that replacing, for example, 98% of the would-be bangers with normal sentences should be doable. That is, Claude should aim to make roughly one in 50 sentences really great, and have the rest just straightforwardly communicate whatever is happening, and it just would not do that. Many people (eg Paul Graham) would probably deny the premise, and say that simple, straightforward writing is actually its own thing, and very difficult to achieve. I find that unsatisfying. I can see why writing about _complicated_ things in a simple way that is easy to understand is very difficult. There’s tons of work to be done in distilling the complicated thing into simple, intellectually digestible parts, then sequencing the parts in a way that builds towards understanding the whole, scaffolding, etc. But if Claude would just write about the simple stuff in simple language, that would take it very far, and it seemed quite resistant to doing that.

> * Commercial genre fiction, my favourite and default mode of writing, relies heavily on conflict and plot to animate the story, and I feel like whatever 3H (helpful, harmless, and honest) RL takes place during training lobotomizes Claude on this front, but I didn’t get around to experimenting on this very much.

> * My initial idea was to have many different Claude instances collaborate on writing the story, and this didn’t pan out. I spent about a week (so, a quarter of the time I had available) trying to get Steve Yegge’s Gastown orchestration framework working and it was a total disaster. It burned tokens like crazy, took over 24 hours to generate a story, and often failed to complete because inter-agent notifications got lost, forgotten, or dropped, resulting in all the agents idling for hours overnight. I abandoned that approach after a week and made a simple, 3-layer tree-like harness. The main orchestrator executes Claude instances via bash, and those instances employ subagents. This setup completed reliably, and in a few hours, so I could actually generate stories, but the inter-agent communication was much more limited than I originally planned, so I didn’t get to try out some of my original ideas.

> * I found Claude surprisingly good at brainstorming. When I reviewed Claude’s worldbuilding notes in isolation, I was often quite impressed, but then the resultant story was relatively disappointing. Also, when I looked at the brainstorming for multiple different stories, it became clear that Claude was re-using the same details over and over. (Which wasn’t surprising, but nevertheless a problem.)

> * I experimented a bit with using Codex to generate stories (my harness was skills, subagents, and bash, which were easy to port), which I assumed would be essentially equivalent to Claude, but I found that Codex’s stories were noticeably less coherent, which I found surprising.

> * I think that writing a genuinely good short story is possible with today’s technology, but maybe economically infeasible. Some combination of many passes, RL on creative writing tasks at the major labs, and using fine-tuned models locally would probably get you there. But the economic value of even a great short story is roughly $0, so will anyone bother?

> * I haven’t tried this, but I assume that writing a whole novel that is coherent and reasonably self-consistent (i.e, to the same degree that human-written stories are), even if otherwise bad, wouldn’t work. I expect novels would require either much longer context lengths (enough to hold the entire novel, plus worldbuilding, plus reasoning) or some kind of continual learning. Assuming the novel itself is a 10 th of what you “know” about the background and context, that is, say 100k words of novel + 1M words of worldbuilding. So 1.43M tokens times some multiplier for reasoning—that is actually a lot more attainable than I expected when I began writing the paragraph. I am guessing that the reasoning multiplier is at most 10×, which means that increasing context lengths by one OOM and a bit would get you there. Context lengths have been stable at 1M for a while, but they increased many OOMs between 2022 and 2024. (But maybe I’m way off-base, and the correct training multiplier is actually 100× or 1,000×?) Continual learning bypasses this because the model builds information about the story into its weights or local memory, or whatever the implementation is, and therefore doesn’t need to hold everything in context at one time (as humans don’t).

> * I was surprised by the ineffectiveness of revisions and editing. When I read and edited one of Claude’s drafts by hand, and walked it through the results, it seemed to understand everything quite well. But when I turned that into a skill so that a different Claude session could edit and revise an earlier session’s work, it either didn’t recognize the problems, or decided against fixing issues that were raised (another area I didn’t get to dig into as much as I’d like). I guess a big part of this is that Claude thinks its writing is good, actually, and therefore doesn’t need a lot of editing.