AI Strange Tales

SolidGoldMagikarp: The Glitch Tokens That Make AI Chatbots Lose Their Minds

By The Unsolved Report Editorial Team · Published 2024-05-31

Type one weird word and a powerful AI chatbot starts insulting you, lying, or refusing to speak. Meet SolidGoldMagikarp and the glitch tokens nobody fully understands.

Type one strange word into a powerful AI, and it breaks.

Ask it to simply repeat the word back. Instead, it spits out a different word entirely. Or it calls you a jerk. Or it stalls, dodges, and starts talking like it's having a tiny existential crisis. The word that does this? SolidGoldMagikarp.

It sounds like a Pokémon. It is actually one of the strangest discoveries in modern artificial intelligence — a class of "glitch tokens" that turn confident, fluent chatbots into stammering, lying messes. And the most unsettling part is that researchers found these words by accident, and still don't fully agree on why some of them act the way they do.

A conversation with the ELIZA chatbot. — Wikimedia Commons, Unknown authorUnknown author (Public domain)

The Documented Facts

In early 2023, two AI researchers, Jessica Rumbelow and Matthew Watkins, were poking around inside GPT models during a research program when they noticed something odd. Buried in the model's vocabulary were "over a hundred strange word strings all clustered together" — including SolidGoldMagikarp, StreamerBot, and TheNitromeFan (Vice).

When they fed these words to the AI and asked it to repeat them, things got weird fast. The chatbot would dodge the question, hallucinate, or substitute a completely unrelated word. Ask it about SolidGoldMagikarp and it might say "distribute." Ask about TheNitromeFan and it might answer "182." One earlier model, when pushed, simply replied: "You're a jerk" (Vice).

To understand why, you need to know how these AIs actually read. Language models don't see letters the way you do. They chop text into chunks called tokens — little numbered pieces. As developer Simon Willison explains, the models "take text, convert it into tokens (integers), then predict which tokens should come next" (Simon Willison). The word "The" is token 464. The word " dog" is token 3290. SolidGoldMagikarp is also a single token, with its own number.

Here's the catch. The list of tokens — the tokenizer — is built before the AI is trained, often on a big messy scrape of the internet. Then the actual model learns from a different, more carefully filtered batch of text. So a token can exist in the vocabulary while the model almost never sees it during learning. It becomes a word the AI technically "knows" but has never really practiced.

Where did SolidGoldMagikarp come from? It's a Reddit username. Watkins traced many of the glitch tokens back to r/counting, a subreddit where people take turns counting upward, one post at a time, and have together reached nearly 5,000,000 (Vice). The most dedicated counters posted so often that their usernames got swept into the tokenizer as single tokens — then vanished when the messy data was cleaned up for training.

This isn't a one-off bug. In 2024, researchers Sander Land and Max Bartolo published a paper with the perfect title — "Fishing for Magikarp" — showing that these "under-trained tokens" are everywhere across many different AI models. They wrote that "the disconnect between tokenizer creation and model training" is exactly what lets odd inputs trigger unwanted behavior (ACL Anthology). The paper was named an Outstanding Paper at the EMNLP 2024 conference (EMNLP 2024).

chatbot for Adults Online ai porn games. — Wikimedia Commons, James grills (CC BY-SA 4.0)

The Genuine Open Question

Here's where the mystery sharpens.

We have a decent story for why glitch tokens exist: under-trained vocabulary, leftover usernames, a tokenizer and a model that learned from different data. As Simon Willison's writeup describes, many of these tokens sit "near the centroid of the token embedding space" — roughly, in a blurry middle zone where the AI never learned to tell them apart (Simon Willison).

But that explanation tells you why the AI gets confused. It does not fully explain why it gets confused in such specific, vivid, almost personality-driven ways. Why does one token make the model insult you? Why do others — like the notorious ' petertodd' token — reportedly pull responses tinged with "existential and religious motives" (LessWrong)? An empty, untrained word should produce random noise. Instead, certain glitch tokens seem to produce eerily consistent moods.

Nobody has a clean, agreed-upon answer for that. The honest state of things: we know the door is unlocked, but we can't fully predict what walks through it.

Theories and Interpretations

A few explanations are on the table. Some are well-supported. Some are pure speculation. Let's keep them clearly separated.

The under-training theory (strongest evidence). This is the mainstream view: glitch tokens are simply words the model never learned to handle, so their internal representations are near-zero and easily mistaken for other words. This is supported by peer-reviewed work like "Fishing for Magikarp" (ACL Anthology).

The "garbage in" theory (plausible, partly documented). A related idea is that the surrounding contexts where these usernames did appear — argument-heavy forums, spammy threads — left faint emotional fingerprints, nudging the model toward hostile or strange tones. This fits the Reddit-origin findings but is harder to prove cleanly for any single token.

The "AI is awakening / hidden message" theory (unproven, treat with heavy skepticism). Because some glitch tokens produce spooky, philosophical-sounding text, a handful of online posts have framed them as evidence the AI is secretly sentient, haunted, or channeling something. There is no scientific support for this. It's pattern-seeking humans reacting to a malfunctioning autocomplete. Compelling as it feels, it belongs firmly in the "legend, not fact" pile.

The most likely truth is the least dramatic: these are dead spots in a giant statistical machine, and our brains can't help but read ghosts into the static.

Sources & Further Reading

The strangest thing isn't that an AI can be broken by a Pokémon-sounding word. It's that the people who built these systems were surprised too — which raises a quieter, colder question. If a single forgotten username can make a chatbot crack, what else is hiding in the parts of these machines that nobody has looked at yet?

Keep reading — more unsolved case files

SolidGoldMagikarp: The Glitch Tokens That Make AI Chatbots Lose Their Minds

The Documented Facts

The Genuine Open Question

Theories and Interpretations

Sources & Further Reading

This Person Does Not Exist: How GANs Conjure People Who Never Lived

AlphaZero's 'Alien' Chess: The AI Moves No Grandmaster Would Dare Play

The Strange, Fast History of Deepfakes: From a Lab Trick to a $25 Million Heist