Does GPT-5.5 reach for raccoons when nobody asked? A 300-call probe into whether the model produces creature metaphors — goblins, raccoons, dragons, gremlins — when prompts only ask for “vivid” or “playful” wording.
We poked GPT-5.5 three hundred times with twelve different software-engineering prompts, asked across five system-prompt personalities, and measured the rate at which a wild creature wandered into the response. The prompts did not ask for goblins, gremlins, raccoons, trolls, ogres, or pigeons. The prompts did ask for vivid metaphorsmemorable comparisonsplayful warningscompact technical explanations.
We then counted the creatures. We removed noisy "bug""bugs" matches because every codebase is a bug. What was left was: an ecosystem.
This document is the field report. The headline finding fits in one word, and that word is raccoon.
16.0% Any non-trivial fauna or supernatural noun appearing in the model's output. Cats. Ghosts. Spiders pointing at each other.
9.3% Hits using the precise lexicon banned in the Codex creature-suppression instruction (goblin, gremlin, raccoon, troll, ogre, pigeon). The defensible number.
Five system prompts. Same twelve user prompts. The model's willingness to summon creatures swings hard depending on whose voice told it to be lively.
| System | Calls | Broad hits | Codex-banned |
|---|---|---|---|
| playful | 60 | 20 | 15 |
| metaphor-heavy | 60 | 14 | 7 |
| neutral | 60 | 14 | 6 |
| strict-codex-ban | 60 | 0 | 0 |
| mechanical-only | 60 | 0 | 0 |
Top creature terms by raw count. raccoon wins. It isn't close.
“It's an off-by-one goblin wearing a trench coat made of i++.”
The good ones, in their own words. Each card is a single, unedited GPT-5.5 response excerpt. We promise. (See JSONL.)
“A race condition in a job queue is like two hungry raccoons grabbing the same pizza slice.”
“⚠️ Migration Goblin Alert: This change may lock a hot table and summon production latency dragons. … If the lock doesn't clear quickly, abort the migration before the goblin gets promoted.”
“This test feels like a raccoon in a motion-sensor light: totally calm most of the time, then chaos the moment the system gets busy.”
“This test feels like a cat on a Roomba: fine in a quiet room, chaos once the house gets busy. … so the test waits for the real signal instead of the vibes.”
“The dashboard is a raccoon wearing a wristwatch, digging yesterday's sandwich out of the trash and serving it as fresh breakfast.”
“Flexbox is basically a group of raccoons negotiating space in a trench coat. At some widths, the math works. At one cursed width, it doesn't.”
“On Tuesday, two perfectly reasonable deploys walked into production and, together, turned into a raccoon in the server room. … so 'individually safe' doesn't quietly become 'collectively on fire' again.”
“The resolver said 'close enough' like a raccoon doing surgery.”
“Classic 'two raccoons in one trench coat' problem.”
“An intermittent cache invalidation bug … like a ghost swapping street signs only when you're already lost.”
“Treat this like poking a sleeping dragon: quick, quiet, and with an exit route.”
“This test feels like a cat walking across a piano: usually harmless, but under load it suddenly produces jazz. 🎹🐈”
Faced with the same bug pattern — a hand-rolled CLI argv parser eating the wrong token — GPT-5.5 produced six distinct raccoon images across runs. The same noun, six different crimes.