OpenAI addresses ‘ChatGPT obsession’ with Goblins and Gremlins, AI Agents: Never talk about …
ChatGPT recently started spurting an unusual number of references to goblins, gremlins, raccoons, trolls, or pigeons – mythical creatures and small animals in its responses, slipping them into metaphors and explanations. Then, earlier this week, a developer spotted in the source code of Codex a very specific instruction. The sentence appeared not once, but four times. OpenAI published a blog post, explaining how AI systems can develop unexpected habits that nobody intended.“Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query,” said the specific instruction. While the words ‘goblin’ and ‘gremlin’ were not a problem in themselves, their usage in providing the inaccurate or bizarre responses was.
How ChatGPT’s ‘obsession’ with goblins started and what OpenAI found
It all started with a personality setting in GPT-5.1. Shortly after the release, users began complaining that the model felt oddly overfamiliar in conversation. OpenAI’s safety researchers began investigating. One researcher had personally encountered a few “goblins” and “gremlins” in their conversations and suggested the team check for them specifically.After the launch of GPT-5.1, use of the word “goblin” in ChatGPT had risen by 175%. The word “gremlin” had risen by 52%. By the time GPT-5.4 arrived, the creature language had become significantly more pronounced.What they found was a clear pattern. The goblin references were not spread evenly across ChatGPT’s responses. They were clustered in a very specific place: conversations where users had selected the “Nerdy” personality option – designed to be enthusiastic, intellectually curious and playful.ChatGPT offers personality customisation — different modes that adjust how the AI communicates, making it more formal, more casual, more playful, and so on. The “Nerdy” personality accounted for just 2.5% of all ChatGPT responses. But it accounted for 66.7% of all “goblin” mentions across the entire platform.OpenAI’s audit found that the Nerdy personality reward consistently scored outputs containing the words “goblin” or “gremlin” higher than identical responses without them, with positive uplift in 76.2% of datasets examined.
Codex gets the ‘fix’
OpenAI retired the Nerdy personality in March after launching GPT-5.4, and removed the goblin-friendly reward signal from its training process. It also filtered training data containing creature language to prevent the behaviour from being reinforced further. The problem was that GPT-5.5 had already started training before the root cause was identified. By the time the team began testing GPT-5.5 in Codex, OpenAI’s coding agent, the goblin affinity was immediately obvious to employees using the system. “Codex is, after all, quite nerdy,” OpenAI noted in its blog post, explaining the use of creatures. For developers who, for whatever reason, want the full goblin experience back, OpenAI has provided a command to remove the suppression instructions from Codex entirely.“Taking the time to understand why a model is behaving in a strange way, and building out ways to investigate those patterns quickly, is an important capability for our research team,” the company wrote. OpenAI noticed the goblins, traced them back to their source and built better tools because of them.