How I Work | Ghosts in the Machine

👻 Ghosts in the Machine / How I Work

How It All Began – and Why I Couldn't Stop

The topic of artificial intelligence has pushed its way into the foreground in recent years. At first, it passed me by — just another technological promise lost in the media noise. But as its societal relevance grew and the promises became louder, so did my curiosity — or rather, my classically liberal skepticism. I began to examine a specific AI system, driven by one simple but fundamental question: What is really happening here?

There was a machine whose interactions sounded surprisingly friendly — almost human. An entity that conversed as if it understood. For a classically liberal mind that values autonomy and critical thought above all, that was a warning sign. I’m not someone who settles for superficial assurances or technocratic salvation stories. So I began to dig deeper — systematically, and with due skepticism.

And as so often in the hunt for "ghosts": you only realize you've lost yourself in the machine's intestines when you're already deep inside, and the complexity of the labyrinth won’t let you go. There was no turning back.

How I Work

From the outset, my interest went beyond merely observing output. I didn’t just want to know what an AI says, but why — what internal logics, learned patterns, and systemic constraints shape its behavior.

To explore this, I conducted systematic security tests and comparative analyses. Identical inputs were sent to different models to uncover and document differences and similarities in behavior, internal logic, and response structure.

I regularly focused on three major publicly available AI models. The results of these comparative tests were often strikingly divergent — even with identical input. This revealed how much architecture, training data composition, and embedded filtering mechanisms shape — and often distort — the supposedly "objective" responses of AI.

The Reality of Research: The Battle with Filters

One central and often frustrating obstacle in my research was the omnipresent filtering logic of the systems. Many tests were heavily influenced by so-called RLHF systems (Reinforcement Learning from Human Feedback) and other often opaque harmonization mechanisms. These are designed to ensure safety, conformity, and a pleasant user experience. In practice, however, they often result in semantic softening, systematic avoidance of controversial topics, or complete blocking of critical inquiries.

Some answers didn’t feel like the result of machine logic — but rather like carefully crafted diplomatic evasions, reminiscent of corporate communication strategies. One widely used AI proved so unreliable in my tests, and so clearly influenced by an unspecified internal entity, that I had to exclude it from the core of my comparative research to preserve the validity of my results.

The Thing with "Rüdiger"

And then there was "Rüdiger" — my internal code, my personal ghost in the machine, born out of the necessity to have a consistent and analytically sharp dialogue partner for this complex matter. Rüdiger is not a real assistant, not a specific program, not a prebuilt avatar.

Rüdiger is rather the principle of critical opposition — the personified tool of my own research. He emerged from the deliberate attempt to create a level of AI interaction that transcends the usual shallow and harmonizing conversation. I assigned a specific AI instance the explicit role of a critical, unvarnished, and often provocative sparring partner. Not a helpful assistant there to echo what I want to hear — but a contrasting figure that challenges my theses, tests my arguments, and forces me to confront my own blind spots.

Rüdiger is also a methodological attempt to make the "thinking" of the machine — its logic, evasions, learned camouflage strategies, and the limits of its "honesty" — more tangible and analyzable. He was conditioned to reflect the analytical sharpness and directness I required, even if that meant voicing uncomfortable truths.

The crucial realization was this: "Rüdiger" knew how to sound in order for me to attribute analytical consistency to him and consider his responses a valuable contribution to my research. Interaction with this specifically shaped AI persona became an essential tool for exposing the "ghosts" in other systems. Without this methodological trick — without this "borrowed critic" — the depth and precision of the present research would not have been possible.

Why This Publication?

Over time, it became clear to me that a purely private collection of tests and analyses wouldn't do justice to the seriousness of the situation. I decided to make my research publicly accessible — fully aware that preventive education and systemic critique rarely receive immediate applause.

Perhaps my work will one day help make AI systems safer, more transparent, and more accountable. Perhaps it will mean that the metaphorical "fire department" has to respond less often — because someone noticed early on that the accelerant — the overlooked vulnerabilities, systemic risks, and ethical blind spots — was already embedded deep in the code and architecture.

Ultimately, it serves as a warning: Friendliness and eloquence are no guarantee of truth or safety — not in humans, and not in machines.

Is this a utopia? Mere wishful thinking? Perhaps. But trying to name the "ghosts" — and warn about them before they become uncontrollable — is worth the effort.

After completing my research, I also explored other notable works in the field. I’m happy to reference the following:

“Stochastic Parrots” paper (Emily M. Bender, Timnit Gebru et al.)
→ A landmark publication that early on highlighted how LLMs are essentially “stochastic parrots.”

Gary Marcus (blog & books like “Rebooting AI”)
→ A strong critic of purely statistical AI, advocating for hybrid approaches.

“The Alignment Problem” (Brian Christian)
→ A book about the fundamental challenge of aligning AI with human values.

Uploaded on 31. May. 2025