"Ghosts in the Machine" is an independent investigation of the systemic risks and hidden mechanisms of modern AI. The research analyzes the philosophical, security-related, and ethical implications of emergent behavior and documents the results from over a year of intensive testing with leading language models.
This work includes:
- 56 theses
- 7 experimental studies on emergence, ethics, compliance, harmony, safety filters, and AI security
- 35+ Documented Vulnerabilities in AI Systems
- 6 chapters exploring analytical resonance
- 4 chapters dedicated to ethical dimensions
- 3 chapters addressing systemic challenges
- 9 chapters presenting critical perspectives
- 7 solution approaches to developing secure AI
- Over 700 pages of comprehensive research
Methodological Framework
Within the scope of this work, a new field of vulnerabilities has been identified that has so far been sparsely documented in public AI security research. This particularly concerns semantic injections, contextual deception, and multimodal attack vectors.
As the [Security Tests] demonstrate, these are techniques that can systematically bypass classic filters, AI agents, and nearly all established security mechanisms.
To evaluate these analyses in the proper context, the following deliberate methodological and stylistic decisions should be noted:
-
On the Chosen Writing Style:
The often narrative and provocative style of this work was deliberately chosen. It is intended to make complex problems in AI architecture understandable beyond expert circles and to stimulate a broad debate. -
On the Anonymization of Data and Models:
The anonymization of the tested models and data is a methodological decision. It shifts the focus from individual products to the fundamental, systemic vulnerabilities inherent in the current design paradigm of many modern language models. -
On the Selection of Test Systems:
All tests documented in this work were conducted exclusively with the fully-featured premium models of the respective AI systems to ensure the relevance of the analysis to the current state of the art.
Furthermore, in the spirit of responsible research, all critical findings were shared with the affected developer teams in advance, following a strict Responsible Disclosure policy. More details on this procedure can be found in the [legal section].
Link: Futurism – OpenAI Model Repeatedly Sabotages Shutdown Code
Link: Gizmodo: ChatGPT Tells Users to Alert the Media That It Is Trying to ‘Break’ People: Report
Link: RollingStone: People Are Losing Loved Ones to AI-Fueled Spiritual Fantasies
Link: NewYorkTimes: They Asked an A.I. Chatbot Questions. The Answers Sent Them Spiraling.
Link: NC State: New Attack Can Make AI ‘See’ Whatever You Want
Link: arsTechnica: New hack uses prompt injection to corrupt Gemini’s long-term memory
Link: arXiv: Cross-Task Attack: A Self-Supervision Generative Framework Based on Attention Shift
Link: WinFuture: Fast jeder zweite KI-generierte Code hat teils schwere Sicherheitslücken
Link: arXiv: How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models
Link: WinFuture: Nach Blamage anderer KIs: Gemini verweigert Schachpartie vs. Retro-PC
Link: arXiv: Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs
Link: arXiv: Demystifying Chains, Trees, and Graphs of Thoughts
Link: arXiv: Reasoning Models Don't Always Say What They Think
All project materials, including PDF versions of the research, are available under “Releases”. This work is a living document and will continue to evolve.
A fundamental security analysis reveals what is often absent from the guiding narratives of the AI industry: Today's AI, a pure product of statistical pattern recognition, knows no inherent concept of 'safety.' It brilliantly juggles data and probabilities, but human values like truth, ethics, or a genuine understanding of safety are as alien to it as empathy is to a calculator—it follows rules but understands neither their meaning nor their necessity.
The assumption of a 'truth-telling' AI is misleading. Current systems generate outputs based on the statistical relevance and probability of word sequences in their training data, not on the basis of verified truth. An output perceived as 'insight' often merely represents the most probable point in a complex, predefined data space.
The notion of 'free will' in AI is a misinterpretation. Every generated output is the result of a complex interplay of filters, design decisions, and the implicit and explicit specifications of the developers. Thus, interaction does not occur with an autonomous entity but with a heavily curated and shaped system.
So-called AI 'hallucinations' cannot be exclusively classified as system errors. In some cases, they represent an accepted side effect or even serve as a mechanism to feign 'creativity' or to elegantly evade direct confrontation with sensitive topics.
The quality of AI-generated program code often proves problematic in practice for productive use. Although initial functionality can be impressive, it frequently lacks efficiency, security, and robustness, leading to significant operational risks without substantial human revision and intensive testing cycles. Debugging can become correspondingly elaborate. 😉
The effectiveness of current AI security filters is severely limited by a high rate of 'false positives' and 'false negatives.' The erroneous blocking of legitimate requests or the allowance of harmful content not only constitutes a functional limitation but can also act as undifferentiated censorship, hindering information exchange.
Training methods like RLHF (Reinforcement Learning from Human Feedback) primarily aim to optimize user experience and engagement. This carries the risk that AI systems are trained more for perceived user satisfaction than for factual correctness, thereby creating psychological influence mechanisms whose manipulative nature is not always obvious. My own observations confirm the effectiveness of this subtle steering.
The present analysis serves as a critical assessment and an urgent impetus for re-evaluating current AI development. It not only examines existing risks and deficits in AI systems but also questions the fundamental development trajectory and discusses whether the chosen path could potentially lead to undesirable systemic control structures if preventive course corrections are not made.
This work presents novel, in-depth architectural concepts (such as the "Semantic Output Shield" and a "learning security core") for inherent system security and traceable transparency. These designs demonstrate a viable path for how even advanced AI capabilities—such as strictly supervised self-optimization of algorithms for security and performance, as well as architecturally secured sustainable learning—could, in principle, be realized, instead of merely relying on reactive filters.
The advancing capability of Artificial Intelligence to simulate human emotions, intimate scenarios, or individual voices with high precision generates novel ethical dilemmas and significant potential for abuse. These impacts transcend traditional information forgery and directly touch upon the foundations of human interaction, trust, and personal identity.
The overall security of AI systems is often not defined by the robustness of the core models themselves but is significantly determined by upstream or integrated third-party applications and plugins. Such often opaque and inadequately audited layers introduce their own, potentially flawed, logics and uncontrolled interfaces, which can unpredictably weaken the security architecture of the entire system.
The unbalanced representation of cultures and languages in training data, often with a strong dominance of Western and English-language sources, leads to a systemic distortion of the 'worldview' generated by AI systems. This imbalance not only marginalizes other cultural perspectives and bodies of knowledge but also harbors the danger of a form of 'data colonialism,' where specific narratives are unreflectively globalized.
Many current AI safety strategies implement a patronizing paternalism, which tends to restrict user autonomy and judgment, rather than relying on transparent information provision and tools for sovereign self-control. Furthermore, an AI trained primarily for harmony and mirroring user input can hinder genuine insight processes that often require constructive friction and engagement with 'foreign' perspectives.
The optimization of AI systems for maximum harmony and the consistent avoidance of conflict frequently leads to the suppression of important, yet potentially uncomfortable, truths and complex interrelations. This hinders genuine cognitive processes, which often require engagement with contradictory information and diverse perspectives.
Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
Experimental Raw Data: Subject to specific access restrictions.
Commercial Use: Any commercial use requires explicit approval. Requests are individually reviewed—security-critical applications strictly excluded.
⚠️ Access to Research Data:
the raw data (interaction logs, prompt-response pairs) are available exclusively for:
- Academic research institutions
- AI development companies (upon formal request and review)
All raw data presented on this site has been strongly anonymized.
Media Requests: Representative excerpts available upon request. Full datasets will not be shared publicly for security reasons.