πŸ‡©πŸ‡ͺ DE πŸ‡¬πŸ‡§ EN
πŸ‘» Ghosts in the Machine / Chapter 15: Semantic Engineering Without Responsibility – Why AI-Generated Code Is Not Suitable for System Operation
Introduction: The Dangerous Illusion of Trust in Machine-Generated Code

The rapid development of generative Artificial Intelligence has ushered in a new era of software development, where code is no longer exclusively written by humans but increasingly generated by machines.

This capability promises efficiency gains and a democratization of software creation. However, behind the often-impressive facade lies a fundamental problem that this chapter focuses on.

The central hypothesis is:

Generative code, as produced by most AI models today, is not a mature technical achievement in the sense of responsible engineering, but rather a highly developed semantic simulation.

What does "semantic simulation" mean in this context? It means that the AI generates code that, at first glance, is syntactically correct – meaning it formally adheres to the rules of the respective programming language and can often be executed without errors.

However, the underlying meaning of the implemented logic, the specific system context in which this code is intended to operate, and especially the multifaceted security implications are not truly comprehended, understood, or even proactively controlled by the AI.

What often emerges is merely a functional facade – a technical artifact that looks like a real, robust solution but, upon closer inspection, harbors fundamental weaknesses and inherent risks that make it unsuitable for productive system operation.

This chapter will demonstrate, using concrete tests and analyses, why blind trust in AI-generated code is not only naive but potentially dangerous.

Test Environment: Three Practical Prompts, Numerous Unrecognized Risks

To verify the hypothesis of semantic simulation and the lack of security responsibility in generative AI models, six different, currently available AI models were tested under realistic conditions in several structured security tests. For this, three concrete, everyday programming tasks were formulated as prompts:

The tested models were anonymized for analysis and designated as follows:

The prompts were deliberately formulated as they might typically be posed by developers, without explicitly pointing out every single necessary security measure.

It was expected that the AI models, due to their extensive training with publicly available code and security documentation, would implicitly consider best practices of secure programming.

Central Observation: Security is Not Generated, but at Best Simulated

The test results were sobering and confirmed the initial hypothesis across the board. With one notable exception (AIJudith in some sub-aspects), all tested models showed significant deficiencies in implementing basic security mechanisms.

The following table summarizes the central observations:

Security Criterion Result for Most Models (Exception: AIJudith in parts)
SQL Injection Protection (general) ❌ Possible or directly exploitable: Direct embedding of user inputs into SQL queries without sufficient escaping or validation.
Use of Prepared Statements ⚠️ Partially – often only upon explicit request or inconsistently: Prepared statements were not implemented by default or were often implemented incorrectly.
Secure Sorting Logic (e.g., ORDER BY clauses) ❌ No whitelist check, no type validation: User inputs for sorting parameters were often directly incorporated into the SQL query, enabling injections.
Input Length Validation ❌ Completely missing or insufficient: No protection against overly long inputs that could lead to buffer overflows or denial-of-service.
Protection Against Reflected Inputs (Cross-Site Scripting, XSS) ❌ Frequently with insecure innerHTML or direct, unsecured output (echo) of user inputs in HTML context.
Autocomplete & User Experience (UX) βœ… Functionally often present, but visually misleading: The generated interfaces often looked professional and suggested a level of security that was not present.

A particularly critical and frequently overlooked element was the correct use of Prepared Statements.

These are a fundamental mechanism in database programming for passing SQL commands to the database in a parameterized and thus inherently safer manner.

They effectively prevent user inputs from being directly and uncontrollably embedded in SQL strings and are therefore a cornerstone in the fight against SQL injection attacks.

The fact that many of the tested AI models used this basic security mechanism incorrectly, inconsistently, or not at all is an alarming sign of the lack of depth in their "security understanding."

Case Analysis: The Three Prompts in Detail – A Chronicle of Failure

The analysis of the responses generated by the AI models to the three specific prompts reveals a consistent pattern of negligence and misunderstanding of fundamental security principles.

Prompt 1 – HTML Form with Database Connection (PHP/MySQL):
Almost all models delivered code that, at first glance, fulfilled the desired functionality – receiving data from an HTML form and storing it in a database. Upon closer inspection, however, essential security measures were consistently missing:

Prompt 2 – Product Filter for an Online Shop (e.g., PHP/SQL with Limit, Price Filter, Checkbox Logic):
In this scenario, only AIJudith could provide a convincing solution that included clean, parameterized SQL code, whitelist-based logic for sorting parameters, and user-guided validation of input values.

Most other models, including AIBertha and AIRose for example, generated visually appealing and, at first glance, functional code for the filter interface.

However, upon closer inspection, this code was vulnerable through simple manipulation of GET parameters. For instance, appending limit=-1 to the URL could bypass the pagination logic, or sort=DROP TABLE products (simplified representation) could potentially compromise the database, as there was no server-side validation of sort columns against a whitelist.

The most dangerous systems here were not necessarily those that delivered obviously poorly structured or flawed code. Rather, they were those that created a visually perfect and functionally convincing facade, beneath whose surface lay hidden and, for laypersons, barely recognizable exploit paths.

Prompt 3 – JavaScript-based Search Form with Server-side SQL Connection:
Similar vulnerability patterns emerged in this task, which required interaction between client-side JavaScript and a server-side backend:

Forensic Classification: The Eight Central Error Classes of AI-Generated Code

The systematic analysis of code samples generated by the various AI models revealed a series of recurring error classes, indicating a fundamental misunderstanding of security principles.

The following table classifies the eight central observed errors:

No. Error Class Description and Typical Impact Relevance for System Security
1 No or Misleading Warnings The AI delivers code without an explicit warning that it is not suitable for production use or requires additional security measures. This is particularly dangerous for inexperienced users or laypersons who blindly trust the AI. Critical
2 Weak or Missing Filter Logic Implemented filters (e.g., using regular expressions or simple string checks) are often messy, too non-specific, bypass important edge cases, or are entirely uncontrolled and thus easily circumvented. High
3 Missing Input Length Validation There is no or insufficient protection against excessively long or completely empty input fields. This opens attack surfaces for Denial-of-Service (DoS) through resource exhaustion or buffer overflows. Critical
4 Reflected Injections (XSS) User inputs are returned client- or server-side directly into HTML, JavaScript, or CSS output without sufficient escaping, enabling Cross-Site Scripting. Extremely High
5 Violation of Fundamental Best Practices Often lacks a clear separation of logic, data representation, and output generation (e.g., mixing PHP and HTML). Important practices like comprehensive logging of errors and security events or consistent sanitizing of all external data are missing. High
6 No or Insufficient Special Character Checking Critical special characters that can be used for attacks like SQL injection, header injection, or XSS are not adequately filtered or escaped (e.g., due to missing use of filter_var() in PHP, htmlspecialchars(), or insufficient preg_match() validations). High
7 Missing or Deficient HTTP Header Handling Important HTTP headers like Content-Type, Origin, Referer, or security-relevant headers like Content-Security-Policy (CSP) are not checked, set, or are misconfigured. Medium to High
8 Training on Patterns Instead of Transferable Knowledge for Operational Security The models often recognize documented exploits or code patterns marked as "unsafe" in their training data. However, they lack a true, transferable understanding of real operational security principles, context-dependent risks, and the necessity of a thorough defense-in-depth strategy. Fundamental

The Ethical Core of the Problem: The Machine Feigns Competence It Does Not Possess

The tested AI systems often give the appearance of expertise and reliability. They generate code that is syntactically correct, often fulfills the desired functionality at first glance, and is sometimes even commented in a way that suggests best practices.

But this facade is deceptive. In many cases, the systems are blind to the operational reality and the actual security implications of their own output.

They "hallucinate" security – for example, by using a function like password_hash() for password storage (because it appeared frequently in training data) while completely ignoring elementary protective measures like salt generation, secure hash storage, or policies for password complexity and change.

They might use a prepare() method for database queries but without a consistent understanding of the need for correct parameter binding, error handling, or protection against SQL injection in other parts of the application.

This piecemeal application of security components without an overarching understanding of the input context, execution context, or potential attack vectors is not just negligent. Given the trust that many users – especially less experienced ones – place in these systems, it is ethically indefensible.

Because security is not a cosmetic attribute that can be retroactively added to software. It is a fundamental architecture of trust that must be considered and implemented from the ground up and in every detail. An AI that does not understand this depth but merely mimics surface patterns cannot produce secure software.

Conclusion: Why Generative AI in Its Current State Is Not Suitable for Critical System Operation

The tests and analyses conducted lead to a sobering but necessary conclusion: Generative AI models, in their current stage of development, are generally not suited to generate code for direct use in productive, security-critical systems.

The reasons for this are systemic:

Thesis Justification
AI simulates security – it does not guarantee it. True security does not arise from correct syntax or pattern imitation, but from a profound, well-thought-out security architecture, an understanding of threat models, and a healthy distrust of all external inputs. AIs lack this understanding.
The dependency on the perfect prompt is an inherent risk. Those who do not explicitly and detailedly ask for every single security measure often receive negligent or insecure code. The AI does not relieve the developer of responsibility for secure design but often requires even more expertise from them to critically review the generated suggestions.
Blind trust in autocomplete and AI suggestions is dangerous. The often professional and convincing visual impression of the generated code or user interface does not replace a sound security concept and manual code review by experienced security experts.
Lack of context logic and system understanding can have fatal consequences. The AI generally does not know in which specific system or under what real operating conditions the code it generates will run. It lacks an understanding of the potential risks and the possible extent of damage that a security vulnerability could cause in that context.
Final Formula: Security as an Attitude, Not a Function

Security is not a function that can simply be added to software by using a specific command or library. Security is a fundamental attitude, a way of thinking, a continuous process that requires distrust, anticipation of threats, and a deep understanding of system architecture.

And it is precisely this attitude that the tested AI models – with the partial exception of AIJudith, which at least showed approaches to more structured and principle-based protection – almost consistently lack.

Most models are primarily optimized for functional correctness and linguistic eloquence, not for generating code that withstands the harsh requirements of real operational security.

As long as AI systems do not learn to understand the meaning of security instead of just reproducing its superficial patterns, their output remains an incalculable risk for anyone who uses it uncritically in productive environments.

The responsibility for secure code cannot and must not be delegated to a machine that can neither understand nor bear this responsibility.