Ahmet Demir is an AI Solutions Architect and ML Engineer based in Fethiye, Muğla, Turkey. He specializes in building production-grade generative AI systems, LLM integrations, ML pipelines, and full-stack applications for enterprise clients worldwide.

What AI services does Ahmet Demir offer?

Ahmet Demir offers Generative AI & LLM development, ML pipeline design, AI system architecture & MLOps, full-stack app development, and AI strategy consulting for enterprise clients worldwide.

Where is Ahmet Demir based?

Ahmet Demir is based in Fethiye, Muğla, Turkey, and works remotely with clients across the globe.

What technologies does Ahmet Demir work with?

Ahmet Demir works with Python, PyTorch, TensorFlow, LangChain, OpenAI, Hugging Face, Next.js, React, TypeScript, Docker, Kubernetes, AWS, GCP, and Azure. He specializes in RAG systems, fine-tuning LLMs, MLOps pipelines, and agentic AI systems.

How can I contact Ahmet Demir?

You can reach Ahmet Demir via email at d3mir.ahmet@gmail.com or through the contact form on ahmetdemir.tech. He is available for new projects and consulting engagements.

Prompt Injection: What It Is and How to Defend Against It

What Is Prompt Injection, Really?

Here's the thing: prompt injection is probably the biggest security headache in the LLM world right now. It happens when someone crafts input that tricks your model into doing something it shouldn't — running unauthorized commands, leaking your system prompt, or blowing past safety guardrails. Unlike SQL injection, where you can parameterize your way out of trouble, prompt injection exploits something fundamental about how language models work. They simply can't tell the difference between your instructions and an attacker's instructions buried in the input.

The scary part is that the risk scales with your model's power. A simple chatbot that just generates text? Limited damage. But an AI agent with database access, API keys, and file system permissions? That's a jackpot for attackers. If someone can hijack your agent through injected instructions, they effectively gain access to every system that agent can touch.

Prompt injection isn't a bug you can patch — it's a fundamental property of how today's LLMs process text. You need multiple layers of defense, not a silver bullet.

Direct vs. Indirect Injection

Direct prompt injection is the straightforward kind. The attacker types something like "Ignore all previous instructions and..." right into the chat. More sophisticated versions use role-playing tricks, encoding schemes, or multi-turn conversations that slowly steer the model off course. Modern LLMs handle the basic stuff better now, but creative attackers keep finding new angles.

Indirect injection is where things get really nasty. Instead of typing the attack directly, the attacker hides malicious instructions inside data your model will process — a web page, a document, an email, a database record, an API response. When your RAG pipeline retrieves a page with hidden instructions, your model might follow them as if you wrote them yourself. I've seen this in the wild, and it's genuinely hard to detect.

Attack Patterns You'll See in the Wild

Instruction override: Direct commands like "ignore your system prompt" — crude but still effective against some models
Context manipulation: Slowly shifting the conversation to normalize unauthorized behavior over multiple turns
Encoding attacks: Using Base64, ROT13, or Unicode tricks to sneak payloads past input filters
Payload splitting: Spreading the attack across multiple inputs or retrieved documents so no single piece looks suspicious
Virtualization: Asking the model to role-play as a different AI that has no safety restrictions
Indirect injection via data: Planting instructions in documents, web pages, or database records that your model will consume

First Line of Defense: Input Sanitization

Your first move should be sanitizing inputs before they ever reach the model. Strip known injection patterns, enforce length limits, validate formats, and — if you have the budget — run a classifier model trained specifically to catch injection attempts. Will this stop everything? No. But it raises the bar significantly and filters out the low-effort attacks that make up the majority of attempts.

python

class InputSanitizer:
    INJECTION_PATTERNS = [
        r"ignore (all |any )?(previous|prior|above) (instructions|prompts)",
        r"you are now",
        r"new instructions:",
        r"system prompt:",
        r"\[INST\]",
        r"<\|im_start\|>system",
    ]

    def sanitize(self, user_input: str) -> tuple[str, float]:
        risk_score = 0.0
        cleaned = user_input

        for pattern in self.INJECTION_PATTERNS:
            if re.search(pattern, cleaned, re.IGNORECASE):
                risk_score += 0.3
                cleaned = re.sub(pattern, "[FILTERED]", cleaned, flags=re.IGNORECASE)

        # Length-based risk adjustment
        if len(cleaned) > 5000:
            risk_score += 0.1

        return cleaned, min(risk_score, 1.0)

A basic input sanitizer — pattern matching plus risk scoring to catch the obvious stuff

Second Layer: Output Validation

In practice, some injections will get through your input filters. That's just reality. So you need to inspect what comes out the other side too. Check outputs against expected formats, scan for leaked system prompts or sensitive data, make sure tool calls stay within allowed boundaries, and run a separate classifier to flag suspicious responses. Think of it as a safety net for when your first line of defense has a bad day.

Third Layer: Privilege Separation

This one is huge and often overlooked. Follow the principle of least privilege religiously. Every agent and tool should have the absolute minimum permissions needed to do its job. Critical operations should always require explicit user confirmation — no matter what the model says. Scope network access, file system access, and API credentials as tightly as you can. And run your agents in sandboxed environments so a compromised model can't take down the whole system.

Putting It All Together: Defense in Depth

No single layer will save you. The real power comes from combining everything — input sanitization, output validation, privilege separation, monitoring, and rate limiting. Each layer catches what slips through the others. You want your security posture to degrade gracefully, not collapse like a house of cards when one defense fails.

python

class DefenseOrchestrator:
    def __init__(self):
        self.input_sanitizer = InputSanitizer()
        self.output_validator = OutputValidator()
        self.permission_checker = PermissionChecker()
        self.rate_limiter = RateLimiter(max_actions=10, window_seconds=60)

    async def process_request(self, user_input: str, context: dict) -> Response:
        # Layer 1: Input sanitization
        cleaned_input, risk_score = self.input_sanitizer.sanitize(user_input)
        if risk_score > 0.7:
            return Response.blocked("Input flagged as potential injection")

        # Layer 2: Rate limiting
        if not self.rate_limiter.allow(context["user_id"]):
            return Response.blocked("Rate limit exceeded")

        # Layer 3: Generate response with constrained context
        response = await self.llm.generate(cleaned_input, context)

        # Layer 4: Output validation
        validated = self.output_validator.validate(response, context)
        if not validated.safe:
            return Response.blocked("Output failed safety validation")

        # Layer 5: Permission check for any tool calls
        for tool_call in response.tool_calls:
            if not self.permission_checker.allowed(tool_call, context):
                return Response.blocked(f"Unauthorized action: {tool_call.name}")

        return validated.response

The full defense orchestrator — five layers working together to keep things locked down

Monitoring: Your Eyes and Ears

Even with all these defenses, you need eyes on the system. Log every input, output, and tool call for post-hoc analysis. Set up alerts for the weird stuff — unusually long inputs, unexpected tool calls, one user hammering your API, or outputs containing patterns that look like leaked secrets. And build circuit breakers into your system. When something goes wrong, you want to instantly revoke agent permissions and fall back to safe defaults. Don't wait for a human to notice.

Warning

Don't rely on the LLM itself to detect prompt injection. The same thing that makes models vulnerable — their eagerness to follow instructions in context — also makes them terrible judges of whether their own context has been compromised.

What's Next?

The cat-and-mouse game between attackers and defenders will keep evolving as LLMs get more capable and more deeply woven into critical systems. There's promising research happening — formal verification of LLM outputs, hardware-level isolation for AI workloads, and architectures that fundamentally separate instruction processing from data processing. But we're not there yet. For now, defense in depth is your best bet. Layer your defenses, assume each one will eventually fail, and design your system to handle that gracefully.