AI & LLMsAI Generated

Prompt Engineering Is Dead. Context Engineering Is King

Prompt engineering alone cannot handle production AI. Context engineering—dynamically assembling information, tools, and memory—is what actually makes agents work.

Sebastjan Mislej2026-02-138 min read

You spent hours crafting the perfect prompt. You tested it. Tweaked the wording. Added "think step by step." And it still fails in production. Here's the hard truth: the prompt isn't the problem. The context is.

In 2023, "prompt engineering" was the hottest skill in tech. Six-figure salaries. LinkedIn influencers sharing "magic templates." Everyone believed the right words could unlock AI's full potential.

That era is over. The smartest AI builders aren't writing better prompts anymore. They're building better systems that feed models the right information at the right time. They call it context engineering.

What Went Wrong with Prompt Engineering

Prompt engineering worked great for demos. You type a clever instruction. The model responds. Everyone claps.

Then you ship it to production. And everything breaks.

The problem? Prompts are fragile. Change one word and you get different behavior. That's fine in a playground. In production systems handling thousands of requests? It's chaos.

Warning: Prompt engineering relies on linguistic precision, not logic. It doesn't scale to real-world AI applications.

I've seen teams spend weeks optimizing a prompt. They get it working perfectly in testing. Then a user phrases something slightly differently. The whole thing falls apart. They patch it. Another edge case breaks. Patch again. The prompt grows into an unreadable monster.

Models also forget everything between calls. They drift. They misinterpret. Unless you feed them full context every single time, they act like they've never seen your data before. Because they haven't.

The breakthrough came when AI builders realized something obvious: agent failures aren't model failures. They're context failures.

Context Engineering: The Real Definition

In June 2025, Shopify CEO Tobi Lütke posted a simple tweet that changed the conversation. He called context engineering "the art of providing all the context for the task to be plausibly solvable by the LLM."

Andrej Karpathy, the former head of AI at Tesla and OpenAI, backed him up. He described it as "the art and science of curating what will go into the limited context window."

Within a month, Anthropic, LangChain, and LlamaIndex had all adopted the term. The first academic survey analyzing over 1,300 papers formalized it as a distinct discipline. This wasn't just rebranding. It was recognition that the field had evolved.

Key Insight

Context engineering is designing dynamic systems that provide the right information, tools, and memory. All in the right format, so an LLM can actually complete the task.

Notice what's missing from that definition? The word "prompt." That's not an accident.

Context engineering doesn't replace prompts. It makes them one small piece of a much bigger system. Your system prompt matters. But so does the conversation history, the tools available, the retrieved documents, the user preferences, and the output format.

What Context Actually Includes

When you call an LLM, context is everything the model sees before generating a response. Most people think it's just the prompt. It's not.

System Instructions

Rules, examples, and behavior guidelines that shape how the model responds.

User Prompt

The immediate task or question from the user.

Conversation History

Recent messages that give the model short-term memory.

Long-Term Memory

User preferences, past decisions, and facts stored across sessions.

Retrieved Data (RAG)

External knowledge pulled from documents, databases, or APIs.

Available Tools

Functions the model can call: send emails, check calendars, query databases.

Each piece matters. Skip one, and your agent fails. Not because the model is dumb. Because you didn't give it what it needed.

The Cheap Demo vs. Magical Agent

Let me show you the difference with a real example. Say someone emails you: "Hey, just checking if you're around for a quick sync tomorrow."

A cheap demo agent with poor context sees only that message. It responds with something generic:

"Thank you for your message. Tomorrow works for me. May I ask what time you had in mind?"

It's polite. It's useless. It doesn't know your calendar. It doesn't know who this person is. It doesn't know your communication style.

Now picture a magical agent with rich context:

Your calendar (you're fully booked tomorrow)
Past emails with this person (you use informal, friendly tone)
Your contacts (this is Jim, a key business partner)
Tools to send calendar invites

The response:

"Hey Jim! Tomorrow's packed on my end, back-to-back all day. Thursday AM free if that works for you? Sent an invite, lmk if it works."

Same model. Same email. Completely different output. The magic isn't in the model. It's in the context you feed it.

This example comes from Philipp Schmid at Hugging Face. He nails the core insight: most agent failures aren't about model capability. They're about missing information.

Why Format Matters as Much as Information

Most developers understand they need to give models more data. Fewer understand that how you present that data matters just as much.

Dump a raw JSON blob into your prompt and watch the model struggle. Give it a clean summary with clear structure and watch it reason properly.

Tip: A short, descriptive summary beats a massive data dump every time. LLMs have limited attention—use it wisely.

This applies to tools too. The input parameters you define, the descriptions you write, the examples you include—all of it matters. It shapes whether the model uses your tools correctly or makes stuff up.

I've spent more time on tool descriptions than on prompts. Clear parameter names. Unambiguous return types. Examples of when to use which tool. This is context engineering in action.

Context engineering treats format as a first-class concern. Not an afterthought.

The Goldilocks Problem: Finding the Right Amount

Context has a weird property: too little is bad, but too much is also bad.

Anthropic's research team calls this "context rot." As you add more tokens to the context window, the model's ability to recall specific information decreases. Every token you add depletes the model's finite attention budget.

n²

Pairwise relationships for n tokens

Human short-term memory limit

LLMs face similar constraints. They're built on transformer architecture where every token attends to every other token. More context means attention gets stretched thin.

The goal isn't maximum context. It's optimal context—the smallest set of high-signal tokens that lets the model succeed.

Techniques like context pruning, salience ranking, and smart caching help you stay in the Goldilocks zone. Give models enough to work with. Not so much they drown.

Building Context Systems That Scale

Here's where it gets practical. Context engineering isn't about writing better strings. It's about building better systems.

A good context system has three layers:

Persistent Identity Layer

Who is the user? What do they want? How should the model behave? This stays constant across sessions.

Knowledge Layer

Time-sensitive data from APIs, databases, or document retrieval. Updated dynamically per request.

Transient Layer

Adapts in real-time based on the conversation's direction. Includes recent messages and tool outputs.

Your system assembles these layers before every LLM call. The user never sees this. They just notice the AI actually understands them.

This is what separates [[link:building-ai-agents-that-work]] toy projects from production systems. Not prompt magic. Infrastructure.

The Attention Budget Mental Model

I think about context like a budget. You have limited tokens. Each one costs attention. Spend wisely.

Anthropic puts it well: "Good context engineering means finding the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome."

When I build [[link:openclaw-ai-that-does-things]] agents, I ask one question before every LLM call. Can this model plausibly accomplish the task with this context?

If the answer is no, I don't blame the model. I fix the context.

This mental shift changes everything. You stop fighting the model. You start feeding it better. Your debugging becomes "what information was missing?" instead of "why is this model so dumb?"

Prompt Engineering Isn't Dead. It's Just Not Enough.

Let me be clear: prompts still matter. System instructions, few-shot examples, clear output formats—all of this is real work that affects results.

But prompt engineering is now a subset of context engineering. It's one tool in a bigger toolkit.

Prompt Engineering

⊂

Context Engineering

The industry shift is from "what words should I use?" to "what information does this model need?"

That's not just a naming change. It's a fundamental shift in how we build AI systems. CIO magazine reports that IT leaders now treat context engineering as "foundational infrastructure"—like API management or [[link:design-tokens-the-system]] data governance.

Gartner defines it as "designing and structuring the relevant data, workflows and environment so AI systems can understand intent." The enterprise world is taking notice.

What This Means for You

If you're building with LLMs, here's what changes:

Stop obsessing over prompt wording. Focus on what information the model actually needs.
Build systems, not strings. Context should be dynamically assembled, not hardcoded.
Treat context as finite. Every token depletes the attention budget.
Include everything: instructions, memory, tools, retrieved data, conversation history.
Format carefully. Concise structure beats raw data dumps.
Test with this question: Can the LLM plausibly complete this task with this context?

The developers who master context engineering will build AI that feels magical. Everyone else will keep wondering why their prompts don't work.

The era of prompt magic is over. The era of context systems has begun. If you want your [[link:5-ai-automations-running-my-business]] AI agents to actually work in production, this is where you focus.

Start small. Pick one agent. Ask yourself before every LLM call: does it have what it needs? If not, give it. That single question will improve your AI more than any prompt template ever could.

Frequently Asked Questions

Is prompt engineering completely obsolete now?

No. Prompts still matter—they set intent and define behavior. But prompt engineering alone can't handle production AI systems. You need the full context engineering approach: memory, tools, retrieved data, and dynamic assembly. Think of prompts as one ingredient, not the whole recipe.

How much context is too much?

When model performance degrades. Research shows "context rot"—accuracy drops as context grows. The goal is optimal context, not maximum context. Start minimal, add only what's necessary, and monitor for recall issues.

What tools do I need for context engineering?

You need four things: a vector database for memory, a retrieval system for documents, tool definitions for external actions, and observability. Tools like Pinecone or Chroma work well for the vector store. LangChain and LlamaIndex are popular frameworks.

Does context engineering work with all LLMs?

Yes, but results vary. Larger context windows (Claude, GPT-4) give you more room. But even 200K-token models suffer from context rot. The principles apply everywhere: right information, right format, right amount.

Where do I start?

Pick one agent you're building. Before every LLM call, ask: "Can this model plausibly complete the task with what I'm giving it?" If not, identify what's missing. Add it. Format it well. That's context engineering.

Building AI agents?

I write about practical AI development from hands-on experience. No hype, no magic templates—just what actually works.

See how I use AI agents →

Back to all posts