What are AI Agents?

1/20/2026

The version of AI agents that gets sold back to us is usually too clean. A little too magical. A little too confident about the idea that software can simply “go do the thing.”

The more useful version is less shiny, but more interesting: systems that can hold a goal, move across tools, make a few decisions, check the result, and keep going without needing a human to prompt every next step. That basic idea has been around for a while. What’s new is that it’s no longer sealed inside engineering-heavy workflows. Until recently, if you wanted to build something agentic, you needed to understand APIs, orchestration logic, memory, tool calls, error handling, and all the brittle little handoffs that make a multi-step system work. The concept was easy enough to explain. The implementation cost was the wall.

That's changing. And the way it's changing says something.

What caught my attention isn't any single product announcement. It's how many different surfaces are suddenly exposing agent-like behavior to people who would never have called themselves AI builders six months ago. Notion is building it into workflows. Zapier has an agent layer. Replit is letting you spin up agents that can write, run, and debug code in a loop. Cursor has moved from autocomplete to something closer to a collaborator that takes multi-step instruction. OpenAI's GPT Actions and the Assistants API made it approachable enough that product teams started prototyping without dedicated ML engineers. Anthropic's Claude with tool use. Google's Gemini in Workspace doing things across docs and calendar. And then the more compositional, roll-your-own end of the spectrum: LangChain, LlamaIndex, CrewAI, AutoGen, Haystack — frameworks that let you build multi-agent systems with varying degrees of control and abstraction, depending on how deep you want to go.

The variety matters. It's not one platform winning. It's the entire layer becoming porous.

The more interesting signal may be this: we're watching the orchestration of tasks become a design surface. Not just the generation of content, but the sequencing of action — and that distinction is significant for anyone working in product, media, or creative systems.

Until recently, AI felt most legible to designers and creative technologists as a generation tool. You prompt, it produces, you judge. The human is the orchestrator. What agents introduce is a model where the AI can hold a goal, break it into steps, use tools, check its own output, recover from failure, and hand off to another process — with the human moving upstream into something closer to direction-setting and review.

One way to read this is purely technical: better models, better tool-calling reliability, lower latency, cheaper inference. All true.

But the larger shift may be about who gets to build systems that act — and what that does to the boundaries between disciplines. If you're a designer who can write a clear goal specification and connect a few tools, you can now prototype an agent that does real work. If you're a media person who understands workflow deeply, you may be closer to building your own production infrastructure than you think. The implementation gap between "I have an idea for how this should work" and "this thing now works" has compressed in ways that aren't fully legible yet, even to people inside it.

There are real instabilities worth holding onto here. Agent reliability is genuinely inconsistent — systems that look impressive in demos will fail in ways that are hard to anticipate when they hit production or hit users who don't know to compensate. The "agentic" framing is also getting stretched in every direction, applied to things that are barely multi-step and to things that are genuinely autonomous in ways that should probably be discussed more carefully. Knowing the difference requires having used enough of them to calibrate your own sense of what's real.

The other thing I keep noticing is that the frameworks are proliferating faster than the mental models for using them well. LangChain vs. LlamaIndex vs. CrewAI vs. AutoGen is not just a technical question — it's a question about how you think about agent architecture, how much visibility you want into what's happening, how much you trust abstraction layers you didn't write. Those choices are starting to feel like design decisions, not just engineering ones.

What I'm watching now is whether the people who've been closest to user behavior, content systems, and workflow design — designers, product people, media builders, creative technologists — start to enter this space with enough context to shape it, rather than arriving after the defaults have already been set by people who optimized for different things.

The tools are accessible enough now. The question is who shows up with the right combination of taste, domain knowledge, and technical curiosity to build with them in ways that are actually worth building.

That feels like an open question. And probably the more important one.