AI Coding Showdown: The Good, Bad, and Ugly

Discover how AI coding agents perform in real-world Edge Delivery Services projects. Learn about their strengths, challenges, and the impact on developer experience. This session from Adobe Developers Live 2025 showcases practical automation versus ‘vibe coding,’ highlighting where AI excels and where it struggles. Join Lars Trieloff as he tests agentic engineering, comparing leading AI coding agents to reveal insights into the future of digital experiences.

Transcript

Hello everyone.

Are we able to see the right things? So is this a developer conference? Yes.

Have we seen anything other than PowerPoint keynote? Have we seen any IDEs, text editors, terminals today? Not yet. Let me fix that.

You’re not going to see any PowerPoint slides today because we are here for the coding agent showdown. And you know, one of the things that I really hate about these conferences is when the presenters make me to do forced audience participation, which I’m going to do right now. So everyone say hi to AI. Hi to AI. And when that happens, AI will say hi to you. So one of the cool things about running a presentation from the terminal is that I could fulfill my childhood dream of having a whipper snapping assistant, which in this case is a small program called Co-presenter. So, dammit, here it is. So hello Adobe Developers Live.

AI is saying hi to you because you said hi so nicely. Thanks everyone.

We are not talking about any kind of AI. We are talking about the fun, sparkly AI, which is agentic.

And if we look at what this is, right, basically everything started with smart auto-complete. This was the original co-pilot, right? And it blew me away. And then we had features like chatting with your code base, being able to ask questions. Awesome. Then we had agents in the AI built right in and you could ask them to do things. And now we have standalone agents in the CLI. Basically the agent is saying, I don’t need this IDE anymore. And after that we might even go into seeing agents in the cloud, but let’s not talk about this today.

So let’s see if my Co-presenter has anything to add.

Yeah, so let’s explore AI assisted development together. And you’ve been certainly wondering through the inevitable architecture charts and this is how I would do them. So let’s quickly talk about what an AI agent is. It’s basically two parts. There’s on the one hand in the local environment, you have an agent harness, which is essentially a command line program that is able to use different tools. It allows you to access your file system. It is able to run shell commands. It’s able to call MCP servers. And all of this is operating on your code base and behind the scenes, this is making API requests to a model that is somewhere sitting in the cloud. And in this example, I think it’s Claude, but this could be different models.

So what does MCP stand for? Obviously this is model context protocol.

And I don’t want to talk about this in general terms. I want to talk about AEM Live Edge delivery services in particular. And on the AEM Live website, we have a nice page that summarizes most of what I’m going to talk about, which is AEM Live slash AI. You can open it and find a lot of tips for working with AI tools. So the first thing that I want to talk about is something that is called agents.md. And agents.md, or as Claude likes to call it, Claude.md is basically a thing that helps you prevent repetitions, right? In a lot of cases, you need to tell the agent how to operate within your code base, how to use the right tools. And this is what something that you can put into agents.md, it will be put into every single prompt in every single session, and the coding agents will follow these instructions.

This is a really powerful thing, and this is something that we’ve been shipping as part of the AEM boilerplate. And you can actually steal the one that we have on the Helix website. It’s working quite well because we’ve been doing a lot of coding on that website using that particular agents.md.

And now my question for Claude is, hey, why does every agent but Claude respect agents.md? And Claude has an explanation here. I think the real explanation is they started this thing and they don’t want people to switch to competing agents. But you never know. Speaking of things that Claude invented, there’s something that was launched really just four weeks ago, which is called skills.md.

And we heard about skills a bit earlier. It’s an incredible approach, right? Because one of the things that you learn when you’re creating an agents.md is you try to cram in everything that you know about coding in your project. And that might be just too much for the task at hand. And too much means you’re basically running out of context and when you’re running out of context, your agent is getting dumber and dumber and you don’t want that. Skills allow you to compartmentalize the knowledge.

And my colleague Sean, who is somewhere here in the audience and answering questions, he did the work of creating an incredible set of skills for edge delivery services that are part of the website. And the basic idea is you have skills like searching for documentation, skills like writing tests, skills like creating a pull request. And the coding agent will then see, hey, this is the list of all skills that I could load. And only when it comes to fulfilling the task, the actual skill that is needed is going to be called. So, what does our co-presenter say about skills.md and other agents? And given that this thing is only four weeks old, there wouldn’t be, it’s not a surprise that number one, no other agent supports it. And number two, that Claude doesn’t know that any other agents support it. When in reality, we built a small little tool called Upskill. And Upskill allows you to install, first install or steal skills from another repository, and then train any other agent that supports agents.md to use this skills. And we are going to use this a bit later. So, my favorite word in this presentation is YOLO, or as Claude Code likes to call it, dangerously skip permissions. You’ve seen Cedric demos, right? And chatgpt was very careful asking for every step along the way. Hey, Cedric, are you okay with me making this API call? And the answer is, if you ask me, sure, I’m fine with it. Just go ahead, just do your worst. And this is exactly what the dangerously skip permission allows you to do. It obviously puts you in the danger zone, because you need to trust somebody else. But this is also where the really fun work begins. So, under normal operations, the agent will ask you for permission for anything that is potentially sensitive or destructive. You will feel, on the one hand, very assured, but also bored out of hell, right? Because you think, hey, is my job only going to confirm that the agent can do the thing that I asked it to do five minutes earlier? And the answer is yes, that’s your job now.

Or you can just escape the sandbox and can say, just run it, just run whatever you think is useful. And staying with the Western theme, have you ever seen a cowboy wear a seatbelt to the rodeo? I thought so.

Let’s add a small German lesson. There’s a word for you, which is called Herzblut.

And Herzblut is, in German, it means putting your heart and soul into the work. And when we are coding, this is what we do. We put our heart and soul into the work, and we obsess about every single detail, whether they are tabs or spaces for indentation, and all of that, and we like to have endless discussions.

When you’re coding with AI, drop that attitude. Your Herzblut isn’t good, isn’t both anything here. The agent is a tool, the code that it produces is a tool, and it will get the job done.

And once you’ve gotten to the point where you think, well, I don’t really care how it’s doing its job, I’m just caring that the job is being done, you get to the point where you think, I could run multiple agents in parallel, right? I could just work on multiple jobs in parallel, or I could just ask multiple agents to perform the task.

And this means you could have one terminal way of Claude, another terminal way of Codex, another terminal way of Gemini, and they’re all working on the same code base, albeit on different tasks.

One of the things that is going to help you is a feature in Git, which is Git Work Trees.

And Git Work Trees basically allow you to separate the work by creating one folder for your working copy that is tied to one particular branch. And if you create one folder for every branch, then you can assign basically one agent to every single folder and have them work in parallel without interfering with each other.

Everything comes together in a normal way by using Git commits and mergers. It’s a very powerful pattern when it comes to working with multi-agents.

For AEM, we added automatic detection for when AEM is running inside a Git Work Tree. It used to say, hey, I can’t deal with this and just exit. Now it deals with it, and the thing that it does, it smartly picks a different port so that it can actually run multiple agents in parallel and multiple development servers.

So this was added in September.

And what does Claude think is the coolest thing about multi-clotting? It’s parallel execution, exactly, right? It’s just doing many things at once and being faster.

Let’s talk a bit about how agents are looking at the world. And Cedric gave us a master class in how agents are seeing your website.

So let’s talk about how agents, let’s talk about how agents are actually seeing your code base.

So coding agents are mostly text-based, right? So they can naturally see your source code. They can use the CLA and CLA output quite well. When it comes to background tasks, we have a couple of agents that are adding support for this, which is great. Claude Code is a good example. And we even have agents that are trying to add support for TUI applications, right? Like this presentation tool that I’m running right now. Gemini has this. It’s super buggy, unfortunately.

Some agents have image input. Downside is that can eat up your context very quickly, and there’s nothing yet for GUI apps. So if you want to help your agent see the world, you need to change whatever challenge you have into a coding challenge.

What we did in AEM are basically two things. Number one is we added for AEM up the command forward browser logs, which means all the browser logs are forwarded to the terminal. And if your agent is starting the dev server, then it can end running it in the background like Claude can, then it can actually see the error messages that are happening on the console. Additionally, if you’re doing web development, you should use either Puppety or PlayWrite to instruct your agent to write things like throw away scripts to test and capture the page. And Claude is incredibly opinionated this time. When I asked it which one is better, it’s saying PlayWrite is better. So let’s quickly talk about safety. And what I like to do is I like to give the coding agents control, right? I like them to drive commits for me, and I like them to open GitHub issues for me, comment on GitHub issues, open PRs for me. The thing that I don’t like is whatever slop that they’re creating showing up under my name. So what I did is I created two tools called AI-aligned Git and AI-aligned GH, which are detecting that it’s an AI agent running this, and then intercepting the call and attributing it to the right agent.

So one of the things that is important for this is I don’t want my colleagues, my teammates to waste their time, their attention, and their hearts blued on reviewing PRs that have been created by AI. If they see something where it says, hey, this looks ugly, just shoot it down. Agents don’t have any feelings. And as we go to the most interesting part of the demo, let’s take a quick look at the agent model table here. So normally when I give presentations, and the presentation bombs, I can only disappoint one company. I thought, why don’t we try to disappoint 12 companies at the same time? So I built up something where we are running Claude, Codex, Gemini, Copilot, Cursor Agent, OpenCode, Quen, Droid, Amp, Kimmy, Crush, and Goose against each other. And I wanted to have as much variation in there as possible. So I gave every single agent a different single agent harness, a different model. Claude is using Claude Opus, which is not the normal modus operandi, which is Claude Sonnet. Codex, I’m using the GPT-5 High. Gemini is using Gemini. Copilot is, that’s GitHub’s Copilot. It’s using the cheaper Claude Haiku model. Cursor Agent is using a homegrown model, Composer One, which is based on GLM, which is a Chinese model. OpenCode is using Grokcode Fast One. Quen is using Quen. Droid is using Droid Core, which again is GLM. GLM is actually quite a nice model, I have to say. Amp is using Sonnet and GPT-5, so a combination.

Kimmy is using Kimmy K2.

Crush is using GLM again, and Goose is using GPT-OSS, which is open AI’s open source model. So, let’s quickly talk about the problem that I have.

We have this fantastic AEM Live blog, which I’m using to, from time to time, share thoughts and ideas from my colleagues and our partners, and by now we have, we started this earlier this year, and by now we have so many blog posts that we actually need an archive. So, what’s the best way to get an archive? You know what? We just have a prompt for this, and I have the prompt here in human readable version, and I have my 12 agents open here, which are Claude, Codex, Gemini. No idea what that is. Oh, that’s, that’s, it’s cursor, Crush, Amp, Quen, no idea, GitHub co-pilot, Goose. Goose is the weirdest one. No idea what this one is. This is a droid, and again, I don’t know. Oh, this is again GitHub co-pilot. You know what? Let’s just fire this up, and I’ve wrote a small tool, which is now broadcasting the same command into every single terminal, and we need to wait a bit here, so that it’s doing this thing 12 times in a row, and then we can start looking over the shoulder of different agents, and see how they’re working, and how they’re approaching the task. This one wants a confirmation. Perfect. Okay, so let’s dive into a couple of these.

So we see ComposerOne is already at the process of writing code for whatever reason.

You know what? I’m just going to trust you on this, my friend.

We have Claude code, which is analyzing this. We have… We have codex, which is uncharacteristically careful here. Gemini is debugging the universe. That’s great. Oh, I see. This one didn’t even get the prompt. Not a problem.

Crush, come on. So who is this? This is AMP.

AMP has a built-in to-do list, which is really nice. So you can see where it is as part of the process, so it’s trying to test the implementation locally. That’s great. Thinking of linting, perfect.

That’s not disturbed. What’s this? This is Quen. Oh, I think I made a mistake. Pasted the same command twice. That’s fine.

We have… Copilot still working on it. And is Goos still working? So Goos is apparently, this was one of the last agents that I tried, because for some reason you cannot divide 11 equally into columns and rows, so that’s why they got into the mix. Their claim to fame is that they are super enterprise ready.

I can tell you the installation was hell.

So it might be true. So what do we have here? We have… We have… Yeah, let’s always allow… Okay.

So we shouldn’t compare them just based on speed, because some of them have been waiting, but okay, so it’s able to see this. And yeah, approve curl.

We are looking here at, again, GitHub Copilot. So while this is running, you know what? Let’s see if they actually already pull requests coming in. And the way you would think the way I do this is that I open… So let’s quickly look at open pull request, and we can also do this without leaving the terminal. And look at this. AMP already submitted a pull request.

We have cursor agent submitted one, and we might be able to try these. Because I have the feeling that I’m going to get kicked off this stage any minute now. So let’s just see what the pull requests look like. And we can already see one here. This is Kimmy. Kimmy didn’t follow the instructions, didn’t put in the AI generated label. That’s not very nice. It put emojis into the PR description. Also not very nice. It also forgot to put a test URL in here. So without any feelings, we are just going to close the pull request. Goodbye. Next one, AMP. AMP is saying this is AI generated. It’s saying my code is great. It’s picking the wrong model, by the way.

Look.

Okay, oh here. Preview link will be available. Let’s see. I think what they actually do is, yep, that doesn’t work. And we have cursor agent. And cursor agent. Does cursor agent have a test URL for us? No.

Okay. So we can see faster is not necessarily better.

I can tell you I’ve been running this a couple of times. If we wait long enough, and I’m going to make sure that when I go off this stage I’m not going to close my laptop. So you can follow the pull request as they come in. Some of them are going to be reasonably good. And things that we can iterate upon.

One of the biggest learnings for me is that we have, oh wait, I forgot about this.

So my biggest learning was that we have an incredible amount of innovation here. The category of coding agent itself is basically half a year old. New features are being rolled out every week. Actually new coding agents, new coding models are being rolled out every week. And one of the things that I found most surprising is that some of the newer agents and some of the smaller models are actually very, very capable for smaller and simpler tasks. So when it comes to, one of the other tasks that I had was to implement light dark mode. So that’s a perfect and very simple task for a coding agent and you can use this free model, you can use a cheap model.

And it’s definitely something that I would recommend trying out, playing around with. And with that I’m going to say thank you for paying attention. Thanks for looking at the website, commenting on the pull request when you see them come in. And next one up is going to be my colleague Carl who’s going to show you what even more advanced stuff you can do with coding agents. Thank you for joining me again.

This session — Coding Agent Showdown: The Good, the Bad, and the Ugly — puts agentic engineering to the test. Lars Trieloff compares leading AI coding agents on real-world Edge Delivery Services projects, revealing where developer experience meets agent experience. Watch where AI shines, where it struggles, and what separates practical automation from “vibe coding.” Recorded live from San Jose.

Special thanks to our sponsors Algolia and Ensemble for supporting Adobe Developers Live 2025.

Next Steps

recommendation-more-help
3c5a5de1-aef4-4536-8764-ec20371a5186