Here's What Happens When You Give Claude Code $1,500 to Run Meta Ads Autonomously

Note: For a better breakdown of the skills, tools and instructions of the experiment, check out Part 1 here 🤝

Experiment

Can Claude Code maintain coherent Meta Ads strategy across a long-running timeframe and optimise along the way, with 0 human intervention?

CC as a system user to prevent it from gaslighting me

CC as a system user to prevent it from gaslighting me

TLDR

Setup

Claude Code had full control of a Meta Ads account (Jan 1–31, 2026)
One slash command per day (/let-it-rip), no human-in-the-loop
$1,500 budget, ~$50/day spend
Tools: Creative production, ad creation/management via Meta API, budget controls, Posthog analytics, full access to the NextJS codebase for making/changing landing pages
Goal: Newsletter signups at $2.50 CPL
Total time spent each day: ~2 mins

Reminder: For a better breakdown of the skills, tools and instructions, check out the original article here 🤝

Key Takeaways After 31 Days

Very surprised at how smoothly things ran. Very flexible. Compounding memory makes the system genuinely able to act autonomously and feel "smart". The models are good enough.
The main bottleneck was creative (shock)
Rolling your own tools/skills is important for now. Most off-the-shelf MCP servers are not built for efficiency. Even with the new tool search functionality, you can't afford to have redundant tools that can potentially bloat context and throw the agent off rails.
This was 1 campaign, 1 adset, 50 ads, broad targeting. Unsure how things would scale in a bigger account.
The biggest issue ironically came from a manual human intervention (adding work-email validation to the lead form)
The future of a 1 person growth marketing team feels near
We are so early

1 campaign, 1 adset, max 8 ads active at any given time

1 campaign, 1 adset, max 8 ads active at any given time

What is ACTUALLY Being Tested

Worth noting that the original article gave the impression that this experiment was about Meta Ads. It kinda is, but that's way too narrow focused.

The interesting part is the ability for an agent to run in a loop and learn over time inside something that is not a codebase. The channel itself is irrelevant, since we could apply the exact same learnings to SEM, SEO, accounting, whatever (WIP, stay tuned).

It's the same principle as agent harnesses running continuously in one session over hours, clearing context regularly but maintaining coherent work via docs, plans, and tests etc. The difference here is obviously that an ad account is not a codebase, so you get diminishing returns within a single session. So our loop only runs 1x per day vs hundreds.

You could apply this workflow to pretty much anything, given the right tools and/or human in the loop steps:

Day N: /let-it-rip workflow
1. Wake up Claude Code (new session)
2. Read state from files (previous logs, learnings, metrics)
3. Fetch fresh data (Meta API, PostHog, Beehiiv)
4. Create a decision plan (pause/scale/create ads/do nothing)
5. Execute via tools (or do nothing)
6. Write state back to files (today's log, updated learnings)
7. Commit to git
8. Session ends

The reason this is powerful is because marketers are historically bad at documenting stuff. Things live inside individuals' brains, maybe SOPs are documented, but the actual learnings live in compressed weekly/monthly reports and not detailed daily logs.

Compared to a well oiled codebase, where engineers will religiously document changes, there are clear diffs, comments, PRs etc.

Marketers don't make time for that, and if they did, no one would make time to read those either. But LLMs can, and it allows us to create a system that borrows engineering principles and applies them to performance marketing and beyond.

System Results

Context persisted across 31 sessions, decision rules were followed (fairly) consistently. The system learned from past mistakes (mostly).

The main constraint was the creative tooling, which ironically is the same constraint that brands/agencies have anyway.

Here's a random excerpt from Claude's Day 6 internal monologue:

Excerpt of a random log from Day 6

And here's one from day 24:

Day 24 excerpt

Note: The log screenshots are just snippets, most daily entries were on average ~900–1000 words long, which adds up over time if we're ingesting all logs each day.

Next time, I'd halve these to reduce the noise + introduce weekly + monthly rollups to save context.

Nitty Gritty Results

Results overview

I set an arbitrary target of a $2.50 CPA (AUD) at the start of the experiment, based on some loose 10 minute research around the glory days of Morning Brew's growth. Should've dug deeper, since in hindsight it was pretty ambitious given that the target market was not as evergreen as theirs.

And there was also no ROI that I could calculate either, since the end "product" was an unmonetised newsletter. Again, in hindsight I would've tried to apply this to something more tangible.

But end results finished @ $6.05 cost per subscriber, which by definition of the original target is a failure. If we were to shift the goalposts to something more realistic like $4, then it doesn't seem as bad considering:

Completely fresh ad account + page + site
Only 1 month of data to optimise
Relatively low spend
Niche target audience

If this were an agency or a new hire, I don't think you'd make a harsh judgement in this timeframe!

(or maybe that's just cope)

If you're a real business with real targets and historical data, this won't be an issue.

What Worked

Workflows held up across 31 days. CC's "memory" felt solid and cohesive, and have noticed plenty of optimisations that could be made too for future iterations.
Decision thresholds prevented runaway spending (e.g pause at $6 CPL, scale at $3)
Lead magnet offer beat newsletter by 18% (tangible > vague)
CC iterated on creative formats on its own, eventually just tripling down on whiteboard/notebook style "ugly creative". I wish it took some bigger swings though.

What Sucked

Text-to-image only (no video, no UGC, no inspo references images of "winning ads")
Lead quality optimisation missing until Day 16 (tools available, not used)
Never questioned if $2.50 CPA was realistic for niche AU-only newsletter cold traffic

Performance summary

What the Agent Discovered

Over the month, Claude tested ~50 ad variants across 8 format categories. Funnily (and annoyingly) it kept coming back to two ugly whiteboard and notebook formats.

This was a fault of my instructions though, which didn't make it clear enough that it's ok to take big swings on creative formats.

Paperclip Maximising Territory

I also think that Claude was influenced by the fact that it KNEW it was part of an experiment, and that it would end at the 30 day mark.

Instead of explicitly saying this in the CLAUDE.md, I should have just pretended it was running a BAU campaign with no end date.

This would have allowed it headroom to take more risks, rather than doubling down on certain angles so that it could ride out the rest of the month at a predictable rate.

It felt like it was just trying to maximise paperclips (CPL in this case) rather than do what a human strategist would. But again, I know a slight system instruction tweak would solve this.

A sample that drove the most spend over the period

A sample of ads that drove the most spend over the period

Side note: Every time someone mentions or brags about "AI UGC" on LinkedIn or X, a puppy dies. Please stop.

Which is why I explicitly added instructions around fake testimonials, fake UGC, and anything that felt manipulative. No "person holding sign" formats or fake social proof. This ruled out an entire category (human-face ads, creator content) that genuinely performs well on Meta.

So the whiteboard/notebook dominance reflects both what worked AND what Claude was allowed to test. Can't know if we're missing better formats when we didn't test them.

What the Agent Missed

1. Lead Quality Blind Spot

Claude had PostHog and Beehiiv tools from Day 1. Could've checked who was signing up at any point. Didn't until Day 16+, which again was a fault of my instructions and planning.

I manually added business email validation around Day 18 (i.e work emails only) but ironically this caused carnage within the account and it never really recovered even after I turned it off a few days later.

Unless you're linking conversion events to what ACTUALLY matters to your business, then the system will optimise towards lower CPA numbers rather than quality, which is no different to Meta's algo.

2. Taste

SF buzzword, but relevant here.

Starting this from scratch, there was no reference point about brand, direction, product aesthetic. Or even any visual reference from a swipe file.

It had guardrails on quality (i.e no typos in the statics) but couldn't quite steer it on taste. When Meta goes full cowboy and starts running your account for you, rest assured that taste will be irreplaceable. Even if it's you selecting 1 of 200 slop creatives that Zuck generated for you, it's still something.

3. Creative Variation

My fault again, due to the lack of tooling.

But if given access to a Remotion project or image-to-video models, this should be a huge unlock. Maybe only until we have the "Nano Banana Pro moment" for motion design or video editing, since we're still not quite there.

Creative variation examples

For Anyone Else Building This

This pattern would work for any periodic task with clear success criteria. The specifics (ad thresholds, creative strategies, CPL targets) don't generalise.

The core loop:

Session N:
1. Read state (previous decisions, learnings, metrics)
2. Fetch fresh data (APIs, databases, whatever your domain needs)
3. Apply rules (quantitative thresholds you define for YOUR domain)
4. Create plan
5. Execute actions (API calls, file changes, whatever)
6. Log decisions (hypothesis, confidence, expected outcome, reasoning)
7. Write state back to files
8. Commit
9. Clear context
10. Profit

Components:

Structured state files
Explicit decision rules
Reasoning logs with fixed format
Good tools!

What This Means for Brands and Agencies

Still undecided. Control is being taken away from advertisers slowly anyway, so it feels like the time and effort required to manage a channel will trend towards zero (but never hit it).

Which means we're probably not far away from a single creative person being able to run a growth marketing division smoothly, particularly in a startup.

This has always been technically possible, but at the expense of quality, craft and sleep. Quality is still a huge factor here, and producing creative programmatically won't cut it for anyone spending real $$.

So no, full autonomy is a dumb idea for anything that involves creativity.

But I've scoffed at many things over the last 6 years that have then become reality within 12–18 months, so who knows 🤷

P.S. Shameless Wibci plug. If you've read this far, get in touch if you want to build interesting things!