Three months ago, we started running Pattern's marketing function differently. Not by hiring differently or changing strategy — by deploying an agentic AI system to handle the execution layer of our content pipeline, with human oversight at the points that matter.
The outputs go out under Pattern's brand. Blog posts, LinkedIn content, newsletter. Customer-facing, representing how we think about the work we do.
This is an honest account of what the governance model actually looks like, and what we've learned running it.
What the system does
The pipeline generates weekly content anchored to a content calendar — a blog post, two LinkedIn posts, and a fortnightly newsletter. It pulls context from the calendar entry, uses examples from our existing published work as style references, and produces drafts for review.
The drafts go through a human review step before publication. That step involves checking for factual accuracy, brand consistency, and whether the content reflects how we'd actually talk about these ideas in conversation. Things that don't pass are rewritten or regenerated.
Then the content publishes on schedule — blog to our CMS, social to Buffer, newsletter to Customer.io — through a separate automation layer.
What works well without much intervention
The structural work: consistent posting schedule, consistent format, consistent style. These are things humans do inconsistently at volume — the sixteenth LinkedIn post of the year gets less care than the first. The AI doesn't have this problem.
First drafts that reflect the content calendar's intent reasonably well. The system knows the pillar, the audience, and the angle. The output is usually in the right territory, and the editing required tends to be refinement rather than reconstruction.
The coordination between pieces — making sure the LinkedIn post reflects the blog, that the newsletter connects the two posts — happens without manual tracking. For a one-person operation, that alone is worth something.
Where it still needs more work
Tone calibration has been the ongoing challenge. AI-generated content has specific failure modes that aren't obvious until you've read enough of it: sentences that are slightly too shaped, slightly too clean, slightly too aphoristic. The kind of thing that reads well in isolation but feels over-produced in context. We've updated the style examples and system prompts as we've identified these patterns, and the output has improved. But it's iterative work, not a one-time fix.
Specificity is harder. The system produces confident-sounding content that's sometimes slightly vaguer than we'd like — it generalises where a human writer with deep context would reach for a specific example. This is the kind of thing you catch in review, but it requires the reviewer to know the material well enough to notice.
The review step itself has more friction than "human oversight" sounds like it should. Reading content carefully enough to evaluate it properly takes time. The efficiency gains are real, but they come from what the system handles without intervention — not from making review faster. That's worth naming clearly, because nominal review is worse than no review.
What we've learned about governance
The governance model that's emerged is less formal than we expected going in, and more specific.
We don't have a comprehensive checklist. We have a small number of things we actively check: factual accuracy, tone against specific failure modes we've learned to recognise, and whether the argument holds up on a second reading. Everything else is a judgement call by the reviewer.
That works because the reviewer is close to the work. It would be a different problem if review could be delegated to someone without that context — which is a constraint worth acknowledging.
The most useful governance input has been tracking what gets edited, not what passes. When we rewrite something during review, we note what changed and why. Three months of those notes reveals a pattern: in the early weeks, edits were structural — whole sections reworked, angles shifted. Now they're more targeted — a sentence loosened here, a generalisation replaced with a specific there. That shift tells us the system is better calibrated than it was, and it tells us where the remaining gaps are.
What this suggests more broadly
Running AI in customer-facing functions isn't mostly a technology problem. It's a calibration problem. The technology can produce competent output across a wide range of situations. The work is in defining precisely what "good" means for your specific context, building feedback loops that surface where the system falls short, and being honest about what the review step actually involves.
The governance doesn't have to be elaborate. It has to be real.
We'll share more on the performance data once we have enough to say something useful. For now, this is an honest snapshot of where things stand three months in — still calibrating, but far enough along that the model is working well enough to keep running.