Two weeks ago we wrote about what a marketing function actually costs and why that number is about to matter less. This is one piece of evidence from inside our own stack.

Lead scoring is a useful test case. It is the kind of capability most SMEs would have written off — too expensive to build properly, too inconsistent to do by hand, too dependent on judgement to outsource. So the work either does not happen at all, or it happens once in a spreadsheet and quietly stops being maintained six weeks later.

We rebuilt our version of it as an AI-assisted workflow that runs every time a contact engages with us. Here is how it is put together, what it costs to operate, and where the friction lives.

The constraint

We are a small studio. There is no full-time marketer scoring leads on a Tuesday afternoon, and there is no enterprise marketing-automation budget to spend on a vendor who would do it for us. Whatever scoring we ran had to be:

A traditional implementation would be a rules engine in the CRM with a points table and a quarterly committee meeting to argue about weights. We have used that pattern before. It works for a quarter and then it ages badly.

The problem

A scoring system has to weigh three different kinds of signal:

  1. Firmographic fit — is this the kind of company we work with (size, revenue, industry, geography)?
  2. Behavioural engagement — what have they actually done on the site, in email, with our content?
  3. Account intent — pricing-page visits, repeat visits, multiple contacts from one organisation engaging at once.

A points table treats all three the same way. A senior salesperson reading the same data does not — they trade signals off against each other ("the fit is weak but the intent is strong, so still worth a call"). The interesting bit of scoring is the trade-off, not the table.

The solution structure

Sixteen nodes in n8n. End to end:

The Claude call is the only AI step. Everything else is deterministic plumbing — fetch context from the two systems of record, give Claude a structured payload, take the JSON it returns, write the score back to both systems.

The system prompt sits at around seventy-five lines. It tells Claude:

The output is three fields: a score from 0 to 100, a two-to-three-sentence explanation, and the three signals that mattered most. The score maps to a lifecycle stage — subscriber, lead, marketing qualified, sales qualified, opportunity — which is what D365 and Customer.io actually act on.

What it costs

A single score is roughly 1,500 input tokens of context plus a 150-token JSON response. On Claude Sonnet 4.5 that works out to around 0.7 cents per score — roughly $7 per thousand leads scored at our token sizes. The Log Cost node writes per-call token counts and estimated dollar cost into the n8n execution history so the number is observed rather than assumed.

For comparison: the cheapest commercial AI lead-scoring tool we evaluated charges per contact in the database rather than per scoring event, and was roughly an order of magnitude more expensive at our volume. The custom build paid for itself inside the first month it ran.

Where the friction lives

Three things this workflow will get wrong if you let it.

Missing context becomes false confidence. Early in development Claude would happily score a contact at 78 with almost no firmographic data, because it had a strong behavioural signal. We added a rule into the prompt: if data for a rubric category is missing, score that category at 25% of its maximum and note the gap explicitly in the reasoning. Scores became more honest, and the reasoning field told the sales conversation when to discount the number.

Re-scoring drift. A daily re-score job sounds harmless. It is not. Without a guardrail, lifecycle stages flicker as small score changes cross thresholds. We have not solved this yet — the daily re-score currently runs on every contact enriched in the last seven days, with no threshold-buffer logic. Hysteresis (a contact has to clear a threshold by a margin to move stage, and cannot move backwards more than one stage per day) is on the build list.

JSON discipline. Claude is good at JSON, but not always. The parse step strips markdown fences, runs a try/catch, and defaults to a score of zero if the response cannot be parsed cleanly. A score of zero surfaces as a clear failure in the lifecycle mapping rather than writing garbage into the CRM. Human oversight stays on the contracts that matter.

Practical implications

The point of this is not the workflow. It is the economics.

A capability that would have cost six figures and a quarterly committee in 2022 is now sixteen n8n nodes and a seventy-five-line system prompt, running for cents per score and writing into the systems we already operate.

That is the cost-collapse the earlier piece pointed at, expressed as one operational example. There are five or six others sitting inside our own stack at similar economics — content publishing, lead-to-CRM sync, account intelligence, governance review. We will publish them in turn.

The lead-scoring workflow is yours to take. The n8n JSON, the full system prompt, and the sample payloads we used to validate it are bundled below. We share it because the easiest way to evaluate whether a studio has done the work is to read what they shipped.

If your team is looking at lead scoring — or qualification, routing, or scoring inside the CRM you already run — we are happy to talk through what worked, what we would do differently, and where the trade-offs sit for your stack.