How I Cut 70% of Tokens from My Claude Skill

Moving business logic out of the prompt and into a deterministic script changed everything about cost, consistency, and speed.

Code on a dark monitor showing token_count values with a hand on a keyboard

March 26, 2026

The Problem: A Skill That Worked — But Burned Tokens Every Run

I built a Claude Skill for our sales ops team. Every morning, someone uploads two Salesforce Excel exports (bookings and shipments) plus an optional Epicor backlog file, types “run daily email,” and gets back a polished HTML summary email ready to paste into Outlook.

It worked great. The output was consistent, the formatting was correct, and leadership loved the email. But under the hood, every single run was expensive — and I didn’t realize how expensive until I looked at the token math.

What Was Actually Happening Each Run

The original SKILL.md file was a 244-line, 9,609-character instruction document. It contained everything Claude needed to generate the email from scratch: column mappings for three different file formats, aggregation logic, sorting rules, narrative generation patterns, dollar formatting rules, HTML rendering specifications down to exact inline CSS for Outlook compatibility, section colors, table column definitions, and spacer pixel heights.

Every time a user typed “run daily email,” Claude had to:

Read ~2,400 tokens of instructions → Write 200–400 lines of Python code from scratch → Execute the code → Debug if anything went wrong → Generate the HTML output. Every. Single. Morning.

Claude was re-deriving the same parsing logic, the same aggregation rules, the same HTML template — burning through output tokens to write code it had effectively already written yesterday. And because LLMs are non-deterministic, each run had a small but real chance of introducing subtle variations: a slightly different sort order, a missing edge case, a formatting inconsistency.

The numbers:

9.6K

Characters of instructions (original SKILL.md)

244

Lines Claude had to internalize per run

~2,400

Input tokens just for the skill instructions

The Insight: Separate What Changes from What Doesn’t

Here’s what I realized: almost nothing about this task requires AI judgment at runtime. The column mappings are fixed. The aggregation logic is fixed. The HTML template is fixed. The sorting, formatting, and narrative patterns — all fixed. The only variable is the data in the uploaded files.

I was using an LLM to do what a Python script does better: deterministic data transformation. Claude’s real value was in designing the logic, not re-executing it every morning.

The Architectural Shift

I extracted all the business logic from the SKILL.md instructions into a standalone daily_email.py script (794 lines). Then I rewrote the SKILL.md to be a thin pointer — just enough for Claude to know which files to look for and which script to run.

	Before	After
SKILL.md	244 lines · 9,609 chars (~2,400 tokens)	49 lines · 2,915 chars (~730 tokens)
Python code	Written by Claude every run (200–400 lines of output tokens)	Pre-built script, 0 lines generated at runtime
Claude’s job at runtime	Read instructions → write code → execute → debug → present	Copy script → run 1 bash command → present output
Total token load per run	~2,400 input + heavy output	~730 input + minimal output

What the New SKILL.md Looks Like

The new instruction file is 49 lines. The core execution block is literally two lines of bash:

bash

# The entire runtime instruction:
cp /mnt/skills/user/.../daily_email.py /home/claude/
cd /home/claude && python3 daily_email.py

The rest of the SKILL.md is just file-matching patterns (so Claude knows which uploads to expect), a brief description of what the script does (so Claude can troubleshoot errors), and the output spec for QA reference. No column mappings. No aggregation logic. No HTML rendering rules. No formatting specs.

Why This Works So Well

70%

Fewer tokens per run

100%

Output consistency (deterministic script)

Lines of bash to execute (vs. 200–400 lines of generated Python)

Token savings. The input tokens dropped from ~2,400 to ~730 just on the instruction read. But the bigger savings are on the output side — Claude no longer writes hundreds of lines of Python. It runs a command and presents a file. The output token cost drops from heavy (code generation + debugging) to near-zero (a one-sentence confirmation).

Perfect consistency. The Python script produces identical output for identical input. No more subtle LLM variations between runs. The HTML template, colors, formatting, and narrative logic are locked in. When someone on the team runs it Tuesday and someone else runs it Wednesday, the emails look exactly the same — just with different data.

Faster execution. No code generation step, no potential debug loops. The script runs in seconds. The total interaction goes from a multi-step chain to a single bash command.

Easier maintenance. When the Salesforce export format changes or leadership wants a new section, I update the Python script — one file, version-controlled, testable. I don’t have to rewrite and re-test a 244-line prompt.

Where This Pattern Works

This “thin skill + deterministic script” pattern is ideal when:

Great Fit

✓Fixed input format (same columns every time)
✓Deterministic logic (aggregation, sorting, formatting)
✓Template-driven output (HTML, reports, emails)
✓Repeated execution (daily, weekly cadence)
✓Multiple users running the same task
✓Output consistency matters

Not the Best Fit

✗Unstructured or variable inputs
✗Tasks requiring real AI judgment
✗One-off exploratory analysis
✗Outputs that need creative variation
✗Rapidly changing requirements (easier to tweak a prompt than rewrite code)
✗Simple tasks where the prompt is already small

The Tradeoff: Upfront Investment

The original approach had a real advantage: I described what I wanted in natural language and Claude figured out the implementation. That’s powerful for prototyping. I could iterate on the SKILL.md, tweak a sentence, and get different output immediately.

Moving to a script meant I had to actually write (or have Claude help me write) 794 lines of production Python. That’s an upfront investment. But for a task that runs every business day across multiple team members, the payback period was about one week of runs.

The mental model: Use the full-prompt approach to prototype and iterate. Once the output is stable and you’re running it repeatedly, extract the logic into a script. Your SKILL.md becomes a thin wrapper. You keep the AI where it adds value (understanding intent, handling errors, presenting results) and move the deterministic work where it belongs (code).

Bonus Savings: Right-Sizing the Model

Here’s something I didn’t think about initially: once your skill is stable and working, ask Claude which model it should actually be running on. This is a conversation most people skip, and it’s leaving money on the table.

In my case, once the heavy logic was extracted into a deterministic Python script, what’s left for the AI to do at runtime? Identify the uploaded files, run a bash command, and present the result. That’s not Opus-level work. Claude recommended dropping to Sonnet for this skill — perfectly capable of the file detection and script execution, at a fraction of the cost per token.

The Multi-Skill Split for High-Frequency Use

My daily email runs once per morning, so a single Sonnet-powered skill made sense. But if this were running multiple times a day — say, regional teams each generating their own version throughout the day — there’s an even more aggressive optimization: split it into two skills at two different model tiers.

Skill 1 (Haiku): File detection, script execution, data parsing, table generation — the mechanical parts. Haiku is fast, cheap, and more than capable of running a bash command and presenting an HTML file. Skill 2 (Sonnet): Just the narrative summary — the bookings analysis bullets, the shipment context, the backlog interpretation, and the key takeaway paragraph. This is the part that actually benefits from a smarter model’s ability to synthesize and write.

The vast majority of the token spend (file I/O, script execution, HTML rendering) gets handled by the cheapest model, and you only invoke the more expensive model for the small slice of work that genuinely requires language understanding. For a skill running 5–10 times a day across a team, this split can meaningfully reduce your monthly cost.

The takeaway: don’t just optimize what the AI does — optimize which AI does it. Once a skill is stable, evaluate the model tier the same way you’d evaluate compute resources for any other workload.

How to Apply This to Your Skills

If you have a Skill that Claude runs repeatedly with similar inputs and you want consistent outputs, here’s the playbook:

1. Audit your SKILL.md. How much of it is deterministic logic that doesn’t need AI judgment? Column mappings, formatting rules, template HTML, aggregation formulas — these are script candidates.

2. Build the script. Extract the deterministic parts into Python (or whatever language fits). Use Claude to help you write it — that’s where the AI shines. Test it thoroughly.

3. Shrink the SKILL.md. Replace the detailed instructions with: file detection patterns, a “copy and run” command, what the script does (brief, for troubleshooting), and output location/format for QA.

4. Deploy to your project. Add both files (slim SKILL.md + script) to your Claude Project. Share the project with your team. Everyone gets the same consistent, token-efficient experience.

5. Right-size the model. Ask Claude which model tier your skill actually needs. If it’s mostly script execution, drop to Sonnet or Haiku. If you run it frequently and there’s a clear split between mechanical work and AI judgment, consider breaking it into two skills at different tiers.

The result: a skill that costs less, runs faster, and produces identical output every time. The AI stays in the loop for understanding user intent and presenting results — but the heavy lifting is deterministic code, where it should be.

LinkedIn Facebook X / Twitter Reddit WhatsApp Email