30. April 2026

Copilot billing is changing. Token noise is now real money.

GitHub has announced that Copilot is moving from premium request based billing to usage-based billing on June 1, 2026.

That sounds like a small billing detail, but for people using Copilot as more than autocomplete it matters.

The old world was mostly this:

Did I use a premium request?

The new world is closer to this:

How many tokens did I send, receive and cache — and which model did I use?

According to GitHub, Copilot usage will consume GitHub AI Credits. Input tokens, output tokens and cached tokens all count. One AI Credit equals $0.01 USD. Copilot Pro+ will still cost $39/month, but that now maps to 3,900 AI Credits/month. Code completions and next edit suggestions stay included, but chat, CLI, agent mode, Copilot Spaces, Spark and third-party coding agents consume credits.

So yes, the base price is not changing. But the shape of the bill is.

Why this matters

I use these tools like extra hands when coding. They run tests, inspect logs, look through files, explain errors, fix YAML, retry commands and sometimes go on a nice little adventure of two steps forward and seven back.

That adventure used to mostly cost patience.

Now it also costs tokens.

And the annoying thing is that a lot of those tokens are just noise:

successful test output
repeated stack traces
massive npm output
long git diff dumps
friendly assistant paragraphs that say very little
boilerplate like “you are absolutely right” before finally getting to the answer

I do not want to pay for politeness. I want the fix.

This is where rtk fits in

rtk is a CLI proxy that reduces noisy command output before it reaches the AI agent.

Their own example for a typical two hour coding session is quite telling:

Scenario	CLI tokens sent to agent
Without rtk	~210,000
With rtk	~23,000
Difference	~187,000 fewer tokens

That is about 89% less CLI noise.

Even if your exact numbers are different, the direction is obvious. The AI does not need every line of a successful restore, build or test run. It needs the result, the failing parts and enough context to act.

If 40% of an agent session is terminal noise, and rtk removes 89% of that noise, the whole session becomes roughly 35% smaller before you have changed a single prompt.

That is not just cheaper. It is cleaner context.

This is where caveman fits in

caveman attacks another problem: assistant verbosity.

The idea is wonderfully silly and very useful: make the AI talk less while keeping the technical meaning.

The project claims around 75% fewer output tokens for responses and also includes tooling for compressing memory files such as CLAUDE.md.

Example:

Long version: “The reason your React component is re-rendering is likely because…”

Becomes:

Short version: “Inline object prop = new ref each render. Wrap in useMemo.”

Same point. Less fog.

If 30% of your Copilot or agent session is assistant explanation, and caveman reduces that part by 75%, that is another 22.5% reduction in total token volume for that type of session.

Simple numbers

Let us say a heavy agent workflow burns the equivalent of 100,000 tokens of mixed context, terminal output and assistant text.

A realistic waste split might look like this:

Area	Share	Tool	Reduction	Tokens saved
CLI noise	40,000	rtk	89%	35,600
Assistant verbosity	30,000	caveman	75%	22,500
Useful context/code	30,000	keep it	0%	0
Total	100,000			58,100

That leaves about 41,900 tokens instead of 100,000.

This is not perfect accounting, because pricing depends on model, input/output/cache mix and GitHub’s final billing calculation. But it is good enough to show the point:

When billing moves to token usage, wasted context becomes wasted money.

For a Pro+ user, the included monthly allowance is 3,900 AI Credits, or $39 worth of usage. If tools like rtk and caveman cut noisy token usage by even 30-50% in real workflows, that means more agent runs before hitting the included allowance or needing extra budget.

My take

This is probably the correct direction from GitHub. Agentic coding is much heavier than autocomplete, and someone has to pay for the compute.

But it also means developers need to get better at token hygiene.

Not because we should be cheap with useful context. Useful context is the whole game.

But we should absolutely be cheap with garbage context.

So my default setup going forward is simple:

use Copilot for the work
use lighter models when the task is routine
use rtk to keep terminal output sane
use caveman or similar compression to keep assistant output short
check billing preview and actual usage instead of guessing

The AI coding workflow is not going away.

But the free buffet feeling is.

Time to stop feeding the model junk.

Copilot billing is changing. Token noise is now real money.

Copilot billing is changing. Token noise is now real money.

Why this matters

This is where rtk fits in

This is where caveman fits in

Simple numbers

My take

Sources