Reducing AI Agent Costs by Making My .NET Repo Fail Faster

/images/2026/0608/Token_shortage.png

My AI coding workflow has changed a lot over time. I have used AI since the early days of ChatGPT 3.5, and I am constantly trying new ways to get better end results from the tools. I have tried structured approaches like GitHub’s Spec Kit, later moved more into planning mode when that became available, and I keep experimenting with ways to improve the plan before letting an agent write code. Spec Kit describes itself as a toolkit for spec-driven development, focused on product scenarios and predictable outcomes instead of “vibe coding” everything from scratch.

My current flow usually starts in planning mode. I use grill-with-docs to improve the context and challenge the plan, ask one or two high-end models to review it, let an agent implement it, run CodeRabbitAI locally, push to GitHub, and then get more feedback from GitHub Copilot, Codex, CodeRabbitAI, and GitHub code quality checks.

That flow works, and the quality is much better than just asking an agent to start coding. But the repeated review loop is where the cost starts to hurt. Every time the agent misses something basic, I pay for another round of feedback, another fix, another explanation, and sometimes another model review.

The frustrating part is that a lot of the feedback is NOT the valuable kind. It is often formatting drift, analyzer warnings, missing XML comments, test issues, obvious boundary violations, or things the repository should have caught before I pushed.

So I changed the repo to move more of that feedback earlier.

Better Feedback, Not No Feedback

I still expect Copilot, Codex, CodeRabbitAI, and GitHub code quality tools to find things. That is why I use them. But I want them to spend their attention on higher-value feedback: design problems, security issues, architecture concerns, bad abstractions, risky edge cases, or places where the implementation does not match the intent.

What I do not want is to spend expensive review cycles being told that formatting is wrong, an analyzer rule failed, a test pattern is inconsistent, an XML comment is missing, or an AppService is directly using something it should not.

That kind of feedback should be local, fast, and boring.

Hardening the Repo

The first change was hardening .editorconfig. Instead of relying only on agent instructions, I moved more expectations into tooling: formatting rules, code style, selected analyzer severities, generated-code exceptions, Razor/Blazorise exceptions, migration suppressions, and high-signal rules.

I also treat warnings as errors, but not blindly. The important part is that .editorconfig and shared build settings control which warnings should actually fail the build. High-value rules can become hard failures, while broader analyzer noise can stay visible without blocking every task.

That matters for agents because it turns selected review feedback into local feedback. If a warning is important enough that Copilot, Codex, CodeRabbitAI, or GitHub code quality would complain about it later, I would rather have the agent fail on it locally before I pay for another review loop.

Then I added three Roslyn analyzer packages:

For test projects I also added:

Together, these give the solution better local feedback around correctness, maintainability, async usage, security-sensitive patterns, performance, and test quality.

The goal is not to turn every warning into a blocker immediately. That would just create a different kind of cost. The better approach is phased: high-value rules should fail early, while broader analyzer findings can stay as warnings until the baseline improves.

Why I Added check.ps1

The biggest practical change was adding check.ps1.

The script is available here: check.ps1.txt.

This gives agents one clear local quality gate instead of a vague instruction like “make sure everything is good”. The agent no longer needs to guess which checks matter, which tests are fast, how formatting should run, how architecture rules are validated, or what safety checks should happen before completion.

The script gives the repo a single contract:

rtk proxy pwsh -NoProfile -ExecutionPolicy Bypass -File ./check.ps1 -Quick
rtk proxy pwsh -NoProfile -ExecutionPolicy Bypass -File ./check.ps1
rtk proxy pwsh -NoProfile -ExecutionPolicy Bypass -File ./check.ps1 -Full

The modes are split by cost:

  • -Quick is for the inner loop while the agent is coding.
  • default mode is for completion checks.
  • -Full is for merge or higher-risk work.

This matters because not every moment needs the heaviest possible validation. A tiered gate gives the agent the right level of feedback at the right time.

What the Script Catches

The script is more than a wrapper around dotnet build.

Quick mode runs the fail-fast checks:

  • NuGet feed preflight
  • restore
  • scoped formatting verification
  • architecture test build
  • architecture tests

Default mode adds:

  • full solution build
  • fast unit/component tests
  • tracked secret hygiene checks

Full mode adds:

  • broader test projects
  • optional integration tests
  • package vulnerability report
  • package deprecation report

It also has command timeouts, total time budgets, timestamped progress, concise output by default, and diagnostic logs when needed.

One detail I really like is that formatting is scoped to changed C# files. The script avoids wasting time on LeptonX, bin, obj, generated files, and unrelated parts of the solution. It also validates the ABP commercial NuGet feed setup before restore, which prevents agents from wasting time debugging noisy restore failures when the real issue is a missing feed key.

The script does not make tokens cheaper by itself. It reduces token waste.

Instead of the agent doing this:

Read many files → guess what broke → run random commands → paste huge logs → ask the model to reason → fix → repeat

it can do this:

Run check.ps1 → get a specific failure → inspect only the relevant files → fix the targeted issue

That is the cost-saving part. The script gives the agent a concrete failure area instead of making it explore the repo blindly. It also keeps the feedback more controlled: scoped formatting, known test groups, architecture checks, secret hygiene, timeouts, and diagnostic logs when needed.

If the script dumped huge logs into the model every time, it would not help much. The value is that the normal output can stay focused, while deeper logs are available only when the agent needs them. Fewer broad file reads, fewer random commands, fewer massive outputs, and fewer low-value review loops all mean less paid AI work.

So check.ps1 is not really about making one command cheaper. It is about making the whole agent loop less wasteful.

Architecture Tests

I also added architecture tests using NetArchTest, xUnit, and Shouldly. These enforce important ABP boundaries, such as keeping application contracts away from EF Core and UI dependencies, keeping managers in the domain layer, preventing AppServices from directly using provider SDKs or raw HTTP clients, and stopping UI projects from bypassing the right layers.

This is exactly the kind of issue an agent can introduce while still producing code that compiles. Before, that feedback would often come from a human review or a higher-cost review tool. Now the repo can catch some of it locally.

Better Instructions, Agents, and Skills for Copilot and Codex

I also updated the instructions for GitHub Copilot, OpenAI Codex, and my local agent setup, but this is not only one instruction file. The repo now has a more complete agent guidance system with AGENTS.md, .github/copilot-instructions.md, specialist agents, and reusable skills.

The skills are important because they give agents more focused guidance depending on the type of work. I have skills for things like core solution guardrails, testing standards, GitHub Actions, Azure, Blazorise, Elsa, Playwright, external-service isolation, and ABP Suite/generated-code handling. That means the agent does not only get generic advice like “write good code”; it gets task-specific rules for the kind of change it is making.

For example, the core guardrail skill tells agents to search before reading files, avoid broad directory scans, stay out of bin, obj, LeptonX, generated files, and migrations unless directly relevant, and keep the task narrow. The testing skill reinforces xUnit, FakeItEasy, Shouldly, BUnit, no Moq, deterministic tests, and which tests are worth running. The GitHub Actions and deployment-related skills add extra caution around secrets, OIDC, CI changes, Azure resources, and anything that can become expensive or risky.

This is also cost control. Broad file reading wastes tokens and makes the agent less focused. Generic instructions help, but focused skills reduce the amount of context the agent has to rediscover every time. The more the repo can tell the agent how to behave for this specific type of work, the less I need to pay for repeated exploration, wrong assumptions, and review loops.

I also added clearer safety rules so agents do not run risky operations like git push, destructive git commands, Azure resource changes, EF migrations, database updates, DNS changes, or secret changes without explicit confirmation.

What I Expect To Save

I do not expect this to remove review feedback. I expect it to improve the quality of the feedback I pay for.

If the repo catches formatting, analyzer, architecture, test, restore, and secret hygiene issues locally, then CodeRabbitAI, Copilot, Codex, and GitHub checks should have less low-value noise to report. They can still find problems, but hopefully those problems are more interesting and more worth paying for.

That is the point. I want fewer rounds where paid tools tell me something a local gate could have told me.

Main Takeaway

Prompts are useful, but they are not enough.

A prompt can tell an agent to follow architecture rules, but a test can fail when it does not. A prompt can tell an agent to format code, but dotnet format --verify-no-changes can prove whether it happened. A prompt can say not to use the wrong patterns, but analyzers and repo checks can catch them earlier.

For me, this was mostly about cost. The more expensive AI coding gets, the more important it becomes to move boring feedback as close to the code as possible.

The best review feedback is the feedback that still requires intelligence. Everything else should fail locally.

Hopefully this gives me better results, and cheaper ones too, but only time will tell.

Latest Posts