Corpus Evidence

← All articles

State of Behavioral Change Risk in .NET

Most code-review content argues from taste. This report starts from a local corpus: 610 already-merged C# pull requests, 61 repositories, and the Behavioral Change Risk signals GauntletCI found in the diffs.

Eric Cogen·Founder, GauntletCI··11 min read
610
merged C# PRs
61
public repositories
148,327
raw BCR findings
35,871
high-confidence findings

Counting matters: this page separates raw findings from high-confidence findings. Raw findings preserve every affected file, framework surface, and rule hit. High-confidence findings use the corpus database confidence threshold of 0.70 or higher. The former measures blast radius; the latter is the cleaner signal for prioritization.

The uncomfortable pattern: contract risk dominates

The largest finding groups are not exotic security bugs or dramatic runtime crashes. They are ordinary-looking API and contract changes: visibility changes, signature changes, nullable edge cases, exception paths, and async behavior. Those are exactly the changes that can look reasonable in review while still changing what downstream callers experience.

That is the core reason GauntletCI treats pull request risk as a diff problem instead of a whole-codebase cleanliness score. The question is not "is this repository good?" The question is "what did this PR make newly dangerous?"

Top rule families in the corpus

RuleSignal familyRaw findings
GCI0004Public API exposure and visibility changes59,965
GCI0003Method signature and contract changes39,628
GCI0006Null and edge-case handling changes10,978
GCI0015Exception-path changes10,389
GCI0016Async and deadlock candidates4,040
GCI0024Dangerous API usage3,435
GCI0010Thread-safety and concurrency risks3,225
GCI0001Mixed-scope or diff-integrity risk2,674
GCI0036Performance hot-path risks2,524
GCI0047Additional behavioral-change signals1,450

GCI0004 and GCI0003 together account for 99,593 raw findings. That does not mean every finding is a defect. It means API shape and contract changes are the dominant risk surface in this corpus.

Repository distribution, with the outlier left visible

The corpus is intentionally not flattened to hide uncomfortable skew. Large SDK and framework PRs produce more signals because they touch more published surface area. That skew is a feature of the data, not a reason to erase it.

RepositoryCorpus PRsRaw findingsHigh-confidence
Azure/azure-sdk-for-net1842,91916,875
JamesNK/Newtonsoft.Json1012,0861,034
googleapis/google-api-dotnet-client1712,0093,236
DapperLib/Dapper79,696107
StackExchange/StackExchange.Redis105,568825
dotnet/reactive125,546217
apache/logging-log4net94,7161,359
dotnet/orleans144,188681
grpc/grpc-dotnet123,935243
DevToys-app/DevToys123,7871,011

Outlier disclosure: Azure SDK PR #57223

Azure SDK for .NET PR #57223 contributes 40,156 raw findings and 16,611 high-confidence findings. That is 27.1% of the corpus raw total and 46.3% of the high-confidence total. Any honest reading of the corpus has to say that out loud.

The right conclusion is not "Azure SDK is bad." The useful conclusion is that multiframework, published-surface-area changes create a different risk profile than small application PRs. For libraries, one signature or visibility change can echo through multiple target frameworks and generated surfaces.

Read the Azure SDK PR #57223 analysis →

Test changes are not a reliable proxy for risk

The corpus contains 178 PRs with no test-file changes recorded. Of those, 133 had at least one Behavioral Change Risk finding, and 46 had at least one high-confidence finding. That does not prove the PRs were wrong. It proves that "tests changed" and "risk was introduced" are different signals.

A reviewer needs both. A test diff shows what behavior the author chose to prove. A risk diff shows what behavior the author may have changed without making that choice explicit.

What this changes for PR review

Review the delta, not the vibe

A polished PR can still alter contracts, exception paths, and runtime assumptions. Risk analysis gives reviewers a concrete checklist tied to the diff.

Treat API shape as production behavior

Visibility and signature changes dominated the corpus. For library and platform teams, API shape is not metadata; it is behavior customers compile against.

Use findings as evidence, not theater

A finding is not a verdict. It is a deterministic pointer to changed behavior that deserves a human decision before merge.

Methodology and limitations

The current local corpus database contains 610 public, already-merged C# pull requests across 61 repositories. The analyzed findings table contains 148,327 triggered findings across 535 PRs and 29 rule IDs. A high-confidence finding is a triggered finding with an `actual_confidence` value of at least 0.70.

The report is not a benchmark of repository quality, maintainer skill, or defect rate. It is a field report about where deterministic change-risk rules fire when applied to real merged diffs. Some findings represent intentional changes. Some represent generated or multiframework duplication. That is why this page reports raw counts, high-confidence counts, and the largest outlier separately.

Sources and scope

This report is based on local GauntletCI corpus artifacts, public pull request sources, and internal methodology articles. Counts are reported from the local corpus database used during this update.

  • GauntletCI corpus fixture exportPublic fixture-level corpus export used to cross-check PR, repository, test-change, and finding totals.
  • Corpus audit runnerAudit script that hydrates corpus fixtures and runs GauntletCI analysis against the local corpus database.
  • Azure SDK for .NET PR #57223The largest outlier in the current corpus; it accounts for 40,156 raw findings and 16,611 high-confidence findings.
  • Azure SDK PR #57223 deep diveInternal case study explaining why multiframework API-surface changes can produce unusually large raw finding counts.
  • Behavioral Change Risk frameworkInternal methodology article defining the change-risk categories used throughout the corpus analysis.

Use this report with

About the author

Eric Cogen -- Founder, GauntletCI

Eric Cogen is a senior .NET engineer with twenty years in production. He has shipped payments systems, internal platforms, and critical line-of-business applications — the kind where a 2 a.m. alert wasn't an emergency, it was a regular Tuesday. GauntletCI is the pre-commit checklist he wishes he had run before every commit.