610

merged C# PRs

public repositories

147,958

raw BCR findings

35,871

high-confidence findings

Counting matters: this page separates raw findings from high-confidence findings. Raw findings preserve every affected file, framework surface, and rule hit. High-confidence findings use the corpus database confidence threshold of 0.70 or higher. The former measures blast radius; the latter is the cleaner signal for prioritization.

The uncomfortable pattern: contract risk dominates

The largest finding groups are not exotic security bugs or dramatic runtime crashes. They are ordinary-looking API and contract changes: visibility changes, signature changes, nullable edge cases, exception paths, and async behavior. Those are exactly the changes that can look reasonable in review while still changing what downstream callers experience.

That is the core reason GauntletCI treats pull request risk as a diff problem instead of a whole-codebase cleanliness score. The question is not "is this repository good?" The question is "what did this PR make newly dangerous?"

Top rule families in the corpus

Rule	Signal family	Raw findings
GCI0004	[Obsolete] attribute transitions on public APIs	59,965
GCI0003	Method signature and contract changes	39,628
GCI0006	Null and edge-case handling changes	10,978
GCI0015	Data integrity and silent discard risks	10,389
GCI0016	Async and deadlock candidates	4,040
GCI0024	Resource lifecycle and undisposed disposables	3,435
GCI0010	Hardcoded secrets, URLs, and connection strings	3,225
GCI0001	Mixed-scope or diff-integrity risk	2,674
GCI0036	Pure context mutation and side effects in getters	2,524
GCI0047	Additional behavioral-change signals	1,450

GCI0004 and GCI0003 together account for 99,593 raw findings. That does not mean every finding is a defect. It means API shape and contract changes are the dominant risk surface in this corpus.

Repository distribution, with the outlier left visible

The corpus is intentionally not flattened to hide uncomfortable skew. Large SDK and framework PRs produce more signals because they touch more published surface area. That skew is a feature of the data, not a reason to erase it.

Repository	Corpus PRs	Raw findings	High-confidence
Azure/azure-sdk-for-net	18	42,919	16,875
JamesNK/Newtonsoft.Json	10	12,086	1,034
googleapis/google-api-dotnet-client	17	12,009	3,236
DapperLib/Dapper	7	9,696	107
StackExchange/StackExchange.Redis	10	5,568	825
dotnet/reactive	12	5,546	217
apache/logging-log4net	9	4,716	1,359
dotnet/orleans	14	4,188	681
grpc/grpc-dotnet	12	3,935	243
DevToys-app/DevToys	12	3,787	1,011

Outlier disclosure: Azure SDK PR #57223

Azure SDK for .NET PR #57223 contributes 40,155 raw findings and 16,611 high-confidence findings. That is 27.1% of the corpus raw total and 46.3% of the high-confidence total. Any honest reading of the corpus has to say that out loud.

The right conclusion is not "Azure SDK is bad." The useful conclusion is that multiframework, published-surface-area changes create a different risk profile than small application PRs. For libraries, one signature or visibility change can echo through multiple target frameworks and generated surfaces.

Read the Azure SDK PR #57223 analysis →

Test changes are not a reliable proxy for risk

The corpus contains 178 PRs with no test-file changes recorded. Of those, 131 had at least one Behavioral Change Risk finding, and 46 had at least one high-confidence finding. That does not prove the PRs were wrong. It proves that "tests changed" and "risk was introduced" are different signals.

A reviewer needs both. A test diff shows what behavior the author chose to prove. A risk diff shows what behavior the author may have changed without making that choice explicit.

What this changes for PR review

Review the delta, not the vibe

A polished PR can still alter contracts, exception paths, and runtime assumptions. Risk analysis gives reviewers a concrete checklist tied to the diff.

Treat API shape as production behavior

Visibility and signature changes dominated the corpus. For library and platform teams, API shape is not metadata; it is behavior customers compile against.

Use findings as evidence, not theater

A finding is not a verdict. It is a deterministic pointer to changed behavior that deserves a human decision before merge.

Toward a finding ledger

The next credibility step is not a louder claim about what GauntletCI can find. It is a clearer public surface for how findings move from candidate signal to reviewer decision. Anthropic's Mythos Preview dashboard is useful here as a workflow reference: candidates are triaged, validated, disclosed, patched, and tied to ledger entries.

A GauntletCI finding ledger would be narrower and product-specific: PR, rule ID, changed file, evidence snippet, confidence band, reviewer verdict, disposition, and follow-up outcome. That would make the corpus easier to audit and make it obvious that a finding is evidence for human review, not an automatic defect accusation.

Methodology and limitations

The current local corpus database contains 610 public, already-merged C# pull requests across 61 repositories. The analyzed findings table contains 147,958 triggered findings across 529 PRs and 28 rule IDs. A high-confidence finding is a triggered finding with an `actual_confidence` value of at least 0.70.

The report is not a benchmark of repository quality, maintainer skill, or defect rate. It is a field report about where deterministic change-risk rules fire when applied to real merged diffs. Some findings represent intentional changes. Some represent generated or multiframework duplication. That is why this page reports raw counts, high-confidence counts, and the largest outlier separately.

Sources

GauntletCI corpus fixture export — Public fixture-level corpus export used to cross-check PR, repository, test-change, and finding totals.
Corpus audit runner — Audit script that hydrates corpus fixtures and runs GauntletCI analysis against the local corpus database.
Azure SDK for .NET PR #57223 — The largest outlier in the current corpus; it accounts for 40,155 raw findings and 16,611 high-confidence findings.
Azure SDK PR #57223 deep dive — Internal case study explaining why multiframework API-surface changes can produce unusually large raw finding counts.
Behavioral Change Risk framework — Internal methodology article defining the change-risk categories used throughout the corpus analysis.
Anthropic coordinated vulnerability disclosure dashboard — Public example of a candidate finding, triage, disclosure, patch, advisory, and ledger workflow for AI-assisted vulnerability research.

State of Behavioral Change Risk in .NET