Counting matters: this page separates raw findings from high-confidence findings. Raw findings preserve every affected file, framework surface, and rule hit. High-confidence findings use the corpus database confidence threshold of 0.70 or higher. The former measures blast radius; the latter is the cleaner signal for prioritization.
The uncomfortable pattern: contract risk dominates
The largest finding groups are not exotic security bugs or dramatic runtime crashes. They are ordinary-looking API and contract changes: visibility changes, signature changes, nullable edge cases, exception paths, and async behavior. Those are exactly the changes that can look reasonable in review while still changing what downstream callers experience.
That is the core reason GauntletCI treats pull request risk as a diff problem instead of a whole-codebase cleanliness score. The question is not "is this repository good?" The question is "what did this PR make newly dangerous?"
Top rule families in the corpus
| Rule | Signal family | Raw findings |
|---|---|---|
| GCI0004 | Public API exposure and visibility changes | 59,965 |
| GCI0003 | Method signature and contract changes | 39,628 |
| GCI0006 | Null and edge-case handling changes | 10,978 |
| GCI0015 | Exception-path changes | 10,389 |
| GCI0016 | Async and deadlock candidates | 4,040 |
| GCI0024 | Dangerous API usage | 3,435 |
| GCI0010 | Thread-safety and concurrency risks | 3,225 |
| GCI0001 | Mixed-scope or diff-integrity risk | 2,674 |
| GCI0036 | Performance hot-path risks | 2,524 |
| GCI0047 | Additional behavioral-change signals | 1,450 |
GCI0004 and GCI0003 together account for 99,593 raw findings. That does not mean every finding is a defect. It means API shape and contract changes are the dominant risk surface in this corpus.
Repository distribution, with the outlier left visible
The corpus is intentionally not flattened to hide uncomfortable skew. Large SDK and framework PRs produce more signals because they touch more published surface area. That skew is a feature of the data, not a reason to erase it.
| Repository | Corpus PRs | Raw findings | High-confidence |
|---|---|---|---|
| Azure/azure-sdk-for-net | 18 | 42,919 | 16,875 |
| JamesNK/Newtonsoft.Json | 10 | 12,086 | 1,034 |
| googleapis/google-api-dotnet-client | 17 | 12,009 | 3,236 |
| DapperLib/Dapper | 7 | 9,696 | 107 |
| StackExchange/StackExchange.Redis | 10 | 5,568 | 825 |
| dotnet/reactive | 12 | 5,546 | 217 |
| apache/logging-log4net | 9 | 4,716 | 1,359 |
| dotnet/orleans | 14 | 4,188 | 681 |
| grpc/grpc-dotnet | 12 | 3,935 | 243 |
| DevToys-app/DevToys | 12 | 3,787 | 1,011 |
Outlier disclosure: Azure SDK PR #57223
Azure SDK for .NET PR #57223 contributes 40,156 raw findings and 16,611 high-confidence findings. That is 27.1% of the corpus raw total and 46.3% of the high-confidence total. Any honest reading of the corpus has to say that out loud.
The right conclusion is not "Azure SDK is bad." The useful conclusion is that multiframework, published-surface-area changes create a different risk profile than small application PRs. For libraries, one signature or visibility change can echo through multiple target frameworks and generated surfaces.
Read the Azure SDK PR #57223 analysis →Test changes are not a reliable proxy for risk
The corpus contains 178 PRs with no test-file changes recorded. Of those, 133 had at least one Behavioral Change Risk finding, and 46 had at least one high-confidence finding. That does not prove the PRs were wrong. It proves that "tests changed" and "risk was introduced" are different signals.
A reviewer needs both. A test diff shows what behavior the author chose to prove. A risk diff shows what behavior the author may have changed without making that choice explicit.
What this changes for PR review
Review the delta, not the vibe
A polished PR can still alter contracts, exception paths, and runtime assumptions. Risk analysis gives reviewers a concrete checklist tied to the diff.
Treat API shape as production behavior
Visibility and signature changes dominated the corpus. For library and platform teams, API shape is not metadata; it is behavior customers compile against.
Use findings as evidence, not theater
A finding is not a verdict. It is a deterministic pointer to changed behavior that deserves a human decision before merge.
Methodology and limitations
The current local corpus database contains 610 public, already-merged C# pull requests across 61 repositories. The analyzed findings table contains 148,327 triggered findings across 535 PRs and 29 rule IDs. A high-confidence finding is a triggered finding with an `actual_confidence` value of at least 0.70.
The report is not a benchmark of repository quality, maintainer skill, or defect rate. It is a field report about where deterministic change-risk rules fire when applied to real merged diffs. Some findings represent intentional changes. Some represent generated or multiframework duplication. That is why this page reports raw counts, high-confidence counts, and the largest outlier separately.
Sources and scope
This report is based on local GauntletCI corpus artifacts, public pull request sources, and internal methodology articles. Counts are reported from the local corpus database used during this update.
- GauntletCI corpus fixture export — Public fixture-level corpus export used to cross-check PR, repository, test-change, and finding totals.
- Corpus audit runner — Audit script that hydrates corpus fixtures and runs GauntletCI analysis against the local corpus database.
- Azure SDK for .NET PR #57223 — The largest outlier in the current corpus; it accounts for 40,156 raw findings and 16,611 high-confidence findings.
- Azure SDK PR #57223 deep dive — Internal case study explaining why multiframework API-surface changes can produce unusually large raw finding counts.
- Behavioral Change Risk framework — Internal methodology article defining the change-risk categories used throughout the corpus analysis.
