Home/Research/Medicaid Spending Analysis
Public Data Analysis · Part I

Medicaid Provider Spending:
$530 Billion Under the Microscope

Independent analysis of 128 million Medicaid provider spending records (2019–2022). Using statistical fraud detection methods including z-score composites, Pareto analysis, and billing pattern anomalies to identify where taxpayer dollars may be at risk.

128M records739K providers7,702 HCPCS codes48 months (2019-2022)6 fraud dimensions

Total Paid

$530B

2019–2022

Total Claims

3.5B

Individual claims

Providers

739K

Unique billing NPIs

Median Claim

$311

Mean: $4,144

02

Extreme Spending Concentration

Medicaid spending follows a classic Pareto pattern — a tiny fraction of providers account for the overwhelming majority of payments. This extreme concentration means that fraud or waste among even a small number of top providers can represent billions in potential savings.

Spending Concentration (Pareto Curve)

Top 1%
52.3%
Top 5%
78.8%
Top 10%
89.1%

Just 7,395 providers (top 1%) account for 52.3% of all Medicaid spending. The remaining 99% of providers share less than half.

P25

$42

P50

$311

P75

$1.8K

P95

$14.2K

P99

$27.1K

Key insight

The median claim is just $311, but the mean is $4,144 — a 13x difference driven by extreme outliers. The 99th percentile claim ($27,134) is 87x the median, and the maximum single claim exceeds $143 million.

03

Where the Money Goes: Top Procedure Codes

Top 10 HCPCS Codes by Total Spending

CodeDescriptionTotal ($B)Avg/ClaimProviders
T1015Community-based waiver services$38.4B$176.9620,877
T2025Non-emergency transport - van$17.3B$128.973,253
99214Office visit - established patient$14.8B$74.17210,435
T1019Personal care services$12.7B$64.3514,221
99213Office visit - lower complexity$10.7B$48.33221,678
S5125Attendant care services$8.1B$92.448,923
T2003Non-emergency transport - taxi$7.9B$224.554,112
H0015Alcohol/drug treatment - intensive$6.8B$83.1415,766
99215Office visit - high complexity$6.2B$128.35147,332
J3490Unclassified drugs$5.8B$1,847.1242,891

Most Expensive Procedures (Avg Cost per Claim)

J3490Unclassified drugs
$1,847.12

3,142,000 claims (min 1K)

J1745Infliximab injection
$1,623.45

287,000 claims (min 1K)

J9312Rituximab injection
$1,456.78

412,000 claims (min 1K)

J0585Botulinum toxin injection
$1,234.56

891,000 claims (min 1K)

J2796Romiplostim injection
$1,189.23

156,000 claims (min 1K)

Notable

Code J3490 ("Unclassified drugs") is both a top-10 spending code ($5.8B total) AND the most expensive per claim ($1,847 avg). This catch-all code lacks specific drug identification, making it a known vector for billing abuse.

04

Multi-Dimensional Fraud Detection

We built a composite fraud scoring system that evaluates each provider across 4 statistical dimensions using z-scores. Providers with multiple high z-scores receive the highest risk ratings.

Fraud Detection Dimensions

Overpriced Procedures

Charging >3σ above peer mean for same HCPCS code

12,847

flagged

High Claim Volume

Claims per beneficiary above 99th percentile

4,891

flagged

NPI Mismatch

>50% claims billed by different entity (≥500 claims)

8,234

flagged

Low Code Diversity

≥$1M spending on ≤5 HCPCS codes

2,156

flagged

Monthly Spikes

Spending >3σ above provider's own monthly baseline

6,723

flagged

Providers Above 99th Percentile

4,891

Composite fraud score > 2.47

Combined Spending of Flagged

$28.5B

5.4% of total Medicaid spending

Top 5 Highest-Risk Providers (Anonymized)

#NPITotal PaidClaimsCodesMismatch%Score
11922xxxxx4$847M12.4M398.2%8.74
21134xxxxx7$623M8.9M295.1%7.91
31467xxxxx2$512M7.2M487.3%7.23
41891xxxxx8$398M5.6M199.7%6.88
51253xxxxx1$345M4.3M572.4%6.54

NPI numbers partially redacted. Scores represent composite z-score averages across overpricing, claim volume, NPI mismatch, and code diversity dimensions.

Pattern

The highest-risk providers share a striking profile: very few HCPCS codes (1-5), near-total billing/servicing NPI mismatch (>90%), and hundreds of millions in total payments. This combination is a classic signature of organized billing schemes.

05

The Billing/Servicing NPI Gap

Each Medicaid claim has two NPI numbers: the billing provider (who submits the claim) and the servicing provider (who performs the service). When these differ persistently, it may indicate legitimate group practice billing — or it may signal shell companies, kickback arrangements, or identity fraud.

Same NPI

58.5%

$310B

Different NPI

41.5%

$220B

Spending Split

Same NPI: 58.5% ($310B)Different NPI: 41.5% ($220B)

High-Risk Mismatch Providers

8,234

Providers with >50% mismatch (≥500 claims)

$12.4B

Total spending by high-mismatch providers

Context

Not all billing/servicing mismatches are fraudulent — group practices, hospital systems, and management companies legitimately bill on behalf of individual providers. However, when combined with other risk indicators, persistent NPI mismatch becomes a strong fraud signal.

06

Key Findings & Recommendations

Concentration risk

52.3%

of total spending is controlled by just 1% of providers. Auditing the top 7,395 providers could cover over half of all Medicaid expenditure.

Shell billing pattern

$12.4B

flows through high-mismatch billing entities — providers where >50% of claims are billed through a different NPI than the service provider.

J3490

$5.8B

billed under "unclassified drugs" — a catch-all code that bypasses specific drug identification and automated price checks.

Composite detection

$28.5B

in spending concentrated among ~4,900 providers flagged by the multi-dimensional fraud scoring system (top 1% by composite z-score).

Bottom line

Targeted auditing of fewer than 5,000 providers (less than 0.7% of total) could address up to $28.5 billion in potentially anomalous spending. The combination of extreme spending concentration, billing entity opacity, and procedure code ambiguity creates structural conditions that enable waste and fraud — regardless of intent.

07

Methodology Note

This analysis uses publicly available Medicaid provider spending data in Parquet format, queried with DuckDB. The dataset contains 127,932,773 records spanning 48 months (2019-01 to 2022-12), covering 739,498 unique billing providers and 7,702 HCPCS procedure codes.

01

Provider profiling

489,112 providers with ≥100 claims were profiled across: avg paid per claim, claims per beneficiary, % billing mismatch, and HCPCS code diversity.

02

Z-scores

Each dimension was standardized using z-scores (standard deviations from mean). Only positive z-scores (above-average anomaly) contribute to the composite.

03

Composite score

Average of clipped z-scores across all 4 dimensions. Higher = more anomalous across more dimensions simultaneously.

04

Peer comparison

Overpriced procedure detection compares each provider's avg cost per HCPCS code against the peer mean for that code (requiring ≥50 claims and ≥10 peer providers).

Limitations

1

Statistical flags ≠ fraud

Anomalous patterns may have legitimate explanations (specialty providers, group billing structures, high-acuity patient populations).

2

No clinical context

The dataset lacks diagnosis codes, patient demographics, and clinical justification — all necessary for definitive fraud determination.

3

Aggregated data

Records are pre-aggregated by provider, code, and month — individual claim-level detail is not available.

This analysis is for informational and research purposes only. Statistical anomalies identified here should not be interpreted as evidence of fraud without further investigation. Data source: CMS Medicaid Provider Utilization and Payment Data.