Finance, procurement, and risk teams need to know which counterparties and spend patterns warrant a second look — before the money moves. RealWork is a multi-source forensic engine that cross-references any vendor or grantee against live web data via Bright Data — Secretary of State registration, license status, federal debarment (SAM.gov), litigation, and financial filings — then surfaces the anomalies that documented fraud patterns predict. We proved it at scale on $36.5B of public spending. The same engine runs on any vendor book.
Grant dollars the state cannot account for (FY 22-23 + 23-24)
See the tool at work
LA County publishes 1,652 homeless-service provider invoices as scanned PDFs under the LA Alliance settlement. They're public record — but completely unqueryable. No CSV. No API. An auditor who wants to know what Del Amo Hospital billed last month has to open PDFs one by one. We built a pipeline to fix that — working toward all 1,600+ documents.
of 1,652 total documents · dataset grows as pipeline runs · ledger.json on GitHub ↗
Gemini 2.5 Pro with Google Search grounding identified billing patterns in the extracted invoices consistent with known Medi-Cal billing violations. Findings are under private review prior to any disclosure.
LAFH_SupportiveServices_Oct2024.pdf
1.8 MB · 64 pages · scanned
[unreadable billing summary grid]
[handwritten corrections]
[rotated signature page]
→ to query: open manually
{
"vendor": "OH-HELP, INC.",
"invoice_date": "2024-10-01",
"billed_amount": 1078638.00,
"deliverables": [
"Interim housing beds",
"Supportive services",
"Case management"
],
"confidence": "high"
}
| Provider | Date | Billed | Confidence |
|---|---|---|---|
| Loading… | |||
Extracted from publicly-available LA County invoices · updated as pipeline runs · full dataset on GitHub ↗
California's Grant Information Act (AB-132, Gov Code §8333) requires the state to track post-award disbursement. We pulled both available fiscal years from data.ca.gov's CKAN API and ran the audit. The field is empty for every single record. This is the systemic gap that makes a tool like this necessary.
Grant records across FY 22-23 + FY 23-24 with no centralized spend tracking populated. 100% of both datasets.
Total award value with zero state-level disbursement record. Nobody knows whether the money was spent as awarded.
The pipeline ran against 50,000 DGS purchase orders and surfaced six threshold-edge patterns matching documented procurement-fraud signatures. The strongest example is below. Note the framing: we are not accusing anyone. We are showing what the audit tool flags and how it honestly weakens those flags when contextual evidence emerges — exactly the calibration the State Auditor's office needs.
12 purchase orders at exactly $49,950 each — $50 below the competitive-bidding threshold
The honest read: California's State Contracting Manual prohibits "splitting purchases" to evade competitive bidding (Public Contract Code §§ 10301-10340 require formal bidding above $50,000), and the 2019 Caltrans bid-rigging prosecution (USA v. Yong, Miller, Opp — 49-78 month sentences, $3M restitution) was built on a similar fact pattern.
However, the dates fall during a real wildfire emergency. Cal Fire ramped from ~55 personnel on August 14 to ~1,760 personnel at the Dillon Fire's September 5 peak. Multiple food orders to one caterer in tight succession during active fire response is plausible, and emergency procurement has expedited rules.
What the fire context does not explain: why every PO rounded to exactly $49,950. Genuine emergency procurement scales with actual crew size — 256 firefighters needs less than 1,760 does. Identical-amount POs across a span where crew size varied 7× looks more like "use the threshold as a default PO ceiling" than "this is what the food actually cost."
This is now reframed as a threshold-ceiling pattern warranting a State Auditor review of two questions: (1) is exactly $49,950 the default PO size for this procurement officer, regardless of actual need, and (2) what was the per-meal unit cost across the 12 POs. Answering either definitively requires line-item access the public Power BI does not currently expose.
$4,943-$4,957 to Correctional Health Care Services — 93% buyer concentration
28 purchase orders clustered just under the $5,000 micro-purchase threshold, 93% to a single buyer, repeating "medical supplies" description. Plausible legitimate explanation: recurring supply orders. The pattern still warrants surface review.
~$4,999 to Cal Fire (85% concentration) — "fuel for dept vehicles"
13 fuel purchase orders just under the $5K threshold, same buyer, same description. Could be emergency fuel during fire response, could be the same pattern as Panini Time at smaller dollar amounts. Worth a closer look.
3 POs at exactly $4,999 within 3 days — single buyer, identical scope
3 purchase orders at precisely $4,999 issued over a 3-day window, 100% to Correctional Health Care, all with description "negative pressure wound therapy rental." The exact-amount-plus-tight-timing combination is the strongest split-contract signature we found at this threshold.
6 POs at ~$4,999 to Air Resources Board — "cylinder rental"
3 contracts in a single 7-day window, all at the threshold edge, same description. Pattern matches but vendor itself is large and reputable — likely benign procurement habit rather than intentional avoidance.
The pipeline cross-referenced 500 California nonprofit state-grant recipients against their publicly-filed IRS Form 990 returns via ProPublica. EIN-match sanity validation dropped wrong-EIN false positives (a tiny dormant local chapter being matched against a national org's name). 36 organizations survived as HIGH PRIORITY anomalies. Every number below comes from a public 990 filing. The State Auditor's office decides what warrants follow-up — the pipeline just makes the queue tractable.
State grants 8.5× reported total revenue
An organization reporting $556K in revenue received $4.75M in state grants. Officer compensation and total expenses both spiked sharply in the same year. Charitable explanations exist (multi-year grants disbursed over time, fiscal sponsorship arrangements). The pattern itself warrants verification by an entity with subpoena authority.
$338K officer compensation at a $1.34M-expense organization
Sector median for officer compensation as a share of expenses is roughly 8-10%. Land Together's 25% ratio is more than 2.5x sector norm. The org also received $4.46M in state grants while reporting $1.93M total revenue. Multiple flags surviving validation simultaneously is the strongest signal in our Track B audit.
$722K officer comp, year-over-year spike, $1.6M state grants
Triple-flag: HIGH_OFFICER_COMP_SMALL_ORG, HIGH_OFFICER_COMP_RATIO, OFFICER_COMP_YoY_SPIKE. The signature pattern is state grants flowing in and executive compensation spiking the same fiscal year. Every flag survived our wrong-EIN sanity check.
Officer compensation more than doubled in one year
A large, well-known nonprofit. The 990 filing shows compensation to a single officer of $2.07M in 2023, up from $900K the prior year. May be entirely lawful — large public-facing nonprofit CEOs sometimes earn at this level — but a $1.17M YoY increase coinciding with $4.5M in state grants is exactly the pattern Track B is designed to detect.
Total expenses jumped 384% YoY ($6.7M → $32.5M)
An organization that spent $6.7 million in one fiscal year and $32.5 million the next. Some growth is plausible — large grants do scale organizations — but a 5× increase in a single year warrants verification of where the money went.
Officer compensation: $451K → $2.05M YoY
$451,111 to $2,053,022 in officer compensation in a single fiscal year. This is the largest YoY officer-compensation spike in our entire validated dataset. The org received $2.7M in state grants in the same period.
30 more validated anomalies are in data/track_b/validated_report.md in the repo. Every flag references a public 990 filing. The pipeline that surfaced them is reproducible.
The patterns the tool surfaces are screening signals. Converting one into an actionable lead means writing a tight, cited, defensible dossier that an oversight body can act on. This is the artifact the State Auditor's confidential channel actually wants — not a webpage, not a tweet, a single document where every claim ties to a public record. We wrote one as a worked example.
Officer compensation reported on Form 990 exceeds the entire state grant by approximately 67%
The complete dossier — at data/dossiers/DOSSIER_trybe_inc.md in the repo — names the entity, cites every claim to a public record, lists the four charitable explanations that the State Auditor's subpoena power could rule out (multi-year grant amortization, fiscal sponsorship, comparable-position salary justification, board-approval process), and recommends the specific oversight channel: submission to the California State Auditor's confidential hotline using one of three pre-drafted tip letter templates we shipped (STATE_AUDITOR_TIP_TEMPLATES.md). This is what depth looks like.
The honest answer to "did you find fraud": we found patterns that warrant review by an entity with subpoena authority. The State Auditor's office is exactly that entity. The dossier above is the artifact you hand to them. WHAT_DEPTH_LOOKS_LIKE.md in the repo gives the 9-step methodology that converts an aggregate-pattern flag into a State Auditor-ready dossier.
A six-stage forensic audit pipeline. ETL → heuristic flagging → external verification → primary LLM synthesis → cross-model ensemble validation → transparent reporting. Built with Bright Data, Gemini, AI/ML API, and ProPublica. Hard budget cap, JSONL cost ledger, dead-end log, reproducible end-to-end.
26,907 grant records (CA Grants Portal), 50,000 purchase orders (DGS), 500 nonprofit 990s (ProPublica). All normalized, deduplicated, persisted to SQLite.
Per-source anomaly detection: just-under-threshold PO clusters, repeating exact amounts, buyer-vendor concentration, nonprofit overhead ratios, exec-comp YoY spikes.
Web Unlocker bypasses anti-bot on bizfileonline.sos.ca.gov, CCLD, ProPublica, SAM.gov. SERP API runs 5-variant queries across all flagged entities. Scraping Browser drives the state's public Power BI procurement dashboard — which is where the pipeline surfaced a single named procurement officer signing 5 of 6 just-under-threshold contracts within 17 days. Hard budget cap throughout.
Gemini 2.5 Pro distinguishes real anomalies from data quality issues and DBA-trap false positives. Cleared cases get CLEARED with reasoning; survivors get WARRANTS INVESTIGATION.
EIN-match sanity checks drop false positives (e.g., MADD flagged when we matched a $5K-revenue local chapter, not the national org). 102 of 104 nonprofit flags survived; 36 reached HIGH priority.
Auto-generated markdown reports per round. Every finding cites public records. Dead-end log documents what was cleared, so future investigators don't repeat our work.
The path from "anomaly warranting investigation" to confirmed fraud runs through the State Auditor, the DOJ Procurement Collusion Strike Force, and the courts — not through a hackathon submission. Our role is to ship the tool that surfaces candidates and reduces the cost of investigation.
This tool documents patterns that fraud detection systems flag. Those patterns are derived from publicly available sources: the California State Contracting Manual, the filings in the 2019 Caltrans procurement-fraud prosecution, the State Auditor's published risk frameworks, and academic fraud-detection literature. They are well-known to fraud examiners.
Publishing detection methodology supports oversight; it does not create new evasion opportunities. The defender's advantage is that detection systems cross-reference many signals simultaneously — evading all of them is harder than evading any one.
This tool is intended for use by oversight bodies — the California State Auditor, the DOJ Procurement Collusion Strike Force, qui tam attorneys, accountability journalists, and other entities whose mandate is public integrity. It is not intended for use by parties who would game procurement.