55% of "DIRECT" authorization claims in publisher ads.txt files are false. 0.012% of identity-sharing requests carry valid consent on first visit. Approximately 5% of the operational ad-tech data economy is properly authorized.
1,757,362 cross-verified triples. 84 SSP registries (1.56M sellers). 142,000 websites crawled. 21,397 publisher ads.txt files. March 14–23, 2026.
Open evidence.html in any browser. No server required for the narrative and charts.
For live publisher verification (type a domain, see its false claims):
# Decompress the data first
gunzip false_direct_claims.jsonl.gz
# Start the API
deno run --allow-net --allow-read evidence_api.ts
# Open http://localhost:8890/evidenceOr query the API directly:
curl "http://localhost:8890/api/verify?publisher=cnn.com"
curl "http://localhost:8890/api/ssp?ssp=google.com"
curl "http://localhost:8890/api/ssps"
curl "http://localhost:8890/api/summary"This is not three separate findings. It is one system.
-
Authorization is forged. 29% of DIRECT claims are contradicted by the SSP's own registry (seller classified as INTERMEDIARY). Another 26% reference seller IDs that don't exist. Stable across 8 successive SSP expansions and across both curated and independently crawled publisher populations.
-
Consent is absent. 0.012% of cookie sync requests carry valid TCF consent on first visit. 77% have no consent parameter at all. The consent banner appears 2–5 seconds after identity has already been shared.
-
Identity proliferates. Average ad-tech-enabled site shares user identity with 5.1 companies. The worst shares with 294 in 10 seconds. 721,000 sync requests captured across 186,000 sites.
-
The structure. 85% of ad-tech-enabled sites have no ads.txt at all. Of the 15% that do, 55% of DIRECT claims are false. Of the companies actually observed on those pages, 24% operate outside any authorization framework. Net: ~5% of ad-tech activity falls within functioning authorization. Nine years after ads.txt was introduced, the false rate has not converged toward zero.
| File | Description |
|---|---|
evidence.html |
Visual evidence brief with interactive verification (4 findings) |
evidence_api.ts |
Deno server — loads data into memory, serves queries |
false_direct_claims.jsonl.gz |
962,891 false (publisher, SSP, seller_id) triples — CONTRADICTED + PHANTOM only (gzipped) |
supply_chain_summary.json |
Aggregate totals — two rates reported (strict 29%, inclusive 55%) |
publisher_profiles.jsonl |
Per-publisher ads.txt depth and crawl traffic |
identity_graph.json |
5,816 sync co-occurrence edges across 201 companies |
consent_measurement.json |
Per-company consent field presence rates |
crawl_summary.json |
Site distribution and geographic breakdown |
ERRATA.md |
Self-audit: what we got wrong and corrected |
- 29% CONTRADICTED (503,387 claims): The SSP's sellers.json explicitly classifies the account as INTERMEDIARY, but the publisher claims DIRECT. No ambiguity.
- 55% inclusive (962,891 claims): Adds phantom seller IDs that don't exist in the registry. Could be stale, fabricated, or (for Google) hidden behind the confidentiality flag.
Both rates are stable across successive SSP expansions (14→24→37→62→84 SSPs) and across both curated (top-1000) and independently crawled (long-tail) publisher datasets.
16 intermediary accounts appear in more than half of all ads.txt files analyzed. The most ubiquitous (Rubicon seller_id 17960) is in 61% of files. These entries arrive via templates distributed by intermediaries to thousands of publishers.
2,033 (SSP, seller_id) pairs appear in 100+ publisher files each; 80 pairs appear in 1,000+ files. This is not individual publisher configuration — it is industrial-scale template injection.
gunzip false_direct_claims.jsonl.gz
# Strict false count (CONTRADICTED only)
grep -c '"CONTRADICTED"' false_direct_claims.jsonl
# → 503,387
# Inclusive false count (CONTRADICTED + PHANTOM)
grep -cE '"CONTRADICTED"|"PHANTOM"' false_direct_claims.jsonl
# → 962,891
# Check a specific publisher
grep '"publisher": "cnn.com"' false_direct_claims.jsonl | python3 -m json.tool | head -20
# Top SSPs by false claims
grep -o '"ssp": "[^"]*"' false_direct_claims.jsonl | sort | uniq -c | sort -rn | head -10-
ads.txt harvest: 75,216 domains probed (Tranco top-1M + automated crawler piggyback). 12,965 valid ads.txt files recovered. 11,990 publishers with verifiable DIRECT claims.
-
sellers.json fetch: 84 SSP registries (1.56M total seller entries). Google's 650K-entry registry is 71% confidential. All registries stored locally with fetch timestamps.
-
Cross-verification: For each DIRECT claim, looked up the seller_id in the SSP's sellers.json:
- CONTRADICTED: SSP explicitly says INTERMEDIARY
- PHANTOM: seller_id not in registry (ambiguous)
- PLAUSIBLE: SSP confirms PUBLISHER or BOTH type
- Deduplicated by (publisher, SSP, seller_id). Malformed seller_ids filtered.
-
Crawl observation: Playwright browser crawled 142,630 unique sites (Tranco 1M, tiered scheduling). 2.6M HTTP requests matched against 603 known ad-tech domains (240 companies). Compared observed companies against declared ads.txt entries to measure unauthorized tracking.
-
Consent measurement: 721,129 cookie sync URLs parsed for TCF consent parameters. First-visit only.
-
Identity graph: Co-occurrence of tracking companies on the same page load. 201 companies, 5,816 weighted edges.
- Sample bias: 11,990 publishers from Tranco top-1M. Biased toward popular Western commercial sites.
- Point-in-time: SSPs can reclassify sellers. Registries are March 17–19, 2026 snapshots.
- Phantom ambiguity: 26% of claims are phantom. That's why we report both the strict (29%) and inclusive (55%) rates.
- First-visit consent: The 0.012% rate measures first-visit behavior. Returning users may show higher rates.
- Google confidentiality: 71% of Google's sellers.json is confidential. Excluding Google, the strict rate is 38%.
- 4% estimate: The net authorization figure multiplies three independent rates. The individual measurements are solid; the multiplication assumes independence, which is approximate.
Data only. No software warranty. All source data (ads.txt, sellers.json) is publicly served by the respective domains. Verdicts are mechanical cross-reference, not editorial judgment. Verify independently before citing.