News organization acquisition brief · April 2026

policedata.ca / The Ledger

A Canadian police-accountability archive. Private, names-retaining capture — public, anonymized surface — and a purge-detector watching every source.

Live Bilingual Federated CC-BY 4.0 data

What you get

A production system that mirrors every public disciplinary record, tribunal decision, and oversight-body report from Canadian police — then detects when those records are removed, altered, or silently edited out of their sources.

Two products, one codebase. A private archive that retains names, full source documents, and long-term forensic provenance (capture → WARC → Wayback → archive.today → IPFS, SHA-256 hashed every step). A public surface that publishes only anonymized, aggregated views — HMAC officer tokens, quarterly date precision, k-anonymity floor of 5, 60-day publication lag, publication bans honored unconditionally.

75Agencies seeded
7 / 15Tier-A adapters
4 / 4Purge detectors
740+Tests

Why a newsroom should care

Canadian police-accountability data is fragmented across dozens of oversight bodies — each with its own publishing cadence, format, and willingness to keep older records online. Historical records are especially vulnerable: when an oversight body is dissolved or reorganized, its archives often move behind a new agency's website, and some quietly disappear. No single Canadian resource currently does all four of:

For a newsroom covering policing, The Ledger collapses 30+ FOI requests a year into a standing source. For a national desk, it's a federal-to-municipal query surface no individual reporter can assemble.

What's shipped

LayerStatusNotes
Capture infrastructureLivePlaywright + WARC + Wayback + archive.today + IPFS + SHA-256 ledger
Tier-A extractors (hand-tuned)7 of 15SIU (ON), BEI (QC), IIO (BC), SIRT-NL, 3× CanLII tribunals
Tier-C LLM fallbackLiveAnthropic Claude + §9.3 review-gated redaction
Purge detection4 of 4HTTP 4xx, content removal, redirect-to-generic, index delisting
AnonymizationLiveHMAC officer tokens, k-anon floor (k=5), 60-day publication lag
Public frontendLive14 routes, EN/FR parity (drift-guarded), RSS, JSON-LD federation
Admin review UILiveOfficer merges, redaction templates, corrections triage
Bulk exportLiveDaily JSONL git commit, weekly Parquet, stable JSON Schema

Access is free. The software and the editorial partnership aren't.

The public archive at policedata.ca is open to everyone, forever, under CC-BY 4.0. Daily JSONL + weekly Parquet bulk exports are committed to a public git repo. Anyone can fork the full history. We can't and won't paywall accountability data — the project's PIPEDA journalism posture and its federation partnerships both depend on it.

What is for sale is the software, the editorial partnership, and engineering time. Tiers below sit on top of the free public surface — they don't gate it.

Price list

All figures CAD. Tax extra. Terms negotiable for Canadian journalism nonprofits.

Full acquisition $275,000
One-time · IP transfer · brand + domain included
  • Complete source + schema + fixtures + 75-agency seed set.
  • Domain transfer (policedata.ca) and the running Hetzner deployment.
  • 3-month handoff: weekly engineering sync, on-call fixes, adapter-maintenance shadowing.
  • Editor onboarding + documentation walkthrough.
  • Public archive stays open + CC-BY 4.0 under the acquirer's stewardship — the brand + editorial direction pass, the public-access commitment doesn't.
Editorial partnership $60,000 / year
Annual · co-branded attribution · doesn't restrict anyone else's access
  • "In partnership with [Newsroom]" attribution on every public page.
  • First-look windows on major stories surfacing from new captures — an editorial embargo, not a data embargo. Public release still happens; the partner gets lead time to publish first.
  • Dedicated cross-agency query support for your reporters — the archive's data scientist on call for your data desk.
  • Quarterly editorial sync with the Ledger team.
  • Escape hatch: 90-day termination notice.
Custom engineering + adapters $18,000 / quarter
Per-quarter retainer · ~10 engineering days · additive to public archive
  • Build new Tier-A adapters for sources your newsroom cares about — every adapter shipped expands the public archive.
  • Custom query / report work against the private (names-retaining) archive, delivered as a vetted-access deliverable under your internal legal sign-off.
  • Schema extensions, federation-feed customizations, whatever the newsroom needs.
Pilot / evaluation $15,000
90 days · refundable against any tier above
  • Two 1-hour newsroom training sessions on the archive + data-desk onboarding.
  • Two custom queries / reports delivered during the window (you pick the questions).
  • Credit applies 100% to any tier above if converted within 90 days.
  • Public archive access is free regardless — you're paying for the newsroom integration work.

Legal posture

The capture side relies on PIPEDA's journalism exemption (s.4(2)(c)). Captured personal data never leaves the private archive, never exits the Canadian datacentre, and never reaches a public surface un-anonymized. Publication bans are honored unconditionally at every downstream stage. A right-of-reply workflow is live at /corrections — anyone named or cited can file a correction; accepted corrections are logged publicly with editor notes.

Publisher incorporation jurisdiction is open; recommended ON / BC / QC (all three carry anti-SLAPP statutes).

Operational footprint

One Hetzner box (~$60 CAD/month, room to grow). Postgres 16 + MinIO + Meilisearch + Apache 2 on loopback behind TLS. Daily cron drives the capture → extract → score → publish → export chain. Weekly tier-1 crawl (15 agencies), biweekly tier-2 (30), monthly tier-3 (long tail). Editor time is the real recurring cost once volume grows.

Next step

30-minute intro call to scope the right tier for your newsroom. Bring your editorial lead, a data-desk contact, and one piece of unfinished FOI reporting — we'll show the same question answered against the live archive.

Contact via policedata.ca/about or corrections intake.

Public data released under Creative Commons BY 4.0. Private archive access governed by journalism exemption; names never surface on public URLs. Figures current as of April 2026 first production deploy — 75 agencies seeded, 804 raw captures, 30 published incidents, 740 tests green.

Brochure version 1.0 · docs/BROCHURE.md in the source repo holds the plain-text equivalent.