Data
The Ledger publishes machine-readable exports alongside the human-facing frontend. Everything here is derived from the public, anonymized tables (pub_incidents, pub_agency_scorecards, purge_events) — never raw names, never internal review queues. All fields are documented in the JSON Schema linked below.
Daily JSONL archive
Every row published through the SPEC §9 anonymization pipeline is dumped to newline-delimited JSON once a day at 08:30 UTC and committed to a public git repository. Forks of the full history are encouraged.
Repository: https://github.com/policedata/archive
| File | Rows describe |
|---|---|
pub_incidents.jsonl | One anonymized disciplinary incident per line. Officer identity is an HMAC-SHA256 token stable within a salt epoch; dates are at quarterly precision. |
pub_agency_scorecards.jsonl | Per-agency transparency grade (A–F) plus finding rate, publication-ban rate, and purge count. Computed daily. |
purge_events.jsonl | Confirmed removals/redirects/content-deletions detected on oversight-body sources. Each row links to the agency and the archived copy where available. |
manifest.json | Export metadata: timestamp, per-file row counts, format version. Consumers should check row counts to detect partial writes before importing. |
Schemas
Byte-stable machine-readable schemas ship alongside every export. Consumers validate against these before import.
| File | What it is |
|---|---|
schema.json | JSON Schema draft 2020-12. $defs per table; additionalProperties: false everywhere so unknown columns fail validation. Embeds the SPEC §9.2 rules as patterns: incident_quarter regex rejects exact dates, k_anon_cohort has a minimum of 5. |
schema.jsonld | JSON-LD @context mapping field names to schema.org and the project vocab URI. Consumers speaking semantic-web tooling can treat the JSONL rows as linked data by applying this context. |
Coming later
- Weekly Parquet snapshot on a public MinIO bucket — faster ingest for pandas / polars / DuckDB than JSONL. Same rows, different wire format.
- Stable redirect URLs for individual purges so feed readers can deep-link. Currently RSS <guid> values are URN-form only.
Federation
Sister accountability projects are welcome to fork or mirror the archive. Schemas are designed to interoperate — agency IDs use stable slugs, officer tokens use a published HMAC algorithm, and incident types follow a documented controlled vocabulary. If you're building something that should pull from the Ledger, the editor welcomes outreach via the contact form.
Known downstream projects: Tracking (In)justice, CBC Deadly Force, Big Local News.
License and attribution
Archive content is released under Creative Commons BY 4.0. Attribution: "policedata.ca / The Ledger" with a link to this page. Every row carries a source_capture_id that traces back to the archived original document via Wayback — verify against that source before republishing.
The live API
No JSON API exists yet — the bulk export is the supported interface. A thin REST layer may land later for lower-latency lookups, but the publication cadence is deliberately lagged (SPEC §9.2 sets a 60-day minimum between disposition and publication), so real-time is not a design goal.