Thunder Bay Transit publishes GTFS schedule and realtime feeds — the same standard behind Google Maps and NextLift. Those apps show where buses are now and discard the data. We store every position, delay, and cancellation as a raw event.
A background job periodically rolls the raw events into the aggregates shown on the Metrics tab. All metrics are measured at timepoint stops (the ones marked timepoint=true in the GTFS feed).
“Without an explicit SLO, users often develop their own beliefs about desired performance, which may be unrelated to the beliefs held by the people designing and operating the service.”
Google is the company who defined the GTFS transit standard and are known for running complex systems with legendary reliability. The SRE Handbook is very influential to how I think about building and operating software systems.
The handbook calls its metrics service level indicators (SLIs) — the same idea as a Key Performance Indicator (KPI), but focused on what the user experiences rather than what the operator reports. An indicator becomes a service level objective (SLO) when stakeholders commit to a target: not just “we track on-time performance” but “we agree 75% is the floor.”
Baseball stats can’t tell you who wins tonight. Transit metrics are the same — they can’t say whether every rider got where they needed to be. What they can do is show whether the system is trending in the right direction over time.
Each service day is sliced into three six-hour windows. Every metric is computed per window, per route. Chunks store raw counts and sums — not percentages. A weekly or system-wide number sums the counts across chunks and divides once at the end, so a busy route with 200 trips outweighs a quiet one with 10 instead of both counting the same.
Once a window closes the chunk is sealed and never changes. Midnight to 6 AM has no chunk — a handful of late-night trips run in that window, but not enough to produce meaningful metrics, so we leave them out.
A trip is "on time" if it arrives within 1 minute early to 5 minutes late of schedule. This is the standard window used by most North American agencies.
Early departures are penalized because if a bus leaves a stop before the scheduled time, you miss it — that's worse than a bus running late.
Typical range for a mid-size Canadian city: 65–85%. Above 90% is world-class. Below 60% indicates a systemic problem.
This metric measures how evenly spaced buses are along a route. If buses arrive at perfectly even intervals, this number is zero — regardless of whether those intervals match the published schedule. In practice, buses bunch up or leave big gaps — the higher the number, the more unpredictable your wait becomes.
Think of it like darts. The bullseye is the average gap between buses. Each actual arrival is a throw. Low covariance means the darts cluster around the bullseye — gaps between buses stay consistent. High covariance means darts scattered across the board — gaps are all over the place.
Below 0.3 is good — riders perceive the service as regular. A Cv near zero isn’t the goal either — that would be like standing too close to the dartboard, where hitting the bullseye says nothing about your aim. Some variance is inevitable and healthy. Above 0.5, gaps feel random and riders lose trust in the system.
Cv only judges regularity, not whether the bus is on time. A route that runs every 20 minutes when the schedule promised every 10 still has Cv = 0 if those gaps are perfectly even. To see whether the service matches its promise, look at EWT below.
Excess Wait Time seems to be the most important operation metric. Transit managers publish a schedule — that schedule is a promise. EWT measures how many extra minutes you actually wait beyond that promise because buses aren’t arriving at regular intervals. It’s the gap between what was committed to and what was delivered.
If a route runs every 15 minutes, you'd expect to wait 7.5 minutes on average. But if two buses arrive together and then nothing comes for 30 minutes, the average headway is still 15 minutes — yet the experience is far worse. EWT captures exactly this gap.
Why does EWT matter more than average delay? Because it counts people, not just buses. A long gap doesn't just mean one late bus — it means every person who showed up during that gap is standing at the stop, waiting. The longer the gap, the more people accumulate, and the longer each of them waits. EWT weights gaps by the number of riders they actually affect, making it a social metric: it measures total human time wasted, not just vehicle timing.
The two timelines below have the same average headway (15 min), but the rider experience is very different:
Bigger gaps hurt more because more riders show up during them. EWT captures that — it weights each gap by how many people it affects, then subtracts the wait you’d expect if buses ran on time.
The route finder uses RAPTOR (Round-bAsed Public Transit Optimized Router), an algorithm developed at Microsoft Research for computing optimal multi-leg transit journeys. RAPTOR works directly on the timetable — scanning routes round by round, where each round adds one more vehicle. Round 1 finds all destinations reachable by a single bus, round 2 finds journeys with one transfer, and so on. Because of this, RAPTOR is more efficient compared to graph-based approaches like Dijkstra or A* for transit routing.