Converting GTFS Frequency.txt to Exact Departure Times
Converting frequencies.txt to exact departure times requires expanding frequency-based schedule windows into discrete trip instances. The process relies on three core GTFS fields: start_time, end_time, and headway_secs. In Python, this is achieved by parsing the GTFS feed, generating an arithmetic time sequence for each frequency row, and filtering results against the end_time boundary. The core expansion formula is:
departure_time = start_time + (n × headway_secs) for all integers n ≥ 0 where departure_time < end_time.
Understanding how routing engines interpret these intervals is critical for downstream applications. When exact_times=1, the generated timestamps represent fixed schedule points that must be honored. When exact_times=0, they represent approximate headway intervals that transit apps and routing engines may jitter based on real-time vehicle positions, dwell times, or traffic conditions. For a deeper dive into how these scheduling paradigms impact downstream data pipelines, see our guide on Handling Frequency-Based vs Timetable Schedules.
Core Expansion Logic
The GTFS specification defines frequency-based service as a window of operation rather than a fixed timetable. To convert this into discrete departures, follow this deterministic pipeline:
- Parse Time Strings: GTFS uses
HH:MM:SSrelative to noon of the service day. Values frequently exceed24:00:00(e.g.,25:30:00for 1:30 AM the next day). Standarddatetime.strptimefails here; use timedelta-aware libraries. - Calculate Intervals: Divide the window duration by
headway_secsto find the maximum number of trips. - Generate Departures: Iterate from
n=0upward, addingheadway_secstostart_timeuntil the result meets or exceedsend_time. - Apply Filters: Respect the
exact_timesflag and discard invalid rows (e.g.,headway_secs ≤ 0or missing fields).
The official GTFS Schedule Reference mandates that headway_secs must be a positive integer and that end_time must be strictly greater than start_time. Violating these constraints will produce invalid routing graphs.
Production-Ready Python Implementation
The following function uses pandas for reliable time parsing and list comprehensions for memory-efficient expansion. It handles GTFS’s non-standard HH:MM:SS format natively and respects the exact_times flag.
import pandas as pd
from datetime import timedelta
from typing import Optional
def expand_frequencies_to_departures(
frequencies_df: pd.DataFrame,
exact_times_filter: Optional[int] = None
) -> pd.DataFrame:
"""
Expands GTFS frequencies.txt into exact departure times per trip.
Returns a DataFrame with columns: trip_id, departure_time, exact_times
"""
if frequencies_df.empty:
return pd.DataFrame(columns=['trip_id', 'departure_time', 'exact_times'])
freq = frequencies_df.copy()
# GTFS times are relative to midnight; pandas handles >24h natively
freq['start_time'] = pd.to_timedelta(freq['start_time'])
freq['end_time'] = pd.to_timedelta(freq['end_time'])
freq['headway_secs'] = pd.to_numeric(freq['headway_secs'], errors='coerce')
# Drop invalid rows early to prevent ZeroDivisionError or malformed output
freq = freq.dropna(subset=['trip_id', 'start_time', 'end_time', 'headway_secs'])
freq = freq[freq['headway_secs'] > 0]
if exact_times_filter is not None:
freq = freq[freq['exact_times'] == exact_times_filter]
expanded = []
for _, row in freq.iterrows():
trip_id = row['trip_id']
start = row['start_time']
end = row['end_time']
headway = int(row['headway_secs'])
exact_flag = int(row.get('exact_times', 0))
# Calculate number of intervals using integer division
total_seconds = (end - start).total_seconds()
n_intervals = int(total_seconds // headway)
# Generate departures strictly less than end_time
for i in range(n_intervals + 1):
dep_time = start + timedelta(seconds=i * headway)
if dep_time < end:
expanded.append({
'trip_id': trip_id,
'departure_time': dep_time,
'exact_times': exact_flag
})
return pd.DataFrame(expanded)
Code Breakdown & Optimization Notes
- Time Parsing:
pd.to_timedelta()automatically converts strings like25:30:00into1 day 01:30:00, preserving chronological order across midnight boundaries. This avoids manual string slicing or modulo arithmetic. Refer to the official pandas.to_timedelta documentation for edge-case handling. - Validation Pipeline: The function drops rows with missing critical fields or non-positive headways early. This prevents runtime exceptions and malformed output.
- Expansion Loop: Using a standard Python list avoids intermediate DataFrame allocations. For feeds with millions of frequency rows, consider processing in chunks to manage RAM.
- Boundary Handling: The condition
dep_time < end_timestrictly follows GTFS semantics. If a trip’s calculated departure exactly matchesend_time, it is excluded. - Type Safety: Explicit casting to
intforheadway_secsandexact_timesprevents silent float coercion issues common in CSV parsing.
For teams building larger transit data pipelines, robust Python Parsing & Data Normalization patterns like these prevent schema drift when merging frequency-based and fixed-schedule feeds.
Routing Engine & GTFS-RT Integration Context
Expanded frequency tables rarely exist in isolation. They feed directly into routing engines (OpenTripPlanner, Valhalla, R5) and real-time prediction systems. Understanding the downstream impact is essential:
exact_times=0(Headway-Based): Routing engines treat generated departures as a baseline. Real-time AVL feeds or GTFS-RTTripUpdatemessages will override these times. If no real-time data exists, engines apply a uniform distribution or stochastic delay model.exact_times=1(Fixed Schedule): Engines treat generated times as hard constraints. Missed trips are not backfilled. Passenger-facing apps will display exact departure times rather than “every X minutes.”- GTFS-RT Alignment: When publishing real-time updates, ensure
trip_idmatches the expanded output exactly. Mismatched IDs cause silent dropouts in live tracking dashboards. - Transfer Windows: Frequency expansions impact transfer feasibility. A 15-minute headway with a 3-minute transfer window creates a 12-minute buffer for missed connections. Validate these windows during timetable generation.
Edge Cases & Compatibility Matrix
| Scenario | Handling Strategy | Impact on Departure Generation |
|---|---|---|
| Overnight Service | start_time=23:00:00, end_time=26:00:00 |
Generates departures past midnight. pd.to_timedelta preserves the offset. |
exact_times=0 |
Headway is a guideline | Routing engines apply stochastic delays or real-time AVL data. |
exact_times=1 |
Fixed schedule | Engines treat generated times as hard constraints. Missed trips are not backfilled. |
headway_secs varies |
Multiple rows per trip_id |
Process rows sequentially. Overlapping windows require deduplication or priority rules. |
Missing exact_times |
Defaults to 0 per GTFS spec |
Treat as approximate headway. Explicitly fill missing values to avoid ambiguity. |
Performance & Memory Considerations
Expanding frequencies can explode dataset size. A single route with 18-hour service and a 120-second headway generates 540 trips. Multiply this across a metro region, and a 5MB frequencies.txt can balloon into a 500MB departure table. To mitigate this:
- Filter Early: Apply
exact_times_filterand service date constraints before expansion. - Use Generators: Yield rows instead of building a full list if streaming to a database or parquet writer.
- Vectorize When Possible: For uniform headways,
numpy.arangecan replace Python loops, though GTFS’s per-trip variability often necessitates row-wise iteration. - Index Strategically: Post-expansion, index on
trip_idanddeparture_timeto accelerate spatial-temporal joins. - Chunk Processing: For feeds exceeding 2GB, split
frequencies.txtbyroute_idorservice_id, expand independently, and concatenate.
Validation Checklist
Before exporting expanded departures, verify:
This approach ensures deterministic, spec-compliant departure tables ready for routing engines, timetable generators, or passenger-facing APIs.