Converting GTFS Frequency.txt to Exact Departure Times

Converting frequencies.txt to exact departure times requires expanding frequency-based schedule windows into discrete trip instances. The process relies on three core GTFS fields: start_time, end_time, and headway_secs. In Python, this is achieved by parsing the GTFS feed, generating an arithmetic time sequence for each frequency row, and filtering results against the end_time boundary. The core expansion formula is:

departure_time = start_time + (n × headway_secs) for all integers n ≥ 0 where departure_time < end_time.

Understanding how routing engines interpret these intervals is critical for downstream applications. When exact_times=1, the generated timestamps represent fixed schedule points that must be honored. When exact_times=0, they represent approximate headway intervals that transit apps and routing engines may jitter based on real-time vehicle positions, dwell times, or traffic conditions. For a deeper dive into how these scheduling paradigms impact downstream data pipelines, see our guide on Handling Frequency-Based vs Timetable Schedules.

Core Expansion Logic

The GTFS specification defines frequency-based service as a window of operation rather than a fixed timetable. To convert this into discrete departures, follow this deterministic pipeline:

  1. Parse Time Strings: GTFS uses HH:MM:SS relative to noon of the service day. Values frequently exceed 24:00:00 (e.g., 25:30:00 for 1:30 AM the next day). Standard datetime.strptime fails here; use timedelta-aware libraries.
  2. Calculate Intervals: Divide the window duration by headway_secs to find the maximum number of trips.
  3. Generate Departures: Iterate from n=0 upward, adding headway_secs to start_time until the result meets or exceeds end_time.
  4. Apply Filters: Respect the exact_times flag and discard invalid rows (e.g., headway_secs ≤ 0 or missing fields).

The official GTFS Schedule Reference mandates that headway_secs must be a positive integer and that end_time must be strictly greater than start_time. Violating these constraints will produce invalid routing graphs.

Production-Ready Python Implementation

The following function uses pandas for reliable time parsing and list comprehensions for memory-efficient expansion. It handles GTFS’s non-standard HH:MM:SS format natively and respects the exact_times flag.

python
import pandas as pd
from datetime import timedelta
from typing import Optional

def expand_frequencies_to_departures(
    frequencies_df: pd.DataFrame,
    exact_times_filter: Optional[int] = None
) -> pd.DataFrame:
    """
    Expands GTFS frequencies.txt into exact departure times per trip.
    Returns a DataFrame with columns: trip_id, departure_time, exact_times
    """
    if frequencies_df.empty:
        return pd.DataFrame(columns=['trip_id', 'departure_time', 'exact_times'])

    freq = frequencies_df.copy()
    
    # GTFS times are relative to midnight; pandas handles >24h natively
    freq['start_time'] = pd.to_timedelta(freq['start_time'])
    freq['end_time'] = pd.to_timedelta(freq['end_time'])
    freq['headway_secs'] = pd.to_numeric(freq['headway_secs'], errors='coerce')
    
    # Drop invalid rows early to prevent ZeroDivisionError or malformed output
    freq = freq.dropna(subset=['trip_id', 'start_time', 'end_time', 'headway_secs'])
    freq = freq[freq['headway_secs'] > 0]
    
    if exact_times_filter is not None:
        freq = freq[freq['exact_times'] == exact_times_filter]

    expanded = []
    for _, row in freq.iterrows():
        trip_id = row['trip_id']
        start = row['start_time']
        end = row['end_time']
        headway = int(row['headway_secs'])
        exact_flag = int(row.get('exact_times', 0))
        
        # Calculate number of intervals using integer division
        total_seconds = (end - start).total_seconds()
        n_intervals = int(total_seconds // headway)
        
        # Generate departures strictly less than end_time
        for i in range(n_intervals + 1):
            dep_time = start + timedelta(seconds=i * headway)
            if dep_time < end:
                expanded.append({
                    'trip_id': trip_id,
                    'departure_time': dep_time,
                    'exact_times': exact_flag
                })

    return pd.DataFrame(expanded)

Code Breakdown & Optimization Notes

  • Time Parsing: pd.to_timedelta() automatically converts strings like 25:30:00 into 1 day 01:30:00, preserving chronological order across midnight boundaries. This avoids manual string slicing or modulo arithmetic. Refer to the official pandas.to_timedelta documentation for edge-case handling.
  • Validation Pipeline: The function drops rows with missing critical fields or non-positive headways early. This prevents runtime exceptions and malformed output.
  • Expansion Loop: Using a standard Python list avoids intermediate DataFrame allocations. For feeds with millions of frequency rows, consider processing in chunks to manage RAM.
  • Boundary Handling: The condition dep_time < end_time strictly follows GTFS semantics. If a trip’s calculated departure exactly matches end_time, it is excluded.
  • Type Safety: Explicit casting to int for headway_secs and exact_times prevents silent float coercion issues common in CSV parsing.

For teams building larger transit data pipelines, robust Python Parsing & Data Normalization patterns like these prevent schema drift when merging frequency-based and fixed-schedule feeds.

Routing Engine & GTFS-RT Integration Context

Expanded frequency tables rarely exist in isolation. They feed directly into routing engines (OpenTripPlanner, Valhalla, R5) and real-time prediction systems. Understanding the downstream impact is essential:

  • exact_times=0 (Headway-Based): Routing engines treat generated departures as a baseline. Real-time AVL feeds or GTFS-RT TripUpdate messages will override these times. If no real-time data exists, engines apply a uniform distribution or stochastic delay model.
  • exact_times=1 (Fixed Schedule): Engines treat generated times as hard constraints. Missed trips are not backfilled. Passenger-facing apps will display exact departure times rather than “every X minutes.”
  • GTFS-RT Alignment: When publishing real-time updates, ensure trip_id matches the expanded output exactly. Mismatched IDs cause silent dropouts in live tracking dashboards.
  • Transfer Windows: Frequency expansions impact transfer feasibility. A 15-minute headway with a 3-minute transfer window creates a 12-minute buffer for missed connections. Validate these windows during timetable generation.

Edge Cases & Compatibility Matrix

Scenario Handling Strategy Impact on Departure Generation
Overnight Service start_time=23:00:00, end_time=26:00:00 Generates departures past midnight. pd.to_timedelta preserves the offset.
exact_times=0 Headway is a guideline Routing engines apply stochastic delays or real-time AVL data.
exact_times=1 Fixed schedule Engines treat generated times as hard constraints. Missed trips are not backfilled.
headway_secs varies Multiple rows per trip_id Process rows sequentially. Overlapping windows require deduplication or priority rules.
Missing exact_times Defaults to 0 per GTFS spec Treat as approximate headway. Explicitly fill missing values to avoid ambiguity.

Performance & Memory Considerations

Expanding frequencies can explode dataset size. A single route with 18-hour service and a 120-second headway generates 540 trips. Multiply this across a metro region, and a 5MB frequencies.txt can balloon into a 500MB departure table. To mitigate this:

  • Filter Early: Apply exact_times_filter and service date constraints before expansion.
  • Use Generators: Yield rows instead of building a full list if streaming to a database or parquet writer.
  • Vectorize When Possible: For uniform headways, numpy.arange can replace Python loops, though GTFS’s per-trip variability often necessitates row-wise iteration.
  • Index Strategically: Post-expansion, index on trip_id and departure_time to accelerate spatial-temporal joins.
  • Chunk Processing: For feeds exceeding 2GB, split frequencies.txt by route_id or service_id, expand independently, and concatenate.

Validation Checklist

Before exporting expanded departures, verify:

This approach ensures deterministic, spec-compliant departure tables ready for routing engines, timetable generators, or passenger-facing APIs.