Optimizing Batch Isochrone Generation with OSRM

Scaling drive-time catchment modeling from single-site feasibility studies to portfolio-wide screening turns ad-hoc routing into a throughput, memory, and topology problem — this page documents how to run OSRM as a deterministic batch engine inside a retail site-selection pipeline.

When a location intelligence team evaluates one candidate storefront, a single routing request is fine. When they screen thousands of parcels nightly, the same request pattern produces latency spikes, fragmented memory in the routing daemon, and inconsistent polygon topology between runs. The fix is architectural: separate graph serving from polygon generation, extract travel-time matrices asynchronously, polygonize on dedicated compute, and gate every output with automated spatial validation before it reaches downstream scoring. This page sits inside the Isochrone Generation & Network Analysis stage of the pipeline and feeds validated catchment polygons forward to demographic enrichment and suitability ranking.

Concept: Why OSRM Has No Native Isochrone

OSRM (Open Source Routing Machine) is a contraction-hierarchy router built for fast point-to-point and matrix queries. Unlike OpenRouteService, it exposes no /isochrones endpoint — there is no server-side polygon generator. A batch isochrone in OSRM is therefore a derived product, assembled in three stages:

Sample the area around each origin with a regular grid of destination points.
Query the OSRM /table endpoint for the origin-to-grid travel-time matrix.
Reconstruct the reachable boundary by rasterizing those durations onto a regular grid and extracting isolines (contours) at each target threshold.

This indirection is an advantage at scale. Because the expensive step is a single matrix call rather than thousands of independent routing requests, throughput is bounded by matrix size and worker concurrency rather than per-request HTTP overhead. The contour reconstruction is pure CPU work with no network dependency, so it can be horizontally scaled on a separate node from the routing daemon. The trade-off is grid resolution: the isochrone polygon you recover is only as faithful as the destination grid is dense, and grid density drives matrix size quadratically.

For a decision between OSRM and an engine that emits polygons directly, weigh the matrix-and-contour cost against built-in isochrones in Configuring OpenRouteService for Drive-Time Maps, and review the engine-level performance trade-offs in comparing OSRM vs Valhalla for retail catchment analysis.

Architecture Overview

Batch isochrone generation must run independently of any real-time query endpoint to prevent thread starvation and to guarantee idempotent outputs. The routing daemon (osrm-routed) is deployed as a pinned Docker image so routing behavior is byte-reproducible across staging and production. The Python layer never embeds routing logic — it orchestrates matrix extraction, contour reconstruction, validation, and ingestion as discrete, restartable stages.

The Python execution layer relies on aiohttp for non-blocking HTTP and geopandas for vectorized geometry. Isoline extraction uses scipy.ndimage for raster smoothing and skimage.measure.find_contours for the boundary trace. Decoupling matrix extraction from polygonization lets the matrix tier and the contour tier scale on independent node pools.

Configuration Parameters

OSRM’s default car.lua profile assumes unconstrained highway speeds, which overstates urban retail accessibility. Tune the preprocessing chain before any batch run, then keep these parameters under version control so every rebuild is reproducible. The graph is built with osrm-extract --profile <lua>, followed by osrm-partition and osrm-customize to assemble the multi-level Dijkstra (MLD) hierarchy.

Parameter	Stage / scope	Type	Retail default	Notes
`--profile`	`osrm-extract`	path (.lua)	`profiles/retail-car.lua`	Customized from `car.lua` with urban speeds and turn penalties
`--algorithm`	`osrm-routed`	enum	`MLD`	MLD allows live traffic re-customization without a full rebuild
`--max-table-size`	`osrm-routed`	int	`8000`	Hard cap on `sources × destinations`; oversize requests return HTTP 400
`max_concurrent`	orchestrator	int	`10`	In-flight `/table` calls; raise only while daemon RSS stays under budget
`chunk_size`	orchestrator	int	`100`	Origins per matrix call; keep `chunk_size × grid_points ≤ max_table_size`
`grid_spacing_m`	grid sampler	float	`150.0`	Destination grid pitch in metres; smaller = sharper boundary, larger matrix
`thresholds_s`	contour	list[int]	`[300, 600, 900]`	Target durations in seconds (5 / 10 / 15-minute bands)
`working_crs`	grid + contour	EPSG	`5070`	Equal-area projection for metric grids; reproject to 4326 on output

Reproject every grid and contour operation into an equal-area CRS alignment (EPSG:5070 for the contiguous US) so grid spacing is a true metric distance, then transform the final polygons back to EPSG:4326 for storage. Consult the OSRM HTTP API Reference for exact matrix-size limits before scaling chunk_size.

Step-by-Step Python Implementation

The OSRM /table endpoint is a GET request. Coordinates are passed in the URL path as semicolon-separated longitude,latitude pairs; the sources and destinations query parameters carry zero-based indices into that coordinate list. Production batch jobs need strict concurrency control and chunked coordinate management so a single oversized request never exceeds --max-table-size.

python

import asyncio
import aiohttp
import numpy as np
from pyproj import Transformer
from scipy import ndimage
from skimage import measure
from typing import List, Tuple

# Explicit CRS contract: OSRM speaks WGS84 (EPSG:4326); the metric grid and all
# distance-based reasoning happen in an equal-area projection (EPSG:5070).
WGS84 = "EPSG:4326"
EQUAL_AREA = "EPSG:5070"
TO_METRIC = Transformer.from_crs(WGS84, EQUAL_AREA, always_xy=True)
TO_WGS84 = Transformer.from_crs(EQUAL_AREA, WGS84, always_xy=True)


def build_table_url(
    base_url: str,
    coords: List[Tuple[float, float]],
    source_indices: List[int],
    dest_indices: List[int],
) -> str:
    """
    Build an OSRM /table GET URL.
    OSRM coordinates are semicolon-separated lon,lat pairs in the URL path.
    sources and destinations are semicolon-separated zero-based indices.
    """
    coord_str = ";".join(f"{lon},{lat}" for lon, lat in coords)
    src_str = ";".join(str(i) for i in source_indices)
    dst_str = ";".join(str(i) for i in dest_indices)
    return (
        f"{base_url}/table/v1/driving/{coord_str}"
        f"?sources={src_str}&destinations={dst_str}&annotations=duration"
    )


async def fetch_table(
    session: aiohttp.ClientSession,
    semaphore: asyncio.Semaphore,
    url: str,
) -> dict:
    async with semaphore:
        async with session.get(url) as resp:
            resp.raise_for_status()
            return await resp.json()


def chunk_indices(total: int, chunk_size: int):
    """Yield successive index chunks of size chunk_size."""
    for start in range(0, total, chunk_size):
        yield list(range(start, min(start + chunk_size, total)))


async def batch_table_pipeline(
    origins: List[Tuple[float, float]],
    grid_points: List[Tuple[float, float]],
    osrm_base: str,
    max_concurrent: int = 10,
    chunk_size: int = 100,
) -> List[dict]:
    """
    Fetch travel-time matrices from OSRM for chunked origin/grid-point batches.
    Each call queries a slice of origins against all grid points.
    """
    semaphore = asyncio.Semaphore(max_concurrent)
    all_coords = origins + grid_points  # combined coord list for OSRM indexing
    n_origins = len(origins)

    async with aiohttp.ClientSession() as session:
        tasks = []
        for origin_chunk in chunk_indices(n_origins, chunk_size):
            dest_idx = list(range(n_origins, len(all_coords)))
            url = build_table_url(osrm_base, all_coords, origin_chunk, dest_idx)
            tasks.append(fetch_table(session, semaphore, url))

        results = await asyncio.gather(*tasks, return_exceptions=True)

    return [r for r in results if not isinstance(r, Exception)]

With the duration matrix in hand, the second stage rasterizes travel times onto the metric grid and traces a contour for each threshold. The grid coordinates are projected into EPSG:5070 first, so cell pitch is a real metric distance and the contour level corresponds to a true drive-time band.

python

def durations_to_polygons(
    grid_lonlat: List[Tuple[float, float]],
    durations_s: np.ndarray,        # 1-D array aligned to grid_lonlat order
    grid_shape: Tuple[int, int],    # (rows, cols) of the sampling grid
    thresholds_s: List[int],
) -> dict:
    """
    Rasterize a single origin's durations and extract one isoline per threshold.
    Returns {threshold_seconds: [(lon, lat), ...]} ring coordinates in EPSG:4326.
    """
    xs, ys = TO_METRIC.transform(
        [p[0] for p in grid_lonlat], [p[1] for p in grid_lonlat]
    )
    cost = durations_s.reshape(grid_shape)
    cost = ndimage.gaussian_filter(cost, sigma=1.0)  # damp rasterization noise

    x_grid = np.asarray(xs).reshape(grid_shape)
    y_grid = np.asarray(ys).reshape(grid_shape)
    out = {}
    for level in thresholds_s:
        rings = []
        for contour in measure.find_contours(cost, level):
            rows = contour[:, 0].astype(int).clip(0, grid_shape[0] - 1)
            cols = contour[:, 1].astype(int).clip(0, grid_shape[1] - 1)
            mx, my = x_grid[rows, cols], y_grid[rows, cols]
            lon, lat = TO_WGS84.transform(mx, my)
            rings.append(list(zip(lon, lat)))
        out[level] = rings
    return out

Store intermediate geometries as GeoDataFrame objects with explicit EPSG codes so projection state is never inferred. For repeated origin groups, the matrix step is the natural caching boundary — see caching strategies for repeated network queries to avoid recomputing identical /table calls across nightly runs.

Edge Cases and Failure Modes

Most invalid isochrones trace back to one of a handful of conditions:

Disconnected road components. Rural origins can fall on a graph fragment that is unreachable from the main network, producing null durations and degenerate contours. Validate connectivity before batch execution; resolution steps are in troubleshooting disconnected road networks in rural areas.
CRS mismatches. Tracing contours directly in EPSG:4326 makes grid spacing vary with latitude, warping the catchment. Always rasterize in the equal-area working_crs and reproject on output, as the code above does.
max-table-size overruns. When chunk_size × grid_points exceeds --max-table-size, OSRM returns HTTP 400 and the chunk is dropped silently by the exception filter. Assert the product against the daemon limit before dispatch.
Rasterization artifacts at grid edges. Contours can clip against the sampling boundary, leaving open rings. Pad the destination grid by one threshold-radius beyond the expected reach so the boundary always closes inside the sampled area.
null matrix cells. OSRM emits null for unroutable pairs; coerce these to np.inf (not 0) before reshaping, or find_contours will draw a phantom band through unreachable cells.

Performance and Scaling

Matrix size grows as origins × grid_points, and grid points grow quadratically with grid density, so tuning is a balance between boundary fidelity and request cost. Keep chunk_size × grid_points comfortably under --max-table-size, and raise max_concurrent only while the routing daemon’s resident memory stays within its container budget. Monitor osrm-routed RSS during matrix generation; if it crosses ~80% of the container limit, lower --max-table-size or move repeated origin groups behind a cache.

For very large origin sets, the dominant cost shifts from routing to memory churn in the orchestrator. Strategies for keeping batch memory bounded — streaming matrices, releasing intermediate arrays, and capping in-flight chunks — are covered in reducing memory overhead for 10,000-point batch routing. Run the MLD algorithm so live traffic can be folded in with osrm-customize rather than a full graph rebuild.

Validation and QA Gates

Run these checks immediately after contour extraction and before any polygon reaches downstream stages. Treat them as hard gates — a failed batch should halt, not publish partial results.

Topology integrity. Every output polygon must contain its source coordinate; flag geometries that fail point.within(polygon) for manual review.
Geometry validity. Run polygon.is_valid and repair with buffer(0) or make_valid; reject any ring that cannot be closed.
Component isolation. Detect catchments truncated by network gaps or one-way misconfigurations using osmnx or networkx connectivity checks on the source graph.
Nesting monotonicity. A 5-minute band must lie entirely inside the 10-minute band; assert inner.within(outer) across thresholds to catch rasterization errors.

Trigger automated alerts when the validation failure rate exceeds 2% per batch. Log raw API responses alongside geometry hashes so a regression can be diffed and rolled back without re-querying OSRM.

Integration Notes

Validated polygons are the input to the rest of the location intelligence stack. Materialize them in PostGIS with CREATE MATERIALIZED VIEW and a GiST index on the geometry column, then schedule REFRESH MATERIALIZED VIEW CONCURRENTLY so downstream dashboards never read a half-written table. From there, each catchment becomes a trade area ready for demographic enrichment.

The immediate next stage is a spatial join that aggregates population and spend inside each polygon and attaches census block group attributes to it. For multi-modal scenarios that blend drive-time catchments with pedestrian reach, fold custom Lua edge weights in via implementing multi-modal routing for urban retail. Run the batch on a scheduled DAG (Airflow or Prefect) that chains graph validation, matrix extraction, contour generation, and PostGIS ingestion, using source-coordinate hashes against a last_processed table so only new or changed sites trigger recomputation.

← Back to Isochrone Generation & Network Analysis

Optimizing Batch Isochrone Generation with OSRM

Concept: Why OSRM Has No Native Isochrone #

Architecture Overview #

Configuration Parameters #

Step-by-Step Python Implementation #

Edge Cases and Failure Modes #

Performance and Scaling #

Validation and QA Gates #

Integration Notes #

Related #