Reducing memory overhead for 10,000+ point batch routing

This page shows how to route 10,000+ origin-destination (O-D) pairs through a self-hosted or hosted routing engine while holding peak resident memory to a fixed budget, so a portfolio-screening job finishes instead of being killed by the kernel’s OOM reaper.

When a retail site-selection pipeline screens a single candidate parcel, one routing request is trivial. When it screens every parcel in a metro nightly, the same code path that worked at pilot scale collapses: Python object allocation, raw JSON response buffering, and eager Shapely geometry construction all grow linearly with batch size, and the worker dies long before the matrix is complete. The fix is not a bigger box — it is a streaming architecture that keeps the live working set constant regardless of how many points enter the job. This task sits one stage below caching repeated network queries: once a cache layer removes redundant solves, this page keeps the remaining unique solves inside a memory envelope.

Prerequisites

Before running the batch routing job, the following must be in place:

Requirement	Purpose	Notes
Python 3.10+	`:=` walrus operator and modern typing used in the examples	Earlier versions need the chunker rewritten as a loop
`duckdb` ≥ 0.10	Out-of-core persistence of the travel-time matrix	SQLite or Parquet are drop-in alternatives
`pyarrow`	Zero-copy columnar export of intermediate results	Required for the GeoParquet hand-off
`requests` + `urllib3`	HTTP session with retry/backoff	Any HTTP client with connection pooling works
`shapely` ≥ 2.0	Final geometry materialization only	Not imported in the hot loop
`memory_profiler`, `tracemalloc` (stdlib)	Baseline profiling	Profiling is dev-only; strip from production runs
A reachable routing matrix endpoint	The O-D solver itself	A self-hosted OSRM `/table` service or a hosted provider
Input coordinates with a known CRS	Avoid silent lat/lon swaps	Store as EPSG:4326; project only at geometry time

A clean coordinate input is assumed here; the upstream store-coordinate validation rules should have already rejected null islands, swapped axes, and out-of-bounds points before this job runs.

Establishing a memory baseline

Before optimizing anything, locate where memory accumulates. Attach tracemalloc at the script entry point and snapshot allocations at each pipeline stage. In practice, 70–80% of the overhead in a naive batch router comes from three patterns: loading an entire GeoDataFrame into memory, retaining the raw JSON payloads returned by the matrix endpoint, and holding intermediate Shapely geometries while assembling contours.

Run memory_profiler against a representative slice to capture a peak Resident Set Size (RSS) figure, and chart the curve across request batching, response parsing, and spatial operations. Any stage whose memory grows linearly with batch size is the stage to refactor — a correct streaming pipeline shows a flat curve.

python

import tracemalloc

tracemalloc.start()


def report_top_allocations(label: str, limit: int = 5) -> None:
    """Print the largest allocation sites at a pipeline checkpoint."""
    snapshot = tracemalloc.take_snapshot()
    top_stats = snapshot.statistics("lineno")
    print(f"[mem:{label}] top {limit} allocations")
    for stat in top_stats[:limit]:
        print(f"  {stat}")

Configuration and execution parameters

The behaviour of the job is governed by a handful of tunables. Defaults below are sized for a 4 GiB worker routing against an OSRM /table service.

Parameter	Default	Valid range	Effect
`batch_size`	800	100–4,000	O-D pairs per request; lower trims peak RSS and dodges the `--max-table-size` ceiling
`db_memory_limit`	`2GB`	512MB–worker-limit	DuckDB’s spill threshold; keep below the container limit
`retry_total`	3	0–6	HTTP retry attempts before a chunk is skipped or escalated
`backoff_factor`	1.0	0.5–4.0	Exponential backoff base for 429/5xx responses
`request_timeout`	30 s	10–120 s	Per-request socket timeout
`rss_high_water`	0.85	0.7–0.95	Fraction of the container limit that triggers a graceful drain
`defer_geometry`	`True`	bool	Hold flat arrays; build polygons only at export

The single most important lever is batch_size: it bounds both the live working set and, for self-hosted OSRM, the /table request size. Treat it as the knob you tune first when a worker still trips its ceiling.

Streaming, persistence, and deferred geometry

Batch routing at scale must run as a streaming pipeline, not an in-memory accumulator. Partition the 10,000+ point dataset into fixed windows, route each window independently, parse the response, and immediately flush the rows to an out-of-core store. Crucially, never append results to a long-lived Python list or dict across iterations — those structures reallocate and pin memory that garbage collection cannot reclaim while the job runs.

When the downstream consumer needs drive-time polygons, decouple network traversal from geometry generation. Fetch the raw node-edge sequences and travel times first, store them as flat numeric columns, and materialize shapely.Polygon objects only at the final export — typically when handing off to demographic enrichment such as a point-in-polygon catchment join or writing GeoParquet for a scoring model. Polygon objects are the heaviest things in the pipeline; keeping them out of the loop is what flattens the memory curve.

python

import itertools
from collections.abc import Iterator

import duckdb


def stream_chunks(coords: list, batch_size: int = 800) -> Iterator[list]:
    """Yield fixed-size windows without materializing the full list."""
    it = iter(coords)
    while chunk := list(itertools.islice(it, batch_size)):
        yield chunk


def open_store(db_path: str = "routing_matrix.duckdb",
               db_memory_limit: str = "2GB") -> duckdb.DuckDBPyConnection:
    con = duckdb.connect(db_path)
    con.execute(f"PRAGMA memory_limit='{db_memory_limit}';")
    con.execute(
        """
        CREATE TABLE IF NOT EXISTS drive_times (
            origin_id        VARCHAR,
            dest_id          VARCHAR,
            travel_time_sec  INTEGER,
            distance_m       FLOAT
        )
        """
    )
    return con


def persist_chunk(con: duckdb.DuckDBPyConnection, rows: list[tuple]) -> None:
    """Flush one chunk's rows, then let the chunk fall out of scope."""
    con.executemany(
        "INSERT INTO drive_times VALUES (?, ?, ?, ?)", rows
    )

Resilient API boundaries

Routing endpoints are volatile: timeouts, 429 throttles, and malformed payloads will corrupt a batch job if they are not handled deterministically. Mount a pooled requests.Session with exponential backoff and validate every payload before any geometry is built, so a bad response is dropped at the boundary rather than after it has already cost memory downstream.

python

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry


def build_session(retry_total: int = 3, backoff_factor: float = 1.0) -> requests.Session:
    session = requests.Session()
    retry = Retry(
        total=retry_total,
        backoff_factor=backoff_factor,
        status_forcelist=[429, 500, 502, 503, 504],
        allowed_methods=["POST", "GET"],
    )
    adapter = HTTPAdapter(max_retries=retry, pool_maxsize=16)
    session.mount("https://", adapter)
    session.mount("http://", adapter)
    return session


def fetch_matrix(session, origins, destinations, api_url, timeout=30):
    """Fetch one O-D block. Returns None on timeout so the caller can skip."""
    payload = {"sources": origins, "destinations": destinations}
    try:
        resp = session.post(api_url, json=payload, timeout=timeout)
        resp.raise_for_status()
        data = resp.json()
        if "error" in data:
            raise ValueError(f"Routing API returned: {data['error']}")
        return data["durations"]
    except requests.exceptions.Timeout:
        return None  # log and skip; optionally re-queue or fail over
    except requests.exceptions.RequestException as exc:
        raise RuntimeError(f"Network boundary failure: {exc}") from exc

Capping memory for self-hosted engines

When the matrix is served by a self-hosted OSRM or OpenRouteService process, memory limits must be enforced at the container layer, not left to the host. Unbounded thread pools and oversized routing tables will silently eat all available RAM. A hard --memory ceiling forces graceful degradation inside the container instead of a host-level OOM kill that takes neighbouring pods with it.

bash

docker run -d --name osrm-routed \
  --memory=4g --memory-swap=4g --memory-swappiness=0 \
  -p 5000:5000 \
  -v /data:/data \
  osrm/osrm-backend osrm-routed --algorithm mld --max-table-size 10000 /data/region.osrm

--max-table-size caps the combined sources + destinations in a single /table call. For 10,000 O-D pairs this is exactly why batch_size exists: the job must issue many chunked /table requests rather than one oversized matrix call that the engine will reject.

In Kubernetes, mirror the same envelope with explicit requests and limits, and drain the worker before it crosses rss_high_water:

yaml

resources:
  requests:
    memory: "2Gi"
    cpu: "1"
  limits:
    memory: "4Gi"
    cpu: "2"

When RSS exceeds the high-water fraction, pause new chunk ingestion, flush any pending writes, and recycle the worker — a controlled drain beats a kernel kill that loses in-flight results.

Failure modes and debugging

Symptom	Likely cause	Fix
OOM kill mid-run, RSS climbs linearly	Results accumulated in a Python list/dict across chunks	Flush each chunk to DuckDB/Parquet; never keep a growing collection
OOM near the end, only on polygon export	Eager Shapely construction in the hot loop	Keep `defer_geometry=True`; build polygons once, at export
HTTP 400 `TooBig` from OSRM	`batch_size` exceeds `--max-table-size`	Lower `batch_size` or raise the daemon’s table ceiling
Cascading 429s, job stalls	No backoff; bursts hammer the endpoint	Use the retry/backoff session; reduce concurrency
Memory fine, results wrong	lat/lon axis swap, no CRS assertion	Assert EPSG:4326 on input; project only at geometry time
RSS plateaus high, never drops	DuckDB `memory_limit` set above the container	Set `db_memory_limit` below the pod limit so it spills to disk

The two failure modes worth committing to memory: a linear RSS curve always means something is being accumulated that should be streamed, and a late-stage spike almost always means geometry is being built too early.

Verification

Confirm the run was both correct and bounded before trusting its output downstream:

Row count: the persisted matrix should contain len(origins) × len(destinations) rows minus any rows the solver legitimately reported as unreachable. Query SELECT count(*) FROM drive_times; against the DuckDB store and reconcile against the expected product.
No nulls in the hot columns: SELECT count(*) FROM drive_times WHERE travel_time_sec IS NULL; should return 0 unless unreachable pairs are expected.
Peak RSS budget: assert the high-water mark in CI so a regression that reintroduces accumulation fails the build rather than production. A pytest fixture that records tracemalloc.get_traced_memory()[1] and compares against a threshold is enough.
Geometry validity: if polygons were exported, sanity-check that every geometry is_valid and that the union’s bounding box falls inside the study region — the same checks used when validating spatial join accuracy apply here.

Together these gates turn a brittle batch script into a resilient, memory-bounded stage that scales with portfolio size instead of with the size of the available RAM.

Caching Strategies for Repeated Network Queries — eliminate the redundant solves before bounding the remaining ones
Optimizing Batch Isochrone Generation with OSRM — the matrix engine this job calls into
Troubleshooting Disconnected Road Networks in Rural Areas — why some O-D pairs come back unreachable
Comparing OSRM vs. Valhalla for Retail Catchment Analysis — choosing the engine behind the batch

← Back to Caching Strategies for Repeated Network Queries

Reducing memory overhead for 10,000+ point batch routing

Prerequisites #

Establishing a memory baseline #

Configuration and execution parameters #

Streaming, persistence, and deferred geometry #

Resilient API boundaries #

Capping memory for self-hosted engines #

Failure modes and debugging #

Verification #

Related #

Prerequisites

Establishing a memory baseline

Configuration and execution parameters

Streaming, persistence, and deferred geometry

Resilient API boundaries

Capping memory for self-hosted engines

Failure modes and debugging

Verification

Related