Skip to content
← Writing
10 Feb 2026·5 min

A translation pipeline that survives crashes

Translating tens of thousands of passages with an LLM means long, flaky, expensive runs. Here's the checkpointing and quality-gating that makes it safe.

LLMData engineeringReliability

Translating a library is not a single big API call. It's tens of thousands of small ones, run over hours, against an API that will time out, rate-limit, and occasionally hand back something subtly wrong. Three properties mattered more than raw speed: don't lose work, don't ship bad output, and don't overpay.

Don't lose work: atomic checkpoints

Every translated passage is written to a checkpoint with an atomic write — temp file, then rename, with a backup — so a crash or reboot three hours in resumes exactly where it stopped instead of starting over. The checkpoint also stores enough of the source to detect staleness: if a later re-parse shifts the source text, the cached translation is invalidated and redone rather than silently trusted.

Don't ship bad output: two quality tiers

A purely deterministic linter runs first and free — it catches truncation, untranslated Arabic left in the output, and house-style violations. Only what passes goes to a second, model-based fidelitycheck that looks for the failures a linter can't see: omissions, additions, and distortions relative to the source. Anything flagged is re-translated automatically.

for passage in chapter:
    out = await translate(passage, context=window)   # sliding context
    if not deterministic_lint(out):                  # free, fast
        out = await translate(passage, retry=True)
    checkpoint.save(passage.id, out)                 # atomic write

# second pass: model fidelity gate (omission / addition / distortion)
flagged = await fidelity_check(chapter)
await retranslate(flagged)

Two tiers exist for a reason: the cheap gate removes most problems for nothing, so the expensive gate only runs on what survives.

Don't overpay: cache the stable prefix

The system prompt and glossaries are large and identical across calls, so they're sent as a cached prefix — roughly a 90% reduction on that portion. The output token budget scales with input size so a giant chapter doesn't time out while a short one stays cheap, and concurrency is bounded by a semaphore with per-chapter windowing to avoid latency cliffs on very large sections.

What I'd change

The biggest gap is that quality is still validated mostly by manual audit scripts. The next step is a real evaluation set so fidelity changes are measured, not assumed — and pulling the per-source parsers behind one adapter interface, which is the seed of an open-source ingestion framework.