For family historians, a single misread word can be catastrophic. Faded parish registers and hurriedly scribbled 18th-century wills routinely conceal surnames, locations, and crucial dates beneath layers of age and haste. Misreading “Moor” as “Moon” is all it takes. The result is an incompatible branch grafted silently onto a family tree, and you may not catch it for years. Who hasn’t seen errors perpetuated in online trees?
AI models have made significant strides in transcription accuracy, yet still average a handful of errors per page. These mistakes are rarely wild inventions, more often plausible-looking errors that don’t raise alarms. That’s what makes them so dangerous.
Canadian historian Mark Humphries tackles this problem in his blog post, When Models Disagree…Transcription Accuracy Improves Significantly (subscribe to read). The approach is to pass a document through three distinct AI model families, such as Gemini, Claude, and GPT, then overlay the outputs. Where the models diverge, the system flags a potential error. Each model family has its own architectural blind spots, and disagreement between them is a reliable signal that something deserves a closer look. Cross-model consensus catches roughly three-quarters of all transcription errors. That pushes accuracy close to that of a skilled palaeographer and reduces manual review to only the most problematic passages.
Humphries outlines a fully automated process, but you don’t need that for a single knotty problem. There’s a simpler, poor man’s version. Feed the paragraph, or better yet, the full page to preserve context, through three or more AI models and compare what comes back. If that sounds tedious, hand the comparison itself to an AI. It takes minutes. The discrepancies it surfaces are where your attention should go.

