A reminder for those with a Legacy Family Tree Webinars subscription of the end-of-year webinar, The Best Uses of AI for Genealogists, by Steve Little and Mark Thompson. It’s on Friday at 2 pm ET.
Here are two items AI enthusiasts may also want to review.
The first, the video of the NGS event “Navigating the AI Frontier”, now freely available on the NGS YouTube page.
Also, the latest episode of the Family History AI Show. Episode 39 examines fourteen key trends shaping the future of family history AI. Toward the end, they address AI Licensing Deals and How They Change Internet Access Patterns.
That set me wondering: will that extend to AI companies funding the digitization of the 95% of material in archives that isn’t digitized? When will robotics reach the stage of safely retrieving hard-copy materials, some of them fragile, digitizing them, and having AI evaluate them against criteria to ensure privacy embargoes are respected?
Here, unedited, is Gemini’s response to my query.
The shift toward AI licensing deals is creating a “data rush” that is fundamentally changing how archives are viewed. Where physical records were once seen as a liability or a storage cost, they are now being revalued as high-quality, “clean” training sets for Large Language Models (LLMs).
Will AI Companies Fund Massive Digitization?
The short answer is yes, but selectively. While 95% of archival material remains undigitized, AI companies are primarily interested in “high-signal” data—records that are dense with human knowledge, historical context, or unique languages.
-
Targeted Partnerships: We are already seeing “Generative AI Innovation Funds” (like AWS’s $1 million grant to digitize the Jane Goodall Institute’s archives) where tech giants foot the bill to turn handwritten notes and analog media into AI-ready data.1
-
The “Clean Data” Premium: As the internet becomes saturated with AI-generated content, “pristine” historical records (government records, census data, and old newspapers) become more valuable. Licensing deals, such as those being tracked by Ithaka S+R, suggest that archives can use these funds to cover the high costs of digitization.
-
The 95% Challenge: The hurdle remains the sheer volume. Tech companies will likely fund collections with the broadest utility first (e.g., national archives or major genealogical databases) before moving into niche local histories.
The Rise of Archival Robotics
The technology to safely handle “cold” hardcopy is advancing through a combination of industrial automation and specialized suction-based systems.
| Development Stage | Technology Status (2025) | Capabilities |
| Fragile Handling | In Pilot/Active Use | Systems like the University of Innsbruck’s robotic scanners use “air knives” and vacuum suckers to turn pages of 100-year-old fragile documents without human contact. |
| High-Volume Digitization | Mature | Companies like Ripcord use AI-powered robots to remove staples and fasteners 10x faster than humans, processing up to 1 billion pages annually. |
| Complex Media | Developing | Robots are now learning to handle non-standard items like glued cards, folded maps, and varying paper weights using 3D vision and ultrasonic sensors. |
AI Evaluation of Privacy Embargoes
Automated privacy adherence is arguably the most advanced part of the pipeline. The National Archives and Records Administration (NARA) is already piloting AI systems to handle this specific bottleneck.
1. PII Detection and Redaction
AI models are now used to scan digitized text for Personally Identifiable Information (PII). Instead of a human reading every line, the AI flags names, addresses, and social security numbers, applying redactions automatically based on the age of the record (e.g., the “72-year rule” in the US).
2. Declassification and Embargo Logic
AI is being trained on “declassification guides”—complex sets of rules that determine if a document is still sensitive. This allows archives to move beyond “blanket” embargoes (where everything is hidden for 70 years) to a more surgical approach where safe parts of a collection can be released sooner.
3. Ethical AI Governance
A major trend in late 2025 is the “Human in the Loop” (HITL) model. AI performs the first 99% of the heavy lifting—transcription, entity extraction, and privacy flagging—while professional archivists act as “quality controllers” for the 1% of ambiguous cases.
.


What’s new?
What does it take to hook the reader of a family history article? 
New this week:
Branch members are invited to attend the annual Christmas Social and Dessert Potluck on Saturday, 13 December, from noon to 2 pm at Room 226, City of Ottawa Archives, 100 Tallwood Drive, Ottawa.
It’s from the