Overview

HALDA

What it is

HALDA is a digital humanities project that preserves and reanimates a series of World War II love letters exchanged between my grandparents. The work spans end‑to‑end: careful digitisation of fragile manuscripts, AI‑assisted transcription, intelligent post‑correction of OCR errors, thematic analysis of the correspondence, and the creation of GranBot (for Grandma and Grand‑dad)—a conversational agent that replies in the style and voice of my grandparents, grounded in the letters themselves.

Why it matters

A story of hope, distance, and everyday life under wartime pressures—made searchable, explorable, and conversational.

Objectives

What We Achieved

Preserve & Access

Create high‑quality digital copies of the original letters and make the content searchable.

Faithful Transcription

Use OCR plus targeted post‑correction to achieve readable, historically faithful text.

Contextual Understanding

Surface key events, people, places, and recurring themes across the correspondence.

Conversational Exploration

Allow readers to “talk to the archive” via GranBot—while making provenance and limitations explicit.

Source Material

What the Archive Contains

Letters

Handwritten and typed letters dated during World War II.

Context

Envelopes, dates, locations, and marginalia that provide critical context.

Supporting Assets

Family photo fragments and keepsakes referenced within the letters (optional supporting assets).

Workflow at a Glance

From Paper to Searchable Conversation

Digitisation

Gentle surface prep and flatbed/photo capture at archival‑quality resolution. File‑naming convention encodes date, sender, recipient, and sequence number for traceability.

OCR (Optical Character Recognition)

Initial pass using AI‑powered OCR across mixed handwriting/typed pages. Layout detection to separate body text, headers, addresses, and postscripts.

AI Post‑Correction

Custom prompts and lexicons (period spellings, military terms, family names) guide targeted correction. Human‑in‑the‑loop review workflow flags low‑confidence tokens for manual approval.

Entity & Theme Analysis

Extraction of people, places, dates, and events. Topic modelling + clustering to reveal themes (separation, rationing, hope, logistics, humour, future plans). Timeline reconstruction of the relationship and wartime movements.

GranBot (Conversational Agent)

Retrieval‑augmented generation over the verified letter corpus. Responses emulate tone and style while citing source passages on request. Guardrails prevent speculation beyond the archive and disclose that replies are AI‑generated.

Techniques & Tooling

Methods Under the Hood

OCR & Layout

AI‑based OCR with handwriting support; page‑level layout segmentation.

Post‑Correction

Domain lexicons, edit‑distance heuristics, and transformer‑based contextual fixes.

NLP Pipeline

Named‑entity recognition, coreference resolution, topic modelling, and temporal parsing.

RAG for GranBot

Semantic search over chunked letters with strict grounding and citation options.

Versioning

Every page and text revision tracked; each bot reply links to the exact sources used.

Quality & Validation

Checking the Work

Before/After Samples

Side‑by‑side before/after OCR samples for each letter batch.

Errata Log

A running errata log for known ambiguities or contested readings.

Reviewer Sign‑off

Reviewer initials recorded for manual corrections and approvals.

Ethical & Responsible Use

How We Handle Sensitive History

Consent & Privacy

Only letters cleared by the family are included.

Provenance

GranBot indicates which letters inform each answer; users can open the source snippet.

Scope Disclosure

The bot is a stylistic emulation based on the letters; it is not the literal voice of my grandparents.

Redaction Policy

Sensitive details (addresses, living persons, or private data) can be masked in public builds.

Outcomes

What You Can Do

Searchable Digital Archive

A searchable digital archive of the correspondence.

Mapped Timeline

A timeline of key events and places, mapped against wartime context.

GranBot

Natural‑language exploration—e.g., “How did they plan for after the war?” or “What did they say about rationing?”

What’s Next

Where HALDA Goes From Here

Expand the Corpus

Expand the corpus with additional letters and photos as they are discovered.

Handwriting Fine‑tuning

Handwriting model fine‑tuning for recurring hands and letterforms.

Public History & Teaching

Public history exhibits and classroom materials built on the HALDA corpus.

Credits Project lead: Stephen Midgley • Co‑worker: Anais Lavine du Cadet • Primary sources: Family archive • With love and respect to my grandparents, whose words inspired this work.