Overview
HALDA
What it is
HALDA is a digital humanities project that preserves and reanimates a series of World War II love letters exchanged between my grandparents. The work spans end‑to‑end: careful digitisation of fragile manuscripts, AI‑assisted transcription, intelligent post‑correction of OCR errors, thematic analysis of the correspondence, and the creation of GranBot (for Grandma and Grand‑dad)—a conversational agent that replies in the style and voice of my grandparents, grounded in the letters themselves.
Why it matters
A story of hope, distance, and everyday life under wartime pressures—made searchable, explorable, and conversational.
Objectives
What We Achieved
Preserve & Access
Create high‑quality digital copies of the original letters and make the content searchable.
Faithful Transcription
Use OCR plus targeted post‑correction to achieve readable, historically faithful text.
Contextual Understanding
Surface key events, people, places, and recurring themes across the correspondence.
Conversational Exploration
Allow readers to “talk to the archive” via GranBot—while making provenance and limitations explicit.
Source Material
What the Archive Contains
Letters
Handwritten and typed letters dated during World War II.
Context
Envelopes, dates, locations, and marginalia that provide critical context.
Supporting Assets
Family photo fragments and keepsakes referenced within the letters (optional supporting assets).
Workflow at a Glance
From Paper to Searchable Conversation
Digitisation
Gentle surface prep and flatbed/photo capture at archival‑quality resolution. File‑naming convention encodes date, sender, recipient, and sequence number for traceability.
OCR (Optical Character Recognition)
Initial pass using AI‑powered OCR across mixed handwriting/typed pages. Layout detection to separate body text, headers, addresses, and postscripts.
AI Post‑Correction
Custom prompts and lexicons (period spellings, military terms, family names) guide targeted correction. Human‑in‑the‑loop review workflow flags low‑confidence tokens for manual approval.
Entity & Theme Analysis
Extraction of people, places, dates, and events. Topic modelling + clustering to reveal themes (separation, rationing, hope, logistics, humour, future plans). Timeline reconstruction of the relationship and wartime movements.
GranBot (Conversational Agent)
Retrieval‑augmented generation over the verified letter corpus. Responses emulate tone and style while citing source passages on request. Guardrails prevent speculation beyond the archive and disclose that replies are AI‑generated.
Techniques & Tooling
Methods Under the Hood
OCR & Layout
AI‑based OCR with handwriting support; page‑level layout segmentation.
Post‑Correction
Domain lexicons, edit‑distance heuristics, and transformer‑based contextual fixes.
NLP Pipeline
Named‑entity recognition, coreference resolution, topic modelling, and temporal parsing.
RAG for GranBot
Semantic search over chunked letters with strict grounding and citation options.
Versioning
Every page and text revision tracked; each bot reply links to the exact sources used.
Quality & Validation
Checking the Work
Before/After Samples
Side‑by‑side before/after OCR samples for each letter batch.
Errata Log
A running errata log for known ambiguities or contested readings.
Reviewer Sign‑off
Reviewer initials recorded for manual corrections and approvals.
Ethical & Responsible Use
How We Handle Sensitive History
Consent & Privacy
Only letters cleared by the family are included.
Provenance
GranBot indicates which letters inform each answer; users can open the source snippet.
Scope Disclosure
The bot is a stylistic emulation based on the letters; it is not the literal voice of my grandparents.
Redaction Policy
Sensitive details (addresses, living persons, or private data) can be masked in public builds.
Outcomes
What You Can Do
Searchable Digital Archive
A searchable digital archive of the correspondence.
Mapped Timeline
A timeline of key events and places, mapped against wartime context.
GranBot
Natural‑language exploration—e.g., “How did they plan for after the war?” or “What did they say about rationing?”
What’s Next
Where HALDA Goes From Here
Expand the Corpus
Expand the corpus with additional letters and photos as they are discovered.
Handwriting Fine‑tuning
Handwriting model fine‑tuning for recurring hands and letterforms.
Public History & Teaching
Public history exhibits and classroom materials built on the HALDA corpus.
