← Back to Blog

The Complete Guide to Digitising an Archive

Digitising an archive is one of the most important things a heritage organisation can do — and one of the most easily done badly. Parish registers are browning at the edges. Photographs are fading in acid-tinged envelopes. Audio on magnetic tape is losing signal. The physical carriers of our collective memory are deteriorating, and digitisation is no longer a nice-to-have.

But digitisation only succeeds when planning, capture quality, metadata, and preservation are designed together from the start. Treating them as separate projects — scan first, describe later, think about storage eventually — is how organisations end up with thousands of unnamed image files on a hard drive that no one can search and no funder will support.

This guide maps the full digitisation process as a connected system, not a set of isolated tasks. The framework is straightforward: assess, capture, describe, preserve, publish. Each stage depends on the decisions made in the one before it, and each section below follows that sequence. Follow the “Read more” links for detailed technical guidance on each topic.


Planning your digitisation project

Every digitisation project that runs into trouble traces its problems back to inadequate planning. Before a scanner is switched on, you need a clear picture of what you hold.

Survey the collection first — noting formats, approximate quantities, condition, and any conservation concerns. Items in an archive that are actively deteriorating (fading photographs, sticky-shed audio tape, fragile items showing physical damage) should go to the front of the queue. High-demand material follows. Low-risk administrative records can wait.

Establish your standards before you begin: minimum resolution, file naming conventions, metadata schema, and folder structure. Changing these decisions mid-project is expensive and demoralising. The National Archives publishes detailed guidance on digitisation standards for UK institutions. A well-designed digitisation project plan is also essential evidence for grant applications — funders want to see professional-standard methodology before they commit.

Read more: How to Digitise an Archive: A Practical Guide


Choosing digitisation equipment

The right digitisation equipment depends entirely on the range of materials you are scanning.

For most flat, unbound documents and photographs, a flatbed scanner produces professional-quality digital images at 300–600 DPI at reasonable cost. Bound volumes, oversized maps, and fragile items that cannot safely be pressed under a lid call for an overhead book scanner instead. Photographic negatives need different capture settings again. Microfilm and microfiche require a dedicated scanner with the correct carrier system.

Audio and video material — cassette tapes, reel-to-reel recordings, VHS, cine film — require specialist capture equipment. Attempting to record audio through a speaker rather than a direct line output is a common and avoidable error that degrades the digital output significantly.

The principle throughout: match the scanner or capture device to the archival material, and capture at the highest quality your equipment allows. Access copies can be generated from masters; masters cannot be reconstructed from degraded access copies. For a detailed breakdown of what to look for when buying, see our guide to choosing document scanning equipment.

Read more: Choosing Document Scanning Equipment for Your Archive


File formats for digital preservation

Choosing the right file formats is one of the most consequential technical decisions in a digitisation project — yet it is often treated as an afterthought.

Use open, standardised, lossless formats for your archival masters. For still images, TIFF is the archival standard: lossless, stable, and widely supported. JPEG introduces compression artefacts and should never be used for a master image file — it is appropriate only for access copies. For text documents, PDF/A is the correct format where content integrity matters. For audio, Broadcast WAV (BWF). For video, uncompressed or lightly compressed masters.

Avoid proprietary formats tied to specific software. A file that can only be opened in software that no longer exists is, for preservation purposes, a failed file. PNG is acceptable for access images but TIFF remains the preferred master format for original documents and manuscript material. Ensuring your digital outputs meet WCAG accessibility standards is also important if you plan to publish collections online.

Read more: File Formats for Digital Preservation: What to Use and Why


Metadata: from digital images to searchable collection

A folder of TIFF files named scan_001.tif through scan_4782.tif is not an archive. It is a pile of image files. Without metadata — structured, consistent, descriptive information attached to each record — those digital copies cannot be searched, understood, or exchanged with other systems.

Good metadata answers the questions a researcher will ask: what is this? When does it date from? Who created it? What is its relationship to other items in the collection?

Metadata creation has traditionally been the slowest, most labour-intensive step in any digitisation project — often taking as long as the scanning itself. Optical character recognition (OCR) converts image files into machine-readable text, which a search index can retrieve. AI classification goes further: reading each document or photograph, identifying what it is, extracting entities, and suggesting descriptive terms.

The Archiver handles exactly this bottleneck. Upload your scans — documents, photographs, audio, video — and the AI generates metadata automatically, making the collection searchable from day one. Output is available in EAD3, BagIt, and CSV. See how it works for a step-by-step walkthrough, or explore the full feature set. For large archival collections, this collapses the time from scan to described, searchable record by 60 to 80 per cent compared with manual transcription and indexing.

Request early access to try The Archiver on your own collection.


Storage and backup for digital archives

Digitising a collection and then losing the digital files to a hard drive failure is not hypothetical. It is a common outcome for under-resourced projects that treat storage as an afterthought.

The 3-2-1 rule is the baseline standard for digital preservation storage: three copies of every file, on at least two different storage media types, with at least one copy held off-site. Local storage is fast and convenient for working files but vulnerable to fire, flood, and hardware failure. Cloud storage addresses the off-site requirement and provides redundancy at scale. Most archives need both.

Storage is not a set-and-forget decision. Digital files require active stewardship: periodic integrity checks, migration to new formats as older ones become obsolete, and documented responsibility for who manages what. The NLHF digital good practice guidance emphasises that storage and preservation planning should be built into project budgets from the outset.

Read more: How to Store and Back Up Your Digital Archive


Start your digitisation project

Digitisation is a long game, but every collection starts with a first item. Begin with a sound plan, correct standards, and the right tools — so that the work you do now remains useful, searchable, and reusable in twenty years. For a practical roadmap that breaks the process into manageable phases, see our heritage digitisation project lifecycle roadmap.

The Archiver gives heritage professionals — including community archives and local history groups — an AI-powered platform to upload, classify, describe, and search digital archives without specialist technical infrastructure. Request early access to try The Archiver on your own collection.

Request early access

For more on what comes after the scan, read why a digitised archive is only half the story. And if you are applying for funding, our guide to writing a digital preservation grant application covers what funders expect.

Try The Archiver on your collection

Request early access and see what AI cataloguing can do for your collection.

Request early access

Updates from The Archiver