← Back to Blog

How to Digitise an Archive: A Practical Guide for Heritage Professionals

Britain holds an extraordinary amount of undigitised heritage. Parish records, regimental photograph albums, oral history sound recordings, maps and oversized plans — countless collections sit in filing cabinets and climate-controlled strongrooms, slowly deteriorating. For archivists and heritage professionals, learning how to digitise an archive is no longer optional; it is an act of preservation.

This guide is written specifically for small teams running their first digitisation project — a local history society with a few volunteers, a museum with one part-time collections officer, a council archive starting with a single priority collection. The aim is to get you from “we should digitise this” to a working, described, searchable set of digital records without getting lost in the detail.

If you are planning a larger or more complex programme — multiple collections, mixed media, grant-funded work — our complete guide to digitising an archive is the deeper strategic resource. It covers the full end-to-end system and is designed as a reference you can return to throughout a multi-phase project. This guide is the practical starting point; that one is the comprehensive framework.


Plan your digitisation project before you scan

The biggest mistake in any digitisation project is starting without a plan. Before a single scanner lid goes down, you need to understand what you are dealing with.

Conduct a collection survey. Walk through what you hold and note the range of materials: paper documents, photographs, glass plates, microfilm, audio reels, video cassettes, maps. Note condition — fragile items may need conservation treatment before they can be safely handled.

Prioritise what to digitise first. Not everything needs to be digitised at once. Ask: what is most fragile? What does your audience most want access to? A box of Victorian photographic prints that are actively fading deserves to go ahead of photocopied administrative records from 1997.

Set standards before you begin. The National Archives and JISC both publish digitisation guidance for UK institutions. Decide on your minimum resolution, your filename convention, and your metadata schema before you digitise a single item. Changing these mid-project is expensive.


Choosing digitisation equipment

You do not need to digitise items with expensive equipment to produce good results, but you do need to match your scanner or digital camera to your material. Our guide to choosing document scanning equipment covers this in detail.

Flatbed scanners are suitable for most two-dimensional paper documents and photographs. For general document digitisation, 300 DPI is the accepted minimum; for photographs and detailed maps, 600 DPI or higher is appropriate. For items that cannot be pressed flat — bound volumes, fragile originals — an overhead or book scanner avoids the physical pressure of a flatbed lid.

A digital camera on a copy stand is a flexible alternative, particularly useful for three-dimensional objects or for digitising items in the field. Many institutions digitize large-format items such as maps or architectural plans this way.

For audio and video, cassette tapes and analogue video formats should be digitised via direct capture, not re-recorded through a speaker. Broadcast WAV at 48kHz/24-bit is the standard for audio; for video, uncompressed or lightly compressed formats are preferred over consumer formats for archival masters.


File formats and digital preservation

File formats matter for longevity. For image files, TIFF is the archival master format — lossless, widely supported, and stable over time. JPEG is acceptable for access copies but not for masters. PDF/A is the standard where text content matters.

For a full breakdown of which formats to use and why, see our guide to file formats for digital preservation.

The key principle: avoid proprietary formats that may not be readable in twenty years. US digitization standards from institutions such as the Library of Congress and FADGI align closely with UK practice on this point — both recommend open, standardised formats for any digital archive intended to last. The BagIt specification (RFC 8493) is also worth understanding if you plan to package files for long-term deposit or transfer.


Metadata: the difference between a scan and a searchable archive

A folder of image files named IMG_0034.tif, IMG_0035.tif, IMG_0036.tif is not a digital archive. It is a digital pile. Metadata is what transforms scanned copies into a searchable, usable collection.

At minimum, each item needs: a unique identifier, a title or description, a date or date range, a format description, and relevant subject or person tags. Where possible, record provenance — where did this original document come from, who created it, how did it enter your collection?

Optical character recognition (OCR) adds another layer of searchability for paper records. Applying OCR to scanned text makes paper documents full-text searchable without manual transcription. Open source tools such as Tesseract can handle many digitisation projects; commercial OCR tools offer greater accuracy for complex layouts or handwriting.

The Dublin Core schema is a useful starting point for catalogue metadata in heritage collections. If your institution uses a catalogue system, metadata should flow into that system, not live only in a spreadsheet. For photographic material specifically, see our guide to how to catalogue a photograph collection.


Storage and digital preservation

Digitisation is not preservation unless files are stored safely. The standard recommendation is the 3-2-1 rule: three copies, on two different types of media, with one copy stored off-site or in the cloud.

Local storage — an external drive, a NAS — is fine for one copy. It is not sufficient on its own. Drives fail. A cloud backup is not a luxury; for digital records of heritage value, it is basic due diligence.


Making digitised collections accessible

Once you have your digital files, the work of making the collection accessible begins. Traditionally, this meant weeks of manual cataloguing — opening each file, reading or examining it, writing a description, adding tags. For large collections, that work is simply not funded.

The Archiver is built for exactly this problem. Upload your scanned documents, photographs, audio files, and video, and the AI reads, classifies, and describes each item automatically. Handwritten letters get transcribed. Photographs get described. Documents get tagged by subject and entity. The result is a digitally searchable collection from day one — not after six months of cataloguing.

You can also explore all features on the features page to see how The Archiver handles different types of archival material, or see how it works for a step-by-step walkthrough of the upload-to-search process.


Digitising an archive is a significant undertaking, but it does not have to be overwhelming. Start with a clear plan, use appropriate standards, protect your files with redundant storage, and use tools that do the heavy cataloguing work for you.

Request early access to try The Archiver on your own collection.

Try The Archiver on your collection

Request early access and see what AI cataloguing can do for your collection.

Request early access

Updates from The Archiver