← Back to Blog

File Formats for Digital Preservation: What to Use and Why

The file format you choose at the start of a digitisation project will shape whether your digital collection survives for decades or becomes inaccessible within years. Choosing the wrong file format is a slow disaster — rarely obvious at the time, but significant when you discover your audio files no longer open or your images were compressed in a way that cannot be corrected.

This guide covers the key file formats for preservation across still image, document, audio, and moving image works, drawing on guidance from the Digital Preservation Coalition and the PRONOM file format registry.

File Format Identification: Start with PRONOM

Before committing to any file format, archivists should consult the PRONOM registry, maintained by the National Archives. PRONOM is the authoritative file format registry for digital preservation practice in the UK, cataloguing file format information including version, format identification codes (PUIDs), and risk assessments. The Library of Congress digital preservation recommendations provide a complementary US perspective that aligns closely with UK practice.

The Digital Preservation Coalition’s handbook (available via the DPC Wiki) sets out a clear framework for file format selection: prefer open formats over proprietary ones, and prefer formats with active registry entries and wide software support. These principles underpin everything below.

Still Image Works: TIFF and JPEG

For still image works, the standard split is TIFF for archival masters, JPEG for access copies.

TIFF is an open, uncompressed file format with strong long-term preservation credentials. It supports high bit-depth colour and is widely registered in digital preservation format registries as the recommended file format for analogue-to-digital conversion. Every major guidance on file formats recommended for archival masters — including TNA and the Digital Preservation Coalition — endorses TIFF.

JPEG is a lossy format. Lossy formats discard data on compression, and that data cannot be recovered. Use JPEG to derive access copies from your TIFF masters; never use it as your preservation file format.

PNG is a lossless open source format suitable for born-digital graphics, screenshots, and flat-colour images. For photographic and document material in a digitisation project, TIFF remains the preferred archival file format.

Document Preservation: PDF/A

For document file formats, PDF/A is the archival standard. It is an ISO-standardised subset of PDF, designed specifically for long-term preservation. A PDF/A file embeds all fonts, colour profiles, and metadata within the digital file itself — making it self-contained and independent of external software or data service dependencies.

Plain PDF is acceptable for access copies. Proprietary file formats such as DOCX carry obsolescence risk: they are controlled by a private company, change across versions, and should be converted to PDF/A before archiving. The Digital Preservation Coalition’s handbook is clear that the selection of formats should favour open, documented file formats wherever possible.

Audio Works: BWF and WAV

For audio works, BWF (Broadcast Wave Format) is the standard preservation file format. BWF is an extension of WAV that embeds metadata — origination, provenance, and technical information — within the digital object itself. WAV without the metadata chunk is also acceptable.

MP3 and other lossy formats are for access copies only. Lossy formats permanently discard audio data on encoding. Once your audio is encoded as MP3, the lost data cannot be recovered from that digital file. Derive your access copies from BWF or WAV masters and keep the masters untouched. The PREMIS data dictionary maintained by the Library of Congress provides the standard vocabulary for recording preservation metadata alongside your chosen formats.

Moving Image Works: A Note

For moving image works, the digital preservation community increasingly recommends MKV or MOV containers with the FFV1 lossless codec for archival masters. These are large digital files, but preservation of digital objects at full fidelity justifies the storage cost. Use MP4 (H.264) for access copies.

For guidance on where to store your digital files securely, see our guide to digital archive backup and storage. When packaging files for transfer or deposit, the BagIt specification (RFC 8493) provides a standard way to bundle files with fixity checksums.

Format Choice Depends on Context

The recommendations above are sound general guidance, but format selection is not a one-size-fits-all decision. The right choice depends on the source material, the intended use, the institution’s technical capacity, and the preservation requirements of the collection. A small local history society digitising a photograph album faces different constraints from a national repository ingesting born-digital government records at scale.

Before committing to a format strategy, consider: what types of material are you working with? What are your storage and processing resources? Will you need to serve access copies online, or primarily preserve for long-term deposit? Do your funders or deposit agreements specify particular formats? Your choice of document scanning equipment will also influence which output formats are available to you. Answering these questions first prevents over-engineering for collections that need pragmatic solutions — and under-specifying for collections that demand rigorous preservation treatment.

The following table summarises the typical format split across three tiers — preservation master, mezzanine (working intermediate), and access copy — by material type:

Material type Preservation master Mezzanine / working copy Access copy
Photographs and documents TIFF (uncompressed) TIFF or high-quality JPEG 2000 JPEG
Text documents PDF/A PDF/A PDF
Audio BWF (Broadcast WAV) WAV MP3 or AAC
Moving image MKV/FFV1 or uncompressed MOV ProRes or high-bitrate H.264 MP4 (H.264)

Not every institution will need a mezzanine tier. For smaller collections, a two-tier approach — master and access — is often sufficient and simpler to manage. The key principle is that masters are never modified, and access copies are always derived from the best available source.

The Two-Format Rule for Long-Term Access

Across all file format types — still image, document, audio, and moving image — the principle is the same: maintain a master file format and an access file format.

The master file format exists for long-term preservation. It is lossless, conforms to open standards, and is registered in a file format registry such as PRONOM. The access copy is derived from the master and used for research data requests, online delivery, and sharing. You never modify the master, and you never derive a new master from an access copy.

This split is the foundation of sound preservation planning. For a full picture of how it fits into wider digital preservation strategy, see our features overview.

The Archiver accepts uploads in any file format — TIFF, PDF/A, BWF, WAV, MP4, and more — and handles the cataloguing layer so you can focus on the digitisation work itself. Request early access to try it on your own collection.

Try The Archiver on your collection

Request early access and see what AI cataloguing can do for your collection.

Request early access

Updates from The Archiver