15 January 2026
Why Your Digitised Archive Is Only Half the Story
Over the past two decades, heritage organisations across the UK have invested heavily in digitisation. Millions of photographs, manuscripts, maps, and audio recordings have been scanned, photographed, and converted to digital formats. Preserving fragile materials as digital objects is essential — but it is only half the work.
Here is the uncomfortable truth: a digitised archive without metadata is storage, not access.
The cataloguing gap that no one talks about
A digitised photograph without metadata is just a digital file on a server. Without knowing who took it, when, where, or what it depicts, it is invisible to researchers, educators, and the public. It might as well still be in the box.
Many organisations completed digitisation projects under pressure, with limited cataloguing budgets. The priority was capture — digitise everything before it deteriorates further. Proper cataloguing was deferred. But deferred rarely means done. Staff are stretched. Backlogs grow. And the collection sits in digital limbo.
“We have 40,000 digitised images. About 3,000 have what I’d call usable metadata. The rest have filenames and not much else.” — Local Studies Librarian, Northern England
This is not unusual. It is the rule. And it matters enormously, because metadata is what transforms a digital file into a digital resource that people can actually use.
What is digitised archive metadata — and why does it have layers?
Metadata is not a single thing. Archival metadata exists in distinct types, and understanding those types is the first step towards addressing the gap.
Descriptive metadata is the most visible layer. It answers the basic questions: what is this item, who created it, when, where, and what does it depict? Descriptive metadata includes titles, dates, creators, subjects, descriptions, and format information. Without it, your digital collection cannot be searched, browsed, or indexed by any external system.
Structural metadata describes how digital objects relate to one another — how pages belong to a manuscript, how a series of photographs form a sequence, how a folder of digital records maps to an archival hierarchy. It is what allows a digital archive to reflect the original order and provenance of a collection.
Administrative metadata covers rights, provenance, and collection management information — who holds copyright, what access conditions apply, and where the item came from. This layer supports governance, compliance, and reuse decisions.
Preservation metadata records the technical history of a digital file: its original format, any migration events, checksums, and the audit trail of changes over time. Preservation metadata is what allows a digital library or repository to demonstrate authenticity and manage long-term digital preservation.
Metadata standards for digital archives
The archival community has developed international standards to make metadata consistent, interoperable, and exchangeable. Every archivist working with digital collections should be familiar with the main ones.
Dublin Core is the most widely adopted metadata schema for digital resources. It defines fifteen core elements — title, creator, subject, description, date, format, and others — that can be applied to almost any digital content. Dublin Core metadata is expressed in XML and is supported by virtually every content management system and digital repository.
EAD (Encoded Archival Description) is an XML schema developed specifically for archival finding aids. It encodes the hierarchical structure of archival collections in a way that machines can read and that supports interoperability between institutions. EAD is maintained by the Library of Congress and the Society of American Archivists, and it is the standard for sharing descriptive metadata across archival systems.
METS (Metadata Encoding and Transmission Standard) is a Library of Congress standard for encoding structural metadata and administrative metadata about digital objects. METS is commonly used to bundle descriptive, structural, and preservation metadata together into a single XML package for deposit into a repository.
PREMIS (Preservation Metadata: Implementation Strategies) is the international standard specifically for preservation metadata. It defines the information a digital archive needs to support long-term preservation — including provenance, rights, and technical environment.
For a detailed breakdown of EAD and how it works in practice, see our guide to the EAD metadata standard explained.
What happens when metadata is missing or incomplete
The costs of poor archival metadata are practical, not theoretical.
Discoverability collapses. Search engines cannot index what is not described. Internal finding aids cannot surface items without descriptive metadata. Researchers who cannot find materials cannot use them — and cannot cite, share, or build on them.
Reuse is blocked. Grant bodies, educators, publishers, and digital platforms all need clear rights and provenance information before they can reuse digital content. Missing administrative metadata creates legal uncertainty that stops reuse dead.
Preservation is compromised. Without preservation metadata and a reliable audit trail, there is no way to verify that a digital file has not degraded, been corrupted, or been silently modified during migration.
Interoperability fails. Institutions that want to contribute records to aggregators — Europeana, the Archives Hub, local consortia — need metadata that conforms to agreed standards. Non-standard or absent metadata means exclusion from the digital information ecosystem.
How AI is addressing the metadata backlog
The good news for any archivist facing a backlog of under-catalogued digital materials is that AI-powered cataloguing has matured significantly. It is now practical, affordable, and capable of working across all formats.
Modern tools can analyse a digitised photograph and generate descriptive metadata — people, places, objects, approximate date — as a structured draft for the archivist to review. A handwritten manuscript can be transcribed, its contextual metadata extracted, and the result encoded against a metadata standard of the organisation’s choosing. Audio and video recordings can be transcribed, summarised, and indexed using topic modelling and named entity recognition.
Critically, AI does not replace the archivist. It provides a structured first draft — reducing the time cost of cataloguing from hours per item to minutes — while the professional applies judgement, contextual knowledge, and quality assurance. The result is metadata that meets archival standards, at the scale digitisation always demanded.
See our complete guide to digitising an archive for how cataloguing fits into a digitisation workflow from the start.
Why organisations stop at digitisation
It is worth asking honestly why so many organisations reach the scanning stage and then stall. The reasons are structural, not accidental, and understanding them is the first step toward changing the pattern.
Digitisation is visible; cataloguing is not. Scanning produces a tangible, countable output — 500 images, 2,000 pages, 30 hours of audio. It is easy to report to a board, a funder, or a committee. Metadata creation is invisible by comparison. The work is slower, harder to quantify, and produces no object a non-specialist can hold up and point to. Organisations naturally gravitate toward the work that feels most like progress.
Funding structures reinforce the gap. Many grant programmes fund digitisation as a capital project — equipment, scanning, storage — but do not adequately fund the description and cataloguing phases that make digital files usable. When the grant ends, the scanning is complete but the metadata is unfinished. Staff return to other duties. The backlog begins.
Cataloguing requires different skills. Scanning is largely a technical and logistical task. Cataloguing requires subject knowledge, descriptive judgement, familiarity with standards, and an understanding of the collection’s context. Organisations that have the capacity to scan may not have the capacity to describe — and hiring or training for that capacity takes time and investment that is often deferred.
The assumption that “digital” means “done.” There is a persistent institutional habit of treating the creation of a digital file as the end of the process. Once something has been scanned, it feels preserved and available — even when it is neither searchable nor described. This assumption is rarely stated explicitly, but it drives resource allocation decisions that consistently deprioritise metadata.
Thinking about the next stage: labour, systems, and public value
Closing the gap between digitisation and access requires thinking in terms of three things: the labour model, the systems, and the public value case.
Labour. How will description get done at scale? Manual cataloguing is thorough but slow. AI-assisted cataloguing is faster but requires review. Volunteer transcription can work for specific material types but is hard to sustain. Most organisations will need a combination — and the staffing model needs to be planned and funded, not left to emerge on its own.
Systems. Where will metadata live, and how will it be maintained? A spreadsheet is a starting point, not a destination. A collection management system, an archival platform, or a structured database that supports controlled vocabularies, search indexing, and standard export formats is what turns described items into discoverable ones.
Public value. The strongest argument for investing in metadata is the argument for access. A described, searchable collection serves researchers, educators, communities, and the public. An undescribed one serves no one — and costs money to store regardless. Making the public value case explicitly, to boards, funders, and stakeholders, is how cataloguing moves from “we should do that eventually” to “this is core work.”
Making your digital collection actually accessible
Digitisation was necessary. But a collection of digital files without structured metadata is not yet an accessible archive. It is a storage problem waiting to be solved.
The metadata standards exist. The tools to apply them at scale now exist. The question is whether your organisation treats cataloguing as the second half of digitisation — not an optional extra — or continues to defer it.
Your collection has stories to tell. Metadata is what lets people find them.
Explore how Archivers.ai handles metadata generation across formats or request early access to try The Archiver on your own collection.
Try The Archiver on your collection
Request early access and see what AI cataloguing can do for your collection.
Request early access