Skip to main content
SearchLoginLogin or Signup

Inclusive Data: Metadata and Descriptive Language

A resource to help develop more inclusive metadata descriptions. Particularly focused on controlled vocabularies. Linked to Stack 1, and Stack 2. Reviewed by Dr Kevin Guyan (July 2023) and Bri Watson (July 2023) - thank you both for your insights, recommendations and suggestions.

Published onJul 04, 2023
Inclusive Data: Metadata and Descriptive Language

‘If metadata and cataloging are “the power to name”, then it is worth asking: Who is doing that naming?’1


Metadata (data about data) makes the digital world go round! It describes, categorises and labels our world, encapsulating a human and computational need to sort and define. Metadata to the describe the world, whether artefacts, social identity, classifications, etc., ascribes meaning that is reflective of the social and cultural values and norms of society, as well as the author. Crucially, information retrieval, whether human or computational, rely on classification systems which are operationalised, rationalised and presented as universal. These systems are designed to support interoperability but the norms (articulated in standards such as controlled vocabularies, naming conventions, etc.) which conceptualise the world are also the “norms” which construct otherness.

Metadata, especially within the context of cultural heritage is, as Sullivan and Middelton (2020), ‘shaped by and shape[s] the socio-political landscapes in which they operate’.2 They are intrinsically entangled and ‘implicated in systems of power and privilege’.3 What, how, or who is described is culturally relevant and politically meaningful. Who has the power to describe and who does not, who is included through recognition and excluded through indifference, validates or invalidates lived experiences and identities. Metadata and other information management systems, including databases, are therefore not just purely functional or operational, they are steeped with social, cultural and political significance.4

In this sense, thinking critically and deliberately about interventions which disrupt or challenge the normative and sometimes archaic modes of object and subject descriptions in catalogues, etc., is essential to developing a more inclusive knowledge base. It is also a tool for reparative justice across intersectional lines of identity. Examples of harmful, bias and discriminatory language in historic catalogues are not anecdotal, they exist and persist.

Data about data: representation, language, and heritage

Each standard and each category valorizes some point of view and silences another. This is not inherently a bad thing—indeed it is inescapable.5

Metadata and metadata schema (e.g. Dublin Core, EAD/ISAD(G), CIDOC/CRM, MARC21, etc.) are important for information management and retrieval. They include a wide variety of elements which provide descriptive, technical and administrative information, structured in various syntax (e.g. XML, HMTL, etc.). Subject headings, or key words, are an important aspect of metadata and their standardisation, as specified in controlled vocabularies, provides necessary consistency to aid search and retrieval across vast collections of material. They are ‘a way to standardize terminology and allow items to be grouped by common subjects for easier discovery and access points’.6 Critically, as Hardesty and Nolan state in ‘Mitigating Bias in Metadata’ (2021), they are also ‘ a vital part of how individuals and communities are understood’.7

the terminology used to define a systemically marginalized group [was] determined by those outside of the group, often the terms are out dated or reflect a biased perspective.8

Problems of bias, in relation to gender, gender identity, sexual orientation, race, ethnicity, and disability in controlled vocabularies and classification systems, are well documented, and indeed challenged.9 This predates the Internet and most notably can be seen with regard to the Library of Congress (LC), an oft cited and used controlled vocabulary and classification system. As early as the 1970s, for example, The Rainbow Round Table (RRT) of the American Library Association (ALA) challenged the LC’s subject headings and classifications. Indeed, their task force

…were not alone in voicing their criticisms. Sandy Berman's Prejudices and antipathies: A tract on the LC subject heads concerning people also appeared in 1971. Among Berman's many recommendations was the deletion of the cross-reference to "Sexual perversion" for both "Homosexuality" and "Lesbianism", enacted by the Library of Congress in 1972.10

RRT’s site lists a number of projects and initiatives during the 1970s, 80s and 90s that challenged homophobic and sexist vocabularies and classification systems.11 As Terry Cook (2012) notes, the 1970s witnessed a paradigm shift in archival practices, one which recognised the power endowed in traditional archives and one which advocated for the profession to act. Howard Zinn, archivist and historian, noted in 1971, that archival collections are biased

…towards the important and powerful people of the society…we learn most about the rich not the poor, the successful not the failures; the old not the young; the politically active not the politically alienated; men not women; white, not black; free people rather than prisoners; civilians rather than soldiers; officers rather than enlisted men.12

Bias in vocabularies is one thing, not even having records, beyond those which medicalise, criminalise or treat individuals as property, is another. The traditional lack of diverse records within the cultural heritage sector is an ongoing issue. One that some archivists, culture historians, community archives and heritage groups, and social movement archives have sought to rectify but which is a problem that largely persists.13

In addition to lack of representation in archival content, there has been historic under-representation in archives, libraries and museums profession. As a result, ‘the terminology used to define…systemically marginalized group[s] [has traditionally been] determined by those outside of the group’, which has created terms which can be ‘out dated or reflect a biased perspective’, embedding historic societal intolerance, homophobia, racism, and sexism in our archival stacks and metadata descriptions.14

While working on the Reanimating Data project, we carefully considered the subject terms used to describe the Women, Risk and AIDS (WRAP) collection of interviews carried out between 1989 and 1990 across Greater Manchester and London with women between the ages of 16 and 21. Up until 2021, the Humanities and Social Science Electronic Thesaurus (HASSET) controlled vocabulary, managed by the UK Data Archive, preferred term for Lesbian was (Female) Homosexual.15 This is/was problematic for a number of reasons. First, while some embrace the term homosexual, and even find the term affirming, for many homosexual can be offensive since the term has been used to criminalise, degenerate and deride members of the LGBTQ+ community. Second, who among the LGBTQ+ community self identifies as a female homosexual?16 If terms are not used by the community they purport to represent then the power to describe, becomes the power to other, to categorise as less than.17

Some libraries and archives now carry statements that warn users of potential harmful or offensive language within catalogues or archival descriptions (whether digital collections or physical ones), but does this approach address or resolve the structural roots of historic exclusion or marginalisation which created these terms in the first place?18 There is no easy answer to historic descriptions and catalogues since what is documented and how it has been documented is of historic value - metadata records become primary sources in and of themselves.

So what can we, as practitioners, archivists, archivists, information managers, etc., do? Plenty! As Audre Lorde (1982) states, ‘…revolution is not a one-time event. It is becoming always vigilant for the smallest opportunity to make a genuine change…’.19 Step change action begins with how we create, structure, and develop the systems which provide access to knowledge. 20

Writing or developing inclusive metadata requires us to think critically about language, representation, stewardship (versus ownership), historical perspectives and contexts, exclusions and silences from the historic record, about ethics, as-well as power dynamics and privileges. Ethics and a feminist ethics of care, for the individuals and the communities whose cultural heritage is under the archival spotlight, must be a key driver in all metadata and cataloguing decisions. Applying, and seeking out, queer,21 feminist,22 and de-colonial23 approaches to archiving and metadata provide alternative ways of describing, knowing, and thereby belonging. As Jenna Aston (2017) states,

Feminism in the archive…[is]… a political drive that addresses the intersectional modes through which people identify in terms of gender/sexuality, but also through which they become oppressed […] to address critically structures of power and oppression in order to devise strategies to change or overturn them.24

Addressing ‘structures of power and oppression’ begins with feminist strategies, such as feminist listening, decentering expertise, and radical empathy. All of which our first publication ‘A Feminist Framework for Research’ details.25

Reflections and Actions: Archiving Empowerment

There is ‘…no political power without control of the archive…effective democratization can always be measured by …the participation in and the access to the archive, its constitution, and its interpretation’ 26

Through my work with a number of different digital archiving projects - Reanimating Data’s ‘Feminist Approaches to Youth Sexuality’ and Queer Heritage South - I have learned a lot from the various communities of practice, team members and like minded scholars, archivists, historians, curators, developers and indeed volunteer metadata writers. What follows is some thoughts and considerations which stem from this collective work:

  • Metadata descriptions are political. Writing metadata is not a neutral act - it reflects the writer, the politics of the day, and the perceived social and cultural norms of society. It is an interpretative act which depends on the knowledge, experiences and outlook of those writing it. Acknowledging the political nature of this work is crucial to developing metadata that is a fair representation of the objects and collections, or the individuals and communities, being described and which is culturally/socially/politically sensitive to the individuals and communities represented by, or in, these objects and collections. Therefore, write with, not for, communities.

  • Metadata should, where possible, be written with, not for, communities. How an individual or a community describes their own objects, or identities, will inevitably be different to how an outsider might. Nuances in material interpretation, hidden cultural knowledge and expertise, unknown histories and cultural contexts may not seem evident on the surface to an “outsider”. Language and terminology should come from the community and while metadata best practice advice is to use controlled vocabularies for things like, subject terms, geo-location, etc., we must consider the authors and context of these vocabularies and/or ontologies. For example, The Trans Metadata Collective, published a report that outlines best practice for describing trans and gender diverse individuals. They provide a list of subject terms recommended by the Library of Congress to avoid including, ‘gender identity disorder…[as the]…term treats transness and gender diversity as a medical problem’. 27 As much as possible, involve individuals, collectives and communities whose history and heritage the objects you are working with represent. Look to the experts - those that the metadata is purported to represent. To paraphrase D’Ignazio and Klein (2018) we must prioritise the community experts who have a ‘closer and more direct experience of issues of injustice over those that study a data injustice from a distance’.28 Where historic harm as existed, we must engage in conversation with those effected. Words hurt.

Words That Hurt: A Documentary | Brooklyn Public Library

A short documentary film, detailing the Brooklyn Public Library’s participation in replacing the Library of Congress subject heading (LCSH) “Illegal Alien” from its catalog. Most libraries use the LCSH to assign subjects for library material. The film focuses on reactions and thoughts from residents in Bushwick, Flatbush and Sunset Park, Brooklyn about the project.

  • Metadata descriptions reflect society and become primary sources. As society’s social norms, perspectives and traditions change (for good or for bad!) so do the words we use and their intentions. Language, and changes to the meaning of specific words, are historically, geographically, and culturally specific. Therefore, when writing metadata, the words we use, omit, or reject are important. Societal shifts, prompted by liberation movements and identity politics of the 1960s and 70s, have created new imaginaries of cultural heritage which affords recognition to communities previously excluded, criminalised, medicalised, oppressed and discriminated against.29 The words used in archival descriptions are reflective of contemporary perspectives and prejudices, in this way they can bring certain lives meaning and push other ‘lives/experiences further into the shadows’.30

    This becomes particularly noticeable when examining the language used by, or employed when referring to, LGBTQ+ communities and individuals. In fact, even seemingly inclusive terms like LGBT, LGBTQ, LGBTQ+, or LGBTQIA+ pose challenges since they may not fully encompass or represent the diverse range of identities within these communities. The choice of one acronym over another is also deeply influenced by political factors, as demonstrated in the current political landscape in the UK. For instance, slogans such as LGB without the T highlight alternative movements seeking to exclude transgender men and women from these political and cultural affiliations.

    In the contemporary moment (2023) LGBTQ+ communities and individuals now often use the word “Queer” as a synonym of Gay, or LGBTQ+, indeed it is an identity in its own right. For many this word evokes a sense of inclusion, of empowerment, of a collective and community identity. However, we know this is not universal and it is not necessarily perceived as positive across intergenerational experiences of the word. While some have claimed the word back, many still feel the pain, hurt and humiliation of a word that was used by others, to degrade and shame. Another challenge is how such terms resonate across different cultures and borders, as terms can possesses differing meanings.

Cambridge Digital Humanities: Mapping the Road to Archival Justice.

Includes a conversation about the nature of metadata descriptions as a historical significant document.

  • Writing, reviewing metadata should be an ongoing process. Continuously engaging in the writing and reviewing of metadata is crucial. Just as identities are not static, metadata should also be subject to evolution. The act of writing, rewriting, and reevaluating metadata is a form of political activism that recognises the fluidity of identity. This is particularly relevant within LGBTQIA+ communities, where, for example, an individual's may be gender questioning at the point of data creation but later transition. Such significant life changes necessitate the implementation of supportive policies to accommodate individuals during/after their transition and embraces an ethics of care which foregrounds radical empathy as an archival method.

  • Metadata descriptions can create belonging and acceptance. Metadata and subject terms can be powerful, and empowering. The subject terms used in metadata description allow us to find each other in a sea of data, they act as calling cards, as beacons to identify our histories, our community, our selves. Using contemporary words, instead of offensive slurs in historical works, reduces harm. Words matter.


memory workers’…theories, practices and institutions have been deeply complicit in creating and replicating…inequities based on systems of white supremacy, hetero- patriarchy, capitalism and other oppressions for centuries…We have the power to interrupt these systems of oppression, to dismantle memory structures designed around them and in service to them, and to build new liberatory systems, if we have the courage’ 31

Histories are written by the victor, or those with the political and social power to document and be documented. Social, civil rights and liberation movements actively seek to explore ways to liberate both the stories of our shared past and liberate us from the injustice(s) of the past. These can only be achieved by dismantling the systems that create structure of power and disempower.

Resources for inclusive metadata

...archival holdings and activities…[have come]…to reflect society more directly, in all its pluralism, diversity, and contingent nature. There was no ‘‘Truth’’ to be found or protected in archives, but many truths, many voices, many perspectives, many stories...32 (Cook, 2013)

Below are some resources created by various projects and initiatives. These resources help us to think beyond the heteronormative, colonial and racists perspectives that many archival holdings and their metadata descriptions have perpetuated and preserved. They encourage us to think about the plurality of voices in the world and intervene in many of the problems of bias, and discrimination listed above. This is not an exhaustive list and we welcome your suggestions to help grow this resource. You can also find more related literature on our Zotero library.

No comments here
Why not start the discussion?