Exploring the Ethical Challenges of Preserving Dormant Platforms. Peer reviewed by Sharon Webb (Sept. 2023)
Using Yahoo Groups as a case study, the essay explores some of the ethical considerations in preserving content from dormant platforms.
As one popular adage goes, “The internet is forever.” Yet as recent history has shown repeatedly, large swaths of cultural memory can be rendered temporarily or permanently inaccessible by forces ranging from technical glitches to corporate fiat.1 Yet for older, largely abandoned platforms, archiving this content raises serious ethical questions related to posters’ expectations of privacy, the kinds of identifiable information available, and access to these archives. In this article, I’ll explore these issues through the case study of English-language LGBTQ-related Yahoo Groups, which I actively preserved in 2019 and 2020 prior to their deletion.
Within the English-language Internet, Yahoo! is now likely familiar to most users as an also-ran search portal and email provider, or for its inescapable “yodel” jingle, one of the defining sounds of the dot-com boom. The company initially began life in 1994 as a web directory, eventually expanding into search, email hosting, and social groups. These expansions included Yahoo Groups, which emerged in 2000 out of two different group services Yahoo owned, Yahoo Clubs and eGroups. Compared to traditional mailing lists, Yahoo Groups offered a lot of other tools for members, like file hosting, calendars, and polls. Moreover, Yahoo Groups offered moderators greater control over membership and privacy, which was especially important for groups and individuals who couldn’t easily afford to self-host their content, including LGBTQ folks.
Transgender individuals in particular took advantage of Yahoo Groups’ photo-sharing and file hosting to create private groups, like ftmsurgeryinfo, that became well-known resources for information on medical procedures, while groups like transgendernews recirculated trans-related news items from a variety of different sources for approximately twenty years. The length and age of these groups made them valuable resources, as they contained a wide variety of information and advice collected over many years. Furthermore, transgendernews in particular represented an important alternative to databases of coverage from “official” sources like Nexis Uni, as it preserved content from all over the web, including news sources outside of the mainstream and press releases from now-defunct transgender organizations.
As curator of the Queer Digital History Project (queerdigital.com), a independent community history project documenting pre-2010 LGBTQ digital spaces online, I’d previously focused on identifying and mirroring files and documents initially preserved elsewhere but had not been specifically classified as “queer.” The October 2019 announcement of Yahoo Groups’ impending closure in 2020, however, meant that decisions had to be made quickly with limited pre-existing organizational policy or best practices. However, as a member of these communities, my goal was to avoid replicating the most common approach to rapid-response web preservation: “archive first, ask questions later.”2
The nature of Yahoo! Groups, as described above, presented particular ethical challenges. The first came in determining the project’s overall scope. Given the sheer size of Yahoo Groups as a platform, it wouldn’t be possible to preserve every group. What would be the criteria for group preservation, and how could they be applied equitably? Secondly, groups and their members had differing norms and expectations around privacy, and most members likely had no expectation that their messages and content would be formally archived. As such, posts contained a variety of personally identifiable information and groups sometimes hosted large collections of images, such as surgery-related photos, not meant for wider audiences. How could content be preserved in a way that took these expectations into account? Lastly, there was the issue of post-archiving access: What were reasonable access restrictions, and how could those be implemented? Compounding these issues was the reality of Yahoo Groups in 2019: a platform where most groups now lay dormant, with understandably absentee moderators and email addresses that hadn’t been checked in years.
Given these challenges, I ultimately modeled my approach on the Protocols for Native American Archival Materials.3 Developed in 2006, the Protocols emphasize tribal ownership over community knowledge and outline best practices for working collaboratively with tribal leadership on preserving and providing access to archival materials. Though not entirely analogous, Yahoo Groups shared some similarities: long-standing Groups held a variety of specialized community knowledge, not all of which was meant for the wider public, and preserving this information in the absence of community collaboration risked reproducing existing power imbalances. Moreover, this knowledge was also deeply connected to posters’ embodied experiences, quite literally in the case of photographs. Thus, the Protocols specifically influenced my work in three ways: carefully defining the collection's scope, prioritizing community say and involvement, and building accountability into access requests.
From the outset, I focused on preserving text instead of images, given the inherent privacy and intellectual property concerns. In a similar vein, I also decided to require that individuals would have to request access to any of the archives, which forced requesters to clearly outline why they wanted access. Lastly, all email addresses were, when possible, automatically redacted from the preserved messages. Though by no means foolproof, this move aimed to reduce the posters’ risk of outing.
Based on an initial investigation into LGBTQ-themed Yahoo Groups that came up using site search, I limited the scope to two emergent categories. First, what I defined as Public, Non-Active Groups, which had no substantive (aka non-spam) posts within the last six months and over 100 members at the time of their collection. This focus on larger, non-active public groups eliminated many smaller geographically-specific community groups, which were likely to have more personally identifiable information (PII).
The second category were Public or Private Active Groups, where I specifically reached out to moderators about the possibility of preserving groups. Functionally, moderators represented a kind of community partner, who was not only familiar with in-community privacy norms but could also make members aware of the impending preservation and offer them right of refusal on having their content preserved. I provided moderators who responded to my initial outreach with a donation form, where they submitted a group description, attached copies of any group-relevant images (such as logos, etc), lists of email addresses to be excluded from preservation, and access restrictions. Access restrictions in particular were meant to give donors more control over who could see the archives—either the QDHP curators could approve requests, or an individual designated by the donor could make that call.
In the end, the project’s limited scope and necessity of moderator involvement meant only a small portion of all existing LGBTQ-themed Yahoo Groups were preserved. Yet what might otherwise be seen as “data loss” is, in my view, actually an acknowledgement of web archives’ limitations. As Mel Hogan argues, “while there may be no definitive endpoints to digital flows circulating through the Web, the interception of particular nodes, as moments of interruption, can in itself serve to frame the online archive as a moving memory.”4 For LGBTQ folks, I would argue this movement includes moving on, eschewing preserving all digital traces for first valuing the embodied experiences of the individuals who produced them.