The Chimera of the Self-Hosted Digital Library

By David Whelan on January 12, 2026

Reading Time: 6 minutes

Someone posted a long thread to a micro blogging site I am on about the self-hosted library. I am not going to link to it because it’s well-meaning and also not a one-off. These posts appear periodically when people with technology acumen experience the epiphany of financial extraction in the information world. They realize that libraries are intermediaries between the creator and the consumer, a role we share with information publishers. They devise a technology stack that would allow libraries to deliver digital resources directly to patrons, just as they do with print books. Their solution is tech-centric because, as it so often is, the technology is the least complicated part of a solution.

This has been top of mind as I get ready to teach my law practice technology course in the spring. It is all well and good to talk about technology and its risks or even how it works. At heart, though, technology is just a set of tools we use to help us complete a process. We could spade a field for crops to grow food to support ourselves but we use plows to enable fewer people to turn over more land. We could listen to oral histories to learn of past events but we read books, where the story can be shared with more people and more consistently, if not always accurately. If you do not know what you are trying to do from a process perspective, you are not ready to apply technology to solve your problem.

Process Matters

I do not know but I suspect this is why technology-focused solvers are reminded to speak to the community who is impacted by the technology before creating a solution. This often comes up in social spaces but it’s just as true in the workplace. While I appreciate the spirit behind hackathons or courses that develop access to justice apps in 12 weeks, I wonder at the viability of the efforts with so little time available to understand the needs.

One of the things I found fascinating about early Clio conferences is that they would do development sprints at the conference (2016?) using feedback from conference attendees. That sort of connection between software vendors and software users can be very hard to create, even when they work at desks down the hall from one another.

It’s why getting the scope of a project right is so important. It can become a victim of scope creep otherwise, reflecting a failure to understand the process or goal. The burden of scope creep often falls on the technology team, since a failed project will be a failed IT project, but it is just as often because the business side has not explained what they need to do to achieve their goal. In the worst case, it is because the technology people have assumed they understand how the business works.

This can be particularly true in an environment where the person devising the solution is also a participant in the outcomes. I have found this when dealing with lawyers who assume they understand how a law library works because they use one. Or as one judge said to a colleague, “how hard could it be to run a law library?”, given their years of experience penning opinions (but not running budgets, staff, &c.).

It always helps to have technology-literate people on a library team because otherwise things can get lost in the translation. I worked at an organization that was bringing on SharePoint for the first time. I had two of my people go to commercial SharePoint administrator training, since they would be working with some of the back-end functionality related to managed metadata and retention workflows.

We wanted to create some managed metadata lists but the least access rule meant that we had to initiate an IT project to get what we wanted done. Unfortunately, no one on the IT team was given technology training. I mean, SharePoint is just a big document storage server, right? <stares blankly> We spent a long time explaining what we wanted (it was largely just a pick list with a controlled vocabulary) and by the end of a few weeks of “development”, they had created something that failed to be at all workable. I think they might have discovered what a SharePoint list object was but I have a feeling they created a file folder of objects. There was no way to reference this “list” and it was a list in only.

It is fair to say that one should approach another person’s expertise with a certain degree of humility. And, as a law librarian and director, technology people will find that I will expect them to be open to learning about how we work and that, while we are open to change, there are often good reasons to do what we do in the manner in which we do it. Additionally, now that I’m a relatively seasoned director, I have seen systems automated and working and so I have a high set of expectations, since I know that things can work.

The Self-Hosted Library

It may be 40 years since a library could be considered “self-hosted” in that we owned a copy of the information we loaned, and we owned all of the delivery mechanisms involved: shelving, bookmobiles, library card account management, and so on. The self-hosted library in this first quarter-century would increasingly require a wide variety of new expertise, knowledge, ownership, and technology. I’m not aware of any library who has successfully created a digital version of the original library. At heart, a library will almost never fully own all of the information it makes available to the people in its community.

This is not to say that libraries haven’t attempted it. I remember being inspired by Douglas County’s (CO) public library system and their ebook venture. They had the expertise and resources to spin up a resource that worked like other ebook delivery systems. Solving the ebook component would have been a step forward, although still well short of the digital resources libraries make available. But it is the rare library that would attempt that—there is technology plus new staff expertise, let alone the potential legal issues around ownership—even when library budgets are not constrained.

Before anyone starts to imagine or go shopping for the ideal technology stack for a library, they need to wrangle all of the things that libraries do. Not the other way around. To move away from an intermediated information environment, you need to:

gain ownership of, or a license to distribute, information owned by copyright holders. We know that publishers oppose this sort of delivery of books based on the lawsuit by book publishers against the Internet Archive for their lending of digital copies of books owned by the lending organization.
maintain the security of the owned or licensed copy so as not to be a source of piracy (and violate the rights granted by the owner), as well as maintaining the usability of the various formats the information is delivered in.
engage in widespread electronic resource management, swapping in and out content based on licensing and ownership changes. This would involve extensive licensing and contract management that libraries do not currently engage in since it is managed by the information providers like Proquest, RELX, EBSCO, and Thomson Reuters who are themselves aggregators, often not owning the content they present to licensees.

And so on. Heck, I don’t need to tell librarians about the library operations that cannot currently be moved to a technology stack. There would be a massive staff or expertise investment required to accomplish a self-hosted digital library. This might be effected by a single, nationwide institutional library or be helped along by a nationwide licensing system that removed the ownership/copyright issue from the equation.

There are about 9,000 public libraries and perhaps 4,000 special libraries (I’m surprised the American Library Association doesn’t have a more exact figure). So whatever digital approach would have to be scalable across a wide variety of sized libraries. Realistically, it would be something that better resourced libraries or systems would create and then license to their smaller colleagues, or leave them to do without.

If you even got to this point, you would hopefully have already considered the “if we build it, they will come” question. What happens if people just prefer the commercial alternatives? I could imagine a library being successful at launch and then experiencing friction over time because digital information publishing (streaming) isn’t really what we do. If we move to compete with digital information publishers, we will need to meet similar consumer expectations. Otherwise, without a profit driver, government and other funders may question the cost for the benefit to the indirect funders (property tax payers, court filing fee payers, etc.).

Don’t get me wrong. Digital libraries are largely here, if not spread evenly. Many people have healthy relationships with their public or special libraries and never step in the door of a physical library space. Many libraries who have digital offerings have continued to maintain print collections, although I think we are going to see many more special libraries curtailing new print acquisitions. Just as the publishers have slowly shifted away from their print publishing in favor of digital delivery, special libraries can now shift away from print acquisition to solely digital licensing.

But it is still licensing of third party content, often second-hand, and not ownership. I actually think that licensing digital law library content is currently less expensive than it would be to try to own and deliver comparable legal information by a law library. As we shift away from print acquisitions, though, we also make greater commitment to digital licenses.

This will inevitably allow the legal publishers to accelerate their cost recovery against our purchasing. I am confident, however, that there remains a near future where more libraries decide to drop major licenses and the cumulative experience will give other libraries more confidence that they, too, can go without. In law libraries, I think this will manifest as a drop from three legal publishing platforms to two, or, better yet, from two to one. I am hopeful that an increasingly practice-ready focus on legal research will also lead to a focus on teaching research strategies over teaching products.

I appreciate the enthusiasm people bring to the self-hosted library idea. It’s good for people to see libraries as so integral—and such good potential, non-partisan, community-centered partners—to delivery of information in free societies. At the same time, it shows we have a lot of work to do to explain how free-to-you information isn’t free, and that the cost to license, manage, and deliver that information exists. The cost has to be paid, whether by the library to information aggregators or to cloud technology platform providers or to creator unions.

Maybe what I need is to create a year-end “legal research wrapped” infographic that personalizes the notional costs incurred by a law student’s legal research activity. I mean, notional costs are bogus but they could quantify usage in a way students may not consider until they hit law practice. To be honest, it’d never be a law student who would suggest a law library self-host. It’s far more likely to be someone who has not been in a law library as a researcher for a very long time, if ever. There is no way to really prepare for that, other than to remain confident in my own expertise and to explain to them that, of all the challenges to distributing information as a library, technology is often the smallest.