I needed to find a government document. I had seen a link to the document on a site that was not controlled by the US government, so it was not the content I was looking for so much as an authoritative copy. It was unlikely that many other people were looking for it. After emailing a variety of government departments, I got a copy but the document was not publicly accessible anywhere. But it made me try to imagine the cost of the long tail of legal information. What do law libraries expend in resources to support it and do our audiences understand the cost?

The document was an agreement between the Russian Federation and the US and had been executed in 1992. It was before the US State Department archived its web site. It was before the period where the Library of Congress captured web sites. It had not been sent to the National Archives and Records Administration, which I learned after searching their extensive online collections and emailing their reference staff. Thanks to a string of helpful people, the State Department’s Treaty section was able to send me a PDF.

In the process, I learned that non-binding agreements like the one I sought do not get treated in the same way as other documents. From what I can tell, they belong to a class of government documents that survives because someone has stashed them somewhere, not due to a specific record retention approach. Something to store in my own reference knowhow.

Three reference interactions. Retreading the resources I’d used and accessing their own internal resources. How to capture that value in relation to the collection when the object may only be needed by one person?

The Long Tail

The idea of the long tail of information has fascinated me for a long time. I date it to the time I read Chris Anderson’s book. The basic idea is that digital information and systems will enable discovery of more information (and products) because you no longer have the constraints of a physical container (store, library).

It builds off the long tail described by the pareto principle, the 80/20 rule. Every law librarian will be familiar with the concept if not the terminology. And every law library’s collection probably reflects the rule in some form.

There’s disagreement about the theories Mr. Anderson’s book poses. A few years after it was published, there was some discussion about what belonged in the tail. A recent take suggested that, 15 years later, aggregators benefited from the long tail, consumers less, and producers of aggregated items the least of all.

Here’s a simple visualization of the rule, using page views on this web site. The chart below shows that some pages account for a disproportionate amount of page views. In fact, 80% of the page views reflected in this chart of nearly 500 objects are accounted for by about 70 objects. The blue line shows where, from left to right, page views amounts to 80%.

A simple line chart showing the long tail. The orange line reflects web site page views. The most viewed pages are on the left. About 70 pages account for 80% of all page views in the chart.

When we consider our collection and its use by the researchers, we all (librarians and researchers) have a pretty good idea what’s at the left end of the long tail. In my experience, the case law resources and other primary law materials are at the left. Some key secondary texts will also be in that space. The long tail consists of all the other texts, the law journals, the less commonly used primary materials like my non-binding agreement.

Unlike the market of products that the Long Tail book contemplated, the legal publishers are the aggregators and producers in the law library world. The long tail of a law library collection does not contain a lot of small publishers, authors, or content producers. It just contains more and more niche items. And your vendor selections may mean each publisher’s closed garden isolates or distorts content usage.

It’s not just content, though. We can see it with our researchers as well. Here’s an example using data of remote access legal researchers. The researchers that hit the remote access most frequently are at the left. It takes about 250 researchers of nearly 800 to account for 80% of visits in a year.

A chart showing remote access visits of lawyers. The most frequent visitor is at the left. The top 250 lawyers accounted for 80%of visits.

In my search for my government document, I was relying on the 80%. I figured that, like most US government primary materials, I could find a copy on the open web of .gov. I’m not sure if that’s a reasonable expectation or not but it feels like it is. We can find case law, legislation in the form of statutes and regulations, and a wide variety of other government documents for free.

And so can our researchers. That’s the easy part of the law librarian’s world. In fact, what it made me realize is that our researchers are probably not relying on us for a lot of the “head” of the long tail. Lawyers and law students with digital access will probably do a lot of their own research themselves. Even if they rely heavily on reference librarians, intermediated research is likely to still only be a small proportion of their overall research.

In the end, I needed to work through reference librarians. The content object I sought not only wasn’t easily accessible, it wasn’t even where an experienced person might expect it to be. Government reference librarians were the essential ingredient. What does reference impact look like on the long tail?

The Tail Wags the Dog

I felt like it was important to think about how the people intermediating content access – reference librarians generally – fit into usage. And the more I thought about it, it felt as if they weren’t measured. Usage tends to measure people other than our staff.

I want to be clear that I don’t have data for this. The chart I created is a guesstimate of what this might look like. But the idea I’m rolling around in my head is this: how does our reference use of our collection match up to our researchers’ use of it?

We know that our researchers can often access the most commonly used materials directly. We may need to occasionally intermediate their access or direct them. But once they are familiar with the area and the research tools, they will be somewhat self-sufficient. Once I show someone how to search for a case, they can search for 100 more cases without my assistance.

I’ve mentioned before that I think the perspective of a “good researcher” is based on their experience, not years in the profession or access to repetitive information literacy training. As the researcher moves down the long tail of the collection, their familiarity with the content may diminish. Eventually they will reach a point at which they are not aware of what additional content will assist their research.

This is true even if your law library is using a well-organized library catalog and a discovery layer. Just as experience will matter with a legal publisher’s proprietary search interface, it will matter in being able to surface information in a catalog. The quality of the metadata, the knowledge of the researcher about terms of art and relevance, all of this will impact their ability to identify more obscure collection objects on the long tail on their own.

That’s where the reference librarian comes in. They will assist with a portion of those requests. My guess is that reference tends to overlap the long tail but peaks differently before also trailing off. I used my web site page views as the base graphic but here’s sort of what I imagine it looks like. The green line is intended to visualize how often the reference librarian uses the same content as the researcher to answer a legal research question.

A chart showing two lines. The orange line reflects a traditional long tail. The blue dotted line reflects where 80% of the value of the orange line is captured. The green line reflects a hypothetical measurement of reference interactions that rely on the same content as the researcher.

This is something that a law library could capture with planning. We can already get data reports from the legal publishers that reflect the content sources, the collection objects, that researchers use. In a normal year, your reference librarians’ usage would also be in that report. A law firm is probably still able to capture that; for other law libraries, the pandemic may mean the only usage data you’re tracking is that of your staff.

You would also need to capture the print and digital collection objects used in your reference interactions. If a question is answered using a publisher’s database, that can be captured in their proprietary report. Web sites (you could use just the second level domains: state.gov, loc.gov, congress.gov, uscourts.gov), texts, and other resources (people?) would need to be logged on each reference interaction.

As I say, I don’t know what exactly that line looks like. The peak could be much further to the right in an environment where researchers had a high comfort level with the collection or the collection was small. Or it could be further to the left where, like in a courthouse library, the researchers may have less experience and less regular access or familiarity with the collection.

And it wouldn’t be anywhere near as large as I’ve made it. When you consider that your top legal research content object may be jurisdictional case law, which amounts to 100,000s of transactions, your reference team can’t keep up. The reference interaction content usage line might not ever intersect with the usage line. In fact, I’d be surprised if there are many resources that librarians used more often than all of their researchers combined.

Hidden Costs

One law library management challenge is the cost to maintain the long tail. The least used content is at the far right of the chart. It is the most likely for us to cancel if we can and our resource availability demands it. As we shorten the tail, the percentage of objects we use to reach 80% of usage will grow.

Is that an argument that law libraries should cut their collection to maximize that percentage of objects? I think most law librarians would argue that the long tail contains potential value in the same way that a boulder on a hillside has potential energy. But it has been years since a law library could justify just in case collecting. So regardless of where we stand on collection management with an eye to potential value, eventually we may need to tailor the collection to actual usage. Usage is our current metric that approximates value.

There’s also a cost for the reference professionals though. And that is perhaps what struck me the most as I was thinking about this. Because we don’t track our reference interactions in an integrated way, there is no way to contrast how our reference experts provide value in comparison to the researchers themselves.

We track reference interactions. We know the types of questions being answered and we can largely put them in buckets that help funders or governance boards know what we are talking about. But when we talk about collection usage, at least at my library, there’s no way to overlay librarian usage with researcher usage.

I don’t think the lack of data impacts the collection management decisions. I expect the reference librarians utilize the long tail of the collection in a relatively similar manner to researchers. If you have a resource that reference librarians are using and recommending heavily, but researchers aren’t engaging with, there’s a communication or understanding problem.

There is an opportunity, though, to highlight that reference librarian intermediation is required to adequately use part of the most commonly used objects. It may be a way to juxtapose the cost – salaries, professional development – with the content costs of the underlying collection objects. It may be a way to shift the perspective of the role of the librarian in the law library.

And that may be important if we are dealing with a situation where the decision-makers, who are often legal professionals who feel confident in their own legal research skills, view the reference team only as backup. If the perspective is that reference librarians are invoked only in edge cases (read=”research I can’t complete with my own professional skill set”), it misunderstands the role and the value of the reference team.