A couple years back, there was some really interesting law library survey research done. I was talking to someone about it recently and I commented that I thought it had been underutilized, that it contained some real opportunities to learn. But it became clear that the research is not being used because there is a misunderstanding about response rates and statistical validity.
This has been common in my law library and legal professional experience. A group does a survey. The response rate comes back in the <10% range. The lawyers then discount the validity of the information based on that range. How could 2% of a demographic yield any accurate information?
The answer is that it depends on what that 2% is. You should really read this entire post over on SurveyMonkey about statistical validity but even if you don’t, here’s a chart.
I’ve cropped it at a population of 10,000 but a population of 1,000,000 only requires 100 more respondents (1,100) than a 10,000 person population (1,000). In other words, whether you’re trying to survey all the lawyers in your state or all the lawyers, you only need a fragment of them to respond.
It should be noted that I’m not a survey or statistics expert. For whatever reason, a undergrad course on research and a story about George Gallup and this idea has always stuck with me. But I try to understand data that is presented to me, because it is precious if it’s accurate. The only goal I have in this post is to help fellow professionals who may run into the same problem. I thought validity v. response rate was common knowledge on governance boards and for decision-makers and apparently it’s not.
It’s not just law libraries. I had the same experience at the American Bar Association, when I rewrote the Legal Technology Survey questionnaire back in 2000. The third-party research company did a great job of getting responses (we sent out about 18,000-20,000 surveys for a population of 400,000-odd members) but every discussion seemed to focus on, “well, only 1,000 small firm lawyers answered that. That’s not reliable if there are 250,000 of them.” They missed – and I often struggled to educate them – on validity.
This is perhaps even a bigger issue for law libraries. We have so little data that helps us understand our audience’s demographics. A ±10% is completely fine for me to make a small change in strategic direction. A ±3%? We could be headed in a whole new direction.
How Big is Your Pool?
It’s why I have tended to note the respondents (I use n=, for reasons lost to time) to a survey when I share it:
- Library Acquisition Patterns, 2018: n=54 academic libraries (but look at Figure 8)
- Downsizing Continues at Law Firm Libraries, 2016: n=73
- Everyday Legal Problems and the Cost of Justice in Canada, 2016: n=3000, but they’ve split the report into subject areas, so the n will vary
- Library Staff Technology Skills Survey Summary, 2015: n=2215
- Law Firm Librarians Feel Underused and Underpaid, 2015: n=80
We sometimes hide our (poor) response using percentages. It is the opposite of using the response rate percentage to discount the survey. Take this data from a headline about a 2016 Irish law firm survey:
38% of top 20 Irish law firms suffered data attacks in past 12 months
n=107
So, first off, I’m not sure how you get 107 response from the top 20 law firms. But that’s a poorly designed headline. It turns out they had 13 of the top 20, and the survey was also answered by 17 mid-size and 77 small law firms. The 38% represents 5 law firms.
Explain Validity
The challenge then is to explain this to lawyers on law library governance boards or who otherwise make decisions about your law library. It can be a challenge in a couple of ways. It depends on whether the lawyers support the research purpose.
The survey that sparked this post was performed by a third-party research organization. The potential survey pool was probably around 75,000-80,000. If we round up to 100,000 using the SurveyMonkey numbers, then we can reach ±3% validity with 1,100 response.
The survey received double that in responses. Even if the data had been segmented by practice orientation, it could be effective. The research firm did a great job of sending out surveys so that the resulting pools of solos and small law firms reflected their proportion of the legal profession.
In the end, a lot of good learning was lost because the decision-makers persuaded themselves that 3% of a population made the survey unreliable. This can create a vacuum for decision-makers. If there’s no data, what do you use? This is particularly perilous with lawyers who, by dint of usage, think they know how to operate law libraries.
Unfortunately, without data that the governance board can agree matters, they may fall back on anecdata: the times they visited the library, how they use legal research. If we are talking statistical validity, and your governance board or decision-making group is about 10 people, you’re in trouble.
Statistical validity isn’t the be-all and end-all of research surveys. It’s an area that requires expertise, expertise I don’t have, but I am receptive to the outputs of research. Law libraries struggle to get good data to use for decision-making. The last thing we need is to have solid data put on the shelf to gather dust solely because decision-makers didn’t understand validity.