Now that I am no longer distracted by the subject of last week’s entry I can get to the intended topic for my first 2011 blog entry. I should say, though, that I won’t be turning over any kind of new leaf for the new year. For now I’m sticking with the theme I’ve dwelt on already, a theme statistician Howard Wainer expressed concisely:
Whenever we discuss information we must also discuss its accuracy.1
Applied to the library world the message is: Formal library research and advocacy studies should always explain the strengths and limitations of their data. The best reason for abiding by this principle, in my opinion, is to allow readers to decide how much credence they want to give to conclusions drawn in the studies.
With new library advocacy studies on the horizon, I thought I better wrap up any unfinished topics from 2010. Howard Wainer’s advice made me think of the study, Opportunity for All: How the American Public Benefits from Internet Access at U.S. Libraries. Findings from this report have been broadcast widely with barely a mention of how approximate the figures are. Last month IMLS reported using conclusions from the study as input for revisions to the Museum and Library Services Act of 2010. Thankfully, their press release acknowledges the data are estimates, giving the round figure of 77 million as the number of U.S. citizens that used public library computers in 2009. (Use of round figures is a reliable clue that numbers are estimates.)
But, where in the ballpark do this and other estimates from the report lie? How much trust can we put in them? Casual readers of the study may believe its figures are completely solid because there were more than 48,000 people surveyed. But high respondent counts do not make study findings sound. Rather, a study’s soundness comes from how data were gathered and the extent to which effective steps were taken, before and after data collection, to counter potential deficiencies in the gathering methods.
Data for the Opportunity for All study came, first, from interviews of 3,176 U.S. citizens who were part of a probability sample.2 Due to the sampling method used there is good reason to believe that the interviewees paint a fairly balanced picture of U.S. citizens nationwide (except for a drawback I’ll come back to later). The study also included 44,811 online survey respondents who were volunteers recruited by 401 libraries in the study. 367 of these libraries were selected using a probability sample and 34 were self-selected. The sampling method used to select the libraries (not counting the 34) makes them a fairly unbiased representation of the larger universe of all U.S. public libraries. This also means the pool of community residents who could potentially participate in the survey would fairly represent all U.S. residents in communities with libraries.
However, because the online survey respondents were self-selected volunteers, their responses very likely give a slanted reflection of local community residents and also of residents nationwide. Although 44,811 seems like a substantial number, it is quite possible that other important respondents were neglected in the course of the survey. Basically, in survey research two segments of a population of interest might end up neglected: (1) the segment of people who decline the invitation to be surveyed; and (2) any important population segment that researchers omitted, intentionally or not, from their roster of survey invitees. Slanted results due to these oversights are called, respectively, nonresponse bias and undercoverage bias. These sources of bias are problems with pretty much all surveys. Again, the biases interfere with how faithfully respondents represent attitudes, beliefs, and behaviors of a full range of the population of interest.
Well aware of bias in self-selected samples, the Opportunity for All researchers applied two statistical adjustments to lessen the inaccuracy of the online survey data: propensity scoring and calibration (post-stratification) weighting. Let’s look at just the first of these. Propensity scoring is a method designed to temper responses from an otherwise biased sample, like the convenience sample of 44,811 respondents.3 Propensity scoring matches up respondents from a biased sample with a sample thought to be unbiased, called a reference survey. In the Opportunity for All study, data from the 3,176 telephone interviewees served as the reference survey.
Preliminary studies on the effectiveness of propensity scoring for reducing survey bias have produced encouraging results. Still, to date its relative effectiveness has been demonstrated mainly for specific study populations, for example, registered voters, senior citizens, online electronics shoppers, and others. How well it works with different specialty populations as well as more generic ones (like community residents whom public libraries serve) has not been explored. Even for populations where it has been shown to be effective, it does not work across the board. Matthias Schonlau of the Rand Corporation found that bias in certain survey item responses, like reported alcohol use and (financial) stock ownership, is not corrected by propensity score adjustments.4
And there’s nothing to say that the logistic regression models researchers devise for calculating propensity scores are sufficient. As Schonlau wrote, “If there is an unobserved variable that guides [sample] selection…in a way unrelated to the propensity variables and an outcome variable of interest, then no weighting scheme will fix the problem.”5
Bottom line, propensity scoring adjustments are not 100% effective. Besides, as used in the Opportunity for All study they are definitely ineffective for resolving one nagging problem: bias in the reference survey. (This is the drawback to the telephone survey I mentioned above.) If this baseline sample is slanted, propensity score weighting is for naught. Although convenient and inexpensive, using telephone surveys as baselines for propensity score adjustment is a bit risky. According to Fannie Cobben and Jelke Bethlehem of Statistics Netherlands nonresponse and undercoverage bias are stubborn problems with telephone surveys.6 At best, surveys adjusted using propensity scoring can only be as unbiased as the reference survey used as a baseline.
The Opportunity for All researchers do explain that their data may well be inaccurate due to bias and other causes. But you have to search through the report’s appendices to hear about this:
Given the entire sample, the margin of error tells us that the estimated proportions will differ no more than this amount of percentage points from their true values in the population under study.7 The margin of error for an estimate based on public access technology users is ±1.0 percent.
Besides sampling design variability, other forms of error are likely introduced in the analyses of data from survey samples. Bias in selection of respondents, measurement error, and violation of modeling assumptions can all have an influence on variance computations. It is recommended, therefore, that the margin of error be interpreted conservatively.8
I am not sure exactly how variance computations fit into the picture. But I suspect they are related to the two types of error that affect surveys: The first type, sampling error, is inaccuracy due to otherwise well-crafted sampling techniques, like random sampling. The other type is nonsampling error, which, as the researchers note, includes measurement error (like biased or confusing questions, respondents misrepresenting the truth, survey software glitches, and the like), data analysis mistakes, sample selection bias (already described), and so forth.
Since margin of error estimates do not measure nonsampling error, the researchers’ advice to interpret only the margin of error conservatively is, ironically, too conservative. Readers should have been advised that noise and bias in the data mean that all of the survey findings are subject to some level of inaccuracy. The most inaccurate of these, then, should definitely be interpreted conservatively. How much inaccuracy could be hiding in the survey results and in which specific data might it hide? Well, that’s up to the researchers to say.
1 Wainer, H. (2009). Picturing the uncertain world: How to understand, communicate, and control uncertainty through graphical display, Princeton, NJ: Princeton University Press, p. 121.
2 Land line and cell phone customers were called using a randomized dialing technique and a telephone exchange sample. It would be nice to know whether the exchange sample was biased. In the absence of that information, let’s just consider the overall method as approximating random selection. The report doesn’t provide total number of calls placed or the survey response rate.
3 Use of propensity scores to lessen bias in survey data is a relatively recent development. Propensity scores were originally used for reworking data from nonexperimental studies to approximate random assignment of subjects to quasi-experimental treatment and control groups. See Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects, Biometrika, 70 (1), 41-55.
4 Schonlau, M. (2009). Selection bias in web surveys and the use of propensity scores, Sociological Methods and Research, 37(3), p. 311.
5 Schonlau, M. (2009). p. 313.
6 Cobben, F. and Bethlehem, J. (2005). Adjusting undercoverage and non-response bias in telephone surveys, Discussion Paper 05006, Voorburg, The Netherlands: Statistics Netherlands.
7 Margins of error do not work quite the way the researchers describe. By definition, there is a 1 in 20 chance that the true values could stray outside the margin of error of any survey. For more details see Thompson, B. (2006). Foundations of behavioral statistics: An insight-based approach, New York: Guilford Press, pp. 200–206; Hays, W. L. (1973). Statistics for the social xciences, 2nd ed., New York: Holt, Rinehart & Winston, pp. 375–380; or my March 2010 post.
8 Becker, S. et al. (2010). Opportunity for all: How the American public benefits from internet access at U.S. libraries, Washington, DC: Institute of Museum and Library Services, Appendix 2, p. 7.