It’s been a while since I’ve posted here. Writer’s block, I guess. I was hoping to come up with some new angle on library statistics. But to be honest, I haven’t been able to shake the quantitative literacy kick I’ve been on. I believe that quantitative literacy/numeracy is important in this era of data-driven, evidence-based, value-demonstrated librarianship. Especially when much of the data-driving, evidence-basing, and value-demonstrating has been undermined by what I’ll call quantitative deficit disorder. Not only has this disorder gone largely undiagnosed among library advocacy researchers and marketing afficionados, it has also found its way to their audiences. You may even have a colleague nearby who suffers from the disorder.

The most common symptoms among library audiences are these: When presented with research reports, survey findings, or statistical tables or graphs, subjects become listless and unable to concentrate. Within seconds their vision begins to blur. The primary marker of the disorder is an unusually compliant demeanor. Common subject behavior includes visible head-nodding in agreement with all bullet points in data presentations or executive summaries. In severe cases, subjects require isolation from all data-related visual or auditory stimuli before normal cognitive processes will resume.

The only known therapeutic intervention for quantitative deficit disorder is regular exercise consisting of deliberate and repetitive quantitative thinking. Thankfully, this intervention has been proven to be 100% effective! Therefore, I have an exercise to offer to those interested in staving off this disorder.

This exercise is more advanced than others I’ve posted in the past. Researchers who conducted the study I’ve chosen didn’t communicate their quantitative thought processes very clearly. Meaning the exercise requires us to fill in several blanks in the study rationale.

The study is a white paper about library return-on-investment from the LibValue project at the University of Tennessee Center for Information and Communication Studies. The aim of the white paper is to determine how valuable library collections and services are in the university grants process.

It turns out that the LibValue study substantially exaggerates benefits attributable to library collections and services. To understand this we need to examine the formulas the researchers used. To begin with, they were guided by the basic formulas for calculating ROI and cost/benefit analysis shown here:

Basic Return-On-Investment and Cost/Benefit Analysis Formulas

The idea is determining the degree to which monetary investments (costs) pay off in monetary earnings (benefits). Properly calculating ROI or cost/benefit ratios requires (1) a thorough identification of relevant earnings (benefits) and investments (costs) and (2) accurate assignment of monetary values to identified benefits and costs.

The white paper uses the following ROI formula, which I show here verbatim, although reformatted for readability:

LibValue white paper ROI formula.1 Click to see original format.

Though more complicated than the basic ROI formula shown above, this formula follows the same template. The return (earnings) part shows up in the numerator and consists of three separate expressions multiplied together—the multi-term fraction on the left and the two simpler terms to the right. The investment part appears in the denominator as the single term total library budget.

The white paper researchers adapted this formula from an earlier ROI study conducted at the University of Illinois at Urbana-Champaign (UIUC).2 The UIUC researchers, in turn, derived their formula from a 2003 ROI study of special libraries.3

Since the white paper doesn’t explain much about the thinking behind the formula (aka model), we have to refer to the UIUC article to understand it. Except those authors didn’t explain why their model ended up constructed as it is, other than to say they adapted the idea from the special libraries study. All we really have to go on is their table below:

Adapted from UIUC article.4 Red annotations correct errors explained below. Row numbering and shading added. Click for larger image.

The UIUC model appears in the right column and the special libraries model it was adapted from appears in the left column. Both columns contain formula elements—measures, actually—arranged vertically and connected by what were minus signs (hyphens) followed by an equal sign. The minus signs are typographical errors which I have corrected in red. Based on text in the UIUC article these should be X’s to indicate multiplication (which is also matches the white paper model). The top two rows are grayed out to indicate the part of the UIUC model that white paper researchers decided not to use.

To interpret the table follow either column downward. (Later you can, if you want, trace left to right to see how the UIUC researchers adapted the special libraries entries to academic libraries.) Following the right column beginning with row 3 we see two measures multiplied together: Grant proposal success rate using library resources and average grant income. This gives the intermediate product seen in row 7, average grant income generated using library resources.

From the left column in the table above we see the idea of using an average came from rows 5 and 7 of the special libraries model. Regrettably, the special library researchers mistakenly believed a median can be equated with an average. As if waving a magic wand over one can transform it into the other. Fairy dust aside, an average and a median are very different measures. Considering these to be interchangeable is innumerate.

Returning to the right column of the table, the sequence continues into row 8 where multiple calculations are crowded together, which may explain why the model ended up incorrectly specified. According to calculations in the article the first two terms in row 8 are both divided by library materials budget. I corrected this in the equation shown here:

Corrected equation from rows 7 & 8 in right column of UIUC table (above).

It’s not obvious at first, but there are significant measurement decisions embedded in this formula (and the white paper formula also). Decisions that the UIUC researchers acknowledged only in a couple of cryptic statements:

The model is extended to determine the return on the library budget from grant income.5

Quantifying grant income using awards data is problematic as grants can be multiyear awards with income received over time or all at once, and they can be extended or renegotiated. …Data on grant expenditures were recommended because they are part of the university’s reporting system that accounts for the disposition of each grant.6

Let’s untangle the thinking here so we can see what we think of it! The first statement announces that, in the numerator of the basic ROI formula, revenue (earnings/return) was replaced with grant income. Conceiving income as equivalent to revenue is fine as long as researchers and readers alike understand an important distinction between these accounting concepts which happens to be related to timing. Revenue is money earned irrespective of when you actually get your hands on that money. Income is money you have received and have had in your possession at some time. This distinction, and related complications, presented certain measurement challenges that are barely mentioned in UIUC article.

This mention occurs in the second statement above, where the researchers concluded that quantifying (that is, measuring) income using awards data was problematic. But this conclusion doesn’t make sense since they already had chosen to substitute income for awards data (grant revenue). I think that the researchers had a problem with revenue and also with a particular type of income, income from one-time disbursements by grantors in the full amount of grant awards. That is, I believe the researchers’ issue was with library ROI earnings that occurred at a single point in time, whether as revenue or income.

For some (unexplained) reason, the researchers wanted income spread out over time, presumably over the lifetime of the grant. They sought to view all grant income this way whether or not this was how the income actually worked. For example, to the researchers a 3-year $6 million dollar grant award disbursed all at once was equivalent to $500,000 received by the university each quarter or $2 million received annually. To this end, the researchers found a suitable way for spreading grant income over time when it wasn’t so in fact. They chose to gather data on grant expenditures as a substitute measure for grant income.

Thus, the researchers’ measurement decisions had two stages: First, they substituted income data for revenue. Then, they substituted expenditure data for income data. As I say, the only rationale offered for these decisions are the statements quoted above. The LibValue white paper says nothing at all about measurement decisions the model entails, other than the omission of part of the UIUC model (shaded in the table above).

A related puzzle in both studies is exactly which year(s) the ROI data pertain to. The UIUC article says that grants data were collected from an unspecified 10-year period. A survey in the appendix queried faculty about grant proposals for both 2006 and awards received for the 5-year period ending in 2006. Yet, the researchers included this example calculation of their ROI formula indicating the data were from the single year 2006:

Example calculation from UIUC article.7 Shading in the original.

Red annotation added. Click for larger image.

Notice the red-circled measure no. of grants (expended) in year. In the earlier table the ROI formula lists this measure as simply number of grants expended. (Another inconsistency is library [materials] budget in the table which evolved into total library budget in the example.)

Similar mixed messages appear in the LibValue white paper regarding years the data are for. Again, a survey in the appendix queried faculty about 2007 and as well as the prior 5-years. And regression analysis results were reported for data for the unspecified 10-year period. The data tables in the article do not indicate what year(s) the data pertain to. But the data must be annual, rather than multi-year, as the measure from the white paper formula, grants expended each year, implies. And, of course, total library budget in the formula denominator is an annual statistic.

Therefore, we can presume that average grant income (UIUC model) and the average size of grant (white paper model) are annual figures. (Average size of grant means average grant award amount.) But there’s another wrinkle. The averages just named could have been annualized in one of two ways: (1) The averages were calculated from a single year’s grants or (2) they were calculated from multiple years’ grants. Either way poses a measurement challenge related to timing of earnings (grant income) versus investment (budget). If the grant income averages are from a single year, then grants awarded in a given year typically wouldn’t show up as income that same year, although income from prior year grants would. If the averages are from multiple years, somehow the researchers needed to reconcile the multi-year data with annual investment (budget) data.

Spending more time second-guessing the researchers’ measurement decisions is probably not very fruitful here. But you get the idea. Comparisons of earnings—either annualized over multiple years or for a single year—with a single budget year is a bit iffy. How can we know which grants awards were earned from one year’s library investment versus another year’s. It’s also conceivable that some grant awards were facilitated by more than one year’s library investment.

Let’s set these issues aside, as it’s the researchers’ responsibility to address them thoughtfully. Let’s take a look instead at one other interesting aspect of the formulas. In the table from the UIUC article (above) note that the average in row 7 is multiplied by number of grants expended in row 8. Here’s the calculation—corrected as explained earlier—with a twist I’ve added which I will explain:

A twist added to the equation from rows 7 & 8 of the table from the UIUC article (above).

My added twist is equating the two terms multiplied in the numerator left of the equal sign with a single term in the numerator on the right, total grant income generated using library resources. Here’s my reasoning: If you would, temporarily assume that number of grants expended is the same as number of grant awards received. This is roughly the same as equating income (which was apparently measured as reimbursements received for grant expenditures) with revenue (grant awards). As reported in the white paper, these two counts do happen to be equal for some universities but not for others.8 But we’ll get to that.

This assumption means that the UIUC researchers had to calculate average grant income by taking total grant income and dividing it by number of grants awarded. Here’s why the average in the numerator of the equation shown just above has become a total: Multiplying any average by the number of cases used to calculate that average yields the total amount for all cases—in this instance, total grant income.

This same multiplication occurs in the LibValue formula:

Right terms of LibValue formula numerator. Click for larger image.

Again, multiplying average size of grant by the number of grants expended (which we’re assuming to be equivalent to number of grants awarded) gives the total grant income:

Right terms of LibValue formula numerator are equivalent to total grant income. Click for larger image.

Simplified in this way, it becomes obvious that library earnings/benefits are based on total annual grant income, moderated to an unknown degree by spreading that income over time. Neither the UIUC article nor the LibValue white paper explain why it was necessary to use (annual) averages. But it stands to reason that they had to have access to data on total grant income in order to calculate the averages at all.

Just one more thing to consider. We assumed that number of grant awards was equal to number of grants expended. However, this assumption is only partially true. It is true for the 3 of the 8 universities in the white paper as indicated in red here:

Grants expended compared to grant awards for 8 universities in LibValue white paper.9

Click for larger image.

For the rest of the universities (green in the table) grants expended outpaced grants awarded by factors of about 1.5 to one, three to one, four to one, and six to one. From all of this we can conclude: The right terms in the LibValue ROI formula numerator are equivalent at least to a university’s total annual grant income and usually to 300% to 600% of total annual grant income. Thus, the white paper considerably exaggerates library return/earnings for the majority of the universities studied (assuming the LibValue model as a whole to be acceptable).

To determine if and how exaggerated the final ROI ratios are we need to decipher the rest of the formula, including the multi-term fraction in the numerator. Besides this, there is also the question of how defining library benefits as equal to 100% of university grant awards can be justified.

But these are issues we can ruminate on in a sequel to this post. I mean, you can only think quantitatively for so long. Time to give our brains some rest.

============================

1 Tenopir, C. et al. (2010) University Investment in the Library, Phase II: An International Study of the Library’s Value to the Grants Process, p. 7.

2 Luther, J. (2008) University Investment in the Library: What’s the Return, A Case Study at the University of Illinois at Urbana-Champaign.

3 Strouse, R. (2003) Demonstrating Value and Return on Investment, Information Outlook, 14-19.

4 Luther, J. (2008) p. 8.

5 Luther, J. (2008) p. 8.

6 Luther, J. (2008) p. 9.

7 Luther, J. (2008) p. 11.

8 Although it’s possible, for record-keeping reasons, for annual grant income to differ from annual grant expenditures, the white paper researchers tallied earnings based on grant expenditures rather than grant income. This means they also considered number of grants expended in a given year to be equivalent to the number of grants awards receiving income.

9 Data from Table 1b in Tenopir et al. (2010) p. 9.