In their book What the Numbers Say Derrick Niederman and David Boyum say that the way to good quantitative thinking is practice, practice, practice! In this spirit I offer this post as another exercise for sharpening the reader’s numeracy skills.

A couple of months back I presented a series of statistical charts about large U.S. public library systems. Sticking with the theme of large public libraries, I thought I’d focus on one in particular, The Free Library of Philadelphia. This is because the PEW Charitable Trusts Philadelphia Research Initiative did an up-close analysis of The Free Library in 2012. So this post is a retrospective on that PEW report. Well, actually, on just this graph from the report:

Source: The Library in the City, PEW Charitable Trusts Philadelphia Research Initiative. Click to see larger image.

The PEW researchers interpreted the chart this way:

Over the last six years, a period in which library visits and circulation grew modestly, the number of computer sessions rose by 80 percent…These numbers only begin to tell the story of how the public’s demands on libraries are changing.1

The implication is that because demand for technology outgrew demand for traditional services by a factor of 8-to-1, The Free Library should get ready to plug in even more technological devices! This plan may have merit, but the evidence in the chart does not justify it. Those data tell quite a different story when you study them closely. So, let’s do that.

The main problem with the chart is that 80% is an exaggerated figure. It is potentially inflated on its own and is definitely inflated in the context of comparisons made in the chart. Let me begin with the first point.

The percentages in the PEW chart are cumulative rates, that is, rates of change calculated over multiple years. When cumulative rates are based on low initial values, the rates end up artificially inflated. This is clearly the case with a statement in the PEW report about digital downloads at the Phoenix Public Library increasing by more than 800%.2 When you see large 3- or 4-digit percentages, chances are the baseline (the denominator in the fraction that the percentage comes from) is too low. These percentages are so exaggerated they aren’t meaningful.

Another example of a measure on the upswing from low initial values is public computers in U.S. public libraries, seen in the chart below. The Institute of Museum and Library Services (IMLS) began collecting data on these in 1998.

Click to see larger image.

The plotted lines indicate cumulative growth over time calculated from 5 different baseline years listed in the chart legend. Each line begins the year following the baseline year. For example, tracking growth based on the count of computers in 1998 begins in 1999 and extends to 2011. Growth based on the 2000 count begins in 2001. And so forth. In a sense, each baseline year begins with zero growth and increases from there.

The legend lists the count of computers for each of these years. (A peculiar use for a legend, I know. But the data are a bit easier to reference there than in a separate table or chart.) The arrow at the right edge of the horizontal axis indicates that in 2011 there were a total of 262,462 public computers in U.S. public libraries.

The calculation of how much growth 262,462 represents depends on what baseline year we choose. Using the 1998 count as the baseline (24,104, brown line) the 2011 count represents 989% growth. Using the 2002 level (99,453, orange line) the growth was 164%. Using 2004 (141,194, green line) gives 86%. And so on.

The earlier the baseline year, the higher the 2011 cumulative rate is. With each baseline year the cumulative rate decreases because the baseline amounts increase steadily as seen in the chart legend. So, gauging overall growth depends on how far back our hindsight stretches.

Now, let’s see how this dynamic plays out with The Free Library data. Computer use in 2005 at the library was not really a project startup, as is obvious from this chart:

Click to see larger image.

Nevertheless, we know the trend involves low early numbers (denominators) and high later numbers (numerators), as that’s the gist of the PEW story to begin with! And we know that with cumulative rates there is leeway in selecting the baseline year. So, for kicks let’s see how the PEW researchers’ choice compares with other baselines.

To do this we need The Free Library’s computer usage data. Unfortunately, I don’t have access to the data from the PEW report. And also unfortunately, there are some irregularities with the official data the library reported to IMLS—computer uses in 2006 and 2007 were apparently under-counted. Presuming the PEW report to be correct, I estimated usage for 2005 through 2007 by extrapolating backward from the 2011 IMLS count using PEW’s 80% figure. By my calculation computer use at the library was roughly 722,000 in 2005. These alternative counts, IMLS-reported versus extrapolated, are shown in this chart:

Click to see larger image.

Using the extrapolated figures up until 2007 and the IMLS data thereafter, here’s what 2012 cumulative rates for this measure look like:

Click to see larger image.

Consistent with the PEW report, in the chart the 2011 value for the 2005 baseline (brown) line is marked at 80%. Of course, this amount increased in 2012 since the rate is cumulative. The black dot (a line with no length) indicates the cumulative rate of growth from 2011 to 2012, which is equivalent to the annual rate of growth in 2012.

So, is the 80% cumulative growth in 2011 exaggerated? Possibly, if a lower figure can be used just as easily. And there is no rule of thumb for choosing from among alternative cumulative rates, making the choice an arbitrary one.

Whatever choice analysts make, it’s hard to make sense of cumulative rates in general. On their own the rates—80%, 67%, 50% and so on—sound high. But how high are they really, with multiple years rolled into them? And how well are readers able to evaluate them? I suspect that they’re like weather forecasts that audiences translate impressionistically into low-medium-high probabilities.

Another drawback with cumulative rates is that, as summary numbers (that is, the values listed at the right ends of the lines in the chart), they hide year-to-year details. Like the dip that occurred in computer use at the Free Library in 2009. Or the big jump in public computer counts reported to IMLS in 1999 seen in the earlier chart.

A more straightforward way to describe growth in performance data is tracking this annually. This would be the ideal route for the PEW study because it eliminates the baseline year dilemma. Plus it provides readers with a more complete understanding of the data.

The next set of charts shows annual rates of growth in public computers in libraries along with rates for visits and circulation from 1999 to 2011. In chart A you can see that rates for public computer counts (green line) begin high due to smaller counts early on. But within a few years they fall pretty much in line with rates for established library services.

Click to see larger image.

Notice in chart B that in the aftermath of the Great Recession total counts of traditional services fell while total public computer uses grew. Still, the point is that over time technology rates settled into the single-digits.

The next chart gives annual rates of growth in public computer use at The Free Library. (These same percentages appeared at the left ends of trend lines in the chart above that shows cumulative growth.)

Click to see larger image.

Looking at this chart would you have guessed that the pattern up to 2011 represents 80% cumulative growth? The annual percentages, on the other hand, do help us understand what is behind the 80% from the PEW report and the 88% in 2012. We can describe the 80% as a trend where 5 out of 6 years growth was at or somewhat above 10%. Or equivalent to an average annual growth of 10.3%. This same translation can be done for the 67% cumulative rate from 2006 to 2012. The 67% amounts to roughly 10% or somewhat higher growth for 5 out of 7 years. Or an average annual growth of 9.0%.

As you can see, describing growth trends is a bit of a moving target! The time range selected has a major bearing on the answer. In any case annual rates are easier for the typical reader to comprehend.

There’s another thing to be aware of with trends in growth rates. When we see a dip and then increase, the year following the dip is somewhat exaggerated due to the dip year. This applies to the 14% increase in 2010.

Finally, we come to my second point about the PEW report 80% figure being inflated “in the context of comparisons made in the chart.” (I prefer the term biased over inflated.) The bias comes from comparing relative growth in measures having very different orders of magnitude. As seen in the chart below, annual visit and circulation counts at The Free Library ranged from 5.5 to 7.5 million, whereas public computer uses peaked at a bit above 1 million. So we’re talking a difference factor of 6 or 7 here.

Click to see larger image.

As you may have surmised, the issue here is baselines (denominators) again! A 10% increase in circulation amounts to 700,000 units while a 10% increase in computer uses amounts to 100,000. In this example growth in circulation obviously out-paces computer uses, despite the rates being identical. Comparing relative growth in measures of such different magnitudes is biased because it favors the measure with the smaller baseline. This next chart illustrates this bias:

The measure with the lowest 6-year increase gets all the glory! Click to see larger image.

The chart shows the net changes for the three measures presented in the PEW bar chart at the beginning of this post. The red annotations show the cumulative growth which each net count represents. (Data submitted by The Free Library to IMLS do not match the PEW chart: Based on IMLS data the 2011 cumulative rate of growth in circulation was 14.4%, not 11%. And the 2011 cumulative rate of growth in installed public computers was 20%, not 27%.)

Due to its low baseline—5 to 6 million lower than visits and circulation baselines—public computer uses shows stellar growth! Meanwhile, the other two statistics hobble along in what the PEW researchers described as a modest pace.

How curious that the third-placed statistic was cast as the most impressive in the PEW report! Which makes me wonder whether PEW was intentionally praising the library for reaching a not-very-high-bar, since it’s a lot easier to “grow” small programs than large ones. Getting circulation to increase by 14% was no meager accomplishment. Yet, this went unheralded in the PEW report.

Spinning data serves PEW’s agenda, as it does others who see the main function of public libraries as portals to the Internet. Of course, seeing through spun data requires un-spinning them. And the only things needed for this are time and sound quantitative thinking.

Incidentally, the story that the PEW bar chart *only began to tell*—about unrelenting demand for technology at libraries—didn’t exactly come true at The Free Library. Perhaps you noticed this in the chart of annual change in computer uses (the chart above with the single green line). The 14% rebound in 2010 occurred in a year when the number of computers remained unchanged. It’s doubtful the rebound can be attributed to longer hours, since the total hours The Free Library was open in 2010 were the lowest in 8 years. In 2012 the library had the second-lowest hours open over the same 8-year span. In 2012 growth in computer uses fell to a modest 4.6% even though the library had added 120 more computers. Looks like just plugging them in is no guarantee after all.

—————————

1 PEW Charitable Trusts Philadelphia Research Initiative,*The Library in the City: Changing Demands and a Challenging Future,* 2012, p. 10.

2 Pew Charitable Trusts Philadelphia Research Initiative, p. 14.