If You Plug Them In They Will Come

In their book What the Numbers Say Derrick Niederman and David Boyum say that the way to good quantitative thinking is practice, practice, practice! In this spirit I offer this post as another exercise for sharpening the reader’s numeracy skills.

A couple of months back I presented a series of statistical charts about large U.S. public library systems. Sticking with the theme of large public libraries, I thought I’d focus on one in particular, The Free Library of Philadelphia. This is because the PEW Charitable Trusts Philadelphia Research Initiative did an up-close analysis of The Free Library in 2012. So this post is a retrospective on that PEW report. Well, actually, on just this graph from the report:

PEW Philadelphia Report Bar Chart

Source: The Library in the City, PEW Charitable Trusts Philadelphia Research Initiative.  Click to see larger image.

The PEW researchers interpreted the chart this way:

Over the last six years, a period in which library visits and circulation grew modestly, the number of computer sessions rose by 80 percent…These numbers only begin to tell the story of how the public’s demands on libraries are changing.1

The implication is that because demand for technology outgrew demand for traditional services by a factor of 8-to-1, The Free Library should get ready to plug in even more technological devices! This plan may have merit, but the evidence in the chart does not justify it. Those data tell quite a different story when you study them closely. So, let’s do that.

The main problem with the chart is that 80% is an exaggerated figure. It is potentially inflated on its own and is definitely inflated in the context of comparisons made in the chart. Let me begin with the first point.

The percentages in the PEW chart are cumulative rates, that is, rates of change calculated over multiple years. When cumulative rates are based on low initial values, the rates end up artificially inflated. This is clearly the case with a statement in the PEW report about digital downloads at the Phoenix Public Library increasing by more than 800%.2  When you see large 3- or 4-digit percentages, chances are the baseline (the denominator in the fraction that the percentage comes from) is too low. These percentages are so exaggerated they aren’t meaningful.

Another example of a measure on the upswing from low initial values is public computers in U.S. public libraries, seen in the chart below. The Institute of Museum and Library Services (IMLS) began collecting data on these in 1998.

GPTerms_CumGrowth_450

Click to see larger image.

The plotted lines indicate cumulative growth over time calculated from 5 different baseline years listed in the chart legend. Each line begins the year following the baseline year. For example, tracking growth based on the count of computers in 1998 begins in 1999 and extends to 2011. Growth based on the 2000 count begins in 2001. And so forth. In a sense, each baseline year begins with zero growth and increases from there.

The legend lists the count of computers for each of these years. (A peculiar use for a legend, I know. But the data are a bit easier to reference there than in a separate table or chart.) The arrow at the right edge of the horizontal axis indicates that in 2011 there were a total of 262,462 public computers in U.S. public libraries.

The calculation of how much growth 262,462 represents depends on what baseline year we choose. Using the 1998 count as the baseline (24,104, brown line) the 2011 count represents 989% growth. Using the 2002 level (99,453, orange line) the growth was 164%. Using 2004 (141,194, green line) gives 86%. And so on.

The earlier the baseline year, the higher the 2011 cumulative rate is. With each baseline year the cumulative rate decreases because the baseline amounts increase steadily as seen in the chart legend. So, gauging overall growth depends on how far back our hindsight stretches.

Now, let’s see how this dynamic plays out with The Free Library data. Computer use in 2005 at the library was not really a project startup, as is obvious from this chart:

Phila_GPTerms_450

Click to see larger image.

Nevertheless, we know the trend involves low early numbers (denominators) and high later numbers (numerators), as that’s the gist of the PEW story to begin with! And we know that with cumulative rates there is leeway in selecting the baseline year. So, for kicks let’s see how the PEW researchers’ choice compares with other baselines.

To do this we need The Free Library’s computer usage data. Unfortunately, I don’t have access to the data from the PEW report. And also unfortunately, there are some irregularities with the official data the library reported to IMLS—computer uses in 2006 and 2007 were apparently under-counted. Presuming the PEW report to be correct, I estimated usage for 2005 through 2007 by extrapolating backward from the 2011 IMLS count using PEW’s 80% figure. By my calculation computer use at the library was roughly 722,000 in 2005. These alternative counts, IMLS-reported versus extrapolated, are shown in this chart:

phila_actualnextrappitusr_450

Click to see larger image.

Using the extrapolated figures up until 2007 and the IMLS data thereafter, here’s what 2012 cumulative rates for this measure look like:

Phila_CumGrowthPitUsr_450

Click to see larger image.

Consistent with the PEW report, in the chart the 2011 value for the 2005 baseline (brown) line is marked at 80%. Of course, this amount increased in 2012 since the rate is cumulative. The black dot (a line with no length) indicates the cumulative rate of growth from 2011 to 2012, which is equivalent to the annual rate of growth in 2012.

So, is the 80% cumulative growth in 2011 exaggerated? Possibly, if a lower figure can be used just as easily. And there is no rule of thumb for choosing from among alternative cumulative rates, making the choice an arbitrary one.

Whatever choice analysts make, it’s hard to make sense of cumulative rates in general. On their own the rates—80%, 67%, 50% and so on—sound high. But how high are they really, with multiple years rolled into them? And how well are readers able to evaluate them? I suspect that they’re like weather forecasts that audiences translate impressionistically into low-medium-high probabilities.

Another drawback with cumulative rates is that, as summary numbers (that is, the values listed at the right ends of the lines in the chart), they hide year-to-year details. Like the dip that occurred in computer use at the Free Library in 2009. Or the big jump in public computer counts reported to IMLS in 1999 seen in the earlier chart.

A more straightforward way to describe growth in performance data is tracking this annually. This would be the ideal route for the PEW study because it eliminates the baseline year dilemma. Plus it provides readers with a more complete understanding of the data.

The next set of charts shows annual rates of growth in public computers in libraries along with rates for visits and circulation from 1999 to 2011. In chart A you can see that rates for public computer counts (green line) begin high due to smaller counts early on. But within a few years they fall pretty much in line with rates for established library services.

Public Computers Annual Growth

Click to see larger image.

Notice in chart B that in the aftermath of the Great Recession total counts of traditional services fell while total public computer uses grew. Still, the point is that over time technology rates settled into the single-digits.

The next chart gives annual rates of growth in public computer use at The Free Library. (These same percentages appeared at the left ends of trend lines in the chart above that shows cumulative growth.)

Phila_PitUsrAnnualChg_450

Click to see larger image.

Looking at this chart would you have guessed that the pattern up to 2011 represents 80% cumulative growth? The annual percentages, on the other hand, do help us understand what is behind the 80% from the PEW report and the 88% in 2012. We can describe the 80% as a trend where 5 out of 6 years growth was at or somewhat above 10%. Or equivalent to an average annual growth of 10.3%. This same translation can be done for the 67% cumulative rate from 2006 to 2012. The 67% amounts to roughly 10% or somewhat higher growth for 5 out of 7 years. Or an average annual growth of 9.0%.

As you can see, describing growth trends is a bit of a moving target! The time range selected has a major bearing on the answer. In any case annual rates are easier for the typical reader to comprehend.

There’s another thing to be aware of with trends in growth rates. When we see a dip and then increase, the year following the dip is somewhat exaggerated due to the dip year. This applies to the 14% increase in 2010.

Finally, we come to my second point about the PEW report 80% figure being inflated “in the context of comparisons made in the chart.”  (I prefer the term biased over inflated.) The bias comes from comparing relative growth in measures having very different orders of magnitude. As seen in the chart below, annual visit and circulation counts at The Free Library ranged from 5.5 to 7.5 million, whereas public computer uses peaked at a bit above 1 million. So we’re talking a difference factor of 6 or 7 here.

phila_visitscircpitusr2005-2012_450

Click to see larger image.

As you may have surmised, the issue here is baselines (denominators) again! A 10% increase in circulation amounts to 700,000 units while a 10% increase in computer uses amounts to 100,000. In this example growth in circulation obviously out-paces computer uses, despite the rates being identical. Comparing relative growth in measures of such different magnitudes is biased because it favors the measure with the smaller baseline. This next chart illustrates this bias:

phila-netchgvisitscircpitusr_450

The measure with the lowest 6-year increase gets all the glory! Click to see larger image.

The chart shows the net changes for the three measures presented in the PEW bar chart at the beginning of this post. The red annotations show the cumulative growth which each net count represents. (Data submitted by The Free Library to IMLS do not match the PEW chart: Based on IMLS data the 2011 cumulative rate of growth in circulation was 14.4%, not 11%. And the 2011 cumulative rate of growth in installed public computers was 20%, not 27%.)

Due to its low baseline—5 to 6 million lower than visits and circulation baselines—public computer uses shows stellar growth! Meanwhile, the other two statistics hobble along in what the PEW researchers described as a modest pace.

How curious that the third-placed statistic was cast as the most impressive in the PEW report! Which makes me wonder whether PEW was intentionally praising the library for reaching a not-very-high-bar, since it’s a lot easier to “grow” small programs than large ones. Getting circulation to increase by 14% was no meager accomplishment. Yet, this went unheralded in the PEW report.

Spinning data serves PEW’s agenda, as it does others who see the main function of public libraries as portals to the Internet. Of course, seeing through spun data requires un-spinning them. And the only things needed for this are time and sound quantitative thinking.

Incidentally, the story that the PEW bar chart only began to tell—about unrelenting demand for technology at libraries—didn’t exactly come true at The Free Library. Perhaps you noticed this in the chart of annual change in computer uses (the chart above with the single green line). The 14% rebound in 2010 occurred in a year when the number of computers remained unchanged. It’s doubtful the rebound can be attributed to longer hours, since the total hours The Free Library was open in 2010 were the lowest in 8 years. In 2012 the library had the second-lowest hours open over the same 8-year span. In 2012 growth in computer uses fell to a modest 4.6% even though the library had added 120 more computers. Looks like just plugging them in is no guarantee after all.

—————————

1  PEW Charitable Trusts Philadelphia Research Initiative,The Library in the City: Changing Demands and a Challenging Future, 2012, p. 10.

2  Pew Charitable Trusts Philadelphia Research Initiative, p. 14.

Posted in Advocacy, Data visualization, Library statistics, Numeracy | Leave a comment

Averages Gone Wrong

In this post I’ll be telling a tale of averages gone wrong. I tell it not just to describe the circumstances but also as a mini-exercise in quantitative literacy (numeracy), which is as much about critical thinking as it is about numbers. So if you’re game for some quantitative calisthenics, I believe you’ll find this tale invigorating. Also, you’ll see examples of how simple, unadorned statistical graphs are indispensable in data sleuthing!

Let me begin, though, with a complaint. I think we’ve all been trained to trust averages too much. Early in our school years we acquiesced to the idea of an average of test scores being the fairest reflection of our performance. Later in college statistics courses we learned about a host of theories and formulas that depend on the sacrosanct statistical mean/average. All of this has convinced us that averages are a part of the natural order of things.

But the truth is that idea of averageness is a statistical invention, or more accurately, a sociopolitical convention.1 There are no such things as an average student, average musician, average automobile, average university, average library, average book, or an average anything. The residents of Lake Wobegon realized this a long time ago!

Occasionally our high comfort level with averages allows them to be conduits for wrong information. Such was the case for the average that went wrong found in this table from a Public Library Funding and Technology Access Study (PLFTAS) report:

PLFTAS_FigB1_420

Source: Hoffman, J. et al. 2012, Libraries Connect Communities: Public Library
Funding & Technology Study 2011-2012
, 11.   Click to see larger image.

The highlighted percentage for 2009-2010 is wrong. It is impossible for public libraries nationwide to have, on average, lost 42% of their funding in a single year. For that average to be true almost all of the libraries would have had to endure cuts close to 40%. Or for any libraries with lower cuts (like 20% or less) there would have been an equivalent number with more severe cuts (70% or greater). Either way a groundswell of protests and thousands of news stories of libraries closing down would have appeared. These did not, of course. Nor did the official Institute of Museum & Library Services (IMLS) data show funding changes anywhere near the -42% in that table. The Public Libraries in the U.S. Survey data show the average expenditures decrease was -1.7%.2

Various factors could have caused the 2010 PLFTAS percentage to be so far off. I suspect that two of these were an over-reliance on statistical averages and the way the averages were calculated.

Since the percentages in the table describe annual changes, they are rates. Rates you will recall, are how given numbers compare to base figures, like miles per gallon, visits per capita, or number of influenza cases per 1000 adults. The rates in the PLFTAS table indicate how each year’s average library expenditures compare with the prior year. The chart title labels the data average total operating expenditures change.

That label is somewhat ambiguous due to use of the terms average and total together. Usually, a number cannot simultaneously be an average and a total. The percentages in the chart are based on a measure named total operating expenditures, which is the sum of staffing, collection, and other expenditures at an individual library outlet. So, total refers to totals provided by the library outlets, not a total calculated by the researchers from data for the entire group of outlets surveyed.

The title’s wording is ambiguous in another, more significant way. To elaborate, let me first abbreviate total operating expenditures as expenditures, making the phrase average expenditures change. Both the chart title and my revised phrase are ambiguous because they can be interpreted in two ways:


Interpretation

Meaning
Average change in expenditures
Average rate of change in expenditures
Change in average expenditures
Rate of change in average expenditures

Two Interpretations of the Phrase Average Expenditures Change

Tricky, isn’t it? It turns out that percentages from the PLFTAS table fall under the second interpretation, change in average expenditures. That is, the percentages are rates of change in a set of annual averages. The data in the table are the rates while the averages appear elsewhere in the PFTAS reports.3

As explained in my prior post, averages—as well as medians, totals, and proportions—are aggregate measures. Aggregate measures are single numbers that summarize an entire set of data. Thus, we can say more generally that the PLFTAS data are changes in an aggregate measure (an average). Tracking aggregate library measures of one type or another is quite common in library statistics. Here is an example:

Lyons Fig7a Visits
Lyons Fig8a Visits Rate

Annual Library Visit Totals and Rates of Change in the Totals.
 Source: Lyons, R. 2013. Rainy Day Statistics: U.S. Public Libraries and the Great Recession,
Public Library Quarterly, 32:2, 106-107. Click either chart to see larger image.

The upper chart tracks annual visit totals (aggregate measures) and the lower tracks rates of change in these. The annual rate of change in any measure, including aggregate measures, is calculated as follows:

Average Calc

This is exactly how the PLFTAS researchers calculated their—oops…I almost typed average rates! I mean their rates of change in the averages. They compared how much each year’s average expenditure level changed compared to the prior year.

In the earlier table the alternative interpretation of the phrase average expenditures change is average rate of change in expenditures. This type of average is typically called an average rate, which is short-hand for average rate of change in a given measure. An average rate is an average calculated from multiple rates we already have on hand. For example, we could quickly calculate an average rate for the lower of the two line charts above. The average of the 5 percentages there is 3.0%. For the rates in the PLFTAS table the average is -9.6%. In both instances these averages are 5-year average rates.

However, 5-year rates aren’t very useful to us here because they mask the annual details that interested the PLFTAS researchers. We can, though, devise an average rate that does incorporate detailed annual expenditure data. We begin by calculating an individual rate for each of the 6000-8000+ library outlets that participated in the PLFTAS studies following the formula on the left side of the table above. We do this for each of the 5 years. Then, for each year we calculate an average of the 6000-8000+ rates. Each of the 5 resulting rates is the average rate of change in total operating expenditures for one year.

Obviously, tracking how thousands of individual cases, on average, change each year is one thing, and tracking how a single aggregate measure like an average or total changes is quite another. The chart below shows how these types of rates differ:

Pub Lib Expend Rates

Three Different Types of Library Expenditure Rates.    Click to see larger image.

The data are for 9000+ libraries that participated in the IMLS Public Libraries in the U.S. Survey in any of the 5 years covered. Notice that rates for the aggregate measures (red and green lines) decrease faster over time than the average rate (blue line). Since thousands of individual rates were tabulated into the average rate, this rate is less susceptible to fluctuations due to extreme values reported by a small minority of libraries.

On the other hand, rates for totals and averages are susceptible to extreme values reported by a small minority, mainly because the calculation units are dollars instead of rates (percentages).4 This susceptibility would usually involve extreme values due to significant funding changes at very large libraries. (A 1% budget cut at a $50 million library would equal the entire budget at a $500,000 library, and a 10% cut would equal a $5 million dollar one!) Or fluctuations could be caused simply by data for two or three very large libraries being missing in a given year. For the PLFTAS studies, the liklihood of non-response by large systems would probably be higher than in the IMLS data.

The other striking thing visible in the line graph above is how trends in rates of change in totals and averages (red and green lines) are nearly identical. So, tracking rates in average funding pretty much amounts to tracking total funding. (Makes sense, since an average is calculated directly from the total.)

Now the question becomes, which type of rate is better for understanding library funding changes—rate of change in an average or an average rate? I honestly cannot say for sure. Clearly, each can slant the outcome in certain ways, although that isn’t necessarily a bad thing. It all depends in what features of the data we’re hoping to represent.

Regardless, the lesson is that an unexamined average can be very deceptive. For this reason, it’s always smart to study the distribution (spread) of our data closely. As it happens, staring out of the pages of one PLFTAS report is the perfect data distribution for the -42% mystery discussed here. Beginning with the 2009-2010 edition the PLFTAS studies asked library outlets to report how much funding change they experienced annually. The responses appear in the report grouped into the categories appearing in the left column of this table:

PLFTAS_CLII_2010_Fig66Adapted_180

Distribution of Reported Changes in 2010 Library Funding. Adapted from: Bertot et al. 2010.
2009-2010 Public Library Funding & Technology Survey: Survey Findings and Results.

Presuming the data to be accurate, they are strong evidence that the -42% average decrease could not be right. The mere fact that funding for 25% of the library outlets was unchanged lowers the chances that the average decrease would be -42%. Add to this the percentages in the greater-than-0% categories (top 5 rows) and any possibility of such a severe decrease is ruled out.

This argument is even more compelling when visualized in traditional statistical graphs (rather than some silly infographic layout). The graphs below show the distributions of data from the table above and corresponding tables in the 2011 and 2012 PLFTAS reports.5 The first graphic is a set of bar charts, one for each year the PLFTAS study collected the data:

PLFTAS_BudgetChg_Bar_320

Depiction of Distribution of Budget Changes from 2010 – 2012 as Bar Charts.
Click any chart for larger image.

Perhaps you recognize this graph as a trellis chart (introduced in my prior post) since the 3 charts share a single horizontal axis. Notice in that axis that the categories from the PLFTAS table above are now sorted low-to-high with 0% in the middle. This re-arrangement lets us view the distribution of the data. Because the horizontal axis contains an ordered numeric scale (left-to-right), these bar charts are actually equivalent to histograms, the graphical tools of choice for examining distributions. The area covered by the adjacent bars in a histogram directly reflect the quantities of data falling within the intervals indicated on the horizontal axis.

From the bar charts we see that the distributions for the 3 years are quite similar. Meaning, for one thing, that in 2010 there was no precipitous drop or anything else atypical. We also see that the 0% category contains the most outlets in every year. After that the intervals 0.1 to 2% and 2.1 to 4% account for the most outlets. Even without summing the percentages above the bars we can visually estimate that a majority of outlets fall within the 0% to 4% range. Summing the 2010 percentages for 5 categories 0% or higher we find that 69% of the outlets fall within this range. For 2011 the sum is also 69% and for 2012 it is 73%.

Visually comparing the distributions is easier with the next set of graphs, a line chart and a 3-D area chart. I usually avoid 3-D graphics completely since they distort things so much. (In the horizontal axis, can your eyes follow the 0% gridline beneath the colored slices to the back plane of the display?) Here I reluctantly use a 3-D chart because it does give a nice view of the distributions outlines, better than the line chart or separate bar charts. So, I hereby rescind my policy of never using 3-D graphics! But I stick by this guiding principle:Does the graphical technique help us understand the data better?

Budget Change LineBudget Change 3D

Depictions of Distribution of Budget Changes from 2010 – 2012 in a Line Chart and 3-D Area Chart.
Click either chart for larger image.

Notice that the horizontal axes in these charts are identical to the horizontal axis in the bar charts. Essentially, the line chart overlays the distributions from the bar charts, confirming how similar these three are. This chart is also useful for comparing specific values within a budget change category or across categories. On the other hand, the closeness of the lines and the numerous data labels interfere with viewing the shapes of the distributions.

Here’s where the 3-D chart comes in. By depicting the distributions as slices the 3-D chart gives a clear perspective on their shapes. It dramatizes (perhaps too much?) the sharp slopes on the negative side of 0% and more gradual slopes on the positive side. Gauging the sizes of the humps extending from 0% to 6% it appears that the bulk of library outlets had funding increases each year.

So, there you have it. Despite reports to the contrary, the evidence available indicates that no drastic drop in public library funding occurred in 2010. Nor did a miraculous funding recovery restore the average to -4% in 2011. (Roughly, this miracle would have amounted to a 60% increase.) Accuracy-wise, I suppose it’s some consolation that in the end these two alleged events did average out!

—————————

1   Desrosières, A. 1998. The Politics of Large Numbers: A History of Statistical Reasoning. Cambridge MA: Harvard University Press. See chapters 2 & 3.
2   Based on IMLS data the 2009 average expenditures were $1.19 million and the 2010 average was $1.17 million, a 1.7% decrease. Note that I calculated these averages directly from the data. Beginning in IMLS 2010 changed the data appearing in their published tables to exclude libraries outside the 50 states and entities not meeting library definition. So it was impossible to get comparable totals for 2009 and 2010 from those tables.
3   I corresponded with Judy Hoffman, primary author of the study, who explained the calculation methods to me. The figures necessary for arriving at the annual averages appear in the detailed PLFTAS reports available here.
4   This is something akin to political voting. With the average rate each library outlet submits its vote—the outlet’s individual rate of expenditure change. The range of these will be relatively limited, theoretically from -100% to 100%. In practice, however, very few libraries will experience funding increases higher than 40% or decreases more severe than -40%. Even if a few extreme rates occur, these will be counter-balanced by thousands of rates less than 10%. Therefore, a small minority of libraries with extreme rates (high or low) cannot sway the final results very much.
   With the calculation of annual averages each libraries vote by expenditures dollars. These have a much wider range—from about $10 thousand to $100 million or more. With aggregate measures like totals, means/averages, and medians each library’s vote is essentially weighted in proportion to its funding dollars. Due to the quantities involved, aggregate library measures are affected much more by changes at a few very larges libraries than by changes at a host of small libraries.
5    The data are from the sequence of annual reports entitled Public Library Funding and Technology Access Survey: Survey and Findings available at the University of Maryland Information Policy & Access Center.
See table 66, p. 61 in the 2009-2010 report; table 53, p. 54 in the 2010-2011 report; and table 57, p. 65 in the 2011-2012 report.

Posted in Advocacy, Library statistics, Measurement | Leave a comment

I Think That I Shall Never See…

This post is about a much discussed question: How did the Great Recession affect U.S. public libraries? I’m not really going to answer the question, as that would amount to a lengthy journal article or two. But I am going to suggest a way to approach the question using data from the Institute of Museum and Library Services (IMLS) Public Libraries in the United States Survey. Plus I’ll be demonstrating a handy data visualization tool known as a trellis chart that you might want to consider for your own data analysis tasks. (Here are two example trellis charts in case you’re curious. They are explained futher on.)

As for the recession question, in the library world most of the discussion has centered on pronouncements made by advocacy campaigns: Dramatic cuts in funding. Unprecedented increases in demand for services. Libraries between a rock and hard place. Doing more with less. And so forth.

Two things about these pronouncements make them great as soundbites but problematic as actual information. First, the pronouncements are based on the presumption that looking at the forest—or at the big picture, to mix metaphors—tells us what we need to know about the trees. But it does not.

In the chart below you can see that the Great Recession had no general, across-the-board effect on public library funding. Some libraries endured severe funding cuts, others more moderate cuts, others lost little or no ground, while the majority of libraries actually had funding increases in the aftermath of the recession.

IMLS0611_CumChangeOpExp_500

 Bars to the left of the zero line reflect libraries with decreases; bars to the right, increases. Change of -10% = 10% decrease. Change of 10% = 10% increase. Click for larger image.

In the chart note that 35% of libraries had 5-year inflation-adjusted cumulative decreases of one size or another. Of these libraries, about half (18% of all libraries) had decreases of 10% or greater and half (17% of all libraries) had decreases less than 10%. 65% of libraries had cumulative increases of any size. Of libraries with increases, two-thirds (43% of all libraries) had increases of 10% or greater and one-third (22% of all libraries) with increases less than 10%. By the way, expenditure data throughout this post are adjusted for inflation because using unadjusted (face-value) figures would understate actual decreases and overstate actual increases.1

The second problem with the advocacy pronouncements as information is their slantedness. Sure, library advocacy is partial by definition. And we promote libraries based on strongly held beliefs about their benefits. So perhaps the sky-is-falling messages about the Great Recession were justified in case they actually turned out to be true. Yet many of these messages were contradicted by the available evidence. Most often the messages involved reporting trends seen only at a minority of libraries as if these applied to the majority of libraries. That’s is essentially what the pronouncements listed above do.

A typical example of claims that contradict actual evidence appeared in the Online Computer Library Center (OCLC) report Perceptions of Libraries, 2010. Data in that report showed that 69% of Americans did not feel the value of libraries had increased during the recession. Nevertheless, the authors pretended that the 31% minority spoke for all Americans, concluding that:

Millions of Americans, across all age groups, indicated that the value of the library has increased during the recession.2

In our enthusiasm for supporting libraries we should be careful not to be dishonest.

But enough about information accuracy and balance. Let’s move on to some nitty-gritty data exploration! For this I want to look at certain trees in the library forest. The data we’ll be looking at are just for urban and county public library systems in the U.S. Specifically, the 44 libraries with operating expenditures of $30 million or more in 2007.3 The time period analyzed will be 2007 to 2011, that is, from just prior to the onset of the Great Recession to two years past its official end.

Statistically speaking, a forest-perspective can still compete with a tree-perspective even with a small group of subjects like this one. Here is a graph showing a forest-perspective for the 44 libraries:

Median Coll Expend

Median collection expenditures for large U.S. urban libraries.  Click to see larger graph.

You may recall that a statistical median is one of a family of summary (or aggregate) statistics that includes totals, means, ranges, percentages/proportions, standard deviations, and the like. Aggregate statistics are forest statistics. They describe a collective as a whole (forest) but tell us very little about its individual members (trees).

To understand subjects in a group we, of course, have to look at those cases in the data. Trellis charts are ideal for examining individual cases. A trellis chart—also known as a lattice chart, panel chart, or small multiples—is a set of statistical graphs that have been arranged in rows and columns. To save space the graphs’ axes are consolidated in the trellis chart’s margins. Vertical axes appear in the chart’s left margin and the horizontal axes in the bottom or top margin or both.

Take a look at the chart below which presents data from agricultural experiments done in Minnesota in the 1930’s. It happens that the data depicted there are famous because legendary statistician R. A. Fisher published them in his classic 1935 book, The Design of Experiments. Viewing the data in a trellis chart helped AT&T Bell Laboratories statistician William Cleveland discover an error in the original data that went undetected for decades. The story of this discovery both opens and concludes Cleveland’s 1993 book Visualizing Data.4

The core message of Cleveland’s book is one I’ve echoed here and here: Good data visualization practices can help reveal things about data that would otherwise remain hidden.5
Trellis Chart Example

Trellis chart depicting 1930′s agricultural experiments data.
Source: www.trellischarts.com.  Click to see larger image.

At the left side of the chart notice that a list of items (these are barley seed varieties) serves as labels for the vertical axes for three graphs in the top row. The list is repeated again as axes labels for the graphs in the second row. On the bottom of the chart repeated numbers (20 to 60) form the horizontal scales for the two graphs in each column. The layout of a trellis chart provides more white space so that the eye can concentrate on the plotted data alone, in this case circles depicting experimental results for 1931 and 1932.

With multiple graphs arranged side by side a trellis chart makes it easy to compare how different cases (aka research subjects) compare on a single measure. The chart below shows more about how this works using library data:

Demo trellis chart

Trellis chart example with library collection expenditures data.  Click for larger image.

The chart presents collection expenditures as a percent of total operating expenditures from 2007 to 2011. The cases are selected libraries as labeled. Notice how easy it is to identify the line shapes—like the humped lines of Atlanta, Baltimore, Cuyahoga Co., and Hawaii. And the bird-shapes of Brooklyn and Hennepin Co. And the District of Columbia’s inverted bird. Trellis charts make it easy to find similarities among individual trends, such as the fairly flat lines for Baltimore Co., Broward Co., Cincinnati, Denver, and King Co. Nevertheless, the charts presented here are more about identifying distinct patterns in single graphs. Each graph tells a unique story about a given library’s changes in annual statistics.

Incidentally, the trellis charts to follow have been adapted slightly to accommodate cases with exceptionally high data values. Instead of appearing in alphabetical order with other libraries in the chart, graphs for cases with high values appear in the far right column as shown in this one-row example:

Trellis Chart Adaptation

Row from trellis chart with high value graph shaded and in red.
 Click for larger image.

Notice that the graph at the right is shaded with its vertical axis clearly labeled in red, whereas the vertical axes for the non-shaded/black-lettered graphs appear at the left margin of the chart row. In this post all shaded/red-lettered graphs have scaling different from the rest of the graphs in the chart. By using extended scaling just for libraries with high values, the scaling for the rest of the libraries can be left intact.6

With that explanation out of the way, let’s look for some stories about these 44 urban and county libraries beginning with total operating expenditures:

Oper Expend Chart #1

Chart #1 interactive version

Oper Expend Chart #2

Chart #2 interactive version

Total Operating Expenditures.  Click charts for larger images. Click text links for interactive charts.

Take a moment to study the variety of patterns among the libraries in these charts. For instance, in chart #1 Brooklyn, Broward County, Cleveland, and Cuyahoga Co. all had expenditure levels that decreased significantly by 2011. Others like Denver, Hawaii, Hennepin Co., Houston, Multnomah Co., and Philadelphia had dips in 2010 (the Great Recession officially ended in June of the prior year) followed by immediate increases in 2011. And others like Boston, Orange Co. CA, San Diego Co., and Tampa had their expenditures peak in 2009 and decrease afterwards.

Now look at collection expenditures in these next two charts. You can see, for instance, that these dropped precipitously over the 4-year span for Cleveland, Los Angeles, Miami, and Queens. For several libraries including Atlanta, Baltimore, and Columbus expenditures dipped in 2010 followed by increases in 2011. Note also other variations like the stair-step upward trend of Hennepin Co., Houston’s bridge-shaped trend, the 2009 expenditure peaks for King Co., Multnomah, San Diego Co., and Seattle, and Chicago’s intriguing sideways S-curve.

Coll Expend Chart #1

Chart #1 interactive version

Coll Expend Chart #2

Chart #2 interactive version

Collection Expenditures.  Click charts for larger images. Click text links for interactive charts.

Again, with trellis charts the main idea is visually scanning the graphs to see what might catch your eye. Watch for unusual or unexpected patterns although mundane patterns might be important also. It all depends on what interests you and the measures being viewed.

Once you spot an interesting case you’ll need to dig a little deeper. The first thing to do is view the underlying data since data values are typically omitted from trellis charts. For instance, I gathered the data seen in the single graph below for New York:

NYPL Coll Expend

Investigating a trend begins with gathering detailed data. Click for larger image.

The example trellis chart presented earlier showed collection expenditures as a percent of total operating expenditures. This same measure is presented in the next charts for all 44 libraries, including links to the interactive charts. Take a look to see if any trends pique your curiosity.

Coll Expend as pct chart #1

Chart #1 interactive version

Coll Expend as pct chart #2

Chart #2 interactive version

Percent Collection Expenditures .  Click charts for larger images. Click text links for interactive charts.

Exploring related measures at the same time can be revealing also. For example, collection expenditure patterns are made clearer by seeing how decreases in these compare to total expenditures. And how collection expenditures as a percentage of total expenditures relate to changes in the other two measures. The charts below make these comparisons possible for the 4 libraries mentioned earlier—Cleveland, Los Angeles, Miami, and Queens:

Multiple collection measures

Chart #1 interactive version

Multiple measures with data values

Chart #2 interactive version

Understanding collection expenditure trends via multiple measures. Chart #1, trends alone. Chart #2, data values visible.  Click charts for larger images. Click text links for interactive charts.

The next step is analyzing the trends and comparing relevant figures, with a few calculations (like percentage change) thrown in. Cleveland’s total expenditures fell continuously from 2007 to 2011, with a 20% cumulative decrease. The library’s collection expenditures decreased at nearly twice that rate (39%). As a percent of total expenditures collection expenditures fell from 20.4% to 15.6% over that period. Still, before and after the recession Cleveland outspent the other three libraries on collections.

From 2007 to 2010 Los Angeles’ total expenditures increased by 6% to $137.5 million, then dropped by 18% to $113.1 million. Over the 4-year span this amounted to a 13% decrease. For that same period Los Angeles’ collection expenditures decreased by 45%.

By 2010 Miami’s total expenditures had steadily increased by 38% to $81.8 million. However, in 2011 these fell to $66.7 million, a 17% drop from 2010 level but an increase of 13% over the 2007 level. Miami’s collection expenditures decreased by 78% over from 2007 to 2011, from $7.4 million to $1.6 million.

Total expenditures for Queens increased by 17% from 2007 to 2009, the year the Great Recession ended. However, by 2011 these expenditures dropped to just below 2007 levels, a 2% cumulative loss over the 4 years and a 19% loss from the 2009 level. From 2007 to 2011, though, Queens collection expenditures declined by 63% or $7.3 million.

Talk about data telling stories! Three of the 4 libraries had percent of total expenditures spent on collections decrease to below 6% in the aftermath of the recession. To investigate these figures futher we would need to obtain more information from the libraries.

As you can see, trellis charts are excellent tools for traipsing through a data forest, chart by chart and tree by tree. Obviously this phase takes time, diligence, and curiosity. Just 44 libraries and 5 years’ worth of a half-dozen measures produces a lot of data! But the effort expended can produce quite worthwhile results.

If your curious about other interesting trends, the next two sets of charts show visits and circulation for the 44 urban and county public libraries. Looking quickly, I didn’t see much along the lines of unprecedented demand for services. Take a gander yourself and see if any stories emerge. I hope there isn’t bad news hiding there. (Knock on wood.)

Visits

Chart #1 interactive version

Visits chart #2

Chart #2 interactive version

Visits.  Click charts for larger images. Click text links for interactive charts.

Circ Chart #1

Chart #1 interactive version

Circ Chart #2

Chart #2 interactive version

Circulation.  Click charts for larger images. Click text links for interactive charts.

 
—————————

1   The 2007 through 2010 expenditure data presented here have been adjusted for inflation. The data have been re-expressed as constant 2011 dollars using the GDP Deflator method specified in IMLS Public Libraries in the United States Survey: Fiscal Year 2010 (p. 45). For example, because the cumulative inflation rate from 2007 to 2011 was 6.7%, if a library’s total 2007 expenditures were $30 million in 2007, then for this analysis that 2007 figure was adjusted to $32 million.
   Standardizing the dollar values across the 4-year period studied is the only way to get an accurate assessment of actual expenditure changes. A 2% expenditure increase in a year with 2% annual inflation is really no expenditure increase. Conversely, a 2% expenditure decrease in a year with 2% annual inflation is actually a 4% expenditure decrease.
2   Online Computer Library Center, Perceptions of Libraries, 2010: Context and Community, p. 44.
3   In any data analysis where you have to create categories you end up drawing lines somewhere. To define large urban libraries I drew the line at $30 million total operating expenditures. Then, I based this on inflation adjusted figures as described in footnote #1. So any library with unadjusted total operating expenditures equal to or exceeding $28.2 million in 2007 was included.
4   See anything unusual in the chart? (Hint: Look at the chart labeled Morris.) The complete story about this discovery can be found here. Page down to the heading Barley Yield vs. Variety and Year Given Site. See also William S. Cleveland’s book, Visualizing Data, pp. 4-5, 328-340.
5   Using ordinary graphical tools statistician Howard Wainer discovered a big mistake in data that were 400+ years old! His discovery is described in his 2005 book, Graphic Discovery: A Trout in the Milk and Other Visual Adventures. Wainer uncovered anomalies in data appearing in an article published in 1710 by Queen Anne’s physician, John Arbuthnot. The original data were registered christenings and burials collected in England from 1620 to 1720 at the orders of Oliver Cromwell. See Wainer, H. Graphic Discovery, 2005, pp.1-4.
6   The chart below illustrates how a larger scale affects the shapes of a trend line. The scale in the left graph ranges from $25M to $100M, while the scale of the right graph ranges from $25M to $200M. Because the left graph scaling is more spacious (smaller scaling), its trend line angles are more accentuated.

Different Axes Example

Click for larger image.

Posted in Advocacy, Data visualization, Library statistics | Tagged , , , , , | Leave a comment

Roughly Wrong

I decided to move right on to my first 2014 post without delay. The reason is the knot in my stomach that developed while viewing the Webjunction webinar on the University of Washington iSchool Impact Survey. The webinar, held last fall, presented a new survey tool designed for gathering data about how public library patrons make use of library technology and what benefits this use provides them.

Near the end of the webinar a participant asked whether the Impact Survey uses random sampling and whether results can be considered to be statistically representative. The presenter explained that the survey method is not statistically representative since it uses convenience sampling (a topic covered in my recent post). And she confirmed that the data only represent the respondents themselves. And that libraries will have no way of knowing whether the data provide an accurate description of their patrons or community.

Then she announced that this uncertainty and the whole topic of sampling were non-issues, saying, “It really doesn’t matter.” She urged attendees to set aside any worries they had about using data from unrepresentative samples, saying these samples portray “real people doing these real activities and experiencing real outcomes.” And that the samples provide “information you can put into use.”

As well-meaning as the Impact Survey project staff may be, you have to remember their goal is selling their product, which they just happen to have a time-limited introductory offer for. Right now the real issues of data accuracy and responsible use of survey findings are secondary or tertiary to the project team. They could have chosen the ethical high road by proactively discussing the strengths and weaknesses of the Impact Survey. And instructing attendees about appropriate ways to interpret the findings. And encouraging their customers to go the extra mile to augment the incomplete (biased) survey with data from other sources.

But this is not part of their business model. You won’t read about these topics on their website. Nor were they included in the prepared Webjunction presentation last fall. If the issue of sampling bias comes up, their marketing tactic is to “comfort” (the presenter’s word) anyone worried about how trustworthy the survey data are.

The presenter gave two reasons for libraries to trust data from unrepresentative samples: (1) A preeminent expert in the field of program evaluation said they should; and (2) the University of Washington iSchool’s 2010 national study compared its convenience sample of more than 50,000 respondents with a smaller representative sample and found the two samples to be pretty much equivalent.

Let’s see whether these are good reasons. First, the preeminent expert the presenter cited is Harry P. Hatry, a pioneer in the field of program evaluation.1  She used this quote by Hatry: “Better to be roughly right than to be precisely ignorant.”2  To understand Hatry’s statement we must appreciate the context he was writing about. He was referring mainly to federal program managers who opted to not survey their users at all rather than attempt to meet high survey standards promoted by the U.S. Office of Management and Budget. Hatry was talking about the black-and-white choice of high methodological rigor versus doing nothing at all. The only example of lower versus higher precision survey methods he mentioned is mail rather than telephone surveys. Nowhere in the article does he say convenience sampling is justified.

The Impact Survey team would have you believe that Hatry is fine with public agencies opting for convenient and cheap data collection methods without even considering the alternatives. Nevertheless, an Urban Institute manual which Hatry served as advisor for, Surveying Clients About Outcomes, encourages public agencies to first consider surveying their complete roster of clientele. If that is not feasible, public agencies should then use a sampling method that makes sure findings “can be projected reliably to the full client base.”3  The manual does not discuss convenience sampling as an option.

Data accuracy is a big deal to Hatry. He has a chapter in the Handbook of Practical Program Evaluation about using public agency records in evaluation research. There you can read page after page of steps evaluators should follow to assure the accuracy of the data collected. Hatry would never advise public agencies to collect whatever they can, however they can, and use it however they want regardless of how inaccurate or incomplete it is. But that is exactly the advice of the Impact Survey staff when they counsel libraries that sample representativeness doesn’t really matter.

The Impact Survey staff would like libraries to interpret roughly right to mean essentially right. But these are two very different things. When you have information that is roughly right, that information is also roughly wrong. (Statisticians call this situation uncertainty, and the degree of wrongness, error.) The responsibility of a quantitative analyst here is exactly that of an information professional. She must assess how roughly right/wrong the information is. And then communicate this assessment to users of the information so they can account for this in their decision-making. If they do not consider the degree of error in their data, the analyst and decision-makers are replacing Hatry’s precise ignorance with the more insidious ignorance of over-confidence in unvetted information.4

The second reason the presenter gave for libraries not worrying about convenience samples was an analysis from the 2010 U.S. Impact Public Library Study. She said that study researchers compared their sample of 50,000+ self-selected patrons with another sample they had which they considered to be representative. They found that patterns in the data from the large convenience sample were very similar to those in the small representative sample. She explained, “Once you get enough data you start seeing a convergence between what is thought of as a representative sample…and what happens in a convenience sample.”

So, let me rephrase this. You start by attracting thousands and thousands of self-selected respondents from the population you’re interested in. And you continue getting more and more self-selected respondents added to this. When your total number of respondents gets really large, then the patterns in this giant convenience sample begin to change so that they now match patterns found in a small representative sample drawn from that same population. Therefore, very large convenience samples should be just as good as fairly small representative samples.

Assuming this statistical effect is true, how would this help improve the accuracy of small convenience samples at libraries that sign up for the Impact Survey? Does this statistical effect somehow trickle down to the libraries’ small samples, automatically making them the equivalent of representative samples? I don’t think so. I think that, whatever statistical self-correction occurred in the project’s giant national sample, libraries using this survey tool are still stuck with their small unrepresentative samples.5

While it is certainly intriguing, this convergence idea doesn’t quite jibe with the methodology of the 2010 study. You can read in the study appendix or in my prior post about how the analysis worked in the opposite direction. The researchers took great pains to statistically adjust the data in their convenience sample (web survey) in order to counter its intrinsic slantedness. Using something called propensity scoring they statistically reshaped the giant set of data to align it with the smaller (telephone) sample, which they considered to be representative. All of the findings in the final report were based on these adjusted data. It would be very surprising to learn that they later found propensity scoring to be unnecessary because of some statistical effect that caused the giant sample to self-correct.

As you can see, the Impact Survey staff’s justifications for the use of convenience sampling aren’t convincing. We need to rethink the idea of deploying quick-and-easy survey tools for the sake of library advocacy. As currently conceived, these tools require libraries to sacrifice certain of their fundamental values. Gathering and presenting inaccurate and incomplete data is not something libraries should be involved in.

 
—————————

1   The presenter said Hatry “wrote the book on evaluation.” Hatry is legendary in the field of program evaluation. But the book on evaluation has had numerous co-authors and still does. See Marvin Alkin’s 2013 book, Evaluation Roots.
2   The complete quotation is, “I believe that the operational principle for most programs is that it is better to be roughly right than to be precisely ignorant.” Hatry, H.P. (2002). Performance Measurement: Fashions and Fallacies, Public Performance & Management Review, 25:4, 356.
3   Abravanel, M.D. (2003). Surveying Clients About Outcomes, Urban Institute, Appendix C.
4   Yes, convenience samples produce unvetted information. They share the same weakness that focus groups have. Both data collection methods provide real information from real customers. But you take a big risk assuming these customers speak for the entire target group you hope to reach.
5   As I mentioned in my recent post, there is a known statistical effect that can make a library’s convenience sample perfectly match a representative sample drawn from the population of interest. This effect is known as luck or random chance. Just by the luck of the draw your convenience sample could, indeed, end up exactly matching the data from a random sample. The problem is, without an actual random sample to cross-check this with your library will never know whether this has happened. Nor how lucky the library has been!

Posted in Advocacy, Probability, Research, Statistics | 3 Comments

Wasting Time Bigtime

We all know that the main function of libraries is to make information accessible in ways that satisfy user needs. Following Ranganathan’s Fourth Law of Library Science, library instructions guiding users to information must be clear and simple in order to save the user’s time. This is why library signage avoids exotic fonts, splashy decorations, and any embellishments that can muddle the intended message. Library service that wastes the user’s time is bad service.

So I am baffled by how lenient our profession is when it comes to muddled and unclear presentations of quantitative information in the form of data visualizations. We have yet to realize that the sorts of visualizations that are popular nowadays actually waste the user’s time—bigtime!  As appealing as these visualizations may be, from an informational standpoint they violate Ranganathan’s Fourth Law.

Consider the data visualization shown below from the American Library Association’s (ALA) Digital Inclusion Study:

Digital Inclusion Total Dash

ALA Digital Inclusion Study national-level dashboard. Click to access original dashboard.

This visualization was designed to keep state data coordinators (staff at U.S. state libraries) informed. The coordinators were called upon to encourage local public libraries to participate in a survey conducted last fall for this study. The graphic appears on the project website as tool for monitoring progress of the survey state by state.

Notice that the visualization is labeled a dashboard, a data display format popularized by the Balanced Scorecard movement. The idea is a graphic containing multiple statistical charts, each one indicating the status of an important dimension of organizational performance. As Stephen Few observed in his 2006 book, Information Dashboard Design, many dashboard software tools are created by computer programmers who know little to nothing about the effective presentation of quantitative information. Letting programmers decide how to display quantitative data is like letting me tailor your coat. The results will tend towards the Frankensteinian. Few’s book provides several scary examples.

Before examining the Digital Inclusion Study dashboard, I’d like to show you a different example, the graphic appearing below designed by the programmers at Zoomerang and posted on The Center for What Works website. It gives you some idea of the substandard designs that programmers can dream up:1   

What Works Chart

Zoomerang chart posted on http://www.whatworks.org. Click to see larger version.

The problems with this chart are:

  • There are no axis labels explaining what data are being displayed. The data seem to be survey respondents’ self-assessment of areas for improvement based on a pre-defined list in a questionnaire.
  • There is no chart axis indicating scaling. There are no gridlines to assist readers in evaluating bar lengths.
  • Long textual descriptions interlaced between the blue bars interfere with visually evaluating bar lengths.
  • 3D-shading on the blue bars has a visual effect not far from something known as moiré, visual “noise” that makes the eye work harder to separate the visual cues in the chart. The gray troughs to the right of the bars are extra cues the eye must decipher.
  • The quantities at the far right are too far away from the blue bars, requiring extra reader effort. The quantities are located where the maximum chart axis value typically appears. This unorthodox use of the implied chart axis is confusing.
  • The questionnaire items are not sorted in a meaningful order, making comparisons more work.

We should approach data visualizations the way we approach library signage. The visualizations should make the reader’s task quick and easy—something the Zoomerang chart fails at. Here’s a better design:2

What Works Revision

Revision of original (blue) Zoomerang chart posted above. Click to see larger version.

WARNING:  Beware of statistical, graphical, and online survey software. Nine times out of ten the companies that create this software are uninformed about best practices in graphical data presentation. (This applies to a range of vendors, from Microsoft and Adobe to upstart vendors that hawk visualization software for mobile devices.) Indiscriminate use of these software packages can cause you to waste the user’s time.

The Digital Inclusion Study dashboard appearing at the beginning of this post wastes the user’s time. Let’s see how. Note that the dashboard contains three charts—a gauge, line chart, and map of the U.S. The titles for these are imprecise, but probably okay for the study’s purposes (assuming the state data coordinators were trained in use of the screen). Still, for people unfamiliar with the project or users returning to this display a year later, the titles could be worded more clearly. (Is a goal different from a target? How about a survey submission versus a completion?)

Understandability is a definite problem with the map’s color-coding scheme. The significance of the scheme is likely to escape the average user. It uses the red-amber-green traffic signal metaphor seen in the map legend (bottom left). With this metaphor green usually represents acceptable/successful performance, yellow/amber, borderline/questionable performance, and red, unacceptable performance.

Based on the traffic signal metaphor, when a state’s performance is close to, at, or exceeds 100%, the state should appear in some shade of green on the map. But you can see that this is not the case. Instead, the continental U.S.is colored in a palette ranging from light reddish to bright yellow. Although Oregon, Washington, Nevada, Michigan, and other states approach or exceed 100% they are coded orangeish-yellow.3  And states like Colorado, North Carolina, and Pennsylvania, which reported 1.5 to 2 times the target rate, appear in bright yellow.

This is all due to the statistical software reserving green for the highest value in the data, namely, Hawaii’s 357% rate. Generally speaking, color in a statistical chart is supposed to contain (encode) information. If the encoding imparts the wrong message, then it detracts from the informativeness of the chart. In other words, it wastes user time—specifically, time spent wondering what the heck the coding means!

Besides misleading color-coding, the shading in the Digital Inclusion Study dashboard map is too subtle to interpret reliably. (The dull haze covering the entire map doesn’t help.) Illinois’ shading seems to match Alabama’s, Michigan’s, and Mississippi’s, but these three differ from Illinois by 13 – 22 points. At the same time, darker-shaded California is only 5 points lower than Illinois.

The Digital Inclusion map’s interactive feature also wastes time. To compare data for two or more states the user must hover her device pointer over each state, one at a time. And then remember each percentage as it is displayed and then disappears.

Below is a well-designed data visualization that clarifies information rather than making it inaccessible. Note that the legend explains the color-coding so that readers can determine which category each state belongs to. And the colors have enough contrast to allow readers to visually assemble the groupings quickly—dark blue, light blue, white, beige, and gold. Listing the state abbreviations and data values on the map makes state-to-state comparisons easy.

BEA_GDP Map

A well-designed data visualization. Source: U.S. Bureau of Economic Analysis. Click to see larger version.

This map is definitely a time saver!

Now let’s turn to an…er…engaging feature of the ALA dashboard above—the dial/gauge. To the dismay of Stephen Few and others, dials/gauges are ubiquitous in information dashboards despite the fact that they are poor channels for the transmission of information. Almost always these virtual gadgets obscure information rather than reveal it.4  Meaning, again, that they are time wasters.

The gauge in the dashboard above presents a single piece of data—the number 88. It is astonishing that designers of this virtual gadget have put so many hurdles in the way of users trying to comprehend this single number. I hope this bad design comes from ignorance rather than malice. Anyway, here are the hurdles:

  1. The dial’s scaling is all but invisible. The dial is labeled, but only at the beginning (zero) and end (100) of the scale, and in a tiny font. To determine values for the rest of the scale the user must ignore the prominent white lines in favor of the obscured black lines (both types of lines are unlabelled). Then she has to study the spacing to determine that the black lines mark the 25, 50, and 75 points on the dial. The white lines turn out to be superfluous.
  2. The needle is impossible to read. The green portion of the banding causes the red tick-marks to be nearly invisible. The only way to tell exactly where the needle is pointing is by referring to the ‘88’ printed on the dial, a requirement that renders the needle useless.
  3. The uninitiated user cannot tell what is being measured. The text at the center of the image is masked at both edges because it has been squeezed into too small a space. And the gauge’s title is too vague to tell us much. I am guessing that the dial measures completed survey questionnaires as a percentage of some target quantity set for the U.S. public libraries that were polled. (And, honestly, I find it irritating that the 88 is not followed by a percent symbol.)
  4. The time period for the data depicted by the gauge is unspecified. Not helpful that the line chart at the right contains no scale values on the horizontal axis. Or, technically, the axis has one scale value—the entirety of 2013. (Who ever heard of a measurement scale with one point on it?) The dial and line chart probably report questionnaires submitted to date. So it would be especially informative for the programmers to have included the date on the display.
  5. Although the red-amber-green banding seems to be harmless decoration, it actually can lead the reader to false conclusions. Early on in the Digital Inclusion Study survey period, submissions at a rate of, say, 30%, would be coded ‘unacceptable’ even though the rate might be quite acceptable. The same misclassification can occur in the amber region of the dial. Perhaps users should have been advised to ignore the color-coding until the conclusion of the survey period. (See also the discussion of this scheme earlier in this post.)

The graphic below reveals a serious problem with these particular gauges. The graphic is from a second dashboard visible on the Digital Inclusion Study website, one that appears when the user selects any given U.S. state (say, Alaska) from the dashboard shown earlier:

Digital Inclusion Alaska Chart

ALA Digital Inclusion Study state-level dashboard. Click to see larger version.

Notice that this dashboard contains five dials—one for the total submission rate for Alaska (overall) and one for each of four location categories (city, suburban, town, and rural). While the scaling in all five dials spans from 0% to 100%, two of the dials—city and town—depict quantities far in excess of 100%. I’ll skip the questions of how and why the survey submission rate could be so high, as I am uninformed about the logistics of the survey. But you can see that, regardless of the actual data,the needles in these two gauges extend only a smidgen beyond the 100% mark.

Turns out these imitation gauges don’t bother to display values outside the range of the set scaling, which, if you think about it, is tantamount to withholding information.5  Users hastily scanning just the needle positions (real-life instrument dials are designed for quick glances) will get a completely false impression of the data. Obviously, the gauges are unsatisfactory for the job of displaying this dataset correctly.

So now the question becomes, why use these gauges at all? Why not just present the data in a single-row table? This is all the dials are doing anyway, albeit with assorted visual aberrations. Besides, there are other graphical formats capable of displaying these data intelligently. (I won’t trouble you with the details of these alternatives.)

One point about the line chart in the Alaska (state-level) dashboard. Well, two points, actually. First, the weekly survey submission counts should be listed near the blue plotted line—again, to save the user’s time. Second, the horizontal axis is mislabeled. Or, technically, untitled. The tiny blue square and label are actually the chart legend, which has been mislocated. As it is, its location suggests that both chart axes measure survey completions, which makes no sense. The legend pertains only to the vertical axis, not to the horizontal. The horizontal axis represents the survey period measured in weeks. So perhaps the label “Weeks” would work there.

In charts depicting a single type of data (i.e. a single plotted line) there is no need for a color-coded legend at all. The sort of detail that software programmers will know nothing about.

Finally, a brief word about key information the dashboard doesn’t show—the performance thresholds (targets) that states had to meet to earn an acceptable rating. Wouldn’t it be nice to know what these are? They might provide some insight into the wide variation in states’ overall submission rates, which ranged from 12% to 357%. And the curiously high levels seen among the location categories. Plus, including these targets would have required the dashboard designers to select a more effective visualization format instead of the whimsical gauges.

Bottom line, the Digital Inclusion Study dashboard requires a lot of user time to obtain a little information, some of which is just plain incorrect. Maybe this is no big deal to project participants who have adjusted to the visualization’s defects in order to extract what they need. Or maybe they just ignore it. (I’m still confused about the purpose of the U.S. map.)

But this a big deal in another way. It’s not a good thing when nationally visible library projects model such unsatisfactory methods for presenting information. Use of canned visualizations from these software packages is causing our profession to set the bar too low. And libraries mimicking these methods in their own local projects will be unaware of the methods’ shortcomings. They might even assume that Ranganathan would wholeheartedly approve!

 
—————————

1   Convoluted designs by computer programmers are not limited to data visualizations. Alan Cooper, the inventor of Visual Basic, describes how widespread this problem is in his book, The Inmates Are Running the Asylum: Why High Tech Products Drive Us Crazy and How to Restore the Sanity.
2   Any chart with closely spaced bars can be subject to moiré, especially when bold colors are used. Pastel shades, like the tan in this chart, help minimize this.
3   Delaware also falls into this category and illustrates the distortion intrinsic to maps used to display non-spatial measures. (Shale deposit areas by state is a spatial measure; prevalance of obesity by state is a non-spatial measure.) Large states will be visually over-emphasized while tiny states like Delaware and Rhode Island struggle to be seen at all.
4   My favorite example, viewable in Stephen Few’s blog, is how graphic artists add extra realism as swatches of glare on the dials’ transparent covers. These artists don’t think twice about hiding information for the sake of a more believable image.
5   This is extremely bad form—probably misfeasance—on the part of the software companies. More responsible software companies, like SAS and Tableau Software, are careful to warn chart designers when data extend beyond the scaling that chart designers define.

Posted in Data visualization | Leave a comment

Strength in Numbers

I want to tell you about a group of U.S. public libraries that are powerhouses when it comes to providing services to the American public. You might suppose that I’m referring to the nation’s large urban and county systems that serve the densest populations with large collections and budgets. These are the libraries you’d expect to dominate national library statistics. However, there’s a group of libraries with modest means serving moderate size communities that are the unsung heroes in public library service provision. These are libraries with operating expenditures ranging from $1 million to $4.9 million.1   Due to their critical mass combined with their numbers (there are 1,424 of them) these unassuming libraries pack a wallop in the service delivery arena.

Their statistical story is an interesting one. Let me introduce it to you by means of the patchwork graphic below containing 6 charts known as treemaps.

MeasMapsClrD_540

Click to view larger graphic.

From a data visualization standpoint treemaps (and pie charts also) have certain drawbacks that were identified in my prior post.2 Still, treemaps do have their place when used judiciously. And their novelty and color are refreshing. So, let’s go with them!

At first glance, treemaps are not that easy to decipher. Let me offer a hint to begin with and then follow with a fuller explanation. The hint: Notice how prominent the gold rectangles are among the 6 treemaps shown above. As the graph legend indicates, gold represents the $1 million to $4.9 million expenditure group that this post is focused on. (I purposely color-coded them gold!) And the story is about the appearance and meaning of these gold rectangles. Or more exactly, the rectangles representing this expenditure group—however the rectangles are colored. (Below you’ll see I also use monochrome-shaded treemaps to tell the story.)

Now let’s see how treemaps work. Treemaps are like rectangular pie charts in that they use geometrical segments to depict parts-of-a-whole relationships. In other words, treemaps present a categorical breakdown of quantitative data (not the statistician). A single treemap represents 100% of the data and the categories are represented by inset rectangles rather than pie wedges. The sizes of treemap segments reflect the data quantities. In some cases treemaps also use color to represent data quantities, as this green treemap does:

LibbyExpAreaD_450

Number of Libraries by Expenditure Group  
Click to view larger interactive chart.

Before getting to the quantitative aspects of this green chart, let me explain that the ballooned text is an interactive feature of Tableau Public, the statistical software used to generate the chart. If you would, click the treemap now to view the interactive version. At the top right of that chart is a legend indicating how color-shading works. Also, below the treemap is a bar chart displaying percentages data—the same figures visible on the treemap balloons.

In monochrome treemaps like the green one above the largest and darkest rectangle represents the highest number in the data, and the smallest and lightest represents the lowest number. The largest rectangle is always located at the top left of the treemap and the smallest at the bottom right. All tree maps, including the 6 charts above, follow this top-left-to-right-bottom arrangement. But only monochrome treemaps use color-shading in addition to rectangle sizing to portray quantities. With multi-color treemaps color is used instead to encode the data categories, in our case, library expenditure ranges.

In all of the treemaps included in this post each inset rectangle represents one of the total operating expenditure ranges listed in the first column of this table:

Table-LibsPopbyExp

Data Source: Public Libraries in the U.S. Survey, 2011, Institute of Museum & Library Services.3

Although these expenditure categories pertain to quantities (dollar ranges), remember that categories are always qualitative, that is, non-numerical.4  To emphasize the fact that the categories are non-numerical, in the table their labels have letter prefixes.

In the green treemap the rectangles are arranged according to the number of public libraries falling within each expenditure range, left to right as described above. However, this order does not apply to the bar chart appearing below the interactive version of the treemap. That bar chart is instead sorted by the category ranges low to high.

The largest rectangle in the treemap is the $50K or less category. Hovering the pointer over the $50K or less rectangle in the interactive chart or looking at the corresponding bar in the bar chart shows this category’s percentage as 21.1%. Similarly, 16.1% of public libraries fall under the next-largest rectangle which represents the $400K – $999.9K category. And the categories with the third and fourth largest rectangles, $1.0M – $4.9M and $100K – $299.9K, account for 15.4% and 14.7% of the total number of libraries. (The complete data appear in the table above.)

This next treemap (below in blue) depicts population data for each expenditure category. To see detailed figures click on the chart. (Again, a bar chart appears below the treemap there.) Hover the pointer over the top left rectangle (the 1.0 million to $4.9 million category) of the interactive treemap. Notice that this category serves the highest population amount, 88.4 million or nearly 29% of the total U.S. population served by public libraries. Next is the $10.0 million to $49.9 million group, the rectangle just below. Libraries in this category served 83 million or 27% of the population in 2011.

PopbyExpArea_450

Population Served by Expenditure Group
Click to view larger interactive chart.

Now the question is, how do these two measures—U.S. public library counts and population served—relate to each other? The next bar chart offers an answer:

PopLibsbyExpBar_400

Click to view larger interactive chart.

Again, click on the bar chart to see the interactive version. Then, click on the legend (Libraries / Population Served) to highlight one set of bars at a time. You’ll see that from left to right the libraries percentages (green bars) drop, remain fairly level, and then drop again. The population served percentages (blue bars) swoop up from left to right to the $1.0 million to $4.9 million category, step down, up, and then down again.

These trends are not surprising since we know that the smallest libraries serve the smallest communities on the smallest budgets. And that these communities are very numerous. Likewise, the largest libraries serve the largest communities, which are few in number. But it is surprising that the $1.0 million to $4.9 million category serves as large of a swath of the population as it does. And that the next highest libraries expenditure-wise, the $5.0 million to $9.9 million group, does not keep up with this first group. (I’m curious about why this would be the case.)

In a moment I’ll get to other key measures for this spectacular $1.0 million to $4.9 million category of libraries. But first I thought I would lay out the population data differently by looking at how libraries are distributed independent of library expenditures. Just as a reminder of how it works. The chart below shows the distribution of public libraries among 11 population categories labeled on the horizontal axis. Adding up the percentages for the left 5 bars, you see that 77% of public libraries serve communities with less than 25,000 population. Note in the bar chart (and treemap in the interactive version) that the 10K to 24.99K population group contains the most libraries.5

PopOnlyBarD_480

Distribution of Population Served
Click to view larger interactive chart.

Okay. Now lets look at total operating expenditures by expenditure category in the purple chart here:

OperExpbyExpArea_450

Total Expenditures by Expenditure Category
Click to view larger interactive chart.

In this chart the two left-most rectangles look identical in size, don’t they? Click on the interactive version and you can see that the $10 million to $49.9 million and the $1.0 million to $4.9 million groups each account for more than $3 billion in annual public library operating expenditures. And their expenditure levels are nearly equal. The $10 million to $49.9 million group outspends the $1.0 million to $4.9 million group only by 1.2% ($38 million).

Where service provision is concerned, however, the $1.0 million to $4.9 million libraries shine. First, as seen in the interactive version of the chart below, their 2011 total visits surpassed the $10 million to $49.9 million group by 2% or 18 million. Granted, if we were to combine all libraries with expenditures exceeding $10 million into a single category, that category would win out. But the point here is that the 1,424 members of the $1.0 million to $4.9 million group are able to generate library services at nearly the same level as the largest urban libraries in the country. Without a doubt, the productivity of these moderate size libraries is substantial.

visitsbyexpareaD_450

Total Visits by Expenditure Category
Click to view larger interactive chart.

On circulation, however, the $10 million to $49.9 million libraries out-perform the $1.0 million to $4.9 million group. The former group accounts for 4% more of total U.S. public library circulation than the latter. These larger libraries account for 31.6% of all circulation nationwide, compared to the $1.0 million to $4.9 million group which accounts for 27.6%. (Click on the gray chart below to view these and other figures.)

circbyexpareaD_450

Total Circulation by Expenditure Category
Click to view larger interactive chart.

Yet circulation is the only major output measure where the $1.0 million to $4.9 million libraries play second fiddle to libraries from the other expenditure categories. Besides total visits, our 1,424 libraries excel in total program attendance and public Internet computer users. The next (olive) treemap shows the 7% margin (6.5 million) for total program attendance this group holds over the second-place group.

attenbyexpareaD_450

Total Program Attendance by Expenditure Category
Click to view larger interactive chart.

The final treemap below gives data on public Internet computer users. Again these middling libraries exceed the $10 million to $49.9 million libraries by 2.5% or about 8.3 million computer users. Rather startling that this group of libraries would outpace the large and well-equipped libraries of the nation in the delivery of technology services to communities.

pitsusrbyexpareaD_450

Total Public Computer Users by Expenditure Category
Click to view larger interactive chart.

To recap the data presented here let’s revisit the 6 multi-color treemaps introduced at the beginning of this post. We can see the gold rectangle is the largest among all expenditure groups for population, visits, program attendance, and public Internet computer users. And it is 2nd highest in operating expenditures and circulation.

As I mentioned, the standing of the largest expenditure categories could be enhanced by merging the $10 million to $49.9 million and the $50 million or more categories into a single category. (Of course, any boundary within this wide range of expenditures would be arbitrary.) Even so, the $1.0 million to $4.9 million group would still show a strong presence, leaving its next largest peers, the $5.0 million to $9.9 million category, in the dust. No matter how you slice the data, the $1.0 million to $4.9 million group is a major player in national library statistics. Now we need to think of some appropriate recognition for them…

 
—————————

1   Based on the Public Libraries in the United States Survey, 2011, Institute of Museum and Library Services.
2   It was Willard Brinton who identified the problem in his 1914 book. In my prior post scroll down to the sepia graphic of squares arranged laterally. There you see Brinton’s words, “The eye cannot fit one square into another on an area basis so as to get the correct ratio.” Bingo. With treemaps this is even more problematic since a single quantity in the data can be represented by different-shaped but equivalent rectangles—stubby ones or more elongated ones. You’ll see in the examples that it is impossible to visually determine which of two similarly-sized rectangles is larger. This difficulty also applies to pie wedges.
3   For purposes of this post I used only libraries reporting to the Institute of Museum and Library Services in 2011 that were located in the continental U.S., Alaska, and Hawaii.
4   The expenditure groups are examples of categorical data. Other examples are geographical regions of the U.S. and library expenditure types (collection, staffing, technology, capital, and so forth). Categorical data are also called nominal data or data on a nominal scale.
5   For detailed information about the statistical and geographic distributions of small libraries see the new report, The State of Small and Rural Libraries in the United States, IMLS Research Brief. No. 5., Sept. 2013.

Posted in Data visualization, Measurement, Statistics | Tagged , , | Leave a comment

Quadruple Your Statistical Knowledge In One Easy Lesson

Even with the promises of big data, open data, and data hacking it is important to remember that having more data does not necessarily mean being more informed. The real value of data, whatever its quantity or scope, comes from the question(s) the data can help answer.

There are various reasons any given set of data might or might not provide reliable answers, the most basic being the data’s accuracy. Clever new technologies that scan, scrape, geocode, or mobilize loads of data aren’t much use if the data are wrong. All we end up with is scanned, scraped, geocoded, and mobilized misinformation. Garbage in-garbage out, as they say.

Getting from data to answers requires understanding the meaning of the data and its relevance to our questions. With statistical data much of this meaning and relevance depends on three particular ideas:

  1. How the data were selected
  2. The group/population researchers are trying to learn about
  3. How these two relate

I am here to tell you that if you master these ideas your statistical knowledge will immediately quadruple! Okay. I admit my estimate of learning gain could be inflated (but maybe not). In any case, the importance of these ideas cannot be exaggerated. British statistician T.M.F. Smith called them “the most basic concepts in statistics” in his 1993 presidential address to the Royal Statistical Society. He also said:

In statistics data are interesting not for their own sake but for what they tell us about the phenomenon that they represent, and specifying the target population and the selection mechanism should be the starting point for any act of statistical inference.1

Although you may have already learned these ideas in research and statistics courses, I encourage you to revisit your understanding of them. I say so because your recall may have been colored by misinformation on this topic appearing in library literature and national research projects. I discuss some of this misinformation further on.

In the meantime, let’s explore these ideas using a library example: Suppose we are interested in learning about consumer demand for e-books in the U.S. And we happen to have access to a big datafile of e-book circulation for all U.S. public libraries—titles, authors, call numbers, reserve lists, e-reader devices, patron demographics, and the like. We analyze the data and then issue this pronouncement:

Ebook demand in the U.S. is highest for these genres: romance novels, self-improvement, sci-fi/fantasy, biographies, and politics/current events.

Is our pronouncement correct? Not very. The list of genres is a poor reflection of consumer demand for e-books, first, because our data describe only public library borrowers instead of all e-book consumers. (Our datafile did not tap demand among e-book purchasers.) Second, the list is probably inaccurate for another reason. In the libraries’ collections, the proportions of e-books in the genres are likely to differ from those for all e-books published nationally. Demand for one genre, say e-biographies, that are underrepresented in library holdings will be understated compared with demand for e-biographies among U.S. consumers as a whole. So, besides giving a slanted view of consumer behavior, the e-book datafile is also slanted in terms of the genres consumers have access to in the first place.

Third, the pronouncement is inaccurate even when we limit our question just to demand among library e-book borrowers. The small number of available e-book copies will have made it impossible for some borrowers to check out the e-books they wanted when they wanted them. This user demand will not necessarily be accounted for in the e-book datafile.

The reasons just given for doubting the pronouncement are all related to how the e-book data were collected in the first place—the selection mechanism using Professor Smith’s term. Understanding how collection methods affect what data can and cannot tell us is the knowledge-quadrupling information I’m talking about. Here is that information in a nutshell:

The way that data are selected either support or detract from the validity of conclusions drawn. Thus, data selection directly affects the accuracy of answers gleaned from the data. Inaccuracy due to data selection, called selection bias, comes from slantedness and/or incompleteness of data. This bias occurs when certain types of subjects/respondents are systematically over- or under-represented in the data. Relying on biased data is usually risky and sometimes irresponsible.

Most library and information science professionals do not understand selection bias. Nor are they well-informed about survey sampling best practices. And, as I mentioned, some library literature and national projects have misinformed readers about these topics. I’d like to discuss a few examples as a way to clear up some of the confusion.

One example is an article in a peer-reviewed library journal about the generalizability2 of survey findings (also known as external validity). The researchers wondered whether specific user traits were so common that they were very likely true for all academic libraries. User traits is a term I have devised as shorthand for attributes, behaviors, or trends detected in survey or other library data. (It’s not an official term of any sort, nor did the researchers use it.) A trait might be something like:

The average length of time undergraduate students spend in university libraries is markedly shorter for males than for females.

The researchers figured that if a trait like this one were found to be true in surveys conducted at theirs and a dozen or so peer libraries, then it should be true across the board for all academic libraries. They proceeded to identify several uniform traits detected in multiple surveys conducted at theirs and their peer libraries. (Thus, their study was a survey of surveys.) They ended up advising other academic libraries not to bother studying these traits on behalf of their home institutions. Instead, the other libraries should just assume these traits would hold true exactly as they they occurred at the libraries that had already done the surveys.

This is bad advice. The researchers’ sample of library survey results was too limited. They reached out only to the dozen or so libraries that were easily accessible. Choosing study subjects this way is called convenience sampling. Almost always convenience samples are poor representations of the larger group/population of interest. (There is another type of convenience sampling called a self-selected sample. This is when researchers announce the availability of a survey questionnaire and then accept any volunteers who show up to take it. We’ll revisit this type of slanted sampling further on.)

The best way to avoid selection bias in our studies is the use of random (probability) sampling. Random sampling assures that the subjects selected provide a fair and balanced representation of the larger group/population of interest. The only thing we can surmise from a convenience (nonprobability) sample is that it represents the members which it is composed of.

Because they used an unrepresentative (nonprobability) sample rather than a representative (probability) sample, the researchers in the example above had no grounds for claiming that their findings applied to academic libraries in general.

Before moving to the next example some background information is necessary. I suspect that library researchers have taken statistics courses where they learned certain statistical rules-of-thumb that, later on, they end up mis-remembering. As you might expect, this leads to trouble.

Statistics textbooks usually talk about two basic types of statistics, descriptive and inferential. Descriptive statistics are summary measures calculated from data, like means, medians, percentages, percentiles, proportions, ranges, and standard deviations. Inferential statistics (also known as statistical inference) have to do with extrapolating from the sample data in order to say something about the larger group/population of interest. This is exactly what we’ve been discussing already, being able to generalize from a sample to a population (see footnote #2). This amounts to inferring that patterns seen in our samples are fair estimates of true patterns in our target groups/populations. Drawing representative samples provides the justification for this inference.

Inferential statistics also entail a second type of inference related to how random chance can cause apparent patterns in sample data that are not likely to be true in the larger population. This more esoteric form of inference involves things like hypothesis testing, the null hypothesis, statistical significance, and other convoluted issues I’ve written about before here. It so happens that statistics textbooks often recommend the use of random (probability) sampling in studies when researchers intend to conduct statistical signficance testing, a rule-of-thumb that may have confused researchers in this next example.

This example is a study of public library programs published in a peer-reviewed journal in which researchers acknowledged their use of convenience (nonprobability) sampling. It seems they focused on the textbook recommendation I just mentioned in the prior paragraph. They apparently reasoned, “Since it’s bad form to apply statistical significance testing to a convenience sample, we won’t do that. Instead, we’ll stick to descriptive statistics for our sample data.” They proceeded to report means and medians and percentages and so on, but then announced these measures as applicable to U.S. public libraries in general. The researchers abided by the esoteric statistical rule, yet ignored the more mainstream rule. In fact, its importance qualifies this rule for the knowledge-quadrupling category and is stated here:

Without representative sampling there is no justification for portraying survey findings as applicable (generalizable) to the larger population of interest.

Library organizations violate this rule every time they post a link to an online survey on their website. Inviting anyone and everyone to respond, they end up with the biased self-selected sample described already. Then, as in the prior example, they report survey findings to users and constituents as if these were accurate. But they are not. Because this practice is so common, it has become respectable. Nevertheless, the promotion of misinformation is unethical and—unless you work in advertising, marketing, public relations, law, or politics—professionally irresponsible.

Textbook rules-of-thumb may have also been a factor in this next example. By way of introduction, recall that we collect a sample only because it is impossible or impractical to poll every member from the group/population of interest. When it is possible to poll all members of the population, the resulting survey is called a census or a complete enumeration. If we have the good fortune of being able to conduct a census, then we do that, of course.

Unfortunately, researchers publishing in a peer-reviewed library journal missed this opportunity even though they happened to have a complete enumeration of their population—an electronic file, actually. Instead of analyzing all of the data in this datafile, for some reason (probably recalling cookbook steps from a statistics course) the researchers decided to extract a random sample from it. Then they analyzed the sample data and wrote the article based only on that data. Because these researchers relied on a portion rather than the entire dataset, they actually reduced the informativeness of the original data. They failed to understand this:

The more the composition of a sample matches the larger group/population, the more accurate measures taken from the sample (means, medians, percentages, and so on) are. As a corollary, a larger representative sample is better than a smaller representative sample because its composition typically matches the larger population more closely. When a sample is the equivalent of a census of the entire population, the sample is perfectly accurate (generally speaking).

And, yes, I should also add:

A small representative sample is better than a large unrepresentative sample. And an unrepresentative sample is possibly better than not conducting the survey at all, but (a) not by very much and (b) only if our luck is reasonably good. If our luck is bad, measures from the unrepresentative sample will be totally wrong, in which case not conducting the survey is the better option. (Better to be guided by no information than wrong information.)

If your library always has good luck, then it should by all means use an unrepresentative sampling method like convenience sampling. You can explain to the library’s constituents how the library’s consistently good luck justifies this use.

Now onto a final case of unadulterated selection bias from a few years back. I believe the high visibility of this study justifies naming the organizations that were involved in its production. My purpose is to remind readers that the status of an institution and the quality of its data analysis practices are not necessarily related. Which, in a way, is good news since it means humble organizations with limited credentials and resources can learn data analysis best practices and outperform the big guys!

So to the story. This is about the study funded by a $1.2 million grant from the Bill & Melinda Gates Foundation to the Online Computer Library Center (OCLC) entitled, From Awareness to Funding, published in 2008.3  The surveys used in the study were designed and conducted by the internationally acclaimed advertising firm, Leo Burnett USA (a la MadMen!).4

For this study Leo Burnett researchers surveyed two populations, U.S. voters and U.S. elected officials. Survey respondents from the voter group were selected using a particular type of probability sampling. (This is good, at least on the surface.) The resulting sample consisted of 1900 respondents to an online questionnaire. The elected officials sample was made up of self-selected respondents to invitations mailed to subscribers of Governing, a professional trade journal. In other words, elected officials were selected via a convenience sample. (This is bad.) Nationwide, 84 elected officials completed Leo Burnett USA’s online questionnaire. (This is not so good either.)

Roughly, there are 3,000 counties in the U.S. and 36,500 cities, towns, townships, and villages.5  Let’s say each entity has on average 3 elected officials. Thus, a ballpark estimate of the total count of elected officials in the U.S. is 112,500. To omit officials in locales with no public library let’s just round the figure down to 100,000.

High-powered Leo Burnett USA settled for a very low-powered and quite unreliable sample—84 self-selected officials—to represent a population of about 100,000. The OCLC report acknowledged this deficiency, noting:

Due to the process by which respondents were recruited, they represent a convenience sample that is quantitative but not statistically representative of all local elected officials in the United States.6

Professional standards for marketing researchers caution against misrepresenting the quality of survey samples. The Code of Marketing Research Standards obliges marketing researchers to:

Report research results accurately and honestly… [and to] provide data representative of a defined population or activity and enough data to yield projectable results; [and] present the results understandably and fairly, including any results that may seem contradictory or unfavorable.7

So, the responsible thing for Leo Burnett USA to do was to announce that reporting any details from the 84 respondents would be inappropriate due to the inadequacy of the sample. Or, in light of marketing research professional standards, they could have made an effort to draw a probability sample to adequately represent the 100,000 U.S. elected officials—perhaps stratified by city/town/township/village.

But alas, Leo Burnett USA and presumably the OCLC authors chose a different strategy. First, as a technicality, admit the sample’s deficiency in the report text (their quotation above). Then, ignore both the deficiency and the admission by portraying the data as completely trustworthy. As a result, an entire chapter in the OCLC report is devoted to quite unreliable data. There you will find 18 charts and tables (an example is shown below) with dozens of interesting comparisons between U.S. elected officials and U.S. voters, thoughtfully organized and confidently discussed.

OCLCFundingCh3Chart

Source: From Awareness to Funding, OCLC, Inc. p. 3-5.

So what’s not to like? Well, we might dislike the fact that the whole thing is a meaningless exercise. When you compare data that are accurate (like the purple-circled 19.9 voters figure above) with data that are essentially guesswork (like the blue-circled 19.0 elected officials figure above), the results are also guesswork! This is elementary subtraction which works like this:

WildAssGuessSubraction

The Wild-Ass Answer is off by however much the Wild-Ass Guess is! This isn’t necessarily part of the knowledge-quadrupling information I mentioned. But it’s handy to know. You can see another example here.

As I said, this 2008 OCLC report was promoted far and wide. And the dubious elected officials data were showcased in the GeekTheLibrary initiative (the $5+ million sequel to the 2008 OCLC study) as shown in these banners that appeared on the initiative’s website:

GeekTheLibElectedOfficialsBanners

Banners posted on http://www.geekthelibrary.org.

Due to selection bias the statements in both banners are pure speculation. Incidentally, the GeekTheLibrary initiative was, shall we say, data-driven. It was designed based on findings from the 2008 OCLC study. We can only hope that there weren’t very many program strategies that relied on the study’s insights into U.S. elected officials.

That, of course, is the problem with unrepresentative survey samples. They are likely to produce unreliable information. If our objective is accurate and unbiased information then these samples are too risky to use. If our objective is going through the motions to appear data-driven and our audiences can’t verify our data on their own, then we can use these samples with no worries.

 
—————————

1   Smith, T.M.F. (1993). Populations and selection: Limitations of statistics, Journal of the Royal Statistical Society – Series A (Statistics in Society), 156(2), 149.
2    Generalizability refers to the extent to which we have ample grounds for concluding that patterns observed in our survey findings are also true for the larger group/population of interest. Other phrases describing this idea are: Results from our sample also apply to the population at large; and we can infer that patterns seen in our sample also exist in the larger population. Keep reading, as these ideas are explained throughout this blog entry!
3   De Rosa, C. and Johnson, J. (2008). From Awareness To Funding: A Study of Library Support in America, Dublin, Ohio: Online Computer Library Center.
4   On its Facebook page Leo Burnett Worldwide describes itself as “one of the most awarded creative communications companies in the world.” In 1998 Time Magazine described founder Leo Burnett as the “Sultan of Sell.”
5   The OCLC study purposely omitted U.S. cities with populations of 200,000 or more. Based on the 2010 U.S. Census there are 111 of these cities. For simplicity, I had already rounded the original count (36,643) of total cities, towns, townships, and villages to 36,500. This 143 adjustment cancels out the 111 largest U.S. cities, making 36,500 a reasonable estimate here.
6   De Rosa, C. and Johnson, J. (2008), 3-1. The phrase “sample that is quantitative” makes no sense. Samples are neither quantitative nor qualitative, although data contained in samples can be either quantitative or qualitative. The study researchers also misunderstand two other statistical concepts: Their glossary definition for convenience sample confuses statistical significance with selection bias. These two concepts are quite different.
7   Marketing Research Association. (2007). The Code of Marketing Research Standards, Washington, DC: Marketing Research Association, Inc., para. 4A; italics added.

Posted in Accountability, Library assessment, Measurement, Research, Statistics | Tagged , , , , , | 1 Comment