Improving Tabular Displays

Embed Size (px)

Citation preview

  • 8/12/2019 Improving Tabular Displays

    1/31

    Improving Tabular Displays, with NAEP Tables as Examples and Inspirations

    Author(s): Howard WainerSource: Journal of Educational and Behavioral Statistics, Vol. 22, No. 1 (Spring, 1997), pp. 1-30Published by: American Educational Research Associationand American Statistical AssociationStable URL: http://www.jstor.org/stable/1165236

    Accessed: 25/10/2010 17:43

    Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at

    http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless

    you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you

    may use content in the JSTOR archive only for your personal, non-commercial use.

    Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained athttp://www.jstor.org/action/showPublisher?publisherCode=aera.

    Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed

    page of such transmission.

    JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of

    content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms

    of scholarship. For more information about JSTOR, please contact [email protected].

    American Educational Research AssociationandAmerican Statistical Associationare collaborating with

    JSTOR to digitize, preserve and extend access toJournal of Educational and Behavioral Statistics.

    http://www.jstor.org

    http://www.jstor.org/action/showPublisher?publisherCode=aerahttp://www.jstor.org/action/showPublisher?publisherCode=astatahttp://www.jstor.org/stable/1165236?origin=JSTOR-pdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/action/showPublisher?publisherCode=aerahttp://www.jstor.org/action/showPublisher?publisherCode=aerahttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/stable/1165236?origin=JSTOR-pdfhttp://www.jstor.org/action/showPublisher?publisherCode=astatahttp://www.jstor.org/action/showPublisher?publisherCode=aera
  • 8/12/2019 Improving Tabular Displays

    2/31

    Journal of Educational and Behavioral StatisticsSpring 1997, Vol. 22, No. 1, pp. 1-30

    Improving TabularDisplays, With NAEPTables as Examples and InspirationsHowardWainerEducational Testing Service

    Keywords: graphical comprehension, NAEP, tabular displaysThe modem world is rich with data; an inability to effectively utilize thesedata is a realhandicap.Onecommonmodeof data communications theprinteddata table. In this article we provide our guidelinesthe use ofwhich can make tables more effective and evocative data displays. Weusethe National Assessment of EducationalProgress both to provide inspirationfor the development of these guidelines and to illustrate their operation.We also discuss a theoretical structure to aid in the development of testitems to tap students' proficiency in extracting informationfrom tables.

    In his 1786 atlas of England and Wales, William Playfair wrote of theincreasing complexity of modem life. He pointed out that when life wassimpler and datawere less abundant,an understandingof economic structurewas both more difficult to formulate and less importantfor success. But bythe end of the 18th century, this was no longer true. Statistical offices hadbeen established and hadbegun to collect dataon which political andcommer-cial leaders could base their decisions. Yet the complexity of these dataprecluded their easy access by any but the most diligent.Playfair's genius was in surmountingthis difficulty throughhis marvelousinvention of statistical graphsand charts.The complexity of life within 18th-century Britain and the massiveness of available data were but trifles incomparison to today's complex network of data sources and topics. Thesedata are being transformedinto graphic forms at a breathless pace.Today we have the need to clearly and accurately display summaries ofhuge amounts of information.Computingequipment,software, andelectronicnetworks provide the means to summarize information and disseminateresults. What we lack is a broad understandingof how best to do it. In thisarticle we examine the data table as a communicative display and suggestfour steps which, if followed, can allow tables to communicate better.Inthis reportwe focus principallyon thetable because we heartilysubscribeto the notion that although accuratedata can help us to understand he world,they can help only

    if they are properly nterpreted. herecan be no assuranceof a properinterpretation,owever,unlessthe arrangementf the data on the printedpage is clear,logical, complete,andproperly ocused. . .. Incidentally,tis ourconviction, ested n experience,hat anguage lows moreeasilyand

  • 8/12/2019 Improving Tabular Displays

    3/31

    Wainerlogically from the pen of him whose tabulateddata reflect carefulandprecise thinking. Walker& Durost, 1936,p. iii)

    I have two goals in this article. The primaryone is to provide concreteguidelines for constructing more communicative tables. My second goal ismore limited. Because tables are used so often to communicate information,it is generally felt that the ability to understand tables is an importantskillfor a literate citizen. It is not uncommon to find tables used as stimuli withinmany tests of basic reading or mathematical skills. Thus my second goal isto illustrate how improved tabular presentation immediately lends itself toexpanding the range of test questions that can be asked.For reasons that will become clearer by the end of this article, we havechosen to intertwine a general discussion of tabulardisplay with a discussionof the use of tables within the National Assessment of Educational Progress(NAEP)'-both as stimuli in test items and as communicative media inpublished reports.To build any effective display we must have a firm notion of purpose. Wecannot know what the best answers are unless we know what the questionsare. Thus we must first understand what questions will be asked of thedata. Any discussion of data display in the abstract is pointless. To aid inunderstanding the intended purposes for tabular displays, we shall look atthe sorts of test questions that NAEP pairs with data tables. It does not seemfar-fetchedto assumethat the sorts of questionsNAEP expertsbelieve childrenought to be able to answer from tables are the same sorts that everyone elseshould be facile with.This exercise is what naturally led us to the second goal, oftentimesintermingled in this report with the first, of expanding the range of testquestions that can be posed when data are well displayed. The advice offeredhere comes from many sources, filtered throughme. To the extent that I addanything to the wisdom of those who have preceded me on this path, it isin the attention devoted to depicting error.

    TabularPresentationGetting nformationroma table s likeextracting unlightroma cucumber.(Farquhar& Farquhar,891,p. 55)

    The disdain shown by the two 19th-century economists quoted abovereflected a minority opinion at that time. As commonly prepared, tables,spoken of so disparagingly by the Farquhars,remainto a large extent worthyof contempt.Before we explore ways to improve a tabular display, it is wise to beexplicit about the likely audience and goals of the display. In this reportweexamine tables within NAEP that are aimed at three separate audiences:children, the lay public, and education professionals. While it might seemthat this diversity of audience and associated goals ought to yield quitedifferent structures or their displays, it appearsthatthe requirementsof their2

  • 8/12/2019 Improving Tabular Displays

    4/31

  • 8/12/2019 Improving Tabular Displays

    5/31

    Wainerin NAEPreports, lthough heyareoftenexpected o servearchivalpurposesas well. Tablesas Part of NAEPItemsExample 1: 1992 12th Grade Math, Questions 3 and 4

    This exampleshowshow roundingableentriesmakesa difference.Theoriginal able on whichQuestions3 and4 werebasedis:POPULATIONSOFDETROITANDLOS ANGELES

    1920-1970CityYear Detroit Los Angeles1920 950,000 500,0001930 1,500,000 1,050,0001940 1,800,000 1,500,0001950 1,900,000 2,000,0001960 1,700,000 2,500,0001970 1,500,000 2,800,000Thetwo questions omitting he alternatives ffered)were:

    3. HowmanymorepeoplewerelivinginLosAngelesin 1960 than1940?4. Whatwas thefirstyearlisted in whichthepopulation f Los Angeleswas greater hanthe population f Detroit?If we roundto two digits (the nearesthundred housand)and tidy up thedisplaya bit, we get:

    PopulationsForTwo CitiesYears: 1920-1970Units:MillionsYear Detroit Los Angeles1920 1.0 0.51930 1.5 1.11940 1.8 1.51950 1.9 2.01960 1.7 2.51970 1.5 2.8

    4

  • 8/12/2019 Improving Tabular Displays

    6/31

    ImprovingTabularDisplaysThe answerto Question3 is clearly 1 million,and the answer o Question4 is 1950.It awaitsempiricalverificationwhether his is easier thanbeforerevision,butmyintuitionandmy twelve-year-oldon)certainly uggestsso.WhydidIsuggestroundingotwodigits?Letusexplore his in adiscussionof the first ruleof table construction:

    Rule I. Round-a lot This is for threereasons:"*Humanscannotunderstandmorethantwo digits very easily."*Wecan almostneverustifymore han wodigitsof accuracytatistically."*We almost nevercareaboutaccuracyof more than two digits.

    Let us take each of these reasonsseparately.Understanding.Consider the statement"This year's school budget is$27,329,681."Who can comprehend r remember hat?If we rememberanything, t is almostsurelythe translation Thisyear's school budgetisabout27 million dollars."Statisticalustification.Thestandardrror f most statistics s proportionalto 1 over the squareroot of the samplesize.2God did this, and there isnothingwe can do to changeit. Thus supposewe would like to reportacorrelation s .25. If we don't wantto reportsomething hat is inaccurate,we must be surethatthe seconddigit is reasonably ikely to be 5 and not6 or 4. To accomplishthis we need the standard rrorto be less than.005. Butsincethestandardrrors proportionalo 1/ , the obviousalgebra(1/, n- .005, therefore - 1/.005 = 200) yields the inexorable con-clusionthata samplesize of the orderof 2002or40,000 is requiredojustifythepresentationf morethana two-digitcorrelation.A similarargumentanbe made for mostotherstatistics.Whocares?I recently awa tableof average ife expectancieshatproudlyreportedhe mean ife expectancyof a male at birth n Australiao be 67.14years. What does the 4 mean? Each unit in the hundredths igit of thisoverzealous eportageepresents days.Whatpurposes served n knowinga life expectancy o this accuracy?Formostcommunicativenot archival)purposes,67 would havebeenenough.Theeffects of too manydigitsis sufficientlypernicious hatI would liketo emphasize he importance f roundingwith another hortexample.Thefollowing equation is taken from State Court Caseload Statistics: AnnualReport,1976 (CourtStatisticsProject,1976):

    In (DIAC) = -.10729131 + 1.00716993 x In (FIAC),whereDIAC is the annualnumberof case dispositions,and FIAC is theannualnumberof case filings.This is obviouslythe result of a regressionanalysiswith anovergenerousutput ormat.Usingthestandard rrorustifi-cation for roundingwe see that to justifythe eightdigits shownwe would

    5

  • 8/12/2019 Improving Tabular Displays

    7/31

    Wainerneeda standard rror hat s of theorderof .000000005,or a samplesize oftheorderof 4 X 10'6. This is a verylargenumberof cases-the populationof Chinadoesn'tputa dentin it. Theactualn is the number f states,whichallowsone digitof accuracyat most. If we round o one digitandtransformout of the log metricwe arriveat the morestatisticallydefensibleequation

    DIAC= .9 FIAC.This can be translatedntoEnglishas "There reabout90%as manydisposi-tions as filings."Obviously heequation hat s more defensiblestatisticallyis also much easierto understand.My colleagueAl Biderman,who knowsmoreabout courtsthanI do, suggested hat we needed to round urther,othe nearest nteger DIAC= FIAC),and so a more correct tatementwouldbe "Thereare aboutas many dispositionsas filings."A minute'sthoughtaboutthe courtprocessremindsone thatit is a pipelinewithfilingsat oneend and dispositionsat the other.They mustequal one another,and anyvariationn annualstatisticsreflectsonly the vagariesof the calendar.Thesortof numericalophistry emonstratednthefirstequation angivestatisti-cians a badname.3Example2: 1990 8th and 12th Grade ScienceAssessment

    Any redesign askmustfirsttryto developan understandingf purpose.Thepresentationf thedataset in Table1musthave been intended o helpthe readeranswersuchquestionsas:(1) What s thegeneralevel(inhours)of batteryife forthebrandshosen?(2) Howdo thebatterybrands ifferwithrespect otheir ifeexpectancies?What'sthe best one?Theworst?(3) Whatkindsof equipment sebatteries pmostquickly?Leastquickly?

    TABLETablepairedwithItems21 and 22 on the 1990 8th and 12thgradescienceassessmentBatteryLife in HoursBattery Cassette Portable

    Brands Player Radio Flashlight ComputerConstantCharge 5 19 10 3PowerBat 7 24 13 5Servo-Cell 4 21 12 2Never Die 8 28 16 6Electro-Blaster 10 26 15 46

  • 8/12/2019 Improving Tabular Displays

    8/31

    Improving TabularDisplays(4) Are thereany unusual nteractionsbetweenequipmentand batterybrand?

    These areobviouslyparallel o thequestions hatareordinarily ddressednthe analysisof any multifactorialable--overall level, row, column, andinteraction ffects.By characterizinghe information n the tablein this way we areabletoexplicitly lay out areas of questionsthatmightbe askedabout these datain an effort to determine he extent to whichstudentscan understand atapresented n a table.In fact, there were threequestionsthat followed thistable,butonly one asked aboutthedata,and it was parallel o Question2:21. On thebasisof the informationnthetable,whichbranddoyouthinkis the bestall-purposebattery?Assumeall batteries ost the same.)

    The nextquestionasked abouthow the studentmadethis determination:22. Brieflyexplainhow you used the informationn the table to makeyourdecision.

    Beforegoing further, inviteyou to readTable1 carefullyandsee to whatextentyou can answer he fourquestions.Butdon'tpeekaheadThe entries n thistable arealready ounded, o we cango directly o thesecond ruleof tableconstruction:

    Rule II. Order the rows and columns in a way that makes sense.Alphabetical rder s almostneverthe bestwayto go. Threeusefulwaystoorderthedataare:"*bysize.Oftenwe lookmostcarefullyatwhat s ontopand ess carefullyfurtherdown.Put thebiggestthingfirst.Also, orderingby someaspectof the data often reflectsorderingby some hiddenvariablethat canbe inferred."*naturally.Timeis ordered rom thepastto thefuture.Showingdatainthatordermelds well withwhat heviewermightexpect.Thisis alwaysa good idea."*according o interest.If we are especially interested n comparingaparticularet of rowsor columns,putthemadjacent o one another.Table2 is a redoneversionof Table1inwhichbatteriesrows)areorderedby battery ife in a radio,with the longest-lastingbatteryfirst. Typesofequipmentcolumns)areorderedbyhowquickly heyuse upbatteries,eastvoracious irst.From his we see thatby orderingby radiouse we havealsoordered or flashlights.Thereis some minorshufflingwithinthe cassetteplayerandcomputerolumns.Nowthat hetable s ordered, nsweringNAEPQuestion21 is easy,as aremost othermaineffectquestions.Wecan improvematters till furtherby rememberinghe thirdrule:

    7

  • 8/12/2019 Improving Tabular Displays

    9/31

    WainerTABLE 2First revisionof battery ife table(rowsand columnsordered, xtraneouslines removed)

    BatteryLife in HoursBattery Cassette PortableBrands Radio Flashlight Player ComputerNever Die 28 16 8 6Electro-Blaster 26 15 10 4PowerBat 24 13 7 5Servo-Cell 21 12 4 2ConstantCharge 19 10 5 3Rule III. ALL is different and important.Summaries f rowsandcol-umnsare important s a standardor comparison-theyprovidea measureof usualness. What summarywe use to characterize ll dependson thepurpose.Sometimesa sum or a meanis suitable,more often a median.Butwhatever s chosen it shouldbe visuallydifferent rom theindividual ntriesand set spatiallyapart.The summariesmeans)surrounding able 3 makethe row and columneffectsexplicit.Now we notonlysee that theNever Die batteryhebest allaround,butwe havea measureof how muchbetter t is. Wealso see thatacomputeruses batteriesabout6 timesas fast as a radio.Can we go further?Sure. To see how requires hat we considerwhatdistinguishes tablefromagraph.A graphusesspacetoconvey nformation.A table uses a specific iconic representation.We have madetables moreunderstandabley usingspace-making a tablemore like a graph.We canimprove ablesfurther y making hemmoregraphicaltill. A semigraphicaldisplaylike the stem-and-leaf iagram Tukey,1977) is merelya table inwhichthe entriesarenotonly orderedbutarealso spacedaccording o thesize of the gapsbetweenadjacent ows or columns.Therulethenis:

    TABLESecondrevisionof battery ife table(rowand columnmeansshownandemphasized)BatteryLifeinHours

    Battery Cassette Portable BatteryBrands Radio Flashlight Player Computer AveragesNeverDie 28 16 8 6 15Electro-Blaster 26 15 10 4 14PowerBat 24 13 7 5 12Servo-Cell 21 12 4 2 10ConstantCharge 19 10 5 3 9Usageaverages 24 13 7 4 128

  • 8/12/2019 Improving Tabular Displays

    10/31

    Improving TabularDisplaysRule IV. Add spacing to aid perception. If there is a clustering amongrows or columns, space them so that they look clustered. To put this notion

    into practice, consider the next version of Table 1, shown as Table 4.The rows have been spaced according to what appear to be significantgaps (Wainer& Schacht, 1978), and we see thatbatteriesfall into two groups:three relatively strong batteries and two weaker ones. This yields a table thatis about as good as we can do. Now we can see that a battery lasts abouttwice as long in a radio as in a flashlight, and about twice as long in aflashlight as in a cassette player. Moreover, we see clearly that the threebestbatteries yield about 50% more life than the two worst.This brings us to an interesting issue. NAEP Questions 21 and 22 couldbe answered trivially if the table were transformedas we have done in Table4. Should we transformthe table? The way in which we have structuredthetable is not based on the particularquestions that were asked, but ratherongeneral rules for all tables. We would have done it in exactly the same wayhad we not seen the questions. This transformationmerely follows a set ofrules that characterizes good practice. The original table was flawed in thatit didn't conform to standards of good practice.Basing a characterizationof an examinee's ability to understand a datadisplay on a question paired with a flawed display is akin to characterizingsomeone's ability to readby asking questions about a passage full of spellingand grammaticalerrorswhose sentences were orderedhaphazardly.What arewe really testing?One might say that we are examining whetheror not someone can under-stand what is de facto "out there."I have some sympathy with this view, butwhat is the relationship between the ability to understand lliterate prose andthe ability to understand proper prose? If we measure the former, do weknow anything more about the latter? Yet how often do we encounter wellmade displays in the everyday world? Should we be testing what is, or whatshould be?

    TABLE4Third evisionof battery ife table(rowsspacedto accentuatebatteryclusters)BatteryLife in HoursBattery Cassette Portable BatteryBrands Radio Flashlight Player Computer AveragesNeverDie 28 16 8 6 15Electro-Blaster 26 15 10 4 14PowerBat 24 13 7 5 12

    Servo-Cell 21 12 4 2 10ConstantCharge 19 10 5 3 9Usageaverages 24 13 7 4 129

  • 8/12/2019 Improving Tabular Displays

    11/31

    WainerA morepracticalproblems that f a display s properly onstructed,mostcommonlyaskedquestionsareeasilyanswered.That s thenature f graphicsandhuman nformation rocessingability. t is hardero asknontrivial ues-tionsof a well constructedable. This is not an isolated ssue.I will discussit furthern the conclusionof this article.While we cannothope to resolvethese issueshere,I would like to addone vote toward estingliteracywith prosethatis correctlycomposedandtestingnumeracywith datadisplaysthat conform o acceptedstandards fgood practice. f we do otherwisewe maybe ableto connectour testwithcommonpractice,but is that what we wish to do?In theconcluding ectionof thisarticle willdiscuss he kindsof questions

    that can be constructed nd suggesta theoretical tructure hatwill aid infuture ests of this sort.Example3: 1992 4th, 8th, and 12th GradeMathAssessment

    Original Table RevisedTableTen Students'Test scores Ten Students'Test scores

    Student Score Student ScoreA 88 C 91B 65 A 88C 91 H 85D 36E 72 E 72F 57 B 65G 50 I 62H 85 F 57I 62 G 50J 48 J 48

    D 36Mean 65Question9, associatedwith the abovetable,is as follows.9. Thetableabove showsthescoresof 10students n a finalexamination.What s the rangeof thesescores?(thenfouroptions)

    To answer hisquestion,one needs to knowthat therange s thedifferencebetweenthe largestand the smallestentries,find them,and then subtractthem.A properlypreparedable,whichorders he rows by the dataratherthan some arbitraryetter,removesthe need for the second step.4Also,introducingpaceswheretherearedatagaps(invisible n theoriginal able)providesthe opportunityo ask other,deeperquestionsaboutthe structureof thesedata.10

  • 8/12/2019 Improving Tabular Displays

    12/31

    ImprovingTabularDisplaysBig Tables in NAEP Reports

    NAEP reportsare often mother odes of information,but sometimesittakesa considerable mountof effortto mine thatinformation.Onereasonthat such effort s requireds the formatof the datapresentation.t appearsthatsavingspace s sometimesviewedasa more mportant oalthaneffectivecommunication. et us examinea single largetable fromone majorNAEPreportand see how the application f the aforesaid ourrules can increaseits comprehensibility.he tablechosen sharesenoughof its characteristicswithothertablesto allow one example o be broadlygeneralizable.Example4: Table2.12 From Data Compendium or the NAEP 1992Mathematics ssessment f the NationandtheStates

    This table,reproduceds Table5, showstheaveragemathematics erfor-manceof eighth-gradexamineesfromall participatingurisdictionsn the1992 statemathematics ssessmentas a functionof parent's ducation.Alsoincludeds thepercentage f examinees neachstatewhoseparents' ducationis at each of the designated evels. As is customary,he standard rrorsofall figuresarepresentedn parentheses.Before we attempt o revise this table, it is wise to consider its likelypurpose. Why would anyonewant to see data like these? What sorts ofquestionswould such data answer?How easily could the readerof thistableanswer he same sorts of questions hatwereasked of children n theassessment?How hard s it to answera questionanalogous o Question21aboutwhat s thebestall-purpose atteryWhats thebestperformingtate?)?OroneanalogousoQuestion9 about herangeof scoresamong10children(What s therangeof performancesmong he41 participatingtates?)?Anyredesignshouldallow suchobviousquestions o be answered asily.Moregenerally,or thistable,as withmosttwo-waydisplays, hequestionsthat can be answeredare based on the factorspresented,o wit:(1) How did the children n each of thejurisdictionsperform n math?Whichstates did the best?Whichthe worst?How muchvariationsthereamongthe states?How does my statecomparewithothers ikeit?With henationasa whole?What s theclustering mong hestates?(2) Whatis the relationshipbetweenparental ducationand children'smathperformance?(3) Does parental ducationhavethe sameeffect in all jurisdictions?In addition, herearequestionsparallel o thesedealingwiththepercentageof childrenat eachparental ducationevel.(4) How well educatedare the parentsof these children n each of thejurisdictions?Whichstateshave thebesteducatedparents?Whichtheworst?Howmuchvariations thereamongthestates?Howdoes my

    11

  • 8/12/2019 Improving Tabular Displays

    13/31

    Wainerstate compare with others like it? With the nation as a whole? Whatis the clustering among the states?

    (5) Which level of parentaleducation is most common? Which is least?How much parentaleducation is "typical"?(6) Does the distribution of parental education have the same shape inall jurisdictions?After answering the above questions, we would like to be able to knowwhich differences we observe are possible artifacts of sampling fluctuationand which represent real differences in the populations of interest.

    TABLE5OriginalTable2.12fromDataCompendiumor the NAEP 1992MathematicsAssessment of the Nation and the States (p. 83): Average mathematicsproficiency by parents' highest level of education

    Grade 8 - 1992Some Education After Did Not Finish HighGraduated College High School Graduated High School School I Don't KnowPUBULIC Perce.nage Average Percentage Average Prcentagel Average Per rtage Average Percentage Average"SCHOOLS of Students Proficiency of Studens Proficiency of Studenr Proficienc of S ens Proficiency of Students Proficiency

    NATION 40 (1.4) 279 (1.4) 18 (0.6) 270 (1.2) 25 (0.8) 256 (1.4) 8 (0.6) 248 (1.8) 9 (0.5) 251 (1.7)Northeast 38 (3.1) 282 (4.2) 18 (1.1) 267 (3.0) 26 (2.2) 259 (4.2) 8 (0.9) 246 (4.2) 10 (1.2) 250 (3.3)Southeast 35 (1.9) 270 (1.9) 17 (0.8) 263 (2.0) 28 (1.4) 249 (1.9) 12 (1.6) 246 (4.2) 8 (1.0) 248 (4.3)Central 42 (2.7) 283 (2.9) 20 (1.4) 273 (1.6) 26 (1.7) 264 (2.3) 4 (0.7) ** ( ) 7 (0.8) 258 (3.8)West 43 (2.9) 279 (2.6) 18 (1.2) 274 (2.6) 19 (1.5) 252 (2.9) 9 (1.1) 248 (2.4) 11 (0.9) 248 (2.9)STATESAlabama 33 (1.6) 261 (2.5) 18 (0.7) 258 (2.0) 29 (1.1) 244 (1.8) 13 (0.9) 239 (2.0) 7 (0.6) 237 (2.9)Arizona 36 (1.5) 277 (1.5) 22 (1.0) 270 (1.5) 21 (0.9j 256 (1.6) 10 (0.7) 245 (2.5) 12 (0.8) 248 (2.7)Arkansas 30 (1.1) 264 (1.9) 20 (0.8) 264 (1.7) 31 (1.1) 248 (1.6) 11 (0.7) 246 (2.4) 8 (0.6) 245 (2.7)California 39 (1.8) 275 (2.0) 18 (1.0) 266 (2.1) 17 (0.9) 251 (2.1) 10 (0.9) 241 (2.2) 16 (1.1) 240 (2.9)Colorado 46 (1.2) 282 (1.3) 19 (0.9) 276 (1.6) 21 (0.9) 260 (1.5)> 6 (0.6) 250 (2.4) 7 (0.5) 252 (2.6)Connecticut 47 (1.3) 288 (1.0)> 16 (0.8) 272 (1.8) 22 (0.9) 260 (1.8) 6 (0.6) 245 (3.3) 9 (0.6) 251 (2.4)Delaware 39 (1.2) 274 (1.3) 18 (1.0) 268 (2.3) 30 (1.0) 251 (1.7) 6 (0.5) 248 (4.0) 8 (0.9) 248 (3.4)Dist. Columbia 32 (1.0) 244 (1.7) 17 (0.8) 240 (1.9) 29 (0.8) 224 (1.6) 9 (0.7) 225 (3.2) 12 (0.6) 229 (2.2)Florida 39 (1.5) 268 (1.9) 19 (0.7) 266 (1.9) 24 (1.1) 251 (1.8) 8 (0.7) 244 (2.7) 10 (0.7) 244 (3.21Georgia 35 (1.7) 271 (2.1) 18 (0.7) 264 (1.7) 30 (1.2) 250 (1.3) 11 (0.8) 244 (2.2) 6 (0.6) 245 (2.6)Hawaii 38 (1.1) 267 (1.5) 15 (0.9)< 266 (1.9) 25 (1.0) 246 (1.8) 6 (0.5) 242 (3.5) 16 (0.8) 246 (2.1)Idaho 48 (1.2) 281 (0.9) 20 (0.8) 278 (1.3) 19 (0.9) 268 (1.4)> 7 (0.5) 254 (2.3) 6 (0.5) 254 (2.8)Indiana 33 (1.5) 283 (1.5) 21 (0.9) 275 (1.9) 32 (1.1) 260 (1.6) 8 (0.6) 250 (2.6) 6 (0.5) 249 (3.3)Iowa 44 (1.4) 291 (1.2)> 21 (0.8) 285 (1.5) 25 (1.1) 273 (1.3) 4 (0.4) 262 (2.4) 5 (0.4) 266 (2.8)Kentucky 28 (1.4) 278 (1.6))) 19(0.8) 267 (1.6) 32 (0.9) 254 (1.6) 15 (0.9) 246 (1.7) 6 (0.4) 242 (2.8)Louisiana 32 (1.4) 256 (2.5) 20 (0.9) 259 (1.8) 30 (1.3) 242 (1.6) 10 (0.7) 237 (2.4) 7 (0.6) 236 (3.7)Maine 40 (1.5) 288 (1.4) 22 (1.0) 281 (1.5) 26 (1.1) 267 (1.1) 6 (0.5) 259 (2.7) 5 (0.5) 266 (2.6iMaryland 44 (1.7) 278 (1.8) 18 (0.9) 266 (1.9) 25 (1.2) 250 (1.8) 6 (0.8) 240 (3.7) 7 (0.5) 245 (3.8;Massachusetts 48 (1.5) 284 (1.3) 17 (0.8) 272 (1.8) 21 (1.0) 261 (1.4) 7 (0.6) 248 (3.2) 7 (0.6) 248 (2.6)Michigan 38 (1.6) 277 (2.2) 23 (0.9) 271 (2.0) 26 (0.9) 257 (1.7) 6 (0.5) 249 (2.0) 7 (0.6) 248 (3.0oMinnesota 48 (1.3)> 290 (1.0)> 21 (0.9) 284 (1.8) 22 (0.9)"< 270 (1.8)> 3 (0.4) 256 (4.2) 7 (0.6) 268 (3.0)Mississippi 36 (1.7) 254 (1.6) 16 (0.7) 256 (2.0) 29 (1,4) 239 (1.6) 13 (0.8) 234 (1.8) 7 (0.6) 231 (2.8)Missouri 36 (1.3) 280 (1.7) 22 (0.9) 275 (1.5) 29 (1.0) 264 (1.6) 8 (0.7) 254 (2.4) 6 (0.5) 252 (2.9)Nebraska 46 (1.5) 287 (1.2) 20 (1.0) 280 (1.6) 24 (1.2) 267 (1.7) 4 (0.5) 247 (3.3) 6 (0.6) 256 (3.8)New Hampshire 46 (1.5) 287 (1.4) 17 (0.8) 280 (1.5) 24 (1.1) 267 (0.9)>> 6 (0.5) 259 (2.5) 7 (0.5)> 262 (2.5)New Jersey 45 (1.6) 283 (1.8) 18 (0.8) 275 (2.1) 23 (1.2) 259 (2.5) 7 (0.6) 253 (3.8) 8 (0.7) 250 (3.9)New Mexico 34 (1.4) 272 (1.4) 20 (0.7) 264 (1.4) 26 (1.1) 249 (1.4) 11 (0.7) 244 (1.9) 10 (0.6) 245 (2.0):ew York 44 (1.8) 277 (1.9) 18 (1.1) 271 (2.4) 23 (1.0) 256 (2.5) 6 (0.8) 243 (4.2) 10 (1.0) 240 (3.8)North Carolina 36 (1.2) 271 (1.4;> 20 (0.8) 265 (1.6)> 27 (0.9)< 246 (1.7) 10 (0.6) 240 (2.3) 6 (0.5) 240 (3.6)North Dakota 54 (1.2)> 289 (1.1) 18 (0.7) 283 (1.9) 19 (1.3) 271 (1.7) 3 (0.5) 259 (4.5) 5 (0.5) 272 (2.8)Ohio 37 (1.4) 279 (1.8) 19 (0.7) 272 (1.6) 32 (1.1) 260 (2.3) 7 (0.6) 243 (2.6) 5 (0.S) 249 (4.5)Oklahoma 39 (1.4) 277 (1.5) 21 (0.9) 272 (1.9) 26 (1.0) 257 (1.7) 8 (0.7) 254 (2.9) 6 (0.5) 251 (4.3)Pennsylvania 39 (1.8) 282 (1.6) 19 (0.9) 274 (1.9) 30 (1.2) 262 (1.6) 7 (0.8) 252 (2.8) 5 (0.5) 252 (3.81hode Island 43 (1.1) 276 (1.1) 18 (1.5) 271 (1.5) 22 (1.4) 256 (1.6) 8 (0.4) 244 (2.1) 8 (0.6) 239 (2.5)outh Carolina 37 (1.4) 272 (1.5) 16(0.7) 268 (1.7) 31 (0.9) 248 (1.4) 9 (0.6) 248 (2.1) 7 (0.3) 247 (3.0)Tennessee 33 (1.5) 267 (2.1) 21 (0.9) 265 (1.8) 29 (1.0) 251 '1.6) 12 (0.8) 245 (2.0) 5 (0.4) 243 (3.6)Texas 34 (1.6) 281 (2.1)> 18 (0.8)> 272 (1.6) 21 (1.0) 253 (1.6) 16 11.0) 247 (1.7) 11 (0.8) 244 (2.4tah 53 (1.3) 280 (1.0) 22 (1.0) 278 (1.2) 15 (0.8) 258 (1.8) 3 (0.3) 254 (3.2) 7 (0.5) 258 (2.7)Virginia 41 (1.5) 282 (1.5) 18 (0.8) 270 (1.6) 24 (0.9) 252 (1.5) 9 (0.6) 248 (2.1) 8 (0.6) 251 (2.51West Virginia 29 (1.1) 270 (1.5) 18 (0.8) 269 (1.4) 33 (1.1)< 251 (1.2) 13(0.9) 244 (1.8) 7 (0.4) 239 (2.31Wisconsin 38 (2.4) 287 (1.8) 24 (0.8) 282 (1.5) 28 (1.8) 270 (1.9) 5 (0.6) 254 (3.4) 6 (0.6) 255 (4.01Wyoming 42 (0.9) 281 (0.9) 22 (0.8) 278 (1.7) 23 (0.7) 266 (1.1) 5 (0.6) 258 (3.3) 7 (0.5) 260 (2.2)TERRITORIESGuam 28 (1.2) 246 (1.9) 13 (0.7) 244 (2.4) 27 (1.1) 229 (1.9) 10 (0.9) 224 (2.5) 22 (1.2) 226 (2.0)Virgin Islands 23 (1.1) 224 (2.0) 11 (0.8) 232 (2.4) 29 (0.9) 221 (1.9) 14 (0.9) 219 (2.4) 24 (1.0) 217 (1.4)

    The percentages for parents" highest level ofeducation may not add to 100 percent because some students responded I don't know." >>Thevalue for1992 was significanty higher than the value for 1990 at about the 95 perment ertainty level.

  • 8/12/2019 Improving Tabular Displays

    14/31

    ImprovingTabularDisplaysAnswersto all of these questions ie withinthe boundsof Table5, buthow easily can they be extracted?Can we ease the painof this extraction

    througha changein the designof thetable?Let us beginthereal workof theredesignby askingwhy one wouldwantto includethepercentagesn each educational ategory n thesame tableasthe mathematics roficiency,as opposedto placingthemin theirown tableon a facingpage.Themajorreason s thatthepercentages reimportantorcalculating tatemeans.Suchmeans aregiven in othertables,butit wouldseemgoodpractice rememberRuleIII)to include hemhere.Oncetheyarecalculated, hey providea sensible variableon which to orderthe states(ratherhan healphabet-RuleII).Oncethisordering asbeenaccomplishedwecan seeapparentaps nthe states'performance. natural isualmetaphorfor these data gaps is to includematchingphysicalgaps.5The resultingtableof meanproficienciesby stateis shownas Table 6. The percentagedistributionsf parental ducation lided from thistablehave beenset asidefor a parallel able;we shallreturn o these datashortly.We have also movedtheDistrictof Columbianto thesectionof nonstatesthatalso includesGuamand theVirginIslands.All orderings donewithintablesection.Note thatthekey summaries rein boldfacetype.Forease ofmanipulationhe standard rrorshavetemporarily eenremoved romtheirparentheses.Theywill shortlybe removedaltogether.Table6 allows us to answersome of thequestionsphrasednitiallyquiteeasily, especiallythose dealingwith the relativeperformance f the states(Question 1). The usual findingof Midwestern tates havingthe highestaverageperformancendtheSouthern tatesthelowestis seen immediately.Moreover,we see that thereis a 37-pointdifferencebetween the higheststatesand the lowest. Interpreting7 pointsis helpedby rememberinghatthere is an averageincrease of 12 NAEP points/yearbetweenfourthandeighthgradein math.Thus, the 37-pointdifferencecan be interpreted scorrespondingo about a three-yeardifference in averageperformancebetween the best andworstperformingtates.This increases o more thanfouryearswhen one'sgaze shiftsto thethree"otherurisdictions." hegapsdepictedhelp keep our eyes fromblurringwhile examiningsuch a largetable, and they also provideroughgroupingsthat may be suggestiveofexplanatory ypotheses.Note thatthedataabout he varioussectionsof thecountryon the top of Table5 have been removedentirely.This was donebecause heywere theproducts f a different urveyandhavelarger tandarderrors hanthe individualstatesfromwhich those sectionsare composed.Their nclusion ust addedan unnecessaryourceof confusion.Examininghe averageproficiency or thenationat eacheducationevelreveals heunsurprisingesult hatchildrenwhoseparents rebetter ducatedscorehigher n mathematics.n addition,t appears hatchildrenwhodon'tknow theirparents'educationperform lightlybetterthanchildrenwhoseparents id notfinishhighschool.This is suggestiveof agrouping omewhat

    13

  • 8/12/2019 Improving Tabular Displays

    15/31

    TABLE6Reformatted version of Table5 in which standard errors are in separately labeledcolumns, categories of parental education are separated, average stateperformance is shown, and rows are ordered and spaced by average performance

    SomeEducation Did NotPUBLIC Graduated After Graduated FinishSCHOOLS College HighSchool HighSchool HighSchool Idon't Know Average0 so 0 so e se 0 se e so e seNation 279 1.4 270 1.2 256 1.4 248 1.8 251 1.7 267 1.4States

    Iowa 291 1.2 285 1.5 273 1.3 262 2.4 266 2.8 283 1.4North Dakota 289 1.1 283 1.9 271 1.7 259 4.5 272 2.8 283 1.5Minnesota 290 1.0 284 1.8 270 1.8 256 4.2 268 3.0 282 1.6

    Maine 288 1.4 281 1.5 267 1.1 259 2.7 266 2.6 278 1.5Wisconsin 287 1.4 280 1.5 267 0.9 259 2.5 262 2.1 278 1.4New Hampshire 287 1.8 282 1.5 270 1.9 254 3.4 255 4.0 278 2.0Nebraska 287 1.2 280 1.6 267 1.7 247 3.3 256 3.8 277 1.6Idaho 281 0.9 278 1.3 268 1.4 254 2.3 254 2.8 274 1.3

    Wyoming 281 0.9 278 1.7 266 1.1 258 3.3 260 2.2 274 1.3Utah 280 1.0 278 1.2 258 1.8 254 3.2 258 2.7 274 1.3Connecticut 288 1.0 272 1.8 260 1.8 245 3.3 251 2.4 273 1.6Colorado 282 1.3 276 1.6 260 1.5 250 2.4 252 2.6 272 1.6Massachusetts 284 1.3 272 1.8 261 1.4 248 3.2 248 2.6 272 1.6New Jersey 283 1.8 275 2.1 259 2.5 253 3.8 250 3.9 271 2.3

    Pennsylvania 282 1.6 274 1.9 262 1.6 252 2.8 252 3.8 271 1.9Missouri 280 1.7 275 1.5 264 1.6 254 2.4 252 2.9 271 1.8Indiana 283 1.5 275 1.9 260 1.6 250 2.6 249 3.3 269 1.8Ohio 279 1.8 272 1.6 260 2.3 243 2.6 249 4.5 268 2.1Oklahoma 277 1.5 272 1.9 257 1.7 254 2.9 251 4.3 267 1.9

    Virginia 282 1.5 270 1.6 252 1.5 248 2.1 251 2.5 267 1.7Michigan 277 2.2 271 2.0 257 1.7 249 2.0 248 3.0 267 2.1New York 277 1.9 271 2.4 256 2.5 243 4.2 240 3.8 265 2.5Rhode Island 276 1.1 271 1.5 256 1.6 244 2.1 239 2.5 265 1.5Arizona 277 1.5 270 1.5 256 1.6 245 2.5 248 2.7 264 1.8Maryland 278 1.8 266 1.9 250 1.8 240 3.7 245 3.8 264 2.1Texas 281 2.1 272 1.6 253 1.6 247 1.0 244 2.4 264 1.8Delaware 274 1.3 268 2.3 251 1.7 248 4.0 248 3.4 262 1.9Kentucky 278 1.6 267 1.6 254 1.6 246 1.7 242 2.8 261 1.7California 275 2.0 266 2.1 251 2.1 241 2.2 240 2.9 260 2.2South Carolina 272 1.5 268 1.7 248 1.4 248 2.1 247 3.0 260 1.7Florida 268 1.9 266 1.9 251 1.8 244 2.7 244 3.2 259 2.1Georgia 271 2.1 264 1.7 250 1.3 244 2.2 245 2.6 259 1.8New Mexico 272 1.4 264 1.4 249 1.4 244 1.9 245 2.0 259 1.5Tennessee 267 2.1 265 1.8 251 1.6 245 2.0 243 3.6 258 2.0West Virginia 270 1.5 269 1.4 251 1.2 244 1.8 239 2.3 258 1.5North Carolina 271 1.4 265 1.6 246 1.7 240 2.3 240 3.6 258 1.7Hawaii 267 1.5 266 1.9 246 1.8 242 3.5 246 2.1 257 1.9Arkansas 264 1.9 264 1.7 248 1.6 246 2.4 245 2.7 256 1.9Alabama 261 2.5 258 2.0 244 1.8 239 2.0 237 2.9 251 2.2Louisiania 256 2.5 259 1.8 242 1.6 237 2.4 236 3.7 249 2.2Mississippi 254 1.6 256 2.0 239 1.6 234 1.8 231 2.8 246 1.8

    Other JurisdictionsGuam 246 1.9 244 2.4 229 1.9 224 2.5 226 2.0 235 2.0District of Columbia 244 1.7 240 1.9 224 1.6 225 3.2 229 2.2 234 1.9Virgin Islands 224 2.0 232 2.4 221 1.9 219 2.4 217 1.4 222 1.9

  • 8/12/2019 Improving Tabular Displays

    16/31

    Improving TabularDisplaysheterogeneousnparentalducation.A smallplotof meanmathperformanceagainstparents'education Figure1), witha roughreferenceine drawn n,makes the quantitative spect of this relationship learer and providesareasonable nswer o Question2.Scanningdown the first columnof the tableshowsthatthehigher-scoringstatesalso tend to havea greaterproportionf children omingfromhomeswith a parentwhowas a college graduate. utevenamong ustthesechildren(conditioningnparents'ducation),heres still a37-pointdifference etweenthehighest-andlowest-scoringtates.This is partof an answer o the thirdkindof question, lthoughmorecomplete nswers an bebuiltbyconstructinggraphs ike Figure1 for individual tates.Such a graph, hownas Figure2,contradictshe hypothesishat differencesn states'overallperformanceredueto differencesn parents' ducation.Aside frombeingmildlystartlingnitsownright, hisresult educes till furtherhe needto include hepercentageof children n eachparentalducation ategorywithin histable.

    280

    270 '00

    260

    250

    240 Graduated Some Education High School I Don't HighSchoolCollege AfterHigh School Graduate Know DropoutParents' EducationFIGURE 1. A plot showing children's mathematics proficiency and their parents'education. A reference line drawn in shows roughly the relationship between the(mostly) ordered categories of parental education and children'sperformance.15

  • 8/12/2019 Improving Tabular Displays

    17/31

    Wainer300

    290280

    S"".A2 7 0

    260OO

    0 "'..

    250-& ""

    240'230

    220 Graduated Some Education HighSchool I don't HighSchoolCollege AfterHigh School Graduate know DropoutParents' EducationFIGURE . A comparison f theperformance f 8thgraders nmathematicsnIowa,New Jersey,and Mississippi, hownas a function of theirparents'education.Thedifference n theperformance f children n mathematics y state is not solely dueto differencesn parentaleducation.What About the Standard Errors?

    Questions about the statistical significance of these observed differencescan be answered afterdoing a little arithmetic on the standarderrorsincludedwithin the table. A natural question to ask is why that arithmetic hasn'talreadybeen done by the generatorsof the table. One possible answer to thisquestion is that there are too many plausible questions of statistical signifi-cance that might be asked to calculate all of the possible error terms. But,playing devil's advocate, couldn't some conservative errortermbe calculatedthatwould save all of the clutterintroducedby the many columns of standarderrors?The answer to this, simply put, is yes. And the next version of thistable (shown as Table 7) segregates the standarderrors into a separate tableand substitutes instead (for quick and dirty significance judgments) threeestimates of the standard error of the difference between any two entries inthat column. The first is an upperbound on the standard rrorof thedifference.16

  • 8/12/2019 Improving Tabular Displays

    18/31

    ImprovingTabularDisplaysThis is obtained by multiplying the largest value of the standarderrorin thatcolumn by ,J. The second entry, labeled 40 Bonferroni, is the first entrymultiplied by 3.2. This is obtained from the Bonferroni inequality and basedon the idea that a user is interested in making comparisons of his/her ownstate with each of the others. This controls the family of tests protectionbeyond the .05 level. The last entry, labeled 820 Bonferroni, is the first entrymultiplied by 4.0 and controls the family of tests significance for someonewho compares each state with all others. It is likely that this last estimate isunnecessary, since anyone expecting to make that many comparisons willalmost surely want the tighter errorbounds constructed from the individualstandarderrorsand perhaps use more powerful proceduresfor multiple com-parisons (e.g., Benjamini & Hochberg, 1995).6A table augmented with these errorsummaries, but relieved of the burdenof individual accompanying standarderrors, is not only a good deal clearerto look at but, for most prospective users, a good deal easier to use formaking inferences about statistical significance of observed differences.Next, while there was no good reason to combine mathematicsachievementandpercentageof children n eachcategoryinto the sametable,thesepercentagedistributionsare important n theirown right.It was just that theirpresentationwas clearerafterthey were separated nto two tables. To examine this, considerthe two variables, shown as Tables 7 and 8. Table 7 contains just meanmathematicsproficiency; Table 8 just the distributionof childrenacross levelsof parentaleducation. It appearsthatthe benefits associated with housing bothof these variables within the same table are too few to offset the increases inperceptualcomplexity that accrue by mixing them. It seems, however, worth-while to keep them contiguous. Thus we would recommendplacing them onfacing pages. Note that the states in Table 8 are ordered by the state meansfrom Table 7. This facilitates comparisons between the two tables. It alsoraises the interestingquestion of whetherthe increasedease of comprehensionyielded by orderinga table by its contents is more thanoffset by the increaseddifficulty in making comparisons across tables ordered n differentways. Thisissue will be discussed furtherat the end of this section.On both tables we have highlighted unusual entries by putting them inboldface type and boxing them in. Entries that are unusually large are alsoshaded. Entries that are unusually small are boxed but unshaded. We havealso appendeda positive or negative sign as a furtherreminderof the directionof the entry's variation. Thus in Table 7 we see that the average score ofchildren whose parentshad only some post-high school education was unusu-ally high in West Virginia. Similarly, Nebraska's and Connecticut's childrenof high school dropouts scored unusually poorly.The determination of which entries were unusual was made by fitting asimple additivemodel to the dataandexamining the residuals. Those residualsthat stuck out excessively (more than 2 times the squareroot of the mean ofthe squared residuals) were then highlighted. Table 7 goes about as far as

    17

  • 8/12/2019 Improving Tabular Displays

    19/31

    TABLE7Revision of Table 6 with individual standard errors replaced by conservativeestimates, unusual entries highlighted, and a state locator index insertedSomeEducation Did Not

    PUBLIC Graduated After Graduated FinishSCHOOLS College High High High IDon'tSchool School School Know MeanNation 279 270 256 248 251 267States1 Iowa 291 285 273 262 266 2832 NorthDakota 289 283 271 259 272 2833 Minnesota 290 284 270 256 268 282

    4 Maine 288 281 267 259 266 2785 Wisconsin 287 282 270 254 255 2786 NewHampshire 287 280 267 259 262 2787 Nebraska 287 280 267 [ 247- 256 2778 Idaho 281 278 268 254 254 2749 Wyoming 281 278 266 258 260 27410 Utah 280 278 258 254 258 27411 Connecticut 288 272 260 f 245- 251 273

    12 Colorado 282 276 260 250 252 27213 Massachusetts 284 272 261 248 1 248- 27214 NewJersey 283 275 259 253 250 27115 Pennsylvania 282 274 262 252 252 27116 Missouri 280 275 264 254 252 27117 Indiana 283 275 260 250 249 26918 Ohio 279 272 260 243 249 26819 Oklahoma 277 272 257 254 251 26720 Virginia 282 270 252 248 251 26721 Michigan 277 271 257 249 248 26722 New York 277 271 256 243 240- 26523 RhodeIsland 276 271 256 244 239- 26524 Arizona 277 270 256 245 248 26425 Maryland 278 266 250 240 245 26426 Texas 281 272 253 247 244 26427 Delaware 274 268 251 248 248 26228 Kentucky 278 267 254 246 242 26129 California 275 266 251 241 240 26030 SouthCarolina 272 268 248 248 247 26031 Florida 268 266 251 244 244 25932 Georgia 271 264 250 244 245 25933 New Mexico 272 264 249 244 245 25934 Tennessee 267 265 251 245 243 25835 West Virginia 270 269+ 251 244 239 25836 NorthCarolina 271 265 246 240 240 25837 Hawaii 267 266 246 242 246 25738 Arkansas 264 264 248 1 246 + 245 25639 Alabama 261 258 244 239 237 25140 Louisiania 256 259 242 237 236 24941 Mississippi 254 256 239 234 231 246

    OtherJurisdictions42 Guam 246 244 229 224 226 23543 District f Columbia 244 240 224 225 229 23444 Virgin slands 224 232 221 219 217 222

    Error terms for comparisonsMax Std error of diff 3.5 3.4 3.5 6.4 6.4 3.540 Bonferroni 11.3 11.0 11.3 20.7 20.7 11.3820 Bonferroni 14.0 13.6 14.0 25.6 25.6 14.0

  • 8/12/2019 Improving Tabular Displays

    20/31

    TABLE8A parallel of Table 7 including instead percentage distribution of parentaleducationSomeEducation Did Not

    PUBLIC Graduated After Graduated FinishSCHOOLS College High High High I Don'tSchool School School KnowNation 40 18 25 8 9States1 Iowa 44 21 25 4 52 NorthDakota 64 + 18 19 3 53 Minnesota 48 21 22 3 7

    4 Maine 40 22 26 6 55 Wisconsin 38 24 28 5 66 New Hampshire 46 17 24 6 77 Nebraska 46 20 24 4 68 Idaho 48 20 19 7 69 Wyoming 42 22 23 5 710 Utah 53+ 22 15- 3 711 Connecticut 47 16 22 6 9

    12 Colorado 46 19 21 6 713 Massachusetts 48 17 21 7 714 NewJersey 45 18 23 7 815 Pennsylvania 39 19 30 7 516 Missouri 36 22 29 8 617 Indiana 33 21 32 8 618 Ohio 37 19 32 7 519 Oklahoma 39 21 26 8 620 Virginia 41 18 24 9 821 Michigan 38 23 26 6 722 New York 44 18 23 6 1023 Rhode Island 43 18 22 8 824 Arizona 36 22 21 10 1225 Maryland 44 18 25 6 726 Texas 34 18 21 16 1127 Delaware 39 18 30 6 828 Kentucky 28- 19 32 15 629 California 39 18 17 - 10 1630 South Carolina 37 16 31 9 731 Florida 39 19 24 8 1032 Georgia 35 18 30 11 633 New Mexico 34 20 26 11 1034 Tennessee 33 21 29 12 535 West Virginia 29- 18 33 13 736 NorthCarolina 36 20 27 10 637 Hawaii 38 15 25 6 1638 Arkansas 30- 20 31 11 839 Alabama 33 18 29 13 740 Louisiania L 32- 20 30 10 741 Mississippi 36 16 29 13 7

    Other Jurisdictions42 Guam 28 13 27 10 2243 District f Columbia 32 17 29 9 1244 Virgin slands 23 11 29 14 24Error terms for comparisonsMax Std error of diff 3.4 2.1 2.5 1.4 1.740 Bonferroni 11.0 6.8 8.1 4.5 5.5820 Bonferroni 13.6 8.4 10.0 5.6 6.8

  • 8/12/2019 Improving Tabular Displays

    21/31

    Wainerwe might expect in displaying the results to answer all of the questions aboutachievement scores phrased earlier.Last, the individual standard errors that were previously housed in theoriginal table have been combined and piled into two tables of standarderrors matching Tables 7 and 8. These are available from the author [email protected]. I believe that these will be so rarelyconsulted that it isn'tworth using up extrapages here. Futureexperience will informthisjudgment,and I am preparedto change the format if I am wrong.Thus we have found that separating variables that are only tangentiallyrelated into separatetables yields increasedcomprehensibility.Once the sepa-ration is completed, the tables should be structuredaccording to the fourrules specified earlier.The questions posed at the beginning of this section,which characterize the most plausible reasons why anyone would want tosee these data, are all answered more easily from these revised tables.What about order?Clearly, if we wish to compare data values on differentvariables from the same set of states it is often helpful if those dataareorderedin the same way in those different tables. This is currentlyaccomplished byordering all tables alphabetically. Is this a good idea? I think that there areseveral alternatives.The most attractive one to me is to order each table asan independent entity, to be looked at and understoodon its own. Secondaryanalyses that require combining information from several tables should bedone from a different data source than the table; almost surely it should besome electronic database that would allow easy subsequent manipulations.But if we are to think of the tables as the first available archive, there maybe an argument for ordering all tables on a similar topic in the same way,so that various pieces of information about a particularstate can be pickedout easily. If so, alphabetical ordering is only one possibility among many.Is it the best one? Alphabetical ordering has only one thing going for it: Itmakes locating a specific state easier.7Its principaldrawbackis that it usuallyobscures the structurethat the table was constructed to inform us about. Ifa set of tables like those thatgrew out of Table 5 areconstructed and orderedby overall performance (instead of alphabetically), we have made finding aparticular state a bit more difficult.s I believe that this is a small cost incomparison to the gain in comprehensibility.But even this can be amelioratedthrough the inclusion of a "locator table." All we need to do is number thejurisdictions in the table sequentially from 1 to 44, as was done in the firstcolumn of Tables 7 and 8, and then have a small, alphabetically orderedlocator index table (Table9) thatconnects alphabeticallyordered state namesto row numbers in the empirically ordered tables.CompoundTables

    Table 7 is a rectangulararrayshowing a single dependent variable, meanmathematics proficiency, as a function of two independentvariables, parents'20

  • 8/12/2019 Improving Tabular Displays

    22/31

    Improving TabularDisplaysTABLE9Alphabetically ordered locator index table of the states in Tables 7 and 8, to beused in case of an emergency loss of any particular jurisdictionState Position State PositionAlabama 39Arizona 24Arkansas 38California 29Colorado 12Connecticut 11Delaware 27Florida 31Georgia 32Hawaii 37Idaho 8Indiana 17Iowa 1Kentucky 28Louisiana 40Maine 4Maryland 25Massachusetts 13Michigan 21Minnesota 3

    Mississippi 41Missouri 16Nebraska 7

    New York 22New Jersey 14New Mexico 33New Hampshire 5NorthDakota 2NorthCarolina 36Ohio 18Oklahoma 19Pennsylvania 15RhodeIsland 23SouthCarolina 30Tennessee 34Texas 26Utah 10Virginia 20WestVirginia 35Wisconsin 6Wyoming 9OtherJurisdictionsDistrictof Columbia 43Guam 42VirginIslands 44

    education and geographic location. As we have seen, when properlydesignedsuch a table can be a clear and evocative communicator of information.Unfortunately,cleardesign is too commonly abandoned n favor of compoundtables when multivariateor multilevel data are to be displayed. Such tablesare very hard to understand. In fact, in a survey of education policymakers,Hambleton and Slater (1995) found that such compound tables were the mostfrequentlymisunderstood of any datadisplay in NAEP executive summaries.In one such display, parallel in structureto Table 10, more than half of theeducation professionals answered a simple data extraction questionincorrectly.To illustrate the negative effects of a compound table, consider Table 10,a somewhat tidied up version of Table 2.3 from the 1992 NAEP Reading

    21

  • 8/12/2019 Improving Tabular Displays

    23/31

    CU

    C4 um) "( cis... .

    00r-. ), 00 W, ) " 't" It 0" "

    -o - ,:> i cf c 1 C'i N- i C-i EaC 00

    0: rA

    C 0CC C/ nP" cn U

    :?- 0 - -c--4. ---el 40- - CiCi0 C, , C, C, C, ,C,4 00 , , 000, -, .zo

    6 ddd d d "" ""

    U . . . . *.Cd

    v" = ,

  • 8/12/2019 Improving Tabular Displays

    24/31

    ImprovingTabularDisplaysAssessment. This display can be improved by breaking it up into smallerdisplays. For example, the average proficiencies are best shown as a two-way table by themselves (see Table 11). This table shows quantitatively themodest size of the region effects and the much larger grade effects. Thereare no large interactions, and so no entries are boxed in. Of course, an evenmore evocative image could be obtained by subtractingout the grade meansand then plotting the residuals. Such a table would make clear just howdifferent the Southeast is. Similar tables containing the percentageof childrenat each of the NAEP reading levels could also be constructed.Why is it that it is often easier to understand several simple displays thanone compound one? To understanda display involves two distinct phases ofperception (Bertin, 1973/1983) which are characterized by two questions:What are the components of data that are being reported? What are therelations among them?The first phase is easy if the horizontaland verticalcomponents areunitary,for example, grade level versus region. It becomes more difficult if they arenot, for example, level of proficiency and average proficiency and percentversus region and grade. The second phase of perception is addressed by thefour rules, but it is made more difficult if the first phase is complex. Inthis report we have suggested that unless the simultaneous presentation ofmultidimensional information is critical to understanding,comprehension isaided by keeping the components of data simple andpresentingtables paired.This was illustrated earlier when we separated the percentage distributionsand the achievement scores into separate tables.

    Discussion and ConclusionsJust as there areno good or bad tests, neither are theregood or poor graphsnor good or poor tables. Their value depends on the uses to which they are

    put. Some constructions answer the questions one is entitled to ask, andTABLE11Reformattedersionof Table10 in whichonly regionalproficienciesby gradelevel are included. t is arranged o emphasize he two-way tructure f thedata. It is borderedby meansand maineffectsAverage Proficiency Regional Regional

    Region Grade Grade Grade12 Means EffectsNortheast 223 263 293 260 4Central 221 264 294 260 4West 215 260 292 256 0Southeast 214 254 284 251 -5GradeMeans 216 260 291 256GradeEffects -40 4 3523

  • 8/12/2019 Improving Tabular Displays

    25/31

    Wainerothers do not. By makingthe hierarchy f possible questionsexplicit,weemphasize he fact thatone cannot ook at a graphor tableas one looksata painting r a traffic ignal.One does notpassivelyreadagraph; nequeriesit. And one must knowhow to ask usefulquestions.What are the questionsthatcan be asked?To some extent we exploredthis in the secondsection.Ingeneral heyare thesamequestions hatwouldbe asked of data fromany factorialexperiment:What are the row effects?What arethe columneffects? Whatarethe interactions?How do the rowsand columnsgroupas functionsof theseeffects?The goal of effective displayis to ease the viewer's task in answeringthesequestions.We have found hatwisely ordering, ounding,ummarizing,andspacinggo a long way towardaccomplishinghis. In addition,we mustconfront he likely use of a table head-onbefore ncludingvariousmixturesof variables nto it. Addingextrastuffalwaysaffectscomprehensibility,ndwe mustmakethe triagedecision betweensaving spaceby combining woor more ables ntooneandcommunicatinglearly. t has beenourexperiencethatbreakingup complex displays sensiblyoftencommunicatesmoreeffi-ciently,em for em, thana large compound able.

    Measuring NumeracyEarlierwe showedthat f tablesthatareused as stimuliwithina test itemareprepared roperly,hequestionsassociatedwith themareusuallyreducedin difficulty,often dramatically.This does not mean that the practiceofaskingsuchquestionsoughtto be discontinued,nymorethanwe advocatecontinuingo usepoorlyconstructedables omakesuchquestionsesstrivial.Thetest's usefulnessas a learningnstrument ouldbe enhanced f it servedas a model for how tablesoughtto be prepared s well as illustratinghedepthof informationeadilyavailable rom well preparedables.Wellpreparedables will also allow us to constructquestions hatprobethedeepstructure f thedata n awaythat s toodifficultwithpoorlypreparedtables.Whataresuchquestions ike?To answer hiswe need a littletheory.And,to illustrate histheory,we will use thebatteryife item fromthe 1990ScienceAssessmentntroducedarlierandreproducedereas Table 12. Thisis identical o Table4, shownearlier,exceptthatfourunusualentrieshavebeenindicatedbyboxing hem n.A shadedbox withapositivesignindicatesa higherthanexpectedentry;an unshadedbox witha negativesign meansa lowerthanexpectedentry.Ehrenberg1977) calls the abilityto understandatapresentedn a table"numeracy."his termmayhavebroader pplication, utwe shalluse it inthis narrow ontext for the nonce.Howcanwe measure omeone'sproficiencyn understandinguantitativephenomenahat arepresentedn a tabularway (an individual'snumeracy)?Obviously hereareNAEPtest itemswritten hatpurporto do exactlythis;the items describedearlierare some typicalexamples.We can do better

    24

  • 8/12/2019 Improving Tabular Displays

    26/31

    ImprovingTabularDisplaysTABLE12Revisionof Table4 withunusualentrieshighlighted

    Battery Life in HoursBattery Cassette Portable BatteryBrands Radio Flashlight Player Computer AveragesNeverDiel 28 + 16 8 6 15Electro-Blaster26 15 10 [ 4- 1 14PowerBat 24 13 7 5 12Servo-Cell 21 12 4 2 10Constantharge19 10 5 3+ 9

    Usage averages 24 13 7 4 12with the guidance of a formal theory of graphic communication (Wainer,1980, 1992).Rudimentsof a Theoryof Numeracy

    Fundamentalto the measurementof numeracy is the broaderissue of whatkinds of questions tables can be used to answer. My revisions of Bertin's(1973/1983) three levels of questions are:"*Elementary-level questions involve data extraction, for example, Howlong does a Servo-Cell last in a cassette player?"*Intermediate-levelquestions involve trends seen in partsof the data, forexample, How much longer is a battery likely to last in a radio than ina portable computer?"*Overall-level questions involve the deep structure of the data beingpresented in their totality, usually comparing trends and seeing group-ings, for example, Which two appliances show the same pattern ofbattery usage?, or, Which brands of batteries show the same patternofbattery life?

    They are often used in combination; for example, Zabell (1976) referredtotheir use in the detection of outliers-unusual data points. To accomplishthis objective we need a sense of what is usual (i.e., a trendat the intermediatelevel), and then we look for points that do not conform to this trend (theelementary level). Such questions are hard to answer from a raw table suchas Table I but are trivial in Table 11, where such interactions(this time froman additive model) are highlighted.Note that although these levels of questions involve an increasingly broadunderstandingof the data, they do not necessarily imply an increase in theempirical difficulty of the questions.9Reading a table at the intermediate level is clearly different from readinga table at the elementary level; a concept of trend requires the notion of25

  • 8/12/2019 Improving Tabular Displays

    27/31

  • 8/12/2019 Improving Tabular Displays

    28/31

    ImprovingTabularDisplaysconstructinghedisplay.Whileelementary-leveluestionsare bestansweredwith a table,intermediate-nd overall-levelquestionsmaybe easierwithagraph.However,as we havedemonstrated, ell preparedablescanbe usefulat higher evels.Myexperiences that est itemsassociatedwithtables endto bequestionsof the first kind,althoughoften they are compounded hrough he use ofnontabularomplexity.This is not an isolatedpractice onfined o themea-surement f numeracy.n thetestingof verbal easoningt is commonpracticeto make a reasoningquestionmore difficultsimply by usingmorearcanevocabulary.Thispractice tems from thefact that t is almost mpossible owritequestions hatare moredifficult han hequestioners smart.Whenwetryto test the upperreachesof reasoningability,we mustfind itemwriterswho aremore clever still.Of course,whenwe recorda certainevel of performancey anexamineeon a table-basedtem,we canonlyinfera lowerboundon someone'snumer-acy;'0a better ableof the samedataought omake he itemeasier.Similarly,a more numerate udiencemakes a tableappearmoreefficacious.

    SoftwareThere s anenormouswealthof softwareavailable o maketables.I havefoundthat the versatilityof spreadsheet rogramss especiallyuseful. Alltables n thisarticlewereprepared singMicrosoft'sEXCEL'"on a Macin-tosh computer.Suchsoftwareallows prettycompletecontrolof fonts,typesizes,borders, ndshading.Orderingows strivial; rderingolumns equiresa little work.Transformingatafroma spreadsheetable into the graphofyour choice is easily accomplishedwithin EXCEL'"for most commongraphicforms. For moreesotericformatsthe dataare easily moved intospecial-purpose rograms.The realpowerof spreadsheetsmergeswhencalculations f some com-plexityarerequired.This lifts the spreadsheetrombeingmerely handytobeingessential orpreparingables.Theidentificationf unusualdatapointsin a two-waytablerequirescalculatingrow and columneffects and thensubtractinghemout.Determiningwhichgapsin a univariate atastringarelikely to be worthemphasizing equiresordering he data,calculating hegaps as well as a vector of inverselogistic weights,and combiningandsummarizinghem.All of thesetaskscanbedoneon thefly withina spread-sheet. Speciallydesignedtable softwaredoes not always measureup inthis regard.Before one can use this or any softwareon NAEPdata,one must firstextract hosedatafromrather omplexNAEPdatafiles andtransformheminto a formatacceptable o a spreadsheet.This formerlyoneroustaskhasbeen eased considerablyhrough he development f some special-purposesoftware NAEPEX) hat s distributedwiththeNAEPSecondary-UseDataproducts.NAEPEXallows the userto define,extract,and analyzesubsets

    27

  • 8/12/2019 Improving Tabular Displays

    29/31

    Wainerof NAEP data in a relatively painless manner.Furtherdetails about NAEPEXare contained in its user guide (Rogers, 1995).

    SummingUpTables are used for many purposes within NAEP: as stimuli in test items,as containers to archive data, and as a communicative medium. Believingthat the archival purpose is anachronistic,we focused our attention on rulesfor building tables to facilitate their efficacy as communicative devices. Wefound that the same four rules apply to the simplest tables used as stimuliwithin the assessment and to the most complex tables aimed at scientists.While the rules areobjective and as such can be applied througha completely

    automatic procedure,humanjudgment and wisdom are still required.Beforeapplying the rules, one must decide on the most likely prospective uses forthe data in the table and include only those data that facilitate those uses.Of foremost importance is the notion that we are typically not looking ata table to simply extract a number. To become involved in a problem and tounderstand it is to shift from extracting individual entries to understandingquantitativephenomena. The construction of efficacious data displays aimsto promotethis transition,allowing thereadera gracefulchange fromspectatorto participant.Notes

    Thisworkwas sponsoredby the NationalCenter orEducation tatistics hroughContactNumberR999B40013to the Educational estingService,HowardWainer,Principalnvestigator. lthough ampleased o expressmy gratitudeorthissupport,I mustreexpress he usual caveatthat all opinionsexpressedhere arethose of theauthor nddo notnecessarily eflect heviewsofeitherNCESor theU.S.Government.I am delightedto be able to thankJeremyFinn for his critical and constructivecommentson thisworkas it developed.Of course,he shouldn'tbe heldresponsibleforwhathasresultedromhisgoodadvice. wouldalso like tothankBrentBridgeman,JohnMazzeo,KeithReid-Green,LindaSteinberg, ndanunusuallyhelpfulassociateeditorandtwo sharp-eyed,houghtful,butanonymous efereesfor theircommentson anearlierdraft.Last,my gratitudeo JohnTukeyfor his helpfulsuggestionsonthe choice of an errorterm for large tables.This articlewas abstractedrom aconsiderablyongertechnicalreport Wainer,1995);readers nterested n receivinga copy of thatreportcan request t fromMarthaThompsonat Educational estingService([email protected]).'NAEPis a congressionallymandated urveyof the educational chievement fAmerican tudentsandof changes n thatachievement crosstime.Thissurveyhasbeenoperationalor nearly25 yearsandutilizestechnically ophisticatedamplingandassessmentmethodology.The resultsof NAEPare madeavailable o boththeprofessional ndlay publiccontinuously ndarecitedwith increasing requency sevidence in publicdebatesabouteducationalopics.NAEP'sresults recomplex,consisting, stheydo,of (a)outcomesonachievementtests of complex characteron a varietyof subjects;(b) attitudeand behavioralinformationromthe children, eachers,and othersassociatedwith the children's28

  • 8/12/2019 Improving Tabular Displays

    30/31

    ImprovingTabularDisplaysschooling;and (c) detaileddemographicnformation boutthe childrenwho tookthe assessment nstruments. hese dataarereportedn a varietyof ways thatvarywiththecharacterf thedata, heirprospective udience, nd hepurposes f thedata.2Onecan easily constructpathological xceptions(e.g., a samplemean fromaCauchydistribution),utformostnormal ituationshisis a prettygood general ule.

    3I sometimeshear romcolleagues hatmy ideas aboutrounding re too radical-thatsuch extremeroundingwould be "OK f we knew that a particularesultwasfinal. But ourfinal resultsmaybe usedby someoneelse as intermediaten furthercalculations.Too-early oundingwould result n unnecessary ropagation f error."Keepin mindthattablesare forcommunication,otarchiving.Round he numbersand, f youmust, nserta footnoteproclaiminghat heunroundedetailsareavailablefrom the author.Thensit backandwait for the delugeof requests.40f course, eachers'gradebooks areusuallyalphabeticalnd so yield tables iketheoriginal.ButI suspectmany eachers myself ncluded)nowuseelectronicgradebooks which arealphabetizedor ease of dataentryandhavea secondversionforretrieval.Thisdiscussion s aboutretrieval."5Theseapsweredeterminedo be largish hrough onsideration f boththeirsizeand their ocation.A big gap in the tails is not as unlikelyas one of similarsize inthe middle.In this instancewe used inverse ogistic weightson the gaps to adjustfor location(Wainer& Schacht,1978).6Choosinghe maximummaybetoo conservativeormanyusers.Twoalternatives

    may be considered.The first is shrinking he maximum nwardbased upon thestabilityof the estimatesof the standard rror. n this instance he standard rrorsarebasedonabout30degreesof freedom.Thiswouldsuggest omemodest hrinkage.If thedegreesof freedomwere3 or 300, quitedifferentdecisionswouldbe reached.The second alternative s replacingMAX(se) with a more average figure-forexample,

    I se2/n.k=lThis second alternative seems especially attractivein this instance, since thedistribution of standard errors across states is not too far from the nulldistributionxpected roma chi-square ariablewith 30 degreesof freedom.The issues surroundingthe best choice of errorterm is a bit afield from ourpurpose, and so we shall be content to raise it and leave its resolution toother accounts.

    7Although not that easy. I have discovered, to my chagrin, that the two-letter state abbreviations do not yield the same alphabetic ordering as thefull state names.8An especially difficult task is finding out that the state you are lookingfor did not participate in the assessment.9Althoughone small empiricalstudy among3rd-, 4th-, and 5th-gradechildren (Wainer, 1980) showed that, on average, item difficulty increasedwith level and graphicacy increased with age.

    29

  • 8/12/2019 Improving Tabular Displays

    31/31

    Wainer"?Its like trying to decide on Mozart's worth as a composer on the basisof a performance of his works by Spike Jones on the washboard.

    ReferencesBenjamini,Y.,&Hochberg,Y.(1995).Controllinghefalsediscovery ate:A practicaland powerful approachto multiple testing. Journal of the Royal Statistical Society,Series B, 57, 289-300.Bertin,J. (1983). Semiologyof graphics(W.Berg & H. Wainer,Trans.).Madison:Universityof WisconsinPress.(Originalworkpublished1973)Clark,N. (1987).Tablesandgraphics s a formof exposition.ScholarlyPublishing,10(1), 24-42.Court Statistics Project. (1976). State court caseload statistics: Annual report, 1976.Williamsburg, A: NationalCenter or StateCourts.Ehrenberg, . S. C. (1977).Rudiments f numeracy.ournalof theRoyalStatisticalSociety, Series A, 140, 277-297.Farquhar,A. B., & Farquhar,H. (1891). Economic and industrial delusions: Adiscourse of the case for protection. New York:Putnam.Hambleton, R. K., & Slater, S. C. (1995). Are NAEP executive summary reportsunderstandable to policy-makers and educators? (Researchreport).Amherst:Uni-versityof Massachusetts.Playfair, W. (1786). The commercial and political atlas. London: Corry.Rogers, A. (1995). NAEPEX:NAEP data extraction program user guide. Princeton,NJ: Educational estingService.Tukey,J. W.(1977).Exploratory ata analysis.Reading,MA:Addison-Wesley.Wainer,H. (1980).A testof graphicacynchildren.AppliedPsychologicalMeasure-ment, 4, 331-340.Wainer,H. (1992). Understanding raphs and tables. EducationalResearcher,21(1), 14-23.Wainer,H. (1993).Tabular resentation.Chance,6, 52-56.Wainer, H. (1995). A study of display methodsfor NAEP results: I. Tables (Tech.

    Rep.No. 95-1). Princeton,NJ:Educational estingService.Wainer,H., & Schacht,S. (1978). Gapping.Psychometrika,3, 203-212.Walker,H. M., & Durost, W. N. (1936). Statistical tables: Their structure and use.New York:Bureauof Publications,TeachersCollege,ColumbiaUniversity.Zabell,S. (1976).Arbuthnot,Heberden nd the Bills of Mortality Tech.Rep.No.40). Chicago:The Universityof Chicago,Departmentf Statistics.Author

    HOWARDWAINER s PrincipalResearchScientist,EducationalTestingService,Princeton,NJ08541; [email protected]. e specializes n statisticalgraphicsandpsychometrics.ReceivedMarch6, 1995RevisionreceivedAugust28, 1995AcceptedOctober26, 1995