23
The Traditional Future: A Computational Theory of Library Research Andrew Abbott I argue that library-based research should be conceived as a particular kind of research system, in contrast to more familiar systems like standard social scientific research (SSSR). Unlike SSSR, library-based research is based on nonelicited sources, which are recursively used and multiply ordered. It employs the associative algorithms of reading and browsing as opposed to the measurement algorithms of SSSR. Unlike SSSR, it is nonstandard- ized, nonsequential, and artisanally organized, deriving crucial power from multitasking.Taken together, these facts imply that, as a larger structure, library-based research has a neural net architecture as opposed to the von Neumann architecture of SSSR.This architecture is probably optimal, given library-based research's chief aim, which is less finding truth than filling a space of possible interpretations. From these various considerations it follows that faster is not necessarily better in library-based research, with obvious implications for library technologization. Other implications of this computational theory of library research are also explored. mong the most important questions raised by the cur- rent revolution in libraries is that of the effect of new library technologies on library-based scholarship as a whole. Surprisingly, there is almost no serious theoretical reflection on this topic. Most writers focus their attention only on the new techniques themselves: the research tasks that are newly possible or that can now be accomplished faster than ever before. No one asks whether there are sound theoretical reasons for thinking that faster is better or that the newly possible work will lead to improve- ment in library-based scholarship as an enterprise.' Indeed, library-based scholarship as an overall enterprise has seen relatively little study. There is serious empirical study of various search strategies employed by li- brary users with both physical and digital materials. There are articles studying the research habits of individual scholars in the library-based disciplines. And there are manuals teaching students and others how to do library-based research, at least in the minimal sense of finding informa- tion and expert opinion. But there is almost nothing-in the library literature 524 Andrew Abbott is Gustavus F. and Ann M. Swift Distinguished Service Professor in the Department of Sociology and the College at the University of Chicago; e-mail: a-abbottCaudicago.edu.

The Traditional Future: A Computational Theory of Library Researchhome.uchicago.edu/~aabbott/Papers/crlpub.pdf ·  · 2012-05-23The Traditional Future: A Computational Theory of

Embed Size (px)

Citation preview

The Traditional Future: AComputational Theory of LibraryResearch

Andrew Abbott

I argue that library-based research should be conceived as a particular kindof research system, in contrast to more familiar systems like standard socialscientific research (SSSR). Unlike SSSR, library-based research is basedon nonelicited sources, which are recursively used and multiply ordered. Itemploys the associative algorithms of reading and browsing as opposedto the measurement algorithms of SSSR. Unlike SSSR, it is nonstandard-ized, nonsequential, and artisanally organized, deriving crucial power frommultitasking.Taken together, these facts imply that, as a larger structure,library-based research has a neural net architecture as opposed to the vonNeumann architecture of SSSR.This architecture is probably optimal, givenlibrary-based research's chief aim, which is less finding truth than fillinga space of possible interpretations. From these various considerations itfollows that faster is not necessarily better in library-based research, withobvious implications for library technologization. Other implications of thiscomputational theory of library research are also explored.

mong the most importantquestions raised by the cur-rent revolution in libraries isthat of the effect of new library

technologies on library-based scholarshipas a whole. Surprisingly, there is almostno serious theoretical reflection on thistopic. Most writers focus their attentiononly on the new techniques themselves:the research tasks that are newly possibleor that can now be accomplished fasterthan ever before. No one asks whetherthere are sound theoretical reasons forthinking that faster is better or that thenewly possible work will lead to improve-

ment in library-based scholarship as anenterprise.'

Indeed, library-based scholarship as anoverall enterprise has seen relatively littlestudy. There is serious empirical study ofvarious search strategies employed by li-brary users with both physical and digitalmaterials. There are articles studying theresearch habits of individual scholars inthe library-based disciplines. And thereare manuals teaching students and othershow to do library-based research, at leastin the minimal sense of finding informa-tion and expert opinion. But there isalmost nothing-in the library literature

524

Andrew Abbott is Gustavus F. and Ann M. Swift Distinguished Service Professor in the Department ofSociology and the College at the University of Chicago; e-mail: a-abbottCaudicago.edu.

A Computational Theory of Library Research 525

at least-about how the myriad library-research activities of individual scholarsactually come together into a corporateenterprise, much less about the possibleoverall effects of the revolution in aca-demic libraries on the nature and qualityof that enterprise. Indeed, it is preciselybecause we don't have a clear theory ofhow expert library-based scholarship isitself created, sifted, and maintained thatour predictions about the effects of thedigital revolution on library-based expertscholarship are for the most part hangingin midair2

Nor is there much written about thistopic by academics in the fields most af-fected by the revolution in libraries. Thereare library research how-to manuals forgraduate students in scattered fields, afact that indicates that humanists as wellas some social scientists do sometimesteach library research skills to theirgraduate students, introducing themto the critical reading of sources and tothe major bibliographical and archivalguides for their fields.3 But seasonedlibrary researchers do not seem to writemuch about library research methods ingeneral. There are no books on cutting-edge library methodology, no equiva-lents to journals like Sociological Methodsand Research or The Journal of EconomicMethodology where quantitative socialscientists present their latest techniques.For example, one would expect historiansto be among the most assiduous writerson library research methods. But of the45 articles in JSTOR's 71-journal historycollection whose abstracts contain theword "library" or "libraries," none isabout the practice of library research. Lestit be thought that "library" is too generala term, there are only eight articles in theJSTOR history collection with the word"bibliography" in their abstracts, andonly nine with both the words "reading"and "sources." None of these articles isexplicitly about the creating of a bibliog-raphy or the reading of sources.

In part, this lack of attention may re-flect the belief of historians (and correla-

tively of their colleagues in musicology,literature, and the other library-basedresearch fields) that the more globalparts of library research "methodol-ogy"--how to assemble sources, howto maintain records and files, how toassess which areas of a project need fur-ther library work, how to predict wherehidden sources may be found, how andwhen to draw conclusions from mate-rial-are not really "library research"proper. These other things are taken tobe disciplinary knowledge, practices tobe taught in seminars and in the directsupervision of dissertations. They don'tseem belong to the library per se, but to"research design" or some other category.Now, such an argument presumes a strictdivision of labor between librarians andscholars: scholars think up disciplinaryquestions and imagine the various kindsof information that can be assembledinto new answers, while librarians arrayinformation and help the scholars searchthe array for what they need. But no ex-perienced library-based researcher reallybelieves in this division of labor. Ques-tions, answers, sources, and informationare simultaneously in play in a libraryresearch project. There is no separation ofdesign and execution. So it remains sur-prising, given the importance of librariesto the library-based disciplines, that thereis nowhere in them a body of theoreticalor even empirical speculation about thenature of the library-based scholarship asa general social form: how it is that eachindividual library project comes togetherinto a whole and howit is that many suchprojects come together into something wecall knowledge.

As for the sociologists, whose businessit is to study such social forms, they toohave said little. The sociologists of sciencehave been almost completely preoccupiedwith the natural sciences and their labo-ratories, ignoring even the social sciences,much less the humanities. Looking atsociology more broadly, the 56 articles inJSTOR's sociology section that have thewords "library" or "libraries" in their

526 College & Research Libraries

abstracts include none that is about whatwe might call the sociology of advancedlibrary research. There is simply no socio-logical writing about library research.4

This extraordinary disattention to thetheory and practice of library researchis all the more surprising given thatthere are quite a few theoretical reasonsfor expecting the present revolution inlibraries to have very powerful effectson the scholarship accomplished in andthrough libraries. For one thing, electronicconsortia like JSTOR have brought tononelite universities vast holdings thatused to be the privilege of the elite, a de-velopment that could raise or lower theaverage level of scholarship dependingon our assumptions about the impact ofan individual researcher's caliber on hisoutput. For another, the vast increase ofeasily identifiable and retrievable materialhas swelled reference and citation lists,possibly making it much harder to reachconsensus in subfields. For yet another,the huge decline in the cost of accessingmaterials has probably meant-on theclassical two-factor model of produc-lion-that today's scholars spend moretime accessing scholarship and less timereading it than did their predecessors, achange that could have large implicationsfor scholarly quality. One could developmany such arguments.

Evaluating these hypotheses, however,is a difficult matter. First of all, we lackan agreed-upon outcome variable. Whatexactly do we mean by good libraryscholarship overall? Most measures ofscholarly productivity at the individuallevel boil down to bean-counting, eitherof publication or of citations, and no onewith in-depth knowledge of any sub-stantive field thinks that either of thesemeasures has much concept validity. Buteven if we were to have a valid outcomevariable, we don't really have a theory ofhow advanced library research actuallyworks. Yet such a theory is required ifwe are to make predictions about howchanges in library technologies mightactually affect scholarship overall. We do,

November 2008

to be sure, have some ideas about whatscholars do in libraries as individual us-ers. But we don't have a theory of howthose activities are tied together to makea successful scholarly community. Mostof the models for such processes, again,concern the natural sciences, where thePopperian, Kuhnian, and other modelsare familiar 3

In short, there is no truly formal ortheoretical consideration of library re-search as an enterprise and, consequently,no sound basis on which to form a viewof whether the current transformation oflibraries is good orbad for scholarship. Inthis paper, I will undertake the first taskin order to draw some conclusions aboutthe second. I begin with a brief sketch ofstandard social scientific methods. By firstdiscussing a reasonably well-known andwell-thought-through system of research,I hope to establish what are the parts ofa research system and what are the pa-rameters that determine its functioning.With that framework in hand, I then turnto library research, which I define largelythrough its contrasts with this other, morefamiliar body of knowledge procedures,showing how it differs in sources, prac-tices, structures, and aims. This discussionculminates in the argument that the twosets of research practices represent differ-ent forms of computation. By pursuingthis metaphor, I move the discussiononto neutral grounds to escape the usualpolemics about libraries. The paper dosesby drawing out the implications of such acomputational theory of library researchfor the future of both library research andlibrary policy.

A word of definition and clarificationis useful before beginning. By the phrase"library research," I do not refer to allusage of the library, but only to advancedscholarly usage. Undergraduates maybe the most common users of the librarybecause of their huge numbers, but theydo notneed the immense holdings charac-teristic of scholarly libraries. And within"advanced scholarly usage," I am refer-ring only to those branches of scholarship

A Computational Theory of Library Research 527

whose principal mode of production hasbeen the use of library materials. I amthus talking for the most part about thehumanities and the humanistic social sci-ences: scholars of the various languagesand literatures, historians, musicologists,art historians, philosophers, and membersof those branches of sociology, anthro-pology, and political science that drawheavily on library data (historical sociol-ogy, for example). Of course, scientistsuse libraries. But their main mode ofproduction is not library research. I amhere interested only in those branches ofscholarship that rely heavily on librariesfor their "data."

Because there is so little prior work,there is no way to avoid confusing theempirical and the normative in what fol-lows. In part, this is a confusion inevitablein any writing about methods. We wouldnot describe standard social scientificmethods purely in terms of what socialscientists do in practice, but rather interms of what they ought to do in theory.At the same time, those of us who teachthose methods know that in practice wehave to teach our students not only pre-cepts a good methodologist should followin the abstract, but also empirical rules ofthumb that can guide everyday practice.For library research, we lack the abstractprecepts, making do at best with the em-pirical rules of thumb, and in most caseslacking even those: How big a bibliogra-phy is big enough, for exaTmple? (Or toobig?) Indeed, one way of understandingwhat I am doing is to say thatI am tryingto provide the prescriptive theory-the"ought" theory--of library research bytrying to theorize what library researchactually "does," i.e., what it ought to dowhen it is--empirically--being a bestversion of itself. This may be confusing attimes, but it is an inevitable concomitantof the early stages of inquiry.

Standard Social Science MethodsLet me begin by sketching a better-theo-rized body of research method, one thatcan serve as a foil against which to devel-

op my concept of library research. I shalluse standard social scientific methods forthis purpose. Of course, the picture I drawhere will be stark and unnuanced. Butthat is another price of thinking theoreti-cally, at least at the outset. By standardresearch or standard methods I mean heremethods as understood within the broadrange of the quantitative social sciences.I will cover the basics of these researchmethods under three headings: Sources,Practices, and Structures.

To begin with sources. Standard socialscience elicits its data.6 This elicitation canbe by surveys or by interviews. It is mostoften active elicitation, although muchsocial science is built on data that is eithercollected on a routine basis (like censusdata) or simply passively piled up as apart of record-keeping for commercialor other purposes. Data that is activelyelicited is standardized and formalizedin various ways: it can be selected accord-ing to the rules of sampling, for example,and it can be precoded via forced choiceinstruments. After "cleaning," it can beused directly or further aggregated viadata reduction techniques like factoranalysis and clustering.

These gathered and prepared sourcesare then subject to the various practices ofresearch. The data are first translated interms of a set of concepts and measures,which have usually, indeed, governedmuch of the process of elicitation. Theseconcepts and measures are typicallywidely shared across a literature, like thenotion of stress in studies of social sup-port or like the use of years-in-school as avariable to indicate education. Substantialsubliteratures form around the task ofimproving these concepts and measures,an improvement that may mean betterstability over time, or better portabilityacross datasets, or greater plausibility interms of theory.

Once couched in terms of sharedconcepts and indicators, the translatedsource data-which have now been re-dacted into variables--become subjectto various methodologies. The majority

528 College & Research Libraries

of these methodologies in social sciencehave the aim of expressing some one(dependent) variable as a function of therest (independent variables), typically asa linear combination of them. The choiceof methodology is to some extent deter-mined by the nature of the dependentvariable, although, conversely, that vari-able can usually be transformed to fit apreferred methodology--dichotomized,categorized, logit-transformed, and soon. These methodologies are for the mostpart completely routinized recipes foranalysis; one "writes a model," puts thedata in, and results come out. But, all thesame, there is plenty of room for modify-ing these recipes through the handling ofthe various challenges that data alwayspresent to the stringent assumptions ofthe statistical techniques. Seemingly me-chanical in theory, these methodologiesnonetheless require a subtle and artfulhand in practice.

The underlying logic of all of thesepractices--loosely but nonethelessstrongly held by most people workingin standard social science research-is amodified version of the Popperian modelof conjectures and refutations7The schol-arly intervention is regarded as makinga plausible conjecture about the way theworld is and then evaluating it againstdata. If the conjecture is not rejected, thenbecause of its theoretical plausibility it canbe added to and possibly reconciled withour stock of conjectures to this point. In aloose sense, that is, the basis of standardsocial scientific methods is a correspon-dence between our model of the world(that a group of independent variablesdetermine a dependent one in a certainway) and the way the numbers fall outin practice.

Adding a conjecture to our stock ofconjectures is often a simple matter: byadducing a new conjecture, an articlemay set limits on the range of a causalrelationship or, conversely, show its vi-ability in new realms. The question ofreconciliation with existing conjectures isa more vexed one, however. New results

November 2008

are often inconclusive, and new method-ologies can produce results not so muchcontradictory to earlier ones as incom-mensurable with them. The problem ofreconciliation thus raises the question ofhow our standard research practices areembedded within a larger structure ofresearch. How, that is, are researchers andtheir individual research chained togetherinto a larger enterprise?

The first larger structure of standardresearch is the enormous corpus ofdata-both formally elicited and pas-sively collected-that the social scienceshave used over the years. Much of this iscollected in places like the Bureau of theCensus, the Inter-University Consortiumfor Political and Social Research (ICPSR)and the National Opinion Research Cen-ter (NORC) that maintain data archives.Other data remains in researchers' papers.Sometimes data is published in one formor another, in print or online. What isimportant for our present purposes isthat, in the main, this corpus of data isnot really systematized and ordered;there is no quantitative equivalent to thehistorians' National Union Catalogueof Manuscript Collections, for example.The main characteristic of the larger datastructure of social science is this unor-dered, unsystematized quality. It is just avast pile of used datasets.Y

A second structural quality of the stan-dard research world is specialization anddivision of labor. Division of labor canobtain at the level of the project; there canbe interviewers and coders and analystsand PIs. But it can also obtain at the levelof the discipline. There are specialists insampling and in particular quantitativemethodologies just as there are special-ists in this or that research area. This isan obvious fact and requires no furthercomment.

A third structural quality of standardresearch is that it is to a considerableextent characterized by a sequentiallogic. Things have to happen in a cer-tain order. You gather data before youanalyze it. You validate your measures

A Computational Theory of Library Research 529

before you apply them. You select datawith a certain question in mind. Beyondthe level of the project, this sequentialitycontinues. Broad propositions tend to besucceeded by more specific and limitedones. Subparts of large questions mustbe resolved before attacks on the largequestions can produce credible results.To be sure, quantitative social science asa general enterprise is typically advanc-ing on many fronts simultaneously. Butwithin particular research traditions, asequential logic usually applies, as wesee from the common belief in cumulativ-ity. In any specific literature of standardresearch, early pieces are felt to be lessspecified, less methodologically care-ful, less definite. Later results are morespecific, more rigorous, more defined.Within such traditions, indeed, laterstudies often self-consciously replicateearlier studies, even while extending orspecifying them.

The final quality of standard researchtaken as a structure is its organizationaround a search for truth. As I noted ear-lier, "truth" here means in practice a cor-respondence between the way wepredictthe numbers to be, given our theoreticalideas, and the way the numbers actuallyare when we have gone out and measuredthe world. This means that standard re-search is ultimately a form of predictionand search. The truth is thought to be outthere in the real world (a, b, and c causex), and our model.is a hypothesis aboutwhat that truth is (maybe we think b andc cause x). We measure reality accordingto our model, and then reality tells uswhether we found the truth or not (in thiscase, that we are a little off in our guessabout where truth is).

Standard methods are thus ultimatelya formalized version of blind manis bluff;we make educated guesses about wherethe truth is and then get told whether ourguesses are right or wrong. Fundamentalto this game is our belief that the truthis somewhere out there in the world tobe discovered. There is a "true state ofaffairs." Our inability to find it may be a

problem, but the true state of affairs existsand can in principle be found.

One can disagree with various partsof this picture, and certainly one couldmake it much more precise. But overall itis an acceptable thumbnail sketch of howstandard research operates in practice inthe social sciences. Let me summarize itquickly. The sources of standard researchworks lie most often in actively eliciteddata, which is often standardized or con-catenated in the process of being collected.The practices of standard research beginwith the application of measures and ter-minologies that are standardized, widelyshared (or, at least in principle, sharable),and usually fairly rigid and specified.They then continue with the applicationof routine methodological recipes thatevaluate the conjectures of researchersby comparing them to the state of the realworld. The recipes either accept or rejectthe conjectures. The larger structures ofthis standard research world comprisefirst the enormous collection of used data,which is not particularly systematized orordered. They comprise second the quali-ties of sequentiality and division of labor.And they comprise third an overall orga-nization of research around the search fora true state of affairs, which is taken to be"out there" in the real world, but possiblyvery difficult to find.

Library ResearchLet me now turn to a similar analysis oflibrary research. As I noted earlier, this isamuch less organized and defined systemof materials and practices. But we cancharacterize library research by lookingagain at sources, practices, and structures,using the sketch just given of standardmethods as a guide to the analysis. If, asa result, library research seems a little tooperfectly opposed to what I have calledstandard research, we can regard that asa heightening of differences for ease ofcomprehension, not as a claim that someawful chasm divides the two. In fact,they interpenetrate considerably.' Note,finally, that it should be recalled from

530 College & Research Libraries

the introduction that the phrase "libraryresearch" means use of library materialsby expert scholars and, in particular, byscholars in those disciplines for which useof library materials is the primary modeof intellectual production-historians,professors of literature, and so on.

The differences start at the beginning,with sources. Library research uses notelicited data, but recorded data--thingsin libraries. Some of this is passive recordsof the kind we have earlier seen: routinecensus data or annual reports of compa-nies, governments, and other organiza-tions. But much of it is author-producedprimary material of various types: novels,autobiographies, religious tracts, philo-sophical discourses, films, travelogues,ethnographic reports, and so on. Whatis important about all of this primarymaterial is that it was not elicited by theresearcher. It is simply there-created byits authors or originators and depositedone way or another in the library. In thissense, the only analogous material instandard research is passively collectedquantitative data.

But this recorded primary material isonly part of the data for library research.An immense portion of the sources oflibrary research consists of prior libraryresearch (and indeed prior nonlibraryresearch as well). Moreover, library re-search uses this prior work in a very dif-ferent way than does standard research.In standard research, previous work is of

interest largely for its output-the con-jectures that it authorized or rejected. Inlibrary research, prior research is used forall sorts of things in addition to its output.Indeed, it is often ground up into pieces:its primary data can be redefined and re-used, its interpretations can be stolen andmetamorphosed, its priorities deformedand redirected, its arguments ransackedfor irrelevancies that are changed into

major new positions. Although it is bycustom called "secondary material,"the prior research work recorded in thelibrary is, to all intents and purposes, yetanother form of primary data. We can

label this peculiar and intensive use ofprior research with a word from computerscience. Library research, we can say, isrecursive; it can operate on itself.

So the sources of library research arequite different from those of standardresearch: they are not elicited by research-ers, and they are, in the sense just de-fined, to a considerable extent recursive.Moreover, the vast corpus of stuff thatmakes up the data of library researchersis ordered in a number of important ways.It is classified-not only by its authorand publisher and date and other factsof provenance, but above all by its subjectheadings and, in particular, by the mostimportant of these: the call number thatgives it a physical location. Unlike thedata of the standard researchers, the dataof the library researcher is embodied inphysical artifacts, a fact to which I shallreturn below. (This is of course changingat the moment, but we are considering thesystem as it has evolved to the present.)

But subject headings are not the onlyforms of classification and orderingin the library. To subject headings areadded back-of-the-book indexes and bib-liographic notes, subject bibliographies,encyclopedias and handbooks and otherreference works, bibliographical guides,and so on. Most of this indexing and as-sembling is done by human minds, notby the concordance indexing that drivesmost of our current search engines.'"Indeed, an enormous amount of thisindexing is implicit in the contents ofthe data artifacts themselves: one way ofunderstanding any given book based onlibrary research is as a kind of index to aparticular set of other library materials.

In short, library research materialshave an order imposed on them quite dif-ferent from the order present in the elic-ited data of standard methods. In eliciteddata, the analyst imposes order on whathe perceives to be mere human activityby applying certain accepted conventionsof measurement and conceptualization.And once a dataset is gathered and used,it goes on a stack of datasets that is not

November 2008

A Computational Theory of Library Research 531

further ordered, classified, or indexed.(There are a few counterexamples, butthey prove the rule.)

But no one would take the materials ina library as uncognized activity needingto be ordered by certain conventions ofmeasurement and coding. Library materi-als are already cognized and ordered indozens of ways. Each book is a particularselection of things by a human agent; and,beyond the books themselves, indexershave created dozens of mappings-byno means all the same-of myriad idio-syncratic subsets of the materials in thelibrary. The sources for library researchare, in short, fundamentally differentfrom those of standard research, above allbecause of this huge amount of indexingand preorganization, which far surpassesthe straightforward application of mea-surement and coding conventions thatis characteristic of standard research. Letme underline that I am not speaking ofone, single comprehensive order. Thereare multiple such orders, and deliberatelyso, an obvious contrast with the strainof standard research toward consistentdefinitions.

In summary, the sources of libraryresearch consist of recorded materials,which include prior library research(which thus can be used recursively) andwhich are ordered by a large numberof multiple and cross-cutting indexesthat govern myriads of subsets of theircontents. It helps to have a simple termto refer to library materials: I shall callthem "texts."

With these texts, library researchersundertake quite different practices thando their standard researcher colleagues.In the first place, library researchers toa great extent lack the well-defined andwidely shared concepts and measuresthat are fundamental to the practices ofthe standard researchers. The only strictlydefined terms in library work are those ofcertain established large-scale indexes,so-called controlled vocabularies; at themonograph and reference-book level,such controlled vocabularies are created

de novo for each new artifact. And thesteady shift of terminologies as languagedrifts inevitably over time--particularlywith respect to more complex concepts-limits the efficacy of the major controlledvocabularies. Some library fields havehighly specific and enduring terminolo-gies, to be sure: musicology and historicallinguistics are examples. But the vast ma-jority of library research does not involveuse of widely shared, well-defined, andstable concepts, nor any other idea of"measure" analogous to that in standardmethods.

The chief practices of library scholarswith texts are reading and browsing. Itis these that are, in fact, the analogue ofthe standard researchers' measurement,since it is by reading and browsing thatlibrary research scholars extract what theywant from texts. By pointing to readingand browsing as methodologies, I wantto make them unfamiliar, less taken forgranted. We need to see the exact anal-ogy between a standard researcher, who"measures" the social world using a fairlylimited vocabulary of shared conceptsand indicators, and a library researcher,who browses or reads a text using his orher own-and possibly idiosyncratic-in-terpretive armamentarium.

To understand reading and browsingas the analogues of the measurement andmethodology of standard research, it isuseful to borrow language from com-puter science. Measurement, in computerscience terms, employs a fairly simplealgorithm. A measurement algorithmtakes social reality as input and returnsa number or category. The shared-orat least (in principle) sharable-natureof the algorithm means that its output isindependent of who runs it.

Browsing and reading constitute thiskind of "measurement" only in a verylimited sense. To the extent that we thinkof a text as having a single fixed meaning,invariant with respect to any differencesin the readers, reading the text should re-turn that meaning. In such a case we couldthink of reading as pure measurement.

532 College & Research Libraries

But texts that have such fixed meaningalmost never occur in natural language;they can exist only in things like computerprogramming that have perfectly con-trolled vocabulary and syntax. Most textshave multiple and ambiguous meanings,and no texts outside controlled languageshave meanings that are invariant withrespect to readers.

Reading and browsing-the twoare simply different levels of the samething-thus belong to a different familyof algorithms than does measurement.They are association algorithms, in whichinput is taken from text and combinedwith reader-internal data to produce an

output. They are thus inherently nonrep-licable because of their dependence ondata internal to the reader or browser. Auseful way of imagining this is to thinkabout the book-reader technology as com-pared with the site-surfer technology. Inthe site-surfer technology, hyperlinks arehard-coded into the page and direct everyreader to specific preconnected pages. Inthe book-reader technology, hyperlinksare generated dynamically in the act ofreading. They arise by the conjunctionof knowledge in the mind of the readerwith potential meanings in the bodyof the text. Such a system is obviouslyintensely dependent on the richness ofprior knowledge in the minds of readers.And although we can, through things likegeneral examinations, force a certain levelof basic background knowledge into theminds of young scholar readers, there willremain quite large random differencesin this background knowledge even be-tween fairly closely comparable scholars.Consequently, there -ill be substantialvariation in the outputs of the readingprocess even between two such scholars.Reading is thus profoundly different frommeasurement as a research practice, sincethe latter has replicability as one of itsmost important qualities.,

One can "read" with differing levels ofattention to detail. Skimming is what wecall reading when we pay very little at-tention to detail. Browsing is what we call

November 2008

reading when we disregard not so muchthe details of a text as its composed order.Browsing is an association algorithm thatignores the continuous order of the textor, more commonly, that is applied tothings that are not continuous composedtexts in the first place but that have otherkinds of order built into them. One canbrowse a continuous text by flippingthrough it here and there, but one moreoften browses things that have an orderthat is not through-composition. Onebrowses an index or a bibliography, whichis ordered alphabetically by main topicand/or author. Or one browses a hand-book or other reference work, which isordered by main topics in some structuralor functional relation to one another. Orone browses a shelf, which is ordered bycall number. In each case, that is, brows-ing brings together a prepared mind anda highly ordered source that is (usually)not a continuous text.

To some extent, browsing is analogousto what are called hashing algorithms insearching systems; it takes large blocksof material and disregards their detailedorder or inspects them on the basis ofsimple data checks. At other times, brows-ing operates via simple association of

random elements in the object browsedwith random elements in the reader'smind. Each random connection is as-sociated with a probability that it will beuseful, and those above a certain levelare retained. What is central to all formsof browsing is thus the coming togetherof a highly organized but not necessarilycontinuous source object with an equallyhighly (but quite differently) organizedmind. From this is expected to emerge asubstantial collection of productive butrandom combinations.2

As I have noted, the role of internalknowledge in reading and browsingimplies a crucial difference from themeasurement that is their equivalentin standard methodology; they are notreplicable. Two readers don't get quitethe same output from reading a book,and there is no real attempt in library

A Computational Theory of Library Research 533

research fields to correct this by improv-ing measures, controlling terminologies,and so on. There is thus no real equalitybetween an English professor presentinga reading of a novel to a class and a soci-ology professor discussing quantitativeindicators of education. The second isinterested in and hopes to produce repli-cability The first regards replicability asboth unachievable and undesirable.

Another equally important differ-ence between the "methods" of libraryresearch and those of standard researchis that the former lack sequentiality. Evenat the single text level, library researchersread straight through only rarely. Whilesome library researchers read backgroundsources straight through at the beginningof a project, it is much more common fora project to begin out of a variety of typesof sources of varying levels of detail andrelevance, which have been read in noparticular order. There is no equivalentin a library research project to the Idea-Question-Data-Method-Result sequenceof the standard research program. Tobe sure, even the latter is, in practice,something of a rationalization after thefact, but in library research there is noattempt to create even such an imposed,retrospective order.

For example, there is no right orderin which to read the original sourcesfor a book on, say, the passing of theBritish Reform Bill of 1832, althoughof course any library researcher wouldfind the major secondary source-J.R.M.Butler's magisterial book-on the firstbibliographical pass. 13 Should you readthe Parliamentary debates first? or theprivate correspondence of Earl Grey?or the diaries of the important Torymagnates? It is possible that three quitedifferent but equally important workscould be written on the subject startingfrom those three different beginnings.The sequence does not matter. The ruleof thumb in library research is usually toread most heavily-at any given time-inthe area of the largest hole remaining inthe argument. The result of that rule is

that sources are read in wildly differentorders in comparable projects.

But the lack of standardization and thelack of sequentiality do not exhaust thedifferences between library and standardresearch practices. Library research is alsodifferent in that it is customarily artisanal.Each project is done by a single scholar.This obviously goes hand in hand withthe lack of standardization. The unity thata project has is the unity of its researcher;since his is the mind that reads and inter-prets, his is the mind that browses, his isthe mind that ultimately assembles read-ings and interpretations and browsingsinto a work of scholarship. Those of uswho do this kind of work have all tried touse research assistants. And nearly all ofus have given up on them except for thosevery narrow portions of projects wherewe can make use of fixed terminologies.No research assistant I have ever hiredto compile a bibliography has come upwith one half as good as the ones I canmake for myself. They simply don't havethe same contents in their minds andhence can't perform to my satisfaction thesimple associative task that is creating abibliography.

The downside of artisanality is familiarenough; it slows production. Historically,this has been one of the crucial forcesdriving the social sciences toward theresearch practices that I have here calledstandard research. But artisanality hasan important upside, which is that itpermits an extremely productive form ofmultitasking. It is best to show this withan example. In a recent library-basedproject I chose to do my own coding ofthe lives of every occupational therapistworking in Illinois in 1956. And becausethe relevant source did not permit im-mediate extraction of exactly and onlywhat Iwanted, I was forced to scan largeamounts of interesting but slightly ir-relevant material: lives of occupationaltherapists in other states, aspects of careerdata that I wasn't coding, addresses, andso on. In the eight hours that I spent onthe task, I let my browsing self run in the

534 College & Research Libraries

background like a virus checker. Whatit picked up-that is, what I acquiredin addition to the coded careers that Iwanted-were signs of crucial changes inthe population of organizations employ-ing occupational therapists, indicationsof a separate military career trajectoryfor occupational therapists, two possiblehypotheses about the intersection of socialclass and occupational therapy, and a firmgrasp on the marital demography of oc-cupational therapists. Even if my researchassistant hadn't wasted his own multi-tasking capabilities by listening to music(as he usually would, I am convinced), hedoesn't know that Easter Seals-one ofthe common employers of occupationaltherapists-was a polio relief organiza-tion in the 1950s and that polio virtuallydisappeared as an American problem inthat decade, two facts that, taken together,show that one of occupational therapy'scrucial work jurisdictions was underthreat. He doesn't know that, in 1956,there was a mental hospital in Anna, 1l-linois, which in a complicated way wasthe key to my marital pattern insight. Isaw those things only because I had therequisite knowledge, left over from pastprojects or-in the polio case-simplyfrom having lived through the periodinvolved.

It should at once be noted that anotherseasoned researcher might have seenother things and not these. But as we willsee below, that doesn't, in fact, matter.What does matter is that because a singleprepared mind does all the work in thetypical library project, the prospects forproductive multitasking are very, veryhigh. This is all foregone in the standardresearch project with its often consider-able division of labor.

So far, I have discussed the library re-search practices of reading and browsing,with their qualities of nonstandardiza-tion, nonsequentiality, and artisanality.And I have emphasized the multitaskingpermitted by artisanality. Reading andbrowsing, I have argued, are the ana-logues of conceptualization and measure-

November 2008

mentin standard research, which are-bycontrast-based on standardization andsequentiality and which consequentlypermit, and indeed take advantage of,division of labor, both within projects andacross projects. What then is the libraryanalogue of methodology proper-ofregression, or log-linear analysis, or eventhistory methods, the various statisticaltechniques of the standard researchers?And what is the equivalent of the logicalfoundation of standard research practiceson conjectures and refutations?

The quick answer is that there is nosuch analogue. There is no family offixed recipes by which library scholarsproduce their final output. We can atbest give a general name to the processby which library researchers assembletheir various materials into written texts.I shall give that process the label of "col-ligation," a term of William Whewell.11It denotes the inductive assemblage ofa set of facts under a general conceptionof some kind. A classic example is JacobBurckhardt's colligation of the variouschanges in thirteenth century Italian citystates under the heading of Renaissance.Whewell famously attempted a generaltheory of such induction, but it has hadfew followers and no successors. Indeed,much of nineteenth-century Germanhistoriography aspired to a quite differenttheory of historical writing. According toRanke's celebrated dictum, history was amatter of search and discovery, a findingout of what had actually happened--wiees eigentlich gewesen. This is exactly themodel of standard research discussed ear-lier-not the imagination of a new wholeout of diverse parts, but the discovery of agiven truth out there in the world.'5

If we list the kinds of colligationsthat are legitimate products of libraryresearch, we see at once that the practiceof library researchers for the last centuryhas followed Whewell rather than Ranke:the pursuit of a findable and fixed truth isnot an accurate summary of or model forprofessional history or for any other of thelibrary research disciplines. To be sure,

A Computational Theory of Library Research 535

one body of legitimate library researchdoes consist of what I will call Rankeaninvestigations: investigations aiming toexploit new primary sources and to add toour collection of known--that is, orderedand located-facts. An example would bea family reconstruction study of a particu-lar English village. A much larger, secondclass of workis the rewriting or remakingof past colligations into newer shapesconforming to the ever-changing cul-tural norms and questions of the present.Ranke himself provides an example; herewrote the earlier historiography of theMiddle Ages as Marc Bloch was to rewritehim, Georges Duby to rewrite Bloch, andso on. Often, such works rest on Rankeaninvestigations, but they put those newfacts to even newer uses. A third and evenmore adventurous class of work under-takes not reinterpretation but whole newcolligations, pulling together old facts andinterpretations into whole new "things."We see this in the rapid development ofthe concept of "women writers" over thelast thirty years, which has driven notonly reinterpretations of canonical writ-ers like George Eliot and Edith Whartonbut also has led to Rankean investigationsinto writers hitherto ignored like MaryWebb and Charlotte Yonge, all in theservice of creating the new colligation of"women writers."

It is, to be sure, no news to anyone thatthere are not formal recipes for producingthese three kinds of colligations: Rankeaninvestigations, reinterpretations, andrecolligations. There are not even cleargenres for writing them: within history.for example, one can think of narrativesthat fall into all three of these classes.The same is true of biographies andquantitative works on historical topics.Nor is it clear that there is anything thatcorresponds to the conjectures and refuta-tions logic underlying standard methods.There is a loose sense that library-basedworks should be organized around ques-tions, but those questions can take manyforms. It is perhaps better to say that thereis a taste in library-based work, a taste

for reinterpretation that is clever and in-sightful but at the same time founded inevidence and argument. I shall return tothis problem of the criteria for successfulcolligation below.

Let me then summarize this discus-sion of the practices of library researchbefore I go on to the notion of the largerstructures of library research and theirqualities. I have shown so far that thesources of library research are nonelic-ited, to a considerable extent recursive,and multiply ordered. I have shown thatthe basic practices of library research areassociative algorithms like reading andbrowsing. This dependence on associativeproduction implies the nonstandardiza-tion, nonsequentiality, and artisanalityof library research practices and conferson them an especially powerful form ofmultitasking. I turn now to the largerstructures and qualities of the libraryresearch enterprise.

The first larger structure of the libraryresearch world is, of course, the collectionof all the sources--that is, libraries. I havealready noted some of the characteristicsof libraries as repositories of sources-their multiply ordered and recursivequality. I should also note here one par-ticularly important physical quality of thelibrary. Physical libraries contain recordsof their own past orderings, which indeedconstitute one of the basic data types forthe discipline of history. This is true notonly of indexes and other orderings but,in a far more important way, of the physi-cal artifacts or books themselves. A bookis a representation of an interpretation at agiven moment and cannot be modified bylater interventions. There is no equivalentin the online world. Record copies couldbe created in principle, but they can eas-ily be modified either through accidentor malice. At present, they simply do nothave the stability of print.16

Since library research is so dependenton associative rather than measurementalgorithms, the second central structure ofthe library research world is the collectionof prepared artisans: that is, the scholars in

536 College & Research Libraries

the several disciplines. Preparation is thecentral matter here. The library researchenterprise, taken as a whole, depends onhaving workers who are prepared fortheir task; otherwise, reading and brows-ing don't work. Library research dependson prepared minds-people who havepassed laborious general exams that cramtheir heads full of facts and interpreta-tions that will provide the steady flowof hyperlinks as they read texts in thelibrary. Note the policy implication: thatif we abolish these kinds of exams andmemorization, we vastly decrease theoverall power of library research as anenterprise. Sadly, some of this decreasehas already taken place by means ofvarious otherwise admirable reforms ingraduate programs.

Finally, and most important, the arti-sanal research system characteristic of li-brary research is massively parallel ratherthan sequential. This is perhaps the mostprofound difference between library andstandard research. As we have seen, thereis a pretty clear-if often imperfect-se-quential logic to standard research. Boththe individual research project and thecumulative research enterprise have astheir ideal a process that is specificallyprogressive: a logical ordering of tasksin the individual research project anda cumulative ordering of results in thecollective body of research projects takentogether. Of course, there is some degreeof parallelism in the standard researchsystem. There are research projects onindividual career changes, for example,going forward at the same time as arestudies on the mobility of whole classes.But the underlying logic of the system,and certainly its ideals, are sequential.

This is emphatically not the case inlibrary research. Nobody thinks that thegreat book on Jane Austen has to waitfor the great book on Pride and Prejudice.These things are taken to be unorderablein principle, in the sense that one couldimagine either one of these great booksbeing written first and exercising a de-termining effect on the other. Indeed, a

November 2008

library research community would bedisturbed if that were not the case. Thereis absolutely no order to the topics in-vestigated in library research, and onlyin the most degenerate cases do we havethe sequentiality of the standard methods.To be sure, some part of library researchconsists of Rankean investigations thathave a certain kind of cumulativity, akind of simple piling up of brute newfacts. But very little of the action in thedisciplines of history or of literature ismerely about piling up facts. The actionlies in using new facts to leverage new re-interpretations and radical recolligations.These last have no logical or cumulativeorder whatever, and, indeed, one of thestandard gambits in library-based fieldsis precisely to overturn some implicitordering of results in favor of some otherpossible ordering.

The importance of this parallel qualityof library-based research can be seen if wethink about it computationally. Standardresearch can best be imagined-in theideal at least-as a species of classicalvon Neumann programming. It conceivesof research as a MAIN program that hasvarious subprograms contributing toit and called from it as the need arises.It presupposes well-defined terms thatare sharable across program units. It al-lows-indeed encourages-specializationof subroutines and subcalculations. It isgoverned by an overall sequentiality. Andit aims, ultimately, at the successful per-formance of a fairly simply optimizablesearch task, which is the discovery of atruth that is taken to be out there in theworld but hidden by various amounts ofmisinformation and randomness.

Library research is fundamentally quitedifferent from this. It is a massively paral-lel system in which individual, largely un-coordinated processors are taking inputsidiosyncratically from other processorsand from the stack of prior informationand are then employing idiosyncraticknowledge and concepts of their own toturn outnew outputs that in turn becomeinputs to other individual processors like

A Computational Theory of Library Research 537

themselves. It has no sequentiality, nosubroutines, no common variables. Andalthough the "Rankean investigations"portion of it may be optimizing the searchfor a truth that is assumed to be out therein the world but hidden by misinforma-tion and randomness, the rest of it is doingsomething quite different.

There is a concept in computation forthat kind of a computing architecture- theconcept of a neural net. And once we rec-ognize that library research as traditionallypracticed has a neural net architecture, weare suddenly on very new ground. For onething, this means that, contrary to widelyheld views, library research is every bitas "technological" a research system as isstandard research. Neural nets are quitecapable of performing all the basic taskswe expect computers to perform: mostnotably, they can remember and convergeon and hence possibly discover patterns.You don't need an elaborate structure todiscover patterns; you don't need acceptedterms, conventional measures, and stable,recipe-based methodologies. You can dowithout sequentiality and-by implica-tion-even cumulation altogether. Youjust need the right input-output weight-ing patterns for the individual nonse-quential processors, dispensing therebywith common definitions and variablessystemwide. And you need to stronglyprepare the artisan-processors, loadingthem up with the stuff that will make allthe materials they read come alive withblue hyperlinks.

So the first basic conclusion about li-brary research is this: It is not a low-techsystem designed for people who can'tthink rigorously. It's actually a quite high-tech computational architecture that reliesheavily on well-trained individuals. Thattheywork in what seem like random waysand random orders on the stack of priorknowledge and interpretations is justpart of the architecture; it's not a desper-ate intellectual problem. You don't needreplicability and cumulation and all thatother apparatus of discovery. You needwell-trained scholars, a strongly ordered

stack of material, and a willingness totolerate randomness.

The main structural quality of libraryresearch is, therefore, parallelism. And, asI have just noted, a parallel architecturecan produce patterns as effectively as asequential, von Neumann one. But canwe say that the aim of library researchis, in the last analysis, to search for truepatterns out there in the world, as is thecase with the standard research system?Other than for the "Rankean investiga-tions" part of the library research system,I think the answer to this question is no,and that the real reason for the differencebetween the architectures of standardand library research is that the libraryresearch system does not really aim atthe search for a truth out there in theworld, but at something quite different.This in turn will mean that optimizing thelibrary research system is not the same asoptimizing the standard research systemand, in particular, that making the libraryresearch system "more efficient" will notnecessarily improve its overall ability todo what we want it to do.

In general, the disciplines that sustainlibrary research as their primary mode ofresearch are not fields that are organizedaround the pursuit of a truth to whichone comes closer and closer. The universeof possible interpretations of Pride andPrejudice is in principle infinite, as is eventhe universe of possible interpretations ofJane Austen as a biographical human be-ing, even if the date when Jane Austen thebiological individual died is somethingspecific and finite that can be establishedas a matter of truth. Obviously, the disci-pline of English literature is more inter-ested in those types of things to be saidabout Jane Austen that are infinite than inthose that are finite. The specifiable dateof her death is uninteresting comparedto the infinitely evolving meanings ofPride and Prejudice. This does not meanthat canons for rigorous thinking aboutthe latter are not possible; any extensivereading of work in literary studies willpersuade one quickly to the contrary. But

538 College & Research Libraries

the computational task of the algorithmthat is literary research taken as a wholeis not the task of finding, as efficientlyas possible, the truth about Pride andPrejudice or even about Jane Austen. Thetask is rather something like "maximallyfilling the space of possible interpreta-tions" or "not losing sight for too long ofany given region of the space of possibleinterpretations"-or something like that.That is, the computational criterion wemust optimize has something to do withcomprehensiveness and richness ratherthan with rapidity of convergence.

A similar argument applies to all li-brary research-based fields: literary stud-ies, musicology, art history, history, andthe library-based parts of sociology, po-litical science, and anthropology. In all ofthem, the overall thing library researchersaim to optimize is not a "truth" but a rich-ness and plenitude of interpretations. Atany given time, one or another school mayfocus attention in one part of the space.But, in the long run, unvisited regionsare always returned to cultivation, andplenitude is again and again achieved. Inpractice, this may look like rediscoveringthe wheel, but it is, I believe, the ideal ofa set of disciplines whose focus is less onthe true than on the meaningful.

Although specifying such a criterion ofmeaningfulness or plenitude is, of course,a long task, I should underscore my hy-pothesis that the reason library researchtakes the shape I have outlined here-aneural net of highly trained processorsmaking local adjustments in the web ofmeaning-is that this is the optimal wayto produce knowledge about the propa-gation of meaning in human systems.Indeed, we employ the same strategywhen we study meaning "in the wild"rather than recorded in the library--thatis, when we do anthropological ethnog-raphy. Anthropology studies the samegeneral form of data (unformed, rawexperience) as does standard social sci-ence, but it does with that data not whatstandard research does but rather whatlibrary research does: it puts individual

November 2008

human processors (ethnographers) intoraw unformed experience and asks themto return the results of their observationsto a general stock of ethnographies, mak-ing it part of the input that will guide newethnographies of the future.

The ultimate reason that a networkof human processors does better in thepursuit of meaning than do divided laborsystems with shared vocabularies andrecipe methods is that the extraordinarymultiplicity of meaning cannot easily becaptured by the rigidly limited vocabu-laries of variables in standard methods.We know, for example, that the youngBronislaw Malinowski met Edith Whar-ton- then nearing the end of her life-ather villa on the French Riviera in the late1920s. That's a Rankean fact. But onecould imagine this fact appearing in anyof a half-dozen library research-basedbooks: a biography of Malinowski, astudy of the American encounter with Eu-rope, a work on the succession of cohortsin arts and literature, an analysis of therelation of anthropology and fiction, anexamination of Wharton's relations withmen, and a study of the evolution of theFrench Riviera as a cultural center. Noneof these is the "right" colligation of thatfact. Each of them pulls it in a differentdirection. If we want a body of researchthat will take up all these possibilitiesof interpretation, we cannot employ amethod that strips out all but one mean-ing from the start. And that is exactlywhat standard research does.

The result of these different approach-es in the two research systems is differentpatterns of knowledge development.Standard research tends to evolve intightly organized literatures with widelyshared conventions on variables, meth-ods, and problems. As they try to take ona greater and greater range of problems,these literatures run up against otherliteratures coming into the same areasbut with different conventions. Thereusually results a dash and a starting over.Standard literatures are thus somewhatself-limiting.17 By contrast, library re-

A Computational Theory of Library Research 539

search fields do not really have coherentliteratures with extensive conventions.Their coverage of the space of possibleworks is much more interwoven and in-terpenetrating than that of the coherent,separable traditions of standard work.Works of library research are forgottengradually and as individual works ratherthan being hemmed in and starved aswhole traditions.

In short, the reason library researchlooks the way it does isnot that we haven'thad the tools to be efficient about it butrather that library research aims to accom-plish somethingrather different than doesstandard research. It is not interested increating a model of reality based on fixedmeanings and then asking observed real-ity to judge whether this model is right

or wrong. It does not ultimately seek acorrespondence between what it arguesand a "real world." Rather, it seeks to con-tribute to an evolving conversation aboutwhat human activity means. Its premise

is that the real world has no inherent orsingle meaning but that it becomes whatwe make it. Individual works can best

contribute to that conversation if theycombine a coherence of individual vi-sion with a tolerance of reinterpretation.They require solidity in themselves butalso must facilitate their own reuse in

other contexts. The system of knowledgeso produced aims to find the largestpossible universe of human meanings.On the way, it will turn up oceans of

Rankean facts. But they are just a meansto another end.

ImplicationsHaving shown that library research is a"technological" (in the sense of cogni-tively legitimate) approach to thinking

about human affairs, I would now like toturn to the implications of my argument

for libraries and library research goingforward.

The first, and by far most important,of these is that because library researchis not aimed at finding things, in particu-lar at finding correspondences between

models and the world, but rather is aimedat space-filling or some other criterion ofplenitude, it is by no means clear that in-

creasing the efficiency of library researchwill improve its overall quality. For ex-ample, we cannot automatically assumethat increasing the speed of access tolibrary materials by orders of magnitudehas improved the quality of library-basedresearch. This would follow at once ifconvergence on correspondence betweenmodel and reality were the aim of libraryresearch, but given that it is not, there'sno necessary reason why faster shouldbe better. In fact, given that browsing isreduced by efficient access, my argumentimplies that, other things being equal,faster is probably worse.

Indeed, this skepticismholds for many

other technological improvements oflibrary research. For example, an applica-tion of elementary economic theory tellsus that dramatic lowering of item access

costs in terms of time-or put anotherway, the dramatically increased produc-tivity of a given unit of time spent lookingfor materials--has almost without ques-tion meant that library researchers devotemore time to discovery and access (rela-tive to reading) than did library research-ers of decades ago."8 That a substantialdecrease in reading would help libraryresearch seems most unlikely.

A second general implication involvesthe overall quality of the stack of priormaterial on which library researchersdraw. Since onmy model recursive use ofprior material is central to new colligationin library research, a general lowering ofstack quality-or an increasing inabilityto differentiate quality material in the

stack-is a serious problem to libraryresearch. But a variety of forces, not allof them technological, have probablylowered the quality of this stack. The mostpowerful such force is vastly loweredbarriers to entry, both from technologi-cally mediated increases in availability of

sources and from productivity enhancerslike canned software for statistical analy-

sis and, increasingly, for the automated

540 College & Research Libraries

analysis of texts. On the reasonable as-sumption that the level of a scholar'sacademic placement-in terms of accessto traditional library resources-is notcompletely uncorrelated with his or herability, lowered barriers to entry probablymeans lowered quality. This mechanismis furthered by the following: the prolif-eration of journals, which makes goodwork harder to find; by the decreasingresources invested in peer review, whichmakes standards hard to maintain; andby the expansion of academic publishingto earlier and earlier phases of the profes-sional life course, which fills the librarywith intellectual juvenilia.

To understand the full workings of thismechanism, however, requires a more de-tailed theory of library research than I cansketch here. A hidden assumption of theneural net analogy is that we can specifythe "weights" assigned to prior scholar-ship and data by artisanal researchers asthey take in material from the library. Ihave emphasized the importance of theinteraction between external inputs andthe input that comes from a scholar'sprior knowledge. But for a neural netto work, the ideas and interpretationsthat come out of this interaction mustthemselves be "weighted" as they areassembled into a new colligation; notall of them are of equal importance. Forthe library-research "nets" of the variousdisciplines to succeed in filling the spaceof possible interpretations and revisitingpast interpretations on some finite basis,itis no doubtnecessary that these weightsbe constrained to some range of values.These constraints are no doubt implicitin the tacit knowledge that disciplinarypractitioners have in mind when theyinsist that library research can be taughtonly in seminars or direct dissertationsupervision. They are also probablyrelated to the phenomenon, universalin library-based disciplines, of selectingsome somewhat arbitrary (and slowlychanging) set of texts as a canon thatwill have special weight in the process ofscholarship.

November 2008

Until we can specify more exactly howthis tacit knowledge works to producethe mixture of rigor, traditionalism,innovation, and recolligation that ischaracteristic work of the library-baseddisciplines, we cannot directly predictthe effect of decreasing stack quality onthat knowledge. But it seems likely that itseffects will be as problematic as they havebeen in the more "scientific" disciplines,which are affected by many of the sameforces although via different particularmechanisms.19

A third general implication of myargument concerns systematic loss ofrandomness. As anyone who has workedrecently in optimization knows, strip-ping the randomness out of a comput-ing system is a bad idea. Harnessingrandomness is what optimization is allabout today. (Even algorithms designedfor convergence make extensive use ofrandomness, and it is clear that libraryresearch in particular thrives on it.) But itis evident that much of the technologiza-tion of libraries is destroyinghuge swathsof randomness. First, the reduction ofaccess to a relatively small number ofsearch engines, with fairly simple-mindedindexing systems based on concordanceindexing, has meant a vast decrease inthe randomness of retrieval. Everybodywho asks the same questions of the samesources gets the same answers. (Althoughthis effect is, to be sure, undercut by theparadoxical fact that asking the samequestion of a source a few days later canproduce dramatically different resultsbecause of altered algorithms, improvedOCR readers, new data, and so on.) Thecentralization and simplification of ac-cess tools thus has major and dangerousconsequences. This comes even throughreduction of temporal randomness. Inmajor indexes without cumulations- theReaders Guide, for example-substantialrandomness was introduced by the factthat researchers in different periodstended to see different references. Withcomplete cumulations, that variation isgone. The same things always rise to the

A Computational Theory of Library Research 541

top. As Merton's theory predicts and asthe Salganik-Watts experiments haveshown, this rising arises as much througharbitrary piling-on as it does throughrecognition of quality.20

Going also is a huge of amotint of ran-dom variation introduced by the physicalcharacter of library artifacts. Books mustbe divided into pages, pages into lines.Each of these divisions creates a randomemphasis-how many of us thumbingthrough a dictionary have been caughtby two or three head words before weget where we want, and led thereby tosome minor discovery! The same kinds ofrandom emphases are created by physi-cal shelving- the importance of books at(varying) eye-heights, the importance ofbooks at the ends of stacks that are visiblefrom the corridor as one walks by, and soon. All of this, ultimately, disappears inthe Googlification of the library. (It couldbe artificially imposed, to be sure, butthe mistaken ethic of efficiency militatesagainst it.) Yet, in fact, all of this random-ization introduced by the physical natureof the artifacts is probably quite impor-tant in the computational architectureof library research. Indeed, it is physicalproximity that produces the famous epi-sodes of serendipity with which libraryresearchers love to confute opponentsof the physical library. But these stories,which emphasize the extraordinarynature of the one unique book pulledby accident off a nearby shelf, convey amistaken impression. As I have arguedearlier, browsing and the consequentproduction of serendipitous insight are aconstant presence in library work, not anexceptional one. But that constant back-ground browsing only works because thelibrary is a highly ordered physical andindexed system that is cut by thousandsof random cuts. It is this superpositionof random cuts on a highly orderedsubstrate that makes library browsing soconstantly productive.

This argument makes it clear why"efficient" search is actually dangerous.

The more technology allows us to find

exactly what we want, the more we lose1his browsing power. Library research,as any real adept knows, consists in thefirst instance in knowing, when you runacross something interesting, that youought to have wanted to look for it inthe first place. Library research is almostnever a matter of looking for knownitems. But looking for known items isthe true-indeed the only-glory of thetechnological library. The technologicallibrary thus helps us do something faster,but it is something we almost never wantto do; furthermore, it strips us in the pro-cess of much of the randomness-in-orderon which browsing naturally feeds. Inthis sense, the technologized library isa disaster. (I have tried to insist that myuniversity library design its new remoteaccess system for rarely used materialsso that it delivers the wrong item oneout of twenty times. My librarians areskeptical.)

There are other dangers in the shiftto concordance and other simplifiedforms of indexing as opposed to hu-man-based subject indexing. There isstill no automated indexing system thatcompares with human indexing as ameans of creating new meanings andconnections. Concordance indexing is ablunt instrument indeed. Even the newer"word cloud" index systems have manypathologies, as was discovered thirtyyears ago when anthropologists like RoyD'Andrade first started using them.21

They're very visual-which appeals to anew generation-but their actual connec-tion with the meaning systems they indexis often problematic. And unless they arechanged from passive clustering systemsto actively intelligent systems, they areall subject to the problem noted above:that they can only deliver the same set ofthings to whoever queries them similarly.That they do this quickly and effectivelyjust means that much less randomness,that many fewer occasions for new in-sight. Of course, they do permit certainkinds of discoveries. Concordance-cloudtechniques were used almost forty years

542 College & Research Libraries

ago to discover the order in which theworks of Plato were writtenm But theorder in which the works of Plato werewritten, although an interesting Rankeanfact, is not the question that drives thepublication of new books about Platoyear after year.

The Plato story is a parable of the newlibrary. It is indeed true that the newtechnologies enable us to do many thingsfaster than ever before. It is indeed truethat those technologies enable us to dosome kinds of things that we have neverdone before. But neither of these thingsmeans that the current technology reallyrevolutionizes library research. It is awonderful new tool when well handled,but most of its direct effects on library re-search are mixed or deleterious. And theideology behind much of it--that it willsuddenly enable unskilled researchersto produce high quality work-is simpleanti-intellectualism.

ConclusionI have argued two basic things in thispaper. First, I have shown that libraryresearch is a fully legitimate form of in-quir, a computational architecture everybit as "scientific" as standard researchwith its more familiar design. Second,I have shown that, given what libraryresearch aims to do and how it actuallyworks, most of the moves toward thetechnologization of library practices areeither neutral or harmful to the enter-prise as it has been conducted. But I donot wish to close with this doom-and-gloom scenario. I do think that library re-searchers have to defend their resourcesagainst the technologists, who have noidea of what library research is or whatit aims for, and against the administra-tors, who see in the false technologicalargument an intellectual justificationfor the huge savings they hope to real-ize by decommissioning libraries. But,on the other hand, I take heart from themost important single statistic to emergefrom my own 5,700-respondent surveyof library use at my university library.2

November 2008

We created an index of use of physicalmaterials that included things like takinga book out, browsing the stacks, findinga useful book in the reference depart-ment, recalling a book, and so on. Andwe created another index of cutting-edgeelectronic use-consulting an onlinebibliographical tool, downloading datafrom a government data Web site, usingan online reference system, and so on.And, much to everyone's surprise, thecorrelation between these two things wasnot only substantial and positive at thegroup level-graduate students did bothof these things much more than did un-dergraduates-but also at the individuallevel; in fact, the correlation was about0.5. There is thus no evidence of substitu-tion of one kind of use for another. Quitethe reverse: among the young peopleusing our library, it is the heavy physicalusers of the library who use electronicresources the most, and the heavy elec-tronic users who use physical resourcesthe most. What this says plainly is thatthere are heavy "research library" usersand lighter, "study hall" users. And theheavy users use whatever they can gettheir hands on; scholarship advances onelectronic and physical fronts at once.It's obvious, once you think of it; a goodstudent will pursue all means to success.It's the bad students who take the easyway home.

What this means for policy is verysimple, if very expensive. If you aregoing to have a serious library researchcommunity, you have to have both aphysical library and a technologicalone. The new technology is not a pana-cea-more a useful extension. While itprovides wonderful benefits to the manyscholars not lucky enough to work at uni-versities with great physical collections,and while it enables some things neverbefore allowed, it does not in fact "revo-lutionize" library research. The technolo-gized library may, of course, sweep allbefore it. But that victory would entailthe loss of something far better than thetechnologized library can produce.

A Computational Theory of Library Research 543

Notes

1. By "library research" in this paper I refer to research conducted by disciplinary experts(such as musicologists, literature professors, historians, and political scientists) that is primarilybased on materials collected in libraries. I do not mean research about how libraries themselvesfunction. The paper is, in fact, a theoretical essay about the latter topic (with respect to a particulargroup of users--disciplinary experts), but to address that topic I need a term for the ensemble ofdiscipline-based expert research conducted in libraries. "Library-based scholarship" is correctbut cumbersome. So I shall use the phrase "library research" throughout, opposing it implicitlyto terms like "survey research" or "ethnography."

2. There are only a handful of how-to volumes about library research in my university's 7.5-million volume library: perhaps a dozen simple manuals directed at college students as well asThomas Mann's more advanced Oxford Guide to Library Research (New York: Oxford UniversityPress, 2005), which, although absolutely superb as a guide to identifying and finding materials,does not cover the library research process as a whole. There are, of course, many books designedto teach the reader how to look for expert information on some topic. Such books exemplify whatis perhaps the dominant assumption behind much information thinking--that truth (or the "true"expert judgment) is out there somewhere in the library, and the task is to find it with minimumdifficulty. Typical titles are Student Guide to Research in the Digital Age: How to Locate and EvaluateInformation Sources and Find It Fast. Such books obviously offer no help in theorizing how it isthat expert library workers create knowledge in the first place; they assume that knowledge exante, out there for the finding.

3. Examples are T.L. Martinson, Introduction to Library Research in Geography (1972); A.E.Simpson, Guide to Library Research in Public Administration (1976); R.K. Baker, Introduction to LibraryResearch in French Literature (1978); L.F. Place et al., Aging and the Aged: An Annotated Bibliographyand Library Research Guide; L.L. Richardson, Introduction to Library Research in German Studies(1984); S.E. Sebring, Introduction to Library Research in Women's Studies (1985); and J.M. Weeks,Introduction to Library Research in Anthropology (1991). (The latter five are all in a Westview Pressseries of Guides to Library Research.) Generally aimed at advanced undergraduates and begin-ning graduate students, all of these books are in effect slimmed-down, specialized versions of theALA Guide to Reference Books (which indeed is mentioned in all but one of them), usually coupledwith some useful advice about the idiosyncracies of the Library of Congress classification systemand other indexing tools. The same is true of Downs's old standard How to Do Library Research(Urbana: University of Illinois Press, 1966 and later editions). None of these is really a manualfor an expert or even an advanced student, although, paradoxically, every one of them containsa far larger range of reference tools than would be in the working knowledge of even the greatestexperts in the specialties involved. This paradox captures nicely the enormous difference between"finding information" and "doing research."

4. If there is little about the method of library research, there is even less about its theory.At time of writing (26 April 2007), a check of Google revealed five uses of the phrase "theoryof library research." One of them is in a tongue-in-cheek guide to the simplest forms of libraryusage for Duke University chemistry majors. The rest are references to the work of the presentwriter. There are twelve entries for "library research theory"; all appear to be artifacts combiningthe last words of one sentence with first word of another--"...library research. Theory..... For arecent review of the sociology of scienc4, see S. Sismondo, An Introduction to Science and Technol-ogy Studies (Malden, Mass.: Blackwell, 2004). The standard journal in the field is Social Studies ofScience. The reader will scan it in vain for articles on library-based knowledge.

5. K.R. Popper, Conjectures and Reftitations (New York: Basic, 1962); T.S. Kuhn, The Structureof Scienttflc Revolutions, 2nd ed. (Chicago: University of Chicago Press, 1970); Sismondo, AnIntroduction to Science and Technol6gy Studies. According to the Popperian model (Popper 1962),science proposes conjectures, which are then tested against real-world data and either refutedor not. Knowledge at any given time is made up of non-refuted conjectures. Kuhn (1970) insiststhat "real-world data" are to some extent theory-defined, and that "paradigms" (bundles oftheory, data, and practices) are not able to see their own refutation, as Popper's theory requires.See Sismondo 2004 for an introduction to these theories.

6. I apologize to those for whom "data" must be plural. My usage here (singular for dataseen collectively and plural for data seen as disparate facts) is standard in the social sciences atthis point.

7. Popper, Conjectures and Refutations, 33-65.8. There are attempts to change this at present, but they face enormous difficulties because of

the incommensurability of datasets and, more important, of their internal structure. Generalizeddata archiving is at present in the same situation as were books around the time of the standard-

544 College & Research Libraries

ization of the LC classification; the metadata standards we seek at present are the equivalents ofauthoritative standards for descriptive bibliography.

9. Several "standard research" readers of this manuscript have objected that "of course wealso have and do those things" (that is to say, the kinds of sources, practices, and structures Iargue characterize library research). As an empirical statement, this is of course true. Standardresearchers do plenty of pattern searching and random access and other things that I shall arguecharacterize library research. But these are not part of the ideal they teach their students nor arethey part of the organizing reality of their research programs or of the criteria by which theyjudge proposals when serving as members of funding panels. In those activities, they are quitedear about enforcing the formal picture given in the preceding section. In fact, then, their reasonfor claiming that "we do it too" is to assert overall jurisdiction over "scientific method" and toassert that their brand of it is the only one. It is the central assertion of this paper that that claimis false.

10. Keywords, in the classical sense, are a short number of (subject) index words that are as-signed by a human coder to a particular text. They may or may not occur in that text, and theyare, typically, part of a controlled vocabulary that enables the retrieval of effectively concentratedbibliographies. Since they are often assigned by authors themselves, they amount to authorialsteering of future readers. Obviously, keyword indexing in this sense contains far more infor-mation for the scholar than does indexing by simple words that occur in a text, even when thislatter is supplemented by quantity information. I use the name "concordance indexing" for thislatter type of indexing by words in the text-which confusingly has been called keyword out ofcontext (KWOC) indexing even while the original sense of "keyword" still survived. There isnothing "key" about the keywords in KWOC indexing. Calling concordance indexing "keywordindexing" is like calling oleomargarine butter.

11. D.W. King and C. Tenopir., "Using and Reading Scholarly Literature," Annual Review ofInformation Science and Technology 34 (1999): 423-77; C. Tenopir and D.W. King, Towards ElectronicJournals (Washington, D.C.: SLA, 2000); C. Tenopir and D.W. King, Comnunications Patterns ofEngineers (Piscataway, NJ.: IEEE Press, 2004). There is an enormous and quite rich literature ininformation science on reading, much of it summarized in King and Tenopir (1999) and Tenopirand King (2000, 2004). The vast majority of it concerns scientists, engineers, and physicians,whose use of published information is radically different from that of the humanists and socialscientists who are the library researchers here discussed. More disturbing, however, most ofthis work presupposes a theory of knowledge as independent bits of information that has beensystematically dismantled by sociologists and philosophers of knowledge and science over thelast fifty years. It applies only to that part of knowledge that consists of sheer facts, what willhere be called Rankean facts (see footnote 15). As a model of more general knowledge systems,the "knowledge bits" theory is dearly inadequate.

12. R.E. Rice, M. McCreadie, and S.-J.L Chang, Accessing and Browsing Iqformation and Comn-inunication (Cambridge, Mass.: MIT Press, 2001). There is a substantial literature on browsing ininformation science. Its approach is generally more individualized and psychological than theapproach taken here. Also, it does not generally focus on browsing by experts and therefore doesnot focus on the centrality of antecedent knowledge in the browsing process.

13. J.R.M. Butler, The Passing of the Great Reform Bill (London: Longmans, 1914).14. W. Whewell, The Philosoplqi of Inductioe Science (London: Parker, 1847); C.B. McCullagh,

"Colligation and Classification in History," History and Theorq 17 (1978): 267-84; A. Abbott, "EventSequence and Event Duration," Historical Methods 17 (1984): 192-204; E.O. Wilson, Consilience(New York: Knopf, 1998). The term "colligation" (from Whewell) did have a faint afterlife, in theliterature on the philosophy of history. The Whewellian approach has been more recently revivedby E.O. Wilson.

15. L. Krieger, Ranke: The Meaning of History. Chicago: University of Chicago Press, 1977. Thestandard source on Ranke's historiography is Krieger. His translation of what he calls "the mostfamous statement in all historiography" (1977:4) is:

History has had assigned to it the task of judging the past, of instructing the presentfor the benefit of ages to come. The present study does not assume such a high office:it wants to show only what actually happened (wie es eigentlidc gewesen).

16. A computer science colleague has objected that checksums are widely used to verify thestability of computer files and that, in fact, computer files therefore have greater stability thantexts. True enough. But that argument ignores the problem of the creation of a central and cred-ible verifying authority for such checksums and the defense of such an authority from politicaland secret manipulation, one of the many hurdles to be surmounted before the online world canhave the credibility and authority provided willy-nilly by the physicality of print. The issue canbe seen as a more general ideological one. The ease of updating in the online world leads us allto indulge ourselves in the core fantasy of a society founded on the ideology of progress-that

November 2008

A Computational Theory of Library Research 545

the present is always better than the past. There is, in fact, no a priori reason to think this is true(or false). But our devout belief in its truth leads us to rewrite the past with complete abandon.The physical nature of library artifacts prevents that. k

17. A. Abbott, Chaos of Disciplines (Chicago: University of Chicago Press, 2001); A. Abbott,"Seven Types of Ambiguity," in Time Matters (Chicago: University of Chicago Press, 2001).

18. Allen Renear (personal communication) has reported some empirical work confirmingthis prediction.

19. F. Rodell, "Goodbye to Law Reviews," Virginia Law Review 23 (1936): 40-41. A classic caseof the technology-induced degradation of a knowledge system is law reviews, which were over-whelmed with pseudo-scholarship in part because of citation indexing. Fred Rodell's still-famousarticle has lost none of its sting in seventy years: 'And then there is the probative or if-you're-from-Missouri-just-look-at- this type [of footnote]. ... It is [this] probative footnote that is so oftenmade up of nothing but a long list of cases that the writer has had some stooge look up and throwtogether for him.... Any article that has to be explained or improved by being cluttered up withlittle numbers until it looks like the Acrosses and Downs of a crossword puzzle has no businessbeing written."

20. This is the effect nicknamed the "Matthew effect" by Merton in a famous article. The Sal-ganik-Watts Web-based experimental studies look at teenagers' piling-on to arbitrarily labeled"good bands." R.K. Merton, "The Matthew Effect in Science," Science NS 159 (1968): 56-63; Mj.Salganik, P.S. Dodds, and D.J. Watts, "Experimental Study of Inequality and Unpredictability inan Artificial Cultural Market," Science NS 311 (2006): 854-56.

21. M.L. Burton, "Mathematical Anthropology," Annual Review of Anthropology 2 (1973):189-99.

22. L.I. Boneva, "A New Approach to a Problem of Chronological Seriation Associated withthe Works of Plato," in Mathematics in the Archeological and Historial Sciences, ed. F.R. Hodson, D.G.Kendall, and P. Tautu (Edinburgh: Edinburgh University Press, 1971), 173-85.

23. A. Abbott, "The University Library," Appendix to the Report of the Provost's Task Forceon the Future of the University Library University of Chicago, 2006. Available online at www.lib.uchicago.edu/e/about/abbott-report.html and www.lib.uchicago.edu/e/about/abbott-appendix.html. [Accessed 29 September 2008].

Designers for Libraries & Academic Institutions

COPYRIGHT INFORMATION

TITLE: The Traditional Future: A Computational Theory ofLibrary Research

SOURCE: Coll Res Libr 69 no6 N 2008

The magazine publisher is the copyright holder of this article and itis reproduced with permission. Further reproduction of this article inviolation of the copyright is prohibited. To contact he publisher:http://www.ala.org/