54
BIG: An Agent for Resource-Bounded Information Gathering and Decision Making Victor Lesser Bryan Horling Frank Klassner Anita Raja Thomas Wagner Shelley XQ. Zhang Multi-Agent Systems Laboratory Department of Computer Science University of Massachusetts Amherst, MA 01003 [email protected], http://mas.cs.umass.edu Department of Computing Sciences Villanova University 800 Lancaster Ave. Villanova, PA 19085 [email protected] Abstract The World Wide Web has become an invaluable information resource but the explosion of available information has made web search a time consuming and complex process. The large number of information sources and their different levels of accessibility, reliability and associated costs present a complex information gathering control problem. This paper describes the rationale, architecture, and implementation of a next generation information gathering system – a system that integrates several areas of Artificial Intelligence research under a single umbrella. Our solution to the information explosion is an information gath- ering agent, BIG, that plans to gather information to support a decision process, reasons about the resource trade-offs of different possible gathering approaches, extracts informa- tion from both unstructured and structured documents, and uses the extracted information to refine its search and processing activities. This material is based upon work supported by the Department of Commerce, the Library of Congress, and the National Science Foundation under Grant No. EEC-9209623, and by the National Science Foundation under Grant No. IRI-9523419 and the Department of the Navy and Office of the Chief of Naval Research, under Grant No. N00014-95-1-1198. The content of the information does not necessarily reflect the position or the policy of the Government or the National Science Foundation and no official endorsement should be inferred. Article published in Artificial Intelligence 118 (1-2) (2000) 197–244

BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

BIG: An Agent for Resource-BoundedInf ormationGathering and DecisionMaking

Victor Lesser�����

BryanHorling�

FrankKlassner�

Anita Raja�

ThomasWagner�

Shelley XQ. Zhang�

�Multi-AgentSystemsLaboratory

Departmentof ComputerScienceUniversity of Massachusetts

Amherst,MA [email protected],http://mas.cs.umass.edu

�Departmentof ComputingSciences

Villanova University800LancasterAve.Villanova,PA 19085

[email protected]

Abstract

TheWorld Wide Webhasbecomeaninvaluableinformationresourcebut theexplosionofavailableinformationhasmadeweb searcha time consumingandcomplex process.Thelarge numberof informationsourcesandtheir different levels of accessibility, reliabilityandassociatedcostspresenta complex informationgatheringcontrolproblem.This paperdescribestherationale,architecture,andimplementationof a next generationinformationgatheringsystem– a systemthat integratesseveralareasof Artificial Intelligenceresearchundera singleumbrella.Our solutionto theinformationexplosionis aninformationgath-ering agent,BIG, that plansto gatherinformationto supporta decisionprocess,reasonsabouttheresourcetrade-offs of differentpossiblegatheringapproaches,extractsinforma-tion from bothunstructuredandstructureddocuments,andusestheextractedinformationto refineits searchandprocessingactivities.

�Thismaterialisbaseduponworksupportedby theDepartmentof Commerce,theLibrary

of Congress,andtheNationalScienceFoundationunderGrantNo. EEC-9209623,andbytheNationalScienceFoundationunderGrantNo. IRI-9523419andtheDepartmentof theNavy andOfficeof theChief of Naval Research,underGrantNo. N00014-95-1-1198.Thecontentof the information doesnot necessarilyreflect the position or the policy of theGovernmentor the NationalScienceFoundationand no official endorsementshouldbeinferred.

Article publishedin Artificial Intelligence118(1-2) (2000)197–244

Page 2: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

1 Intr oduction and Moti vation

Thevastamountof informationavailabletodayon theWorld Wide Web(WWW)has great potential to improve the quality of decisionsand the productivity ofconsumers.However, theWWW’s large numberof informationsourcesandtheirdifferent levelsof accessibility, reliability, completeness[5], andassociatedcostspresenthumandecisionmakers with a complex information gatheringplanningproblemthat is too difficult to solve without high-level filtering of information.Inmany cases,manualbrowsingthroughevena limited portionof therelevant infor-mationobtainablethroughadvancinginformationretrieval (IR) andinformationex-traction(IE) technologies[6,39,12,40]is no longereffective.Thetime/quality/costtradeoffs offeredby thecollectionof informationsourcesandthedynamicnatureof theenvironmentleadusto concludethattheusershouldnotserveasthedetailedcontrollerof theinformationgathering(IG) process[45].

Our solutionto theinformationexplosionis to integratedifferentArtificial Intelli-gence(AI) technologies,namelyscheduling,planning,text processing,informationextraction,andinterpretationproblemsolving, into a singleinformationgatheringagent,BIG (resource-BoundedInformationGathering)[42,43], that takestheroleof thehumaninformationgatherer. In responseto a query, BIG locates,retrieves,andprocessesinformationto supporta decisionprocess.To evaluateour genericapproach,we have instantiatedit for thesoftwaredomain.BIG’s areaof expertiseis in helpingclientsselectsoftwarepackagesto purchase.For example,a clientmay instructBIG to recommenda databasepackagefor Windows 98, andspec-ify desiredproductattributesaswell asconstraintson suchthingsasthe amountof money they arewilling to pay for sucha product,andon the amountof timeandmoney to spendlocatinginformationaboutdatabaseproducts.Theclient mayalsocontrol how BIG searchesby specifyinga preferencefor informationpreci-sionversuscoverage.A preferencefor coveragewill resultin moreproductsbeingdiscovered,but with lessinformationabouteachproduct.A preferencefor greaterprecisionresultsin BIG spendingmoreresourcesto constructvery accuratemod-elsof productsby gatheringadditionalcorroboratinginformation.In responseto aquery, BIG plans,locates,andprocessesrelevant information,returninga recom-mendationto theclientalongwith thesupportingdata.

Thecomplexity of our objective mandatesa high level of sophisticationin thede-sign of BIG’s components.Indeed,several arecomplex problemsolvers in theirown right. A plannerandassociatedtaskassessorareresponsiblefor translatingaclient’s informationneedinto a setof goalsandgeneratesplansto achieve thosegoals.In theexampleabove,theplannerwouldgenerateplansdetailingthealterna-tivewaysto fetchdatabaseproductinformationandthealternativewaysto processthe information.To supportreasoningabouttime/quality/costtrade-offs, andthusarangeof differentresource/solutionpaths,theplannerenumeratesseveraldistinctplansfor achieving thegoalsanddescribesthemstatisticallyin threedimensions,

198

Page 3: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

duration,quality, andcost,via discreteprobability distributions.Anothersophis-ticatedproblemsolving component,the Design-to-Criteria[60–62] (DTC) agentscheduler, examinesthepossiblesolutionpathswithin theplan,selectsa setof ac-tionsto carryoutandschedulestheactions– copingwith anexponentialschedulingproblemin real-timethroughtheuseof approximationandgoaldirectedfocusing.The resultingsingle-agentschedulecontainsparallelismandoverlappingexecu-tionswhentheprimitive actionsentailnon-localprocessing,suchasrequeststhatareissuedover thenetwork.

As BIG retrievesdocuments,anotherproblemsolver, an Information Extraction(IE) system[25] works in conjunctionwith a setof semantic,syntactic,andsite-specifictools, to analyzethe unstructuredtext documents.Information from thisanalysisis usedfor decisionmakingandrefinementof otherinformationgatheringgoals.

Other complex componentsin BIG include a framework for modeling domaintasks,a web server information database,and a task assessorto assistin trans-lating theproblemsolver’sdomainplansinto adomainindependentrepresentationappropriatefor useby theDesign-to-Criteriaschedulerandotherhigh-level com-ponents.We will returnto theagentarchitecture(seeFigure3) in greaterdetail inSection3.

Fig. 1. BIG’sUserInterface

Let us considera high level exampleto illustratesomeof BIG’s capabilitiesandto seta context for furtherdiscussion.A client is interestedin finding a word pro-cessingprogramfor theMacintosh.Theclient submitsgoalcriteria thatdescribes

199

Page 4: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

desiredsoftwarecharacteristicsandspecificationsfor BIG’ssearch-and-decidepro-cess.A snapshotof thesystem’suserspecificationform is givenin Figure1.

Thesearchparametersare:durationimportanceof 100%,soft time deadlineof 10minutes,hardcostlimitation of $5,andin termsof precisionversuscoverage,50%of theweight is givento precisionand50%to coverage.This translatesto a pref-erencefor a fastsearch/ decisionprocess,possiblyachieved by trading-off costandquality for the fastsearch.This alsoindicatesthat theuserwantstheprocessto take tenminutesor lessandcostno morethan$5

if thesearchprocessincurs

expenseswhengatheringinformation.The useralsoexpressesno preferenceforcoverageor precision– BIG cantrade-off onein favor of the other. The productparametersare:productprice $200or less,platform: Macintosh.Additional prod-uct evaluationfeaturesarediscussedin moredetail in Section5.5.Theclient is anexperiencedhome-office userwho desiresa resultrelatively quickly anddoesnotwant to spendmuchmoney on the search,andwho is primarily concernedwithgettingmostpower for thedollar in theproduct.

BIG’s taskassessorusesthesuppliedcriteriato determinewhich informationgath-ering activities are likely to lead to a solution.Candidateactivities include doc-umentretrieval from known word processingsoftwaremakerssuchasCorel andMicrosoft, aswell asfrom consumersitescontainingsoftwarereviews. Otherac-tivities pertainto documentprocessingoptionsfor retrievedtext. For a givendoc-ument,therearea rangeof processingpossibilities,eachwith differentcostsandadvantages.For example,the heavyweight informationextractorpulls datafromfree format text, fills templatesandassociatescertaintyfactorsfrom theextracteditems.In contrast,the simpleand inexpensive patternmatcherattemptsto locateitemswithin thetext via simplegrep-likebehavior. BIG’s taskassessorhandlestheprocessof layingouttheseproblemsolvingoptionsby emittingataskstructurethatdescribesthealternativewaysto performtasksandquantifiesthemstatisticallyviadiscreteprobabilitydistributionsin termsof quality, cost,andduration.

Theseproblemsolvingoptionsarethenconsideredandweighedby thescheduler– it performsa quality/cost/timetrade-off analysisand determinesan appropri-atecourseof actionfor BIG. Thescheduleis executed;multiple retrieval requestsareissuedanddocumentsareretrievedandprocessed.In this case,dataextractedfrom documentsat theCorelsite is integratedwith dataextractedfrom reviews atthe Benchinsite to form a productdescriptionobject(model)of Corel WordPer-fect.Additionalsearchandprocessingleadsto thediscoveryof 14othercompetingproducts.Thedecisionmaker, basedon theproductmodelsconstructed,indicatesthat the productthat bestsatisfiesthe user’s specificationsis Corel WordPerfect

Thereis no costassociatedwith accessingdatain theexperimentsreportedin thispaperthusthe costconstraintspecifiedby the userdoesnot alter BIG’s behavior. However, asdetailedin [44], experimentswith BIG involving situationswhereaccessingdatafrom se-lectedsitesincurscost,thecostconstraintsspecifiedby theuserareaccountedfor in BIG’sinformationgatheringprocess.

200

Page 5: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

3.5.BIG returnsthis recommendationto theclient alongwith the gatheredinfor-mation,correspondingextracteddata,andcertaintymetricsaboutits extractionanddecisionprocesses.

Fig. 2. BIG’sFinal Decisionfor SampleRun

Theprimarydistinguishingcharacteristicsof this researchare:

ActiveSearch and Discovery of Inf ormation BIG doesnot rely entirely uponapre-specifiedset of sitesfrom which to gatherinformation.BIG also utilizesgeneralURL searchenginesandsites/ informationsourcesdiscoveredduringpreviousproblemsolvingsessions.Mostimportantly, uncertaintyin theextractedinformation and the absenceof crucial information drives further search.Weprovideexamplesin Section4.4.

Resource-boundedSearch and Analysis BIG problemsolvesto meetreal-timedeadlines,costconstraints,andprecision,coverageandqualitypreferences.BIGreasonsaboutwhich actionsto take to producethedesiredresultandplansac-cordingly. This is accomplishedthroughtheuseof theDesign-to-Criteriasched-ulerandby employing anend-to-end,ratherthanreactive,controlprocess.Theseissuesarediscussedin Sections3, 4, and5.

Opportunistic and Top-down Control BIG blendsopportunistic,reactive, prob-lemsolvingbehaviorswith theend-to-endschedulingview requiredto meetreal-time deadlinesand other performanceconstraints.This enablesBIG to workwithin a high-level, structuredplan without sacrificing the dynamismneededto respondto uncertaintiesor inconsistenciesthatarisein modelsderived fromgatheredinformation.We will discussthedetailsin Section5.4.

201

Page 6: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

Inf ormation Extraction and Fusion Theability to reasonwith gatheredinforma-tion, ratherthansimplydisplayingit for theuser, is critical in thenext generationof informationgatheringsystems.BIG usesresearch-level extractiontechnologyto convert freeformattext into structureddata;thedatais thenincorporatedandintegratedinto productmodelsthatareexaminedby BIG’s decisionprocess,re-sulting in a productrecommendation.Detailsareprovidedin Sections3, 4, and5.

Incorporation of Extracted Inf ormation In additionto building productmodels,extractedinformationis incorporatedin BIG’ssearchasit unfolds.For example,competitorproductsdiscoveredduringthesearchareincludedin BIG’s informa-tion structures,possiblyresultingin new goalsto pursueadditionalinformationon theseproducts.We provide furtherdetailsin Sections3 and4, andcover anexamplein Section5.

Thisapproachto web-basedinformationgathering(IG) is motivatedby severalob-servations.The first observation is that a significantportionof humanIG is itselfan intermediatestepin a muchlargerdecision-makingprocess[45]. For example,a personpreparingto buy a car may searchthe Web to find out what car mod-elsareavailable,examinecrashtestresults,obtaindealerinvoiceprices,or exam-ine reviews andreliability statistics.In this informationsearchprocess,thehumangathererfirst plansto gatherinformationandreasons,perhapsatasuperficiallevel,aboutthetime/quality/costtrade-offs of differentpossiblegatheringactionsbeforeactuallygatheringinformation.For example,thegatherermayknow thatMicrosoftCarPointsite hasdetailedandvariedinformationon the relevantmodels,but thatit is sometimesslow, relative to the Kelley Blue Book site,which haslessvariedinformation.Accordingly, a gathererpressedfor time may chooseto browsetheKelley site over CarPoint,whereasa gathererwith unconstrainedresourcesmaychooseto browse-and-wait for informationfrom theslower CarPointsite.Humangatherersalsotypically useinformationlearnedduringthesearchto refineandre-castthesearchprocess;perhapswhile looking for dataon thenew HondaAccorda humangathererwould comeacrossa positive review of the Toyota Camryandwould thenbroadenthesearchto includetheCamry. Thus,thehuman-centricpro-cessis bothtop-down andbottom-up:structured,but alsoopportunistic.A detaileddiscussionon the specificsof this type of opportunisticproblemsolving is pre-sentedin Section5.4. [9] providesa furtherexpositionon issueof exercisingandbalancingvarioustypesof top-down andbottom-upcontrol.

Thesecondobservationthatshapesoursolutionis thatWeb-basedIG is aninstanceof an interpretation problem. Interpretationis the processof constructinghigh-level modelsfrom low-leveldatausingfeature-extractionmethodsthatcanproduceevidencethatis incompleteor inconsistent.In ourcurrentdomainthiscorrespondsto a situationwherethesoftwareproductdescriptionsgeneratedfrom theraw webdocumentsmay not containall the desiredinformation,or duplicateinformationfrom different sourcesmay be contradictory. Coming from disparatesourcesofinformationof varyingquality, thesepiecesof uncertainevidencemustbecarefully

202

Page 7: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

combinedin awell-definedmannerto providesupportfor theinterpretationmodelsunderconsideration.

In recastingweb-basedIG asaninterpretationproblem,we facea searchproblemcharacterizedby a generallycombinatoriallyexplosivestatespace.In theIG task,asin otherinterpretationproblems,it is impossibleto performanexhaustivesearchto gatherinformationonaparticularsubject,or evenin many casesto determinethetotal numberof instancesof thegeneralsubjectthat is beinginvestigated.We firstargue that an IG solution needsto supportconstructiveproblemsolving [11,10]in which potentialanswers(e.g.modelsof products)to a user’s queryare incre-mentallybuilt up from featuresextractedfrom raw documentsandcomparedforconsistency or suitabilityagainstotherpartially-completedanswers.Secondly, anysolutionto this IG problemneedsto supportreasoningabouttradeoffs amongre-sourceor time constraints,the quality of the selecteditem, andthe quality of thesearch,analysisanddecisionprocesses.Becauseof the needto conserve time, itis importantfor an interpretation-basedIG systemto be able to save andexploitinformationaboutpertinentobjectslearnedfrom earlierforaysinto theWWW.

In connectionwith thisincrementalmodel-buildingprocess,aninterpretation-basedIG problemsolutionmustalsosupportsophisticatedschedulingto achieve inter-leaveddata-drivenandexpectation-drivenprocessing.Processingfor interpretationmust be driven by expectationsof what is reasonable,but, expectationsin turnmustbe influencedby what is found in the data.For example,during a searchtofind informationonwordprocessorsfor Windows98,with thegoalof recommend-ing somepackageto purchase,anagentfinding Excel in a review article thatalsocontainsWord might concludebasedon IE-derived expectationsthat Excel is acompetitorword processor. However, schedulingof methodsto resolve theuncer-taintiesstemmingfrom Excel’smissingfeatureswould leadto additionalgatheringfor Excel,whichin turnwouldassociateExcelwith spreadsheetfeaturesandwouldthuschangetheexpectationsaboutExcel(anddropit from thesearchwhenenoughof theuncertaintyis resolved).Wherepossible,schedulingshouldpermitparallelinvocationof IE methodsor requestsfor WWW documents.

Thusfar we have outlineda large informationgatheringsystemdesignedto lever-agethestrengthsof severalAI subfieldsto addressthecomplex taskof usingalargeandunstructuredinformationsourcelike theInternetto facilitatedecisionmaking.In theremainderof thispaper, wediscussrelatedresearchin Section2, andpresentthe BIG agentarchitectureandits key componentsin Section3. We thenpresenta detailedexecutiontraceof BIG in Section4. In Section5 we presentother in-terestingresearchissuesaddressedby BIG usingdetailsfrom actualBIG runs.Wealsodemonstratein this sectionthe flexibility of the architectureto differentuserobjectivesandsoftwaregenresthroughempirical results.Conclusionsandfuturedirectionsarepresentedin Section6.

203

Page 8: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

2 RelatedResearch

2.1 WebBasedInformationAssistance

The exponentialgrowth of the Web hasnot goneunnoticedby the researchandcommercialcommunities.The generalsolution,asoneresearcherso aptly put it[22], is to “move up the informationfood chain,” in otherwords,to build higher-level information processingenginesthat utilize existing tools like generalizedsearchengines(e.g., Infoseekand AltaVista). We first look at threeapproachesusedin informationprocessingcircles.Oneclassof work toward this end is themetasearch engine. Meta searchenginestypically issueparallelqueriesto mul-tiple searchengineslike AltaVista and Infoseek,customizingthe humanclient’squery for eachsearchengineandusingadvancedfeaturesof the searchengineswhereavailable.Examplesof thisincludeSavvySearch[32] andMetaCrawler [22];commercialmetasearchproductsarealsoavailable[33,64]. Someof thesetoolssupplementthe IR technologyof thesearchengines– for example,if a particularadvancedquerytechniquesuchasphrasematchingis missingfrom thesearchen-gine,MetaCrawler will retrieve thedocumentsemittedfrom thesearchengineandperformits own phrasetechniquesonthedocuments.Othersupplementaryfeaturesprovidedin metasearchenginesincludeclusteringcandidatedocumentsSincemetasearchenginesbuild on theservicesofferedby severalURL searchengines,theirresultsgenerallyoffer wider Internetcoverage.However, sincetheir outputis of-ten just a list of URLs generatedby thesameprocessingtechniquesusedin URLsearchengines,it tendsto suffer from thesameproblemastheoutputfrom URLsearchenginesthemselves– toomuchraw data.

A secondclassof relatedwork is the personal informationagent [2,52]. Ratherthanmakingsinglequeriesto a largenumberof sites,theseagentsbegin from oneor morespecificpointson theWeb andselectively pursuelinks in searchof rele-vant information.

They areconcept-driven,obtainingtheir areaof interesteither

throughhard-codedrules, explicit questionnairesor simple learningtechniques.Thesesystemsarenot asfastasthemetasearchsystems,but their designgoalhasa somewhatdifferentfocus.Personalinformationagentsaretypically usedto ob-tain a smallnumberof highly relevantdocumentsfor theuserto read,eitherall atonceor continuouslyover anextendedtime period.Thus,becauseof thepotentialoverheadfor both link traversalanddynamicdocumentprocessingthesesystemstendto sacrificespeedfor documentquality.

Thethird classof workaddressingtheinformationfoodchainis theshoppingagent.Shoppingagentstypically locateandretrievedocumentscontainingpricesfor spec-ified products,extract theprices,andthenreportthegatheredpriceinformationto

The colloquial term “spidering” includesthis directedtraversalalongwith moreundi-rectedsearchstrategies.

204

Page 9: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

theclient.For example,theoriginalBargainFinder[37] andthemorerecentShop-bot [21] both work to find the bestavailable pricesfor music CDs. Thesetoolsoftendiffer from metasearchenginesandpersonalinformationagentsin thattheytypically do not searchthe web to locatethe shoppingsites,instead,the systemsdesignersdevelopa library containingknown shoppingsitesandotherinformationsuchashow to interactwith a particularstore’s local searchengine.Someshop-ping agentsalsointegratesomeof theof the functionalityofferedby thepersonalinformationagents.For example,the commercialJango[35] shoppingagentlo-catesreviews aswell asextractingprice andvery specificproductfeaturesfromvendorwebsites.Researchin thesesystemsoftenfocusesonhow to autonomouslylearnrules(akin to wrappers[49]) for interactingwith eachstore’s formsandforprocessingtheoutput,in contrastto having a humanmanuallyencodetherules.

Considerthe differentattemptsto move up the informationfood chain.The metasearchenginesprovide informationcoverage,independencefrom the nuancesofparticularsearchengines,andspeed.They alsoprovide a measureof robustnesssincethey arenot tied to a particularsearchengine.PersonalinformationagentscombineIR techniqueswith simpleheuristicsto qualify documentsfor theclient’sreview. Shoppingagentsprovide information processingfacilities to supportthehumanclient’s informationgatheringobjective– for example,to find thebestpricefor a musicCD. Our work extendstheseideasby combiningmany of thecharac-teristicsthatmakethesystemsindividually effectivewithin theirareasof expertise.

Like the metasearchengines,BIG canusemultiple differentwebsearchtools tolocateinformationon theweb. In contrastto themetasearchengines,BIG learnsaboutproductsover time andreasonsaboutthetime/qualitytradeoffs of differentweb searchoptions.Akin to the personalinformationagents,BIG gathersdocu-mentsby actively searchingthe Web (in conjunctionwith web searchengines).BIG, however, goesonestepfurtherby alsoperforminginformationextractiononthedatait retrieves.Liketheshoppingagents,BIG gathersinformationto supportadecisionprocess.However, BIG differsfrom shoppingagentsin thecomplexity ofits decisionprocessandin thecomplexity of its informationprocessingfacilities.ThroughIE technologies,BIG processesfreeformattext andidentifiesandextractsproductfeatureslikeprices,disk requirements,andsupportpolicies.

2.2 DatabaseResearch

Someaspectsof BIG relatecloselyto issuesraisedby heterogeneousdatabasesys-tems(HDBS) [59,36]. Suchdatabasesmustpotentiallygatherdatafrom multiplesources,which mayeachhave differentperformance,contentandcost.At a highlevel, thesetwo problemsarethusvery similar. Both BIG andHDBS aim to pro-vide transparentaccessto aheterogeneoussetof informationsourcesfrom asingleaccesspoint.BIG, however, hasadditionalconcernswhich HDBS typically do not

205

Page 10: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

address.BIG’s setof informationsourcesis moredynamicthana typical HDBS,andis composedof a mixtureof searchengineandsingle-pointitems.The infor-mationBIG dealswith is alsounstructuredandnoisy. As moreinformationsourcesbecomeavailablewhich aredesignedto beaccessedby agents,HDBS techniquesmaybecomemoreapplicableto theoverall problemweareaddressing.

Someof BIG’s problemsolving andschedulingactivities areanalogousto tech-niquesusedin databasequeryoptimization.The queryoptimizationprocessin acentralizeddatabasesystemis concernedwith how bestto structureinformationre-questsandmanagetheprocessingwhichmusttakeplaceon theresultingdata.In adistributeddatabasesystem,aqueryoptimizerhastheadditionalburdenof possiblychoosingfrom amongseveral informationsourcesandprocessinglocationswhicheachhave differentbenefitsanddrawbacks[29,27,3].Theseoperationsareanalo-gousto theschedulingactivity donein BIG, which makessimilar decisions.Bothtasksmustconsidersuchissuesasexpectedserverperformance,datastructure,ac-tivity parallelismandhow bestto managetheretrievedinformation.An importantdifferencebetweenaconventionalqueryoptimizerandBIG’sschedulingprocessistheamountof userinput involved.BIG usestheDesign-to-Criteriaagentscheduler,which takesinto accounttheuser’s preferenceswhengeneratingtheschedule.Forinstance,oneusermaybewilling to spenda lot of money in exchangefor a veryshortbut high quality search,whereasanothermaybewilling to spendmoretimeto save money. DTC allows several metricsto effect its behavior, allowing a de-greeof customizationnot permittedby typical databasequeryoptimization.Thesetradeoffs will be coveredin moredetail in later sectionsandarepresentedmorefully in [60–62].

2.3 OtherRelatedIssues

Technologiesdevelopedin mainstreaminformationretrieval researchmayalsohelpBIG find and extract information more reliably. Metadatainformation, such asRDF/PICS/XML [51] allow web pageauthorsto provide conciseinformation ina formatsufficiently structuredto simplify interpretation.Widespreadadoptionoftheseformatswouldgreatlyimprovetheeffectivenessof programslikeBIG. Othertechnologies,facilitatinggeneralinter-application(e.g.Z39.50[1]) andinter-agent(e.g.KQML [24]) communication,canalsoassistby providing thestandardsneces-saryfor simpleinformationtransfer. In somesense,HTTP currentlyfills this role,but moresuitableprotocolsexist for the taskat hand.A practicaldrawbackwiththesenew techniquesis thatthey havenotyet becomewidespreadenoughto makethemviable.If andwhenstandardssuchasRDF becomewidely acceptedit seemsclear that systemslike BIG will be able to make moreeffective useof availableinformation.

GrassandZilberstein’s work [26] is closelyrelatedto our basicapproach,but dif-

206

Page 11: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

fers in that the decisionprocessis centeredarounda Bayesiannetwork andtheirapproachto schedulingis morereactive.BIG is alsorelatedto theWARREN [15]multi-agentportfolio managementsystem,which retrievesandprocessesinforma-tion from the Web. However, BIG differs in its reasoningaboutthe trade-offs ofalternative waysto gatherinformation,its ambitioususeof gatheredinformationto drive furthergatheringactivities, its bottom-upandtop-down directedprocess-ing, andits explicit representationof sources-of-uncertaintyassociatedwith bothinferredandextractedinformation.BIG sharessomecharacteristicswith database-centric, structured-resource,approacheslike TSIMMIS [28], SIMS [4], and theInformationManifold [47], but differs in that its focusis on resource-boundedin-formationextractionandassimilationcoupledwith discovery.

The time / quality / cost trade-off aspectof our work is conceptuallysimilar to[31,30,14,54]andformal methods[23,26] for reasoningaboutgatheringinforma-tion, exceptthatour trade-off analysisfocuseson problemsolvingactions(includ-ing text processing)andotheragentactivities ratherthanconcentratingonly on thetrade-offs of different information resources,i.e., our work addressesboth agentcontrollevel andinformationvalue.

With respectto the developmentof digital libraries,our researchis aimedat par-tially automatingthefunctionof a sophisticatedresearchlibrarian,asin [63]. Thistype of librarian is often not only knowledgeablein library sciencebut alsomayhavea technicalbackgroundrelevantto theinterestsof theresearchdomain.In ad-dition to locatingrelevantdocumentsfor their clients,suchlibrariansoftendistillthedesiredinformationfrom thegathereddocumentsfor their clients.They oftenneedto make decisionsbasedon resourceconcernssuchasthetrade-offs betweenbillable hoursandsolutionquality and the resourcetime/quality/costconstraintsspecifiedby a given client; or whethercertainperiodicalsareavailable in-house,andif not, how long it will take to get themandwhat they will cost.We seethepartialautomationof a sophisticatedlibrarianasa naturalstepin theevolutionarydevelopmentof a fully automateddigital library.

BIG alsorelatesto researchin interfacesanddialoguesbetweenhumanusersandagents[53], thoughtheextractionof softwarerequirementsfrom theuserandtheagent/userinteractionis not the focusof this research.In the future,we envisiona dynamichuman/agentinterfacein which theclient canprovide onlineguidanceto BIG to helpfocusthesearchanddecisionprocesses.In fact,BIG’s architecturewasdesignedpartly to supportsuchactivities andit is oneof thestrengthsof theflexible controlparadigmusedin BIG.

207

Page 12: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

� ����������� ����������� �������� !#"%$'&�(�)�* &%+

,.-0/�13254�6�708�9�: 7

;=<?>A@BDC�E FHG�I J�K�L%I K�J0MN O�P�O�Q�O�RTS

UWV%XZY[V�\ ]�^�V0_�`�acb�d�\�V�]ce�`0f0V g�h�i�j�k�lnm�o p�j�hrq=s�o kcm0t�o�j�kcuv w�x�y�zn{�|�}�~�� ���T����� ��{%�

�r� �0�%���������c��.�0����� �T�����T�T�T�

���������¡ �¢�£�¤�¥�¦¨§%© ª�¤�¢«�¬%­ ®�¯%°�± ²�³.´¶µ ¬%²�²�¯�°

·=¸'¹�º�»�¼ ½�¾�¿3ÀÁ»ÃÂ�Ä�Å�Ä�¼�¹�ÆÇÉÈAÊTË�Ì Í�Ê�Î�Ì Ë�Ï

Ð�Ñ'Ò�Ó%Ô�Õ

ÖÃ×%Ø�×�ÙcÚ%Û�×

Ü�ÝcÞ�ß�à�á0â

ãcä0å�æç�è�é

êcë�ì0í

î�ï�ð

ñ%ò�ñ0ó�ô�õ ö�÷�ø3ùÉú0ñ0û�ô�ü õ�û

ý0þ0ÿ��

�����

����� ��������� ������������� �!�" ��#%$�&���" �

'(�)�( *,+ -�.0/ 132�) 465�7�'(�)6(�8�-�9�7�+ :

;�<�=�>@?�=�ACBD�E�F�G HJI�K�L@M N�O�L�M6F�P

QSR�T U�V W�TCXY�Z�[�\�]�^�_�`

ab�c�d e@f�g�c�dSh@i�j�ckml�npo�l@nrq@s t@uCn@vxw�y z6uCs3w�s {|~}��6�������������}����C���

�3�������J��������������

�S����������p���p�

Fig. 3. TheBIG AgentArchitecture

3 The BIG Agent Ar chitecture

The overall BIG agentarchitectureis shown in Figure3. The agentis comprisedof severalsophisticatedcomponentsthatarecomplex problem-solversandresearchsubjectsin theirown rights.Themostimportantcomponents,or componentgroups,follow in roughorderof their invocationin theBIG agent.

Server and Object Inf ormation DatabasesThe objectdatabasestoresinforma-tion objectsconstructedby BIG duringaninformationgatheringsession.Objectsrepresententitiesfrom theapplicationdomain(e.g.softwarepackages,cars)gen-eratedby BIG, aboutwhichit will makeadecision.Objectsmaybeincompletelyspecified;field valuesmaybeuncertainthroughlack of informationor becauseof contradictoryinformation.Theseuncertaintiesare explicitly representedassourcesof uncertaintydatastructures(SOUs)[7,8]. This enablesBIG to plantofind informationthateithercorroboratesthecurrentinformationor reducescon-flicts with thecurrentinformation,therebydecreasingthedegreeof uncertainty.The objectdatabaseis alsousedto storeinformationfrom previoussearches–thusBIG canlearnandimprove/ refineits knowledgeover time.

The server informationdatabasecontainsnumerousrecordsidentifying bothprimary(e.g.,areview site)andsecondary(e.g.,URL searchengine)informationsourcesontheInternet.Within eachrecordarestoredthepertinentcharacteristicsof a particularsource,which consistof suchthingsasits quality measures,re-trieval time andcost,andrelevantkeywords,amongothers.Theserver databaseis usedby thetaskassessorto helpgenerateits initial sketchof informationgath-eringoptionsandagainduringtheactualsearchprocessby theRESUNplanner.

Both the server and object databasesgrow dynamicallyat runtime. At thestart of the experimentalruns describedin Section5.6, the server databaseisseededwith a smallnumber(10 - 20) of genericinformationsources(e.g.,ven-

208

Page 13: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

dor sitesandsearchengines),while the objectdatabaseis empty. New sourcesareaddedto theserver databaseasthey arediscovered,andnew characteristicsaboutknown sources(i.e. averageresponsetime, file size,references)areusedto updateexisting entries.Theobjectdatabaseis grown in a similar mannerbyaddingnew productsasthey arefoundandrevising recordsof known products.Theinformationin thesedatabasesis partof afeedbackloop,whichimprovesthequality of dataavailableto BIG for eachqueryit processes.Theserver databaseis furtheraugmentedby anoff-line spiderprocesswhich fills thedatabasewithsourcesthatmeetgeneral,easilycheckedcharacteristics(i.e.keywordmatching,minimumtextual content).

Blackboard Component TheBlackboardfunctionsasamultileveleddatabasefortheinformationthesystemhasdiscoveredandproducedthusfar. Unlike theob-ject databasementionedabove, the blackboardis a runtimespecifictool - it ismore efficient to access,but the information will be lost when the systemisshutdown. Useful informationfrom the blackboardis thereforesaved into ob-jectdatabasefor futureuse.Ourcurrentblackboardorganizationhasfour levels:User-Goal,Decision,Object,andDocument,in orderof decreasingabstraction.Thelayeredhierarchyallows for explicit modelingof concurrenttop-down andbottom-upprocessing,while maintaininga clearrecordof supportingandcon-tradictoryevidence.Theinformationat agivenlevel is derivedfrom thelevel(s)below it, and it in turn supportsthe hypothesesat higher levels. For example,whenevaluatingtheappropriatenessof aparticulardecisionhypothesis,thesys-temexaminesthereliability of thetext extractionprocessesusedto generatethepropertiesof theobject.The objectsthemselvesareeachsupportedby thevar-iousdocumentsfrom which they weregenerated.Figure4 shows thefour-levelstructureof our currentblackboardandexamplesof the typesof objectswhicharestoredthere.

In this example,theCorel Wordperfect3.5 productobjectin theobjectlevelprovidessupportingevidenceto theCorelWordperfect3.5recommendationmadeat thethedecisionlevel. Thereareseveralkindsof SOUsshown associatedwiththe“Object” and“Document”levelsin thefigure;theseSOUshelpidentify thoseobjectswhich potentiallyrequirefurtherprocessing.The“partial-support-sou”,for example,indicatesthat thereareimportantfeaturessuchasplatformor pro-cessor, missingfrom thisobject.Theproblemsolverwouldatsomepointnoticethis deficiency andattemptto resolve theuncertaintyby retrieving andprocess-ing relateddocuments.

Task AssessorThe task assessoris responsiblefor formulating an initial infor-mationgatheringplan andfor revising the plan asnew informationis learned.Thetaskassessormanagesthehigh-level view of theinformationgatheringpro-cessandbalancestheend-to-end,top-down constraintsof theDesign-to-Criteriaschedulerandtheopportunisticbottom-upRESUNplanner(bothdiscussedbe-low). It heuristicallygeneratesa network of high-level planalternativesthatarereasonable,giventheuser’s goal specificationandthedesiredperformanceob-jectives,in termsof timedeadlineandinformationcoverage,precisionandqual-ity preferences.

209

Page 14: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

User-Goal

Decision

Object

Price: $200

Platform: Mac

Product: Corel Wordperfect 3.5

Action: Buy

Confidence: 0.775

Price: $159.95

Platform: Macintosh

Overall quality:1.87

Product name: Corel WordPerfect 3.5

partial-support-sou: missing important features

uncertain-support-sou: uncertain about some features

Genre: Word Processing

Text-content: <page text>

Document URL: http:search.outpost.com/search...

no-support-sou: missing all information besides name

no-explanation-sou: no object supported by this document

Fig. 4. BIG’sBlackboardStructure

TheTÆMS [18] taskmodelinglanguageis usedto hierarchicallymodeltheinformation gatheringprocessand enumeratealternative ways to accomplishthehigh-level gatheringgoals.Thetaskstructuresprobabilisticallydescribethequality, cost,anddurationcharacteristicsof eachprimitive actionandspecifyboth the existenceanddegreeof any interactionsbetweentasksandprimitivemethods.TÆMStaskstructuresarestoredin a commonrepositoryandserve asa domainindependentmediumof exchangefor the domainindependentagentcontrol component.In the singleagentimplementationof BIG, TÆMS is pri-marily usedto coordinateandcommunicatebetweenthescheduler(below) thetaskassessor, andtheRESUNplanner.

Design-to-Criteria Scheduler Design-to-Criteria[60–62] is a domain indepen-dentreal-time,flexible computation[30,14,54]approachto taskscheduling.TheDesign-to-Criteriataskschedulerreasonsaboutquality, cost,durationanduncer-tainty trade-offs of differentcoursesof actionandconstructscustomsatisficingschedulesfor achieving thehigh-level goal(s).TheschedulerprovidesBIG withtheability to reasonaboutthetrade-offs of differentpossibleinformationgather-ing andprocessingactivities,in light of theclient’sgoalspecificationandbehav-ior preferences,andto selectacourseof actionthatbestfits theclient’sneedsinthecurrentproblemsolvingcontext. TheschedulerreceivestheTÆMSmodelsgeneratedby thetaskassessorasinput,producesaschedulein softreal-time[62],andreturnsthegeneratedscheduleto theRESUNplannerfor execution.

�The

�For a typical BIG taskstructure,having 25-30primitive actions,scheduletime is on the

210

Page 15: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

resultingschedulemay containsegmentsof parallelactivities whenthe primi-tive actionsentailnon-localprocessing,e.g.,issuingrequestsover thenetwork.Thenon-localactivities canbeembeddedwithin primitive actionsor explicitlymodeledasprimitive actionswith two components,one for initiation andonefor polling to gatherresults,separatedby propagationdelays.This enablestheagentto exploit parallelismwherepossibleandwherethe performanceof theparallelactivitieswill notadverselyaffect thedurationestimatesassociatedwithits activities.

 

In summary, thescheduleris whatenablesBIG to addressreal-timedeadlinesandto trade-off differentaspectsof solutionquality (e.g.,precision,coverage).Theschedulerdoesnotsimply trade-off timeandcost,it is whatdetermineshowthe processshouldbe accomplishedandthe appropriatetime allocationsgivento operationsor particularclassesof operations(e.g., information searchandretrieval versustext processing).

RESUN Planner TheRESUN[7–9] (pronounced“reason”)blackboardbasedplan-ner/problemsolverdirectsinformationgatheringactivities.Theplannerreceivesaninitial actionschedulefrom theschedulerandthenhandlesinformationgath-ering and processingactivities. The strengthof the RESUN planneris that itidentifies,tracks,andplansto resolvesources-of-uncertainty(SOUs)associatedwith blackboardobjects,which in this casecorrespondto gatheredinformationandhypothesesabouttheinformation.For example,afterprocessinga softwarereview, the plannermay posethe hypothesisthat Corel Wordperfectis a Win-dows98wordprocessor, but associateaSOUwith thathypothesisthatidentifiesuncertaintyassociatedwith theextractiontechniqueused.Theplannermaythendecideto resolve thatSOU by usinga differentextractiontechniqueor findingcorroboratingevidenceelsewhere.RESUN’s ability to representuncertaintyassymbolic,explicit factorsthat can influencethe confidencelevels it maintainsfor hypothesesprovidesthecuesfor anopportunisticcontrolmechanismto usein makingcontext-sensitivedecisions.For example,they might beusedto adap-tively engagein more unrestrictedWeb retrieval whena referenceto a previ-ouslyunknown productis encountered,or to engagein differentialdiagnosistodiscriminatebetweentwo softwareproducts’competitive features.

This hintsat an interestingintegrationissue.RESUN’s controlmechanismisfundamentallyopportunistic– asnew evidenceandinformationis learned,RE-SUN may electto work on whatever particularaspectof the informationgath-eringproblemseemsmostfruitful at a giventime.This behavior is at oddswiththe end-to-endresource-addressingtrade-off centricview of the real-time[62]Design-to-Criteriascheduler, a view necessaryfor BIG to meetdeadlinesand

orderof 10 secondson aDigital Alphastation6000. This distinctionis importantbecausethedurationestimatesassociatedwith actionsare

constructedassumingthededicatedefforts of theagent.In caseswheremultiple activitiesareperformedin parallel,and the activities require100%of the local processor, perfor-mancedegradationwill affect theactualrun timesof activities andresultin schedulesthatdo notperformasexpected.

211

Page 16: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

addresstime and resourceobjectives.Currently RESUN achieves a subsetofthepossiblegoalsspecifiedby the taskassessor, but selectedandsequencedbythe scheduler. However, this canleave little room for opportunismif the goalsarevery detailed,i.e., dependingon the level of abstractionRESUN may notbe given room to performopportunisticallyat all. Improving the opportunismvia a two-way interfacebetweenRESUNandthetaskassessoris anareaof fu-turework (Section6). Experimentswith differentcostmodelsfor websitesandschedulingwith different trade-offs, using the the currentinterfacemodel,arepresentedin [44].

To work effectively, BIG must be able to performsearchanddiscovery onthe Web. The searchspacesizeanddynamismof this environmentrequireanagentto 1) respondto datadrivenopportunitiesanduncertaintiesthatarisedur-ing problemsolving,and2) meetreal-timedeadlines,addressresourcelimita-tions,andtrade-off solutionquality for time spentsearching.TheRESUNplan-nerandDesign-to-Criteriaschedulercombineto providethesecapabilities.If theenvironmentwerestatic,asimplescriptwouldbesufficientto controltheagent’ssearchprocess.

WebRetrieval Interface The retriever tool is the lowestlevel interfacebetweenthe problemsolving componentsand the Web. The retriever fills retrieval re-questsby eithergatheringthe requestedURL or by interactingwith with bothgeneral(e.g.,InfoSeek),andsitespecific,searchengines.

DocumentClassifiers To moreeffectively utilize theprocessingpower availableto it anddecreasetheprobabilityof analyzingunrelatedinformation,BIG prunesthesetof documentsto beprocessedthrougha seriesof filtering stepsthat im-poseprogressively increasingprocessingdemandsandqualitystandards.Duringeachstage,a testis performed,which will prevent thedocumentfrom reachingthenext stageif it fails.At thelowestlevel is a simplekeywordsearchin there-trieveddocument’s content.If thedocumentfails to containany of thesuppliedkeywordsit will fail thetest.This is followedby amoresophisticatedcheckby aNaiveBayesclassifier, whichis coveredin detailin Section5.1.TheNaiveBayesdocumentclassifierperformsstatisticaltext classificationandis providedwith asetof positiveandnegativetrainingdocumentsasinput.Beforeperformingclas-sification,theclassifierindexesthedataby readingthe trainingdocumentsandarchiving a “model” containingtheir statistics.A documentwhich passesthesechecksis thenplacedonBIG’s blackboard.Documentsselectedfrom theblack-boardwill thenbe processedby oneor moreof the text extractionknowledgesources.Theexactsetof extractorsappliedto thedocumentis governedby thedocument’s source,or if thesourceis unknown to BIG, all extractorsareused.This final filtering stageis responsiblefor the fine grainedculling of informa-tion. Pertinentdetailsfrom eachdocumentareusedto augmenttheknown setofproducts,while theremainingcontentis discarded.

Inf ormation Extractors The ability to processretrieved documentsand extractstructureddatais essentialbothto refinesearchactivitiesandto provideevidenceto supportBIG’sdecisionmaking.For example,in thesoftwareproductdomain,extractinga list of featuresandassociatingthemwith a productanda manufac-

212

Page 17: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

tureris critical for determiningwhethertheproductin questionwill work in theuser’scomputingenvironment,e.g.,RAM limitations,CPUspeed,OSplatform,etc.BIG usesseveralinformationextractiontechniquesto processunstructured,semi-structured,and structuredinformation

¡. Documentsin generalare used

by BIG in two differentcapacities:productdescriptionsandreviews. Dif ferenttechnologiesoptimizedfor useon eitherof thesedocumentclassesareusedtoprocessthetwo typesof documents.We determinethetypeof somedocumentsby analyzingthe site of origin. For thosedocumentswith unknown type, bothreview anddescriptiontechnologiesareappliedon them.

The informationextractorsare implementedasknowledgesourcesin BIG’sRESUNplannerandareinvokedafterdocumentsareretrievedandpostedto theblackboard.Theinformationextractorsare:textext-ks This knowledgesourceprocessesunstructuredtext documentsus-

ing theBADGER [56] informationextractionsystemto extractparticularde-sireddata.Theextractioncomponentusesa combinationof learned,domain-specificextractionrules,domainknowledge,andknowledgeof sentencecon-structionto identify andextract the desiredinformation.The BADGER textextractorutilizes knowledgegainedfrom a training corpusaswell asa lex-icon/dictionaryof domainwordsandtheir classificationsin a semantichier-archy. This componentis a heavy-weight NLP-styleextractorthat processesdocumentsthoroughlyand identifiesuncertaintiesassociatedwith extracteddata.

Ourmaincontribution in thisareais how theextractedinformationis madeusefulto the restof thesystemby meansof back-endprocessing.Theback-endtakestheextractionsmadeby thesystemandprovidesthedegreeof be-lief for eachextraction.Thedegreeof belief indicatesthelevel of confidencethat theextractionis accurateandis a functionof thenumberof positiveandnegative training examplescoveredby all the rulesthat supporta particularextraction.Using thedegreeof beliefsasthresholds,we determinewhich oftheextractionsarevalid andalsocomputethecertaintymeasureof theentiretemplate.Also, the processedinformation supportsopportunisticcontrol inthesensethatnewly discoveredinformationcould leadto theexaminationofacompletelydifferentpartof thesolutionspacethanbefore.

grep-ks This featherweightKS scansa giventext documentlooking for a key-word thatwill fill theslot specifiedby theplanner. For example,if theplannerneedsto fill a productnameslot and the documentcontains“WordPerfect”this KS will identify WordPerfectastheproduct,via a dictionary, andfill theproductdescriptionslot.

cgrepext-ks Given a list of keywords,a documentand a productdescriptionobject, this middleweight KS locatesthe context of the keyword (similar toparagraphanalysis),doesa word for word comparisonwith built in semanticdefinitionsthesaurusandfills in theobjectaccordingly. Thecgrepknowledge

¡The widespreadadoptionof XML and other structuringspecificationsfor web docu-

mentswill helpto simplify theproblemof processingweb-basedinformation.

213

Page 18: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

sourceusesa lexicon/dictionarysimilar to thatof BADGER.tablext-ks This specializedKS extractstablesfrom html documents,processes

the entries,and fills productdescriptionslots with the relevant items.ThisKS is built to extract tablesand identify tableslots for particularsites.Forexample,it knows how to processtheproductdescriptiontablesfoundat theBenchinreview site.

quick-ks This fastandhighly specializedKS is constructedto identify andex-tract specificportionsof regularly formattedhtml files. The quick-ksutilityessentiallyactsasa wrapperto certainwebsites.It hasknowledgeabouttheinformationstructureeachsite employs, andcanefficiently extract pertinentinformationfrom thesesources.Theprimarydrawbackto sucha techniqueistheinherentdifficulty in constructingsuchwrappers.In our system,a humanexpert must inspectthe sitesin question,deducethe structureof the pages,andthenencoderulesto extractthedesiredinformation.Clearlythis is a laborintensiveprocess,andonewhichmustberepeatedfor eachwebsiteto betar-getedandeachtime a targetedwebsitealtersits format.We choseto employthis techniquebecausea relatively small numberof sitescould be targetedto producea significantamountof high quality information.Recentresearchin [48] hasshown methodswhich canbeemployedto simplify wrappercon-structionandrevision, which couldsignificantlyreducethe amountof effortthis technologyrequires,thusmakingit moreviablein a largescalesystem.

DecisionMaker After product information objectsare constructed,BIG movesinto the decisionmakingphase.In the future, BIG may determineduring de-cision making that it needsmore information,perhapsto resolve a source-of-uncertaintyassociatedwith an attribute that is the determiningfactor in a par-ticular decision,however, currentlyBIG usestheinformationat handto make adecision.We discussthedecisionprocessin greaterdetail in Section5.5,how-ever, the decisionis basedon a utility calculationthat takes into accounttheuser’s preferencesandweightsassignedto particularattributesof the productsandtheconfidencelevelassociatedwith theattributesof theproductsin question.Notethatwe do not rigorouslyevaluatethefinal decisionsthatBIG producesinthis paper, aswe feel theissueis highly subjective.Any selectedproductfallingwithin or closestto theuser’s desiredparametersis consideredavalid choice.

All of thesecomponentsareimplementedandintegratedin BIG. Theconstruction,adaptation,andintegrationof thesecomponentswasanon-trivial process.BIG is alarge,complex, problem-solvingagentthatincorporatesmany areasof AI researchundera singleumbrella.

¢The culminationof theseefforts in BIG hasproduced

¢BIG is implementedin C++,Perl,Common-Lisp,andJava. It is run on anAlphastation

6000with 512megabytesof RAM andrequiresnon-trivial computingresources.However,in termsof performance,little time hasbeenspentoptimizing the system(exceptingtheDTC scheduler)andoptimizationcould reducethe overheadinvolved with runningBIGandimprove BIG’sability to makebetteruseof allocatedrun-time,i.e., it wouldbeabletosearchmoreor extractmoregiventhesameresourceallocation.

214

Page 19: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

aninterestingresearchtool, but theintegrationhasalsoinfluencedandrefinedtheresearchdirectionspertainingto theindividualcomponentsaswell.

4 Execution Trace

We now describea shortsamplerun of the BIG systembasedon the high levelexampledescribedin theintroduction(seeFigures1 and2), to betterillustratethemechanismsusedbothwithin andbetweenBIG’s components.Theclient is a stu-dentwhousesthesystemtofindawordprocessingpackagewhichwill mostcloselysatisfyasetof requirementsandconstraints.For clarity of presentationwedescribethe exampletracein following sequentialstages:querying,planning,scheduling,retrieval, extraction and decision-making.It is importantto note that the detailsgiven below are a representative exampleof BIG’s problemsolving techniques,andthatthespecificsequenceof actionsis highly dependenton theparticularcon-straintsandenvironmentcharacteristicsBIG encounters.

4.1 QueryFormulation

Queryprocessingis initiatedwhentheclient specifiesandsubmitsthesearchcri-teria,which includesthedurationandcostof thesearchaswell asdesiredproductattributessuchasprice, quality featuresandsystemrequirements.In this exam-ple the client is looking for a word processingpackagefor a Macintoshcostingno morethan$200,andwould like thesearchprocessto take tenminutesandthesearchcost to be lessthanfive dollars.The client alsodescribesthe importanceof productprice andquality by assigningweightsto theseproductcategories,inthis casethe client specifiedthat relative importanceof price to quality was60%40%respectively. Productquality is viewedasa multi-dimensionalattributewithfeatureslikeusefulness,futureusefulness

£, stability, value,easeof use,powerand

enjoyability constitutingthedifferentdimensions.Suchcharacteristicsareobserv-ablethroughspecializedanalysistechniquesusedduring theextractionphase.Asseenin Figure1, thesequalitiesareall equallyweightedat 50 units.Theseareas-signedrelative weightsof importance.Theclient specifiestherelative importanceof productcoverageandprecisionas20%and80%respectively.

£Thisrelatesto theopennessof thesoftwareproductto becompatiblewith newerversions

of supportingsoftwareandoperatingsystem

215

Page 20: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

14Get_Information

Get_Basic_Information

Find_Additional_Object

Query_To_ServDB

Query_To_Cybout

Query_To_Macmall

Query_To_Warehouse

Look_For_Materials

Send_Query_Macmall Get_Back_Macmall

Quick_Extract_Information

Get_More_Object

Get_Extra_Information

Make_DecisionBenchin_Review

Satisfy_User_Query

Seq_Sum

Seq_Sum

Seq_Sum

Sum

Sum Sum

Sum

Sum

Sum

Min

task

method

enable

Quick+Textext+CGrep Quick+CGrepSum

Detail_Product_Information

Review_Product_Quality

User_Review

Advanced_FeatureGet_More_Detail_2

Search+More_Detail

Sum

SumSum

MQMD_6

LQLD_7

LQLD_11

LQLD_21

HQHD_8

Get_More_Detail_2

Get_More_Detail_5

HQHD_5

HQHD_15

MQMD_9 MQMD_19Query_To_Maczone

3 6

2

1 4

5

7

9

10

11

128

13

Fig. 5. BIG’sTÆMSTaskStructurefor theShortRun

4.2 PlanConstruction

Oncethe query is specified,the taskassessorstartsthe processof analyzingtheclient specifications.Using its knowledgeof RESUN’s problemsolving compo-nentsand its own satisficingtop-down approachto achieve the top level goal, itgeneratesa TÆMS taskstructurethat it finds mostcapableof achieving the goalgiventhecriteria(ataskstructurehereis akinto anintegratednetworkof alternativeprocessplansfor achieving a particulargoal).Althoughnot usedin this example,knowledgelearnedin previous problemsolving instancesmay be utilized duringthisstepby queryingthedatabaseof previouslydiscoveredobjectsandincorporat-ing this informationinto thetaskstructure[42].

Figure5 showstheTÆMStaskstructureproducedby thetaskassessorin responseto theclient’squeryandthecurrentinformationin theobjectandserverinformationdatabases.Thetop level taskis to satisfytheuser’squery, andit hasthreesubtasks:Get-Information, Benchin-Review, andMake-Decision. The threesubtasksrepre-sentdifferentaspectsof the informationgatheringandrecommendationprocess,namely, finding informationandbuilding productmodels,finding reviews for theproducts,andevaluatingthemodelsto make a decision.Thethreesubtasksarere-lated to Satisfy-User-Queryvia a seqsum()quality-accumulation-function(qaf),which defineshow quality obtainedat thesubtasksis combinedat theparenttask.Someqafs, like seqsum()specify the sequencein which to performsubtasksinaddition to the differentcombinationsthat may be employed (the seqstandsfor“sequence”).Seqsum()specifiesthatall of thesubtasksmustbeperformed,in or-der, andthat thequality of theparenttaskis a sumof thequalitiesof its children.Theformaldetailsof TÆMSarepresentedin [17,16],theevolving specificationis

216

Page 21: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

at [57], andotherTÆMSexamplesappearin [41,61,62].

The Get-Informationtask has two children, also governedby a seqsum(). Thedottededgeleadingfrom Get-Basic-Informationto Get-Extra-Informationis anenablesnon-local-effect (task interaction

¤) denotingthat Get-Basic-Information

mustproducequalitybeforeGet-Extra-Informationcanbeperformed.In thiscase,it modelsthenotionthatproductmodelsmustbeconstructedbeforeany time canbespentdoingoptionalor extraactivities like improving theprecisionof theresultor increasingthe informationcoverage(discussedin Section5.2). Choicein thistaskstructureoccursany time tasksaregroupedunderasum()qaf (therearemanyotherqafsthatentail choice,but they arenot usedin this example).For example,Look-for-Materialshassix subtasksundera sum(), which meansthat any com-binationof thesesubtasksmay be performedandin any order(barringdeadlineson individual tasksor task interactions),i.e., the power-set minus the empty-setmay be performed.Likewise with the childrenof Get-More-ObjectsandDetail-Product-Information. Alternativechoicesaboutwhereto search,how many placesto search,whichmethodsto employ while searching,which informationextractiontechnologiesto use,thenumberof reviews to gatherfor products,andsoforth areall modeledin TÆMS.This is alsowhatgivesBIG theability to target its perfor-mancefor particularsituations.For example,in asituationwherearesultis desiredby a tight deadline,theDesign-to-Criteriaschedulerwill analyzethetaskstructureandfind a solutionpaththat “best” trades-off quality for durationandcost.Thereis anotherelementof choicein BIG, it is in the level of abstractionthat is usedinthecreationof theTÆMS taskstructure– a taskassessorcomponentdetermineswhich aretheoptionsthatareimportantto enumerateandthegranularityof whatis includedin a leaf-node(primitiveaction).

4.3 ScheduleGeneration

Oncegenerated,thetaskstructureis thenpassedto theschedulerwhichmakesuseof theclient’stimeandcostconstraintsto produceaviablerun-timescheduleof ex-ecution.Comparative importancerankingsof thesearchquality, costanddurationsuppliedby theclientarealsousedduringschedulecreation.Thesequenceof prim-itiveactionschosenby theschedulerfor thistaskstructureis alsoshown in Figure5.Thenumbersnearparticularmethodsindicatetheir assignedexecutionorder. Thescheduledtimeandactualexecutiontimeof eachmethodareshown in Table1.The¤

Thefull rangeof taskinteractionsexpressiblein TÆMS werenot exploitedby thetaskassessorcomponentin modelingtheplanner’s activities. Onesetof interactionsinvolvingfacilitation/hindering we hopeto usein futureversionsof BIG. Theserelationshipsallowusto modelthefact that thedegreeof quality producedby a primitive taskwill affect in apositive/negative way thebehavior of otherprimitive tasks.Anotheraspectof TÆMSthatcouldbepotentiallyusefulin modelingIG activities is theability to representdifferentout-comesassociatedwith a task,eachof whichcanhave differenttypesof taskrelationships.

217

Page 22: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

differencesin thesecolumnsareattributableto two differencesourcesof impreci-sion.Thefirst is simply thelocalvarianceassociatedwith web-relatedactivities.

�¦¥

Thesecondsourceof imprecisionis thebalancedinterfacebetweentheDesign-to-CriteriaschedulerandtheopportunisticRESUNplanner. To giveroomfor RESUNto respondto datathatis extractedduringthesearchprocess,someof theprimitiveactionsscheduledby DTC arenot actuallyprimitive actions.Someof theactionsareinsteadabstractionsor blackboxesdenotingbundlesof activities.This enablesRESUNto determineparticularbindingsasappropriategiventheevolutionof dataduring the problemsolving episode,i.e., to be datadriven within the confinesoftheactivities scheduledby DTC. Oneview is thatDTC definesa high-level policyfor RESUNthat definesthe major steps,andresourceallocationsto these,of theinformationgatheringprocess.This interfaceis what enablesBIG to respondtoSOUs(sourcesof uncertainty)associatedwith extractedinformationandto makedecisionsduringthesearchaboutwhich informationto gatheror which extractionprocessesto run– all while still stayingwithin thetimeandresourceguidelinessetby thescheduler. Thus,atonelevel theschedulecanbethoughtof asaspecificationof apolicy thatgovern’sRESUN’s activities.

4.4 InformationRetrieval andExtraction

The scheduleis thenpassedto theRESUNplanner/executorto begin the processof informationgathering.Retrieval in this examplebegins by submittinga queryto a known informationsource,MacZone(www.zones.com),a computerretailer.While this informationis beingretrieved,asecondqueryis madeto anotherretailersite,CyberianOutpost(www.outpost.com),anda third queryis madeto MacMall(www.macmall.com)site.Generally, queriesto suchsitesresultin a list of URLs,whereeachURL is accompaniedby asmallamountof text describingthefull doc-ument.This informationis combinedwith thequerytext andany otherknowledgetheagenthasaboutthedocumentsuchasrecency, length,numberof incominglinksetc. to form a documentdescriptionobjectthat is thenput on the RESUNblack-boardfor considerationby otherknowledgesources.Thequeryto MacZoneresultsin 56 documentdescriptionsbeingplacedon the blackboard,the queryto Cybe-rian Outpostresultsin 78 documentdescriptionsbeingplacedon theblackboard,while theMacMall queryresultsin anadditional86 documentdescriptionsbeingadded.Outof thesecandidatedocumentdescriptions,13documentsarechosenforMediumQuality(MQMD)9 processing.Thischoiceis madeheuristicallyby exam-ining thekeywordscontainedin theURL labelandvia apreferencefor certainwebsites(thosethathaveyieldedusefulresultsin thepast).To identify documentsmostlikely to yield productdescriptions,otherheuristics,suchasdocumentrecency andlengthcouldalsobeused.

�¦¥While thewebexhibitsstrongstatisticaltrendsduringthecourseof aday, e.g.,increasing

delaytimearoundmid-day, theremaybelocal variancethatis difficult to predict.

218

Page 23: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

MethodName ScheduleTime ExecutionTime

Scheduling 8

SendQuerymaczone 1 1

SendQuerycybout 2 1

SendQuerymacmall 1 0

SlackMyTime 27 27

Get Back maczone 19 19

Get Back cybout 20 22

Get Back macmall 19 8

Medium Quality Duration9 72 67

High Quality Duration5 51 49

Get More Detail 2 34 10

Get More Detail 2 35 58

Get More Detail 5 76 76

User-Review-Method 127 144

Benchin-Review-Method 137 137

Make-Decision 1 2

TotalTime(requesttime600) 622 629Table1TimeUsedfor SchedulingversusActualExecutionTime in Seconds

The thirteendocumentsare thenretrieved andrun througha documentclassifierto determineif they are indeedword processorproducts;four documentsarere-jectedby the classifier. Two of the rejecteddocumentsare translationpackages,oneis a descriptionof a scanningOCRsoftware,andtheotherproductis a speechrecognitionpackage.Thesedocumentscontainenoughnon-wordprocessorrelatedverbiageto enablethe classifierto correctly reject it asa word processingprod-uct. The nine remaining(un-rejected)documentsare postedon the blackboardfor further considerationand processing.For example,one of thesedocumentsis: http://search.outpost.com/search/proddesc.cfm?item=16776; aMediumQualityMediumDuration(MQMD)text extractionprocessis performedonthedocument.Theprocessinvolvesusingquickext-ksandcgrep-ksin sequencetocreateaninformationobjectthatmodelstheproduct.A furtherexampleof thetypeof informationwhich is storedon the blackboardcanbe seenin Figure4. Afterquickext-ks runs,thefollowing objectis posted:

Product Name : Corel WordPerfect 3.5Price : $159.95

219

Page 24: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

DiskSpace : 6MBProcessing Accuracy(Rating):

PRODUCTID=0.8 PRICE=1.0 DISKSPACE=0.8

Thecgrep-ksfindsextra informationabouttheproduct’s processor, platform,andother miscellaneousrequirements.It also finds corroboratingproductnameandprice information; this increasesthe processingaccuracy

§¨§of theseslots.After

applyingcgrep-ks:

Product Name : Corel WordPerfect 3.5Price : $159.95DiskSpace : 6MBProcessor : -Platform : macintosh power_macintosh system_7_or_highermisc requirement:(cd ram)Processing Accuracy(Rating):

PRODUCTID=1.4 PRICE=1.6 PROCESSOR=0.0DISKSPACE=0.8 PLATFORM=2.0 MISCREQ=1.2

Eight other productsare found and postedon the blackboardduring the execu-tion of the MQMD 9

§¦©method.Similarly, the methodHighQualityHighDura-

tion(HQHD) 5 retrievessix documents,rejectsone,processesfive documentsandpostsfive moreproductson theblackboard.At this point thesystemhasa total of14competingproductobjectsontheblackboardwhichrequiremorediscriminatinginformationto make accuratecomparisons.The systemhas,in effect, discovered14 wordprocessingproducts.

Thoseobjectswhich areupgradesareimmediatelyfilteredout sincetheclient didnotspecifyaninterestin productupgrades.Also, thoseproductswhicharecertainlynot for theMacintoshplatformarediscarded.Subsequenteffortsarefocusedontheremainingsix products.

ThethreemethodsGet More Detail 2, Get More Detail 2 andGet More Detail 5make queriesto “yahoo” and “infoseek” aboutthe remainingproductsand findsomereview documents.A review processknowledgesourceis appliedon everyreview documentto extract information.Theextractedinformationis addedto theobject,but not combinedwith existing datafor thegivenobject(discrepancy reso-lution of extracteddatais currentlyhandledatdecisiontime).For eachreview pro-cessed,eachof theextractorsgeneratesa pair, denotedª ProductQuality, Search§¨§

Theprocessingaccuracy valuesarea functionof the quality of thedocumentsandex-tractorsusedto derive theinformation.Whenobtainedfrom asinglesource,thevaluesarenormalized.Concurringinformationfrom differentinformationwill resultin theindividualratingsbeingaddedto form thejoint rating.§¦©

Thereis informationavailableaboutthequalityandprocessingdurationof documentstodecidewhich documentsshouldbeselectedfor processing.In thecurrentversionof BIG,we did not implementthis feature

220

Page 25: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

Quality« in theinformationobjectspicturedbelow. ProductQuality (PQuality)de-notesthequalityof theproductasextractedfrom thereview (in light of theclient’sgoalcriteria),andSearchQuality (SQuality)denotesthequality of thesourcepro-ducingthereview. For example,if a review ravesabouta setof featuresof a givenproduct,andthe setof featuresis exactly what the client is interestedin, the ex-tractorwill producea very high valuefor theProductQuality memberof thepair.Currentlythe SourceQuality is determinedbasedon the referencenumberof thedocument(seeSection5.5), the morewidely a documentis referenced,the morehighly it is rated.

For example,four documents(above)arefound.

¬ http://www.mpp.com/mediasoft/keystone/cwp7off.htm¬ http://www.osborne.com/whatsnew/corelwp.htm¬ http://www.cdn-news.com/database/main/1997/2/24/0224010N.html¬ http://www.corel.com/products/wordperfect/

Thesedocumentsareprocessedasreview documentsfor product“Corel WordPer-fect3.5” andtheresultantproductobjectis:

Product Name : Corel WordPerfect 3.5Price : $159.95DiskSpace : 6MBProcessor : -Platform : macintosh power_macintosh system_7_or_highermisc requirement:(cd ram)Processing Accuracy(Rating):

PRODUCTID=3.8 PRICE=1.6 PROCESSOR=0.0DISKSPACE=0.8 PLATFORM=2.0 MISCREQ=1.8

Review Consistence:(((PQUALITY 2) (SQUALITY 3))((PQUALITY 1.2857143) (SQUALITY 2))((PQUALITY 1.2857143) (SQUALITY 2))((PQUALITY 2) (SQUALITY 2)))

Actually, not all of thesefour documentsareproductreviews; oneof themis a listof all Corel WordPerfectproducts.This is causedby the weaknessin generalofsearchenginesandnaturallanguageprocessingtechnologies.In this case,theonlyconsequenceof the incorrectcategorizationof the documentis that we obtainnoinformationafter we processedit with the review extractionknowledgesources.Thus it is necessaryto get information from somespecificproductreview sites.TheUser-Review-Method methodqueriestheBenchinsite,producingfour re-views which areprocessed.The documenthttp://www.benchin.com/ $in-

dex.wcgi/ prodrev/1112163, which includes60 users’ reviews, is selectedand processedfor the product “Microsoft Word 6.01,” Word 6.01 being one ofthesix competingproductsstill underconsideration.Thenew review information((PQUALITY 2.857143) (SQUALITY 3)) isaddedto the“Microsoft Word6.01”

221

Page 26: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

Fig. 6. BIG’sFinal Decisionfor SampleRun

object.Similarly, Benchin-Review-Method sendsqueriesto the Benchinstarreview site (which usesa starrating systemthat is simpleto process),producingreview informationfor four differentproducts.

4.5 Decision-Making

After thisphase,thefinal decisionmakingprocessbeginsby first pruningthesetofproductobjectswhichhaveinsufficient informationto makeaccuratecomparisons.Thedatafor theremainingobjectsis thenassimilated.Discrepanciesareresolvedby generatinga weightedaverageof theattribute in questionwheretheweightingis determinedby the quality of the source.The detaileddecision-makingprocessis describedin Section5.2.The final decisionis shown in Figure6. The decisionconfidenceis not very high becausethereis a competingcandidate“Nisus Writer5.1 CD ROM with Manual” whoseoverall quality is only slightly lessthan thatof “Corel WordPerfect.” This closecompetitiondegradesthe decisionconfidencebecauseonly slight variationsin searchor extractionactivities couldhave resultedin a differentdecision.

As will be seenin the next two sections,this informationgatheringprocesscanchangesignificantly basedon the specificproductspecifications,the amountof

222

Page 27: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

time thattheuseris willing to have thesystemsearch,andthecoverage,precisionandquality criteria thatarespecified.Thoughtnot emphasizedin thedescription,the amountof money the useris willing to spendaccessinginformationsourcesthat charge on a per-accessbasiscan also be factoredinto the generationof aninformationgatheringplan.Experimentsandexamplesappearin [44].

5 Research Issuesin BIG’ s Designand Performance

In this sectionwe presentanddiscussempiricalresultsthatdemonstratetheflexi-bility andextensibility of theBIG approachto informationgatheringfor decisionsupport.Sections5.1 and5.7 addressthe issueof domainspecificknowledgeandgeneralityin BIG. Section5.1 discussesthe importanceof documentclassifica-tion in improving systemperformanceby filtering out inappropriatedocuments.Section5.7 shows that with little additionaltraining,new softwaregenrescanbeaddedto BIG’s library of expertise.Section5.2demonstratesBIG’sflexibility withrespectto precisionandcoverage.Thesectionshowsthatappropriategenerationofa TÆMStaskstructure,by the taskassessor, allows theDesign-to-Criteriasched-uler to evaluateprecisionandcoveragetrade-offs andto meettheclient objectiveswith respectto these.Sections5.3and5.4discusshow informationfusionandop-portunismmanifestin BIG’s informationgatheringactivities. Section5.5 detailstheprocessthatBIG usesto evaluateclient requirementsandmakeits final productselection.Section5.6 demonstratesempirically that the systemaccuratelyadaptsits processingto addressclient searchrequirementsandsuccessfully.

It is importantto notethatthefollowing sectionsdonotattemptto evaluatesystemperformancethroughcomparisonwith anoracle,in which thebestpossibleanswerhasbeenfound by the oracleandthe systemis evaluatedbasedon the proximityof its final answerto the solution returnedby the oracle.Given that optimal an-swersaredifficult to obtainin this environment,andthattheoverall objective is tosupplementa decisionprocess,theperformancemetricby which BIG is evaluatedis generallywhetheror not the resultsarereasonablefor a givensearch/query. Inalmostall of thesituationsthatwe have examined,BIG producesanswersthatareconsideredreasonableby ahumandecisionmaker for thegivensearchandproductcriteria.

5.1 TheImportanceof DocumentClassification

Until recently, BIG hasbeenplaguedby an interestingextractionproblemwhendealingwith productsthat arecomplimentaryto the classof productsin which aclient is interested.For example,whensearchingfor wordprocessors,BIG is likelyto comeacrosssupplementarydictionaries,wordprocessortutorials,andevendoc-

223

Page 28: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

umentexchangeprogramslike AdobeAcrobat.Theseproductsaremisleadingbe-causetheir productdescriptionsandreviewsoftencontainterminologythatis verysimilar to theterminologyusedto describemembersof thetargetclass.WhenBIGprocessesoneof thesemisleadingdocuments,it getsdistractedandfutureprocess-ing is wastedin anattemptto find moreinformationaboutaproductthatis notevenamemberof thetargetclass.For example,if BIG encountersa referenceto AdobeAcrobatwhensearchingfor word processors,andthenelectsto retrieve theprod-uct descriptionfor Acrobat,the extractiontechniquesarelikely to yield datathatseemsto describea word processor. Subsequently, BIG mayelectto gathermoreinformationonAcrobat,furtherdegradingtheoverallefficiency of thesystem.Ex-perimentsindicatethat this type of distractioncanbe reducedthroughthe useofa documentclassifierbeforetext extractionis performedon candidatedocuments.Documentsthatdonotseemto bemembersof thetargetclassarerejectedandtextextractionis not performedon them– thusno new distractinginformationobjectsareaddedto BIG’s blackboard.

Figure7 providesa sampleof our initial results.BIG wasrun in threedifferentmodes:1) BIG alone,2) BIG with theuseof a simplegrep-like pattern-matchingfilter to classifydocuments,3) BIG with theuseof Naive Bayesdocumentclassi-fier [13] andthesimplegrepfilter. Thegrep-like filter examinesthedocumentforinstancesof termsthatdescribethesoftwaregenrein question,e.g.,“word proces-sor.” Thesetermsarehandproducedfor eachquerygenre– in essence,hardwiredinto the system.In contrast,the documentclassifieris trainedusingpositive andnegativeexamples– it learnsterm-basedsimilarity anddifferencemeasures.In allthreemodes,BIG hasdecidedthat it hastime to process13 documentsin total forthegivensearchparameters.Whenfiltering andclassificationof documentsresultsin certaindocumentsbeingrejected(rows two andthreein Figure7), a largercor-pusof documentsis examined(44and74respectively) to obtainthetargetnumberof documents.

­¯®±°�²�°�³�´ °�µ ¶¸·�¹»º,¼�½ ¾�¿6¾�¼�À Á� ÃmÄ�Å Ä�Æ�Ç Ä�ÈÊÉmËSÌ�ÈCÍ�Æ�ÇÎÊÏCÐSÑ Ò�ÓCÒ�Ô�Õ�Ô�ÖØ×�Ù�Ñ ×�Ï�Ú Û�Ð@ܯÝ%Ï�ÞCÒ�ß Ô

àáCâSãåä�æ�ç6è�é»ê àåë¦é�áCâ�ì í�îðïåç6ñ�ò ç6á�é�è�â@ë�ó�á¦ô�õ ö ä÷ùøÊúmû�ü ý þ�ÿ�ø�ÿ��,ü ����� û�� û6þ�ÿ ����� ������������������ ���� "!#��$ %�&'��(�) �����*���,+.-/�"0�12$ � 354�6�7 8*9�8*:�;�:*<�=�>�7 =�4�?2@�6,A.B/4"C�82D :

EGF*H2I'J�K L2KNM'J�O�P J�Q�R*S�T,U�V5Q"W W H�O�P J�Q�RXZY'[Z\�] ^ _"^�` ^�a�_.['^�b�` ^�a�_*c�d,e

f2g2h�i2j g�k�l"m noqp2r�s s t u*v.w5x"y�z�{.|~}�x��*� �

�q� ���*� �*�~��� � ��� �"��� �"� ����� �,���q���*�������"� � �2�2�� G¡ ��¢�£"¤ ¥¦�§�¨ª©*¨¬«®­,¯�° ±�­³²�´ µ.¶¸·º¹�»G¼¾½G¯�° ¿À¼ÂÁ�Ã�Ä2Á�Å

ÆÈÇ�É�Ê�Ë�Ì�Ë�Í,ÎÐÏÒÑ�Ó�ÔÀÕ"Ö ×"ØÙ5Ú Û�Ü,Ý�Þ ßÒà�á,â2ãåä�æèç é�ê�ëíì�î ï

ð5ñ�ò�ó�ô�õºö�÷�ø�ù�ú�÷�û,ú�ü�ý¬þ"ÿ ����������� �������� � ��������� ��� � �"! ����� #%$ &"')( *�+�,.-�,0/2143"5 6�187:9 ;=<�>@?BA�CED�3"5 FGCIH�J:KLH�M N�O P�Q4R�S.TVU)W4X�Y[Z�\^] _"`�a8b%c d

e�f g�h�ikjVl m�n)oqp�m�r4p�s�t0u%v w

Fig. 7. Advantagesof DocumentClassification

224

Page 29: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

In the first run, shown in the figure,neitherfilter nor classifierareused.All doc-umentsretrievedareprocessedby the informationextractors.Noneof thetop fiveobjectsin this testcasearemembersof the target productclass– they areall re-latedto word processorsbut noneof themis actuallya word processingproduct.Clearly, BIG doesverypoorlywhenrelyingonoutsidesourceslikevendor’ssearchenginesto classifyproducts.In the secondrun, the simplegrep-like filter is usedto checkdocumentsbeforeprocessing;31 documentsarerejectedby thefilter andthe overall resultsare a little better. Thereare word processingproductsamongthe candidates,but the selectedproductis not a word processor. In the last run,bothclassifierandfilter areusedto checkdocuments;53 documentsarerejected.All of thetop-rankedcandidatesarewordprocessingproductsandthetopproduct,“ClarisWorksOffice5.0” is anintegratedofficesuit thatincludesawordprocessingpackage.

Clearly, documentpre-classificationis necessaryto filter retrieveddocumentsbe-fore they areusedto produceproductobjects.Vendorsearchenginesaretypicallykeyword basedandare thereforeproneto returnnumerousproductsthat arenotmembersof thetargetclassbut areinsteadrelatedor supplementaryproducts.Im-proving the classificationof documentsandwideningthe training corpusfor theclassifierareareasof futuredevelopment.

Theclassifiercanbeappliedto otherdomains,however, it requiresa new trainingcorpus.This is true with other text processingknowledgesourcesanddocumentclassifiersaswell - componentsbasedon statisticalpropertiesof text (akin to IRtf/idf statistics)requiretraining corporain order to apply themto a differentdo-main.While suchtraining requireshands-onpersonhours,it is reasonableto as-sumethata library of suchclassificationsfor new domainscouldbecompiledovertime,allowing thecapabilitiesto grow asneeded.

To explore this issue,we addednew genresto BIG’s library of expertiseusingasimpleprocedure.A queryfor thenew genre,e.g.,imageeditingsoftware,is givento BIG. BIG thengathersinformationonimageeditingsoftwareby submittingvar-iousspecifiedkeywordqueriesto generalsearchenginesandby lookingatsoftwaremakersandreview sites.Of course,whenBIG retrievesthedocuments,they arefil-teredoutby BIG’sexistingsetof documentclassifiers.However, thisprocessyielda largepool of documentsthatcanthenbeclassifiedby handandusedto train thedocumentclassifiersonthenew genre.Usingthisprocess,it is possibleto integratea new softwaregenrein a little lessthananhour’s time. The text extractiontoolsaregenericenoughto handlethenew genreandno new trainingdocumentsor ad-ditions to the lexicon wererequired.Currently, no specialtools arebeingusedtoautomatethis processof integration.The performanceof the systemon the newgenreis describedin theexperimentalresultsin Section5.7.As partof our futurework, we foreseedevelopingmechanismsto allow usersto provide feedbackaboutthecorrectnessof thedecisionprocessandwhich productsselectedby thesystemarein theball-parkfor anew genre.This informationcanbeusedincrementallyas

225

Page 30: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

wegetnew/moreuserswho accessthis genre.

5.2 PrecisionversusCoverage

Precisionversuscoverage is anissueoftendiscussedin literaturerelatingto infor-mationgatheringor informationretrieval. In the BIG context, oncea satisfactoryamountof informationhasbeenprocessedto supporta high quality decisionpro-cess,the issuebecomeshow bestto spendtheremainingtime, cost,or othermostconstrainedresource.Onealternative is to spendthetime gatheringmoreinforma-tion aboutotherproducts,i.e., discovering new productsandbuilding modelsofthem.Anotheralternative is to spendthe time discoveringnew informationabouttheexistingproductsin orderto increasetheprecisionof theproductmodels.Bothalternativescan lead to higher quality decisionprocessessinceboth expandtherangeof informationon which thedecisionis based.

BIG supportsboth of thesebehaviors, and a rangeof behaviors in betweenthebinaryextremesof 100%emphasison precisionand100%emphasison coverage.BIG clients specify a precision/coveragepreferencevia a percentagevalue thatdefinestheamountof “unused”(if thereis any) timethatshouldbespentimprovingproductprecision.The remainderis spenttrying to discover and constructnewproducts.For example,if a client specifiesx y , this expressesthe ideathat30%ofany additionaltime shouldbespentimproving precisionand70%shouldbespentdiscoveringnew products.

BIG achievesthis trade-off behavior in two ways:by planningandschedulingfor ita priori , andby respondingopportunisticallyto theproblemsolvingcontext withinthe constraintsof the schedule.Schedulingfor the precision/ coveragetrade-offis accomplishedby relatingthe precisionandcoveragespecificationto quality

§:zfor theDesign-to-Criteriaschedulerandgiving theschedulerasetof options,fromwhich to choosea courseof action. In Figure5, Get-Extra-Informationhastwosubtasks,Get-More-ObjectsandDetail-Product-Informationdenotingthetwo dif-ferent endsof the spectrum.Get-More-Objectsrepresentsthe coverageend andDetail-Product-Informationrepresentsthe precisionend.The sum()quality accu-mulationfunctionundertheparenttask,Get-Extra, modelsthattheschedulermaychoosefrom eithersidedependingonthequality, cost,duration,andcertainty, char-acteristicsof theprimitiveactionsundereach.Clientprecision/coveragepreferenceis relatedto quality for theprimitiveactionsunderthesetasks,e.g.,theactionsper-taining to precisionreceive higherquality whenincreasedweight is given to pre-§:z

The particularvaluesassociatedwith the qualitiesof primitive actionsis not criticalprovided that the relative relationshipsamongqualitiesof differentactionsareconsistentwith thedomain.Thepurposeof thequality attributesandqafsareto give theschedulerasoundbasisfor makingtrade-offs amongquality, costandtime characteristicsof differentschedules.

226

Page 31: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

cision.This approachenablesthe schedulerto reasonabouttheseextra activities,andtheirvalue,andrelatethemto theotherproblemsolvingoptionsfrom aunifiedperspective. Thus,the overall valueof pre-allocating“extra” time to coverageorprecisionis alsoconsideredin light of theothercandidateactivities.

BIG canalsowork opportunisticallyto improve coverageor precision,as is de-scribedin 5.4. A third option, not currently implemented,is for BIG to reviseits problemsolving optionsandrescheduleasnew informationis gainedandthecontext (stateof theblackboard,environment,time remaining,etc.)changes.Thiswould enableBIG to reactopportunisticallybut to do sowholly in thecontext ofreasoningaboutthequality, cost,duration,certaintytrade-offs of its optionsfrom aunifiedperspective.

# DRatio Scheduled Execution T.P. #P. A.C. P.A. D.C.

1 0.1 629 587 33 7 1.86 1.38 0.85

0.5 622 720 14 6 3.83 1.47 0.89

0.9 651 685 8 3 7.0 2.12 0.89

2 0.1 629 656 33 8 1.75 1.32 0.85

0.5 622 686 14 4 3.0 1.5 1

0.9 652 522 7 1 7.0 2.12 1

3 0.1 629 702 29 7 1.71 1.47 0.85

0.5 622 606 15 6 2.33 1.52 1

0.9 651 572 7 2 4.5 1.7 0.99

Key: # is the run number, DRatio = preferencefor precision,Scheduled= total executiontimeaspredictedby modelandanticipatedby scheduler, Execution= actualexecutiontime,T.P. = totalproductobjectsconstructed,#P= totalproductspassedto decisionprocess,A.C.= averagecoverageper object,P.A. = extractionprocessingaccuracy per object,D.C. =overall decisionprocessconfidence.Table2Trading-Off PrecisionandCoverage

Table2 shows BIG’s ability to trade-off precisionandcoverage.In providing thisdata,we arenotattemptingto generalizein this sectionthatany particulartradeoffbetweenthetwo is betterthantheother, only thatsuchatradeoff exists.Wefeelthischaracteristicis interestingbothbecauseof theway BIG implementsandexhibitsthe behavior, andbecauseof the ramificationsit hason how userscancontrol asearchprocess.The tablecontainsdatafor threesetsof runs,for the samequery

227

Page 32: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

andwith the samecriteria settings(only the precisionsettingis varied).In eachrun, threetrials areperformed,eachwith a differentprecisionpreferencesetting,namely10%,50%,and90%respectively. Sincenetwork performancevariesduringexecution,andthereis someelementof stochasticbehavior in BIG’s selectionofequallyranked documents,no two trials areidenticaleven if they have the samepreferencesettings.Notethegeneraltrendsin thedifferentruns.As moreweightisgivento increasingprecision,thenumberof products(T.P.) decreases,asdoesthenumberof productsusedin thedecisionprocess(#P).Thedifferencebetweenthesetwo valuesis thatsomeproductobjectslacksufficient informationto beincludedinthedecisionprocessandsomeof theproductobjectsturn out to relateto productsthat do not meetthe client’s specification(e.g.,wrong hardwareplatform,wrongproductgenre,pricetoohigh,etc.),anextremeexampleof thisis in runnumbertwoin the third trial, whereonly oneproductis produced.As thenumberof productsdecreaseasmoreweightis givento precision,theaverageinformationcoverageperobject (A.C.) increases, asdoesthe informationextraction/ processingaccuracy(P.A.). The decisionconfidencealsogenerallyincreases,particularly in runstwoand three,thoughthis item takes into accountthe total coveragerepresentedbythe productsaswell as the precisionof the productmodelsso its increaseis notproportionalto theotherincreases.

Schedulesfor the 10% and90% precisionruns (respectively) areshown in Fig-ures8 and9. Theschedulesshow thesequenceof primitive actionsandtheir starttimes(asexpectedvaluesratherthandistributions).The schedulesdiverge on oraroundtime36whereschedule8 beginsaseriesof MediumQuality DurationandLow Quality Durationactivitiesthatretrieveandprocessadditionalproductrelateddocuments.Thepostfixedintegerson themethodnames(e.g.method10Medium-Quality Duration6) denotethenumberof documents(e.g.6) thatwill beretrieved

by the method.This seriesof stepsresultsin theproductionof nearlytwenty ad-ditionalproductdescriptionobjects.In contrast,aroundthatsametime,schedule9beginsa seriesof Get More Detail actionsthatseekto find informationaboutex-istingproductobjects.

Fromanenduserperspective, theprecision/coveragespecificationenablesclientsto expresspreferencesfor onesolutionclassover another. For a client who needsaspeedyresult,andhasanaccordinglyshortdeadline,thepreferencespecificationmayresultin a slight differenceat best.However, for a client with moregeneroustime resources,thedifferencecanbepronounced.

5.3 InformationFusion

We usethe term informationfusion to denotethe processof integratinginforma-tion from differentsourcesinto a singleproductobject; the informationmay becomplimentary, but alsocontradictoryor incomplete.Thereareseveralaspectsto

228

Page 33: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

Start Time (expected value) Method Name

0 method6_Send_Query_maczone1 method4_Send_Query_cybout-product3 method2_Send_Query_warehouse4 method0_Send_Query_macmall6 Idle (awaiting results)

31 method7_Get_Back_maczone32 Idle (awaiting results)

33 method5_Get_Back_cybout-product35 method3_Get_Back_warehouse36 method1_Get_Back_macmall37 method10_Medium_Quality_Duration_693 method9_Medium_Quality_Duration_8

165 method21_Medium_Quality_Duration_6221 method20_Medium_Quality_Duration_8296 method18_Low_Quality_Duration_7346 method15_Get_More_Detail_2386 method14_Get_More_Detail_2429 method22_Benchin-Review-Method631 method23_Make-Decision

Fig. 8. Schedulefor 10%/ 90%Precisionto Coverage

Start Time (expected value) Method Name

0 method6_Send_Query_maczone1 method4_Send_Query_cybout-product3 method2_Send_Query_warehouse4 method0_Send_Query_macmall6 Idle (awaiting results)

31 method7_Get_Back_maczone32 Idle (awaiting results)

33 method5_Get_Back_cybout-product35 method3_Get_Back_warehouse36 method1_Get_Back_macmall37 method9_Medium_Quality_Duration_8

111 method21_Medium_Quality_Duration_6165 method15_Get_More_Detail_2206 method14_Get_More_Detail_2247 method13_Get_More_Detail_5347 method11_User-Review-Method508 method22_Benchin-Review-Method646 method23_Make-Decision

Fig. 9. Schedulefor 90%/ 10%Precisionto Coverage

thefusionissue.Themoststraightforwardtypeof fusionis informationaddition–whereadocumentprovidesthevalueto aslot thatis notyetfilled. A moreinterest-ing typeof fusionis dealingwith contradictorysinglevalueinformation,e.g.,twodocumentsreportingdifferentpricesfor a product,or two documentsidentifyinga differentproductioncompany for theproduct.WhenBIG encountersthis fusionissue,theitemwith thehighestassociateddegreeof belief is used.{}| Anotherissueis how to integratedifferentopinionsabouttheproduct.The latter is donein BIGby associatingtwo metricswith eachreview document,onerepresentinginforma-

{}| In thefuture,wehopeto exploretheuseof RESUN’sopportunisticcontrolto handlethissituationby trying to retrieve additionalinformationto resolve theconflict.

229

Page 34: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

tion or sitequality, andonerepresentingthequality of theproductasexpressedinthe review. This dual thenconceptuallyrepresentsa value/ densitypair – the in-formationquality metricdeterminestheweightgivento theproductquality metricwhencomparingdifferentmetricsfor differentreviews.To illustrateBIG’s fusionprocess,considerthefollowing partialtrace.

In this example,BIG is searchingfor word processorproductsfor theMacintosh.In responseto ageneralqueryaboutword processingproducts,theMacMall retailsitereturnsa list of URLs.URL A, from Figure10, is selectedby BIG for retrievalandprocessed.BIG extracts“DramaticaPro2.0” from thedocumentasthetitle ofthesoftwarepackage;it alsoextractsthat“Screenplay”(Inc.) is themakerandthatthepackagesellsfor apriceof $289.99.{:~ Theresultof thisextractionis thepartialproductobjectshown in Figure11(a).URL_A http://www.cc-inc.com/sales/detail.asp?dpno=79857&catalog_id=2URL_B http://www.freedombuilders.com//dramatica.htmURL_C http://st2.yahoo.com/screenplay/dpro30mac.htmlURL_D http://www.heartcorps.com/dramatica/questions_and_answers/dramatica10.htmURL_E http://www.zdnet.com/macuser/mu_0796/reviews/review12.htmlURL_F http://www.macaddict.com/issues/0797/rev.dramaticapro.html

Fig. 10.URLs for DocumentsRetrievedDuring Processing

Thevaluesin theProcessing Accuracy slotsarecertaintyfactorsdenotingthequality andcertaintyof theextractionprocessthatfilled therespectiveslots.Sincethedocumentprovidesvery little additionalinformationaboutDramatica,BIG as-sociatesanuncertain-supportSOUwith theobject.Becausetheproductobjectisa promisingareaof exploration,relative to everythingelseon theblackboard,BIGdecidesto attemptto resolve theSOU.Towardthatend,it queriesInfoseekaboutDramatica,resultingin a long list of URLsthatarecombinedwith theirdescriptivetext to createcandidatedocumentdescriptionobjectswhichareaddedto theblack-board.BIG selectsandretrievesasubsetof these,startingwith URL B, which is adetaileddescriptionof theproduct.Processingthedescriptionresultsin theadditionof platformspecificationsto theproductobject,namelythatit runsonWindows95and Apple Macintoshsystems.The descriptionalso containssufficient verbiagethatit is analyzedusingakeyword-basedreview processingheuristicthatlooksforpositiveandnegativephrasesandratesproductsaccordingly, weighingtheproductfeaturesby theuserpreferencefor suchfeatures.Thoughtheverbiagepraisestheproduct,it is givena ratingof -.57 becausethereview doesnot praisetheproductfor thefeaturesin which theclient is interested.In otherwords,eventhoughthere-view is positive,it doesnotmakespecificreferenceto theproductfeaturesin whichtheclient is interested– suchasaspecificplatformor programcharacteristic– andthusit is givenanegativevalueto denotethattheproductis below averagequality-wise.However, sincethe documentin questionis not widely referencedby otherdocuments,it is givenalow informationquality(sourcequality)ratingandtheneg-

{:~ Dramaticais actuallya productcontainedin our corpusof word processorclassdoc-umentsusedto train the documentclassifier. Thus, the pursuit of Dramaticaas a wordprocessingpackageis valid from BIG’sperspective, thoughtheclassificationof Dramaticaasa wordprocessoris perhapsdebatable.

230

Page 35: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

ative review (productquality) ratingwill thushave little weightwhencomparedtoothersources.Theproductobjectafterthis stepis shown in Figure11(b).

In responseto thecontinuedexistenceof theuncertain-supportSOU,BIG decidesto gathermoreinformation.It selectsandretrievesURL C, URL D, URL E, andURL F, in that sequence.Spaceprecludespresentingan exhaustive sequenceofproductobject transformationsas information is integratedinto the object.Fig-ure11(c) is theresultafterprocessingthereview at URL D. Notetheelevationofthe product’s overall quality rating andthe increasein the variousrating criterialike ease-of-useandstability. For freeformatreviews suchasthis one(in contrastto sitesthatemploy consistentnumericalratingsystems),thesemetricsaredeter-minedby a setof heuristicsthat examinethe text for certainpositive or negativeexpressions.

Theremainingdocumentsareretrieved,processed,andintegratedin asimilar fash-ion. Theproductobjectafterprocessingall of theselecteddocumentsis shown inFigure11(d). For example,the final productobject is subsequentlycomparedtootherproductobjectsduringthedecisionprocess(seeSection5.5).While this ex-ampleresultsin the constructionof a fairly completeproductobject, the objectsusedin the final decisionprocessare not all at the samelevel of completeness.Someobjectsmaycontainlessinformation(but not much)andsomemaycontainmoreproductdetailsor more review summarizationstatistics.The decisionpro-cesstakesinto accountthequantityandqualityof theinformationpertainingto theobjects.

5.4 Opportunism

As discussed,opportunismin theBIG systemcurrentlyoccurswithin thebound-ariesof the initial schedule.Theprimitive actionsseenby theschedulerareoftenabstractionsof setsof operationsthat BIG plansto perform, thus enablingBIGto respondopportunisticallyduring the executionof theseactionsto newly gath-ereddataor changesin theenvironment.To illustrate,considera simpleexamplewhereBIG is gatheringdocumentsto recommenda word processor. A portionofthe schedule(without the numericaldetail),producedto addressthe specifiedre-sourceconstraints,follows:

---------------------------------------------------------------------------------------------------... | Get_Back_macmall | MQMD_method|Get_More_Detail_1 | Benchin-Review-Method | Make-Decision | ...---------------------------------------------------------------------------------------------------

Asaconsequenceof executingtheschedule,documentsareretrievedfromtheMac-Mall site and processedusing mediumquality, mediumdurationtext extractiontechniques(meaninga setof simpleandmoresophisticatedextractors),denotedby theMQMD methodin theschedule.Theproductname,“Nisus Writer 5.1 CDROM with Manual” is extractedfrom oneof thedocumentsandis postedasanob-

231

Page 36: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

�:�������%�����}���%���E�����������%���)�V���}�V���"�������}�"���� )¡¢�£�¤�¥:¦�§�¨�§}¦�¤%©�ª¬«�­�®�©�©�§�¥0¯�¦�¨°:±}²�³�´�µ ¶:·}¸º¹"¸�¸»:¼�½}¾�¿:À�À�ÁkÂ}à Ä�¾�¾�Å:¼�Æ}¾�Ç�È É.Ê�Ë�Ì}Í�Ë�Î�Ï:ÐÑÉ »�Í�Ò�Ó�Ô:Õ�Ö%×)ÓØÏ�Ù"Ú}ÐÛÉ.Õ�Ò�Ü:»�Ä�Ì:Ý�× ÓIÞ�Ù�Ï:ßÑà á�â0ãkä�å@æ�çèæ�ßé ê�ë�ì�í�î:ï�ð�í�ñ�ò�ï@ó}ôÛé ê�ë}ì}ï�ñ%ò�ò�ì�ëIó:ôÑé ë�õ�ö:ë�ñ�÷Øó:ôÑø ù�ú�ûkü�û)ý�þ�ÿ������� ������ ������������ ����������������� � ��!������������"�#�����

(a) Initial ProductObject

$#%�&�'�(#)�*,+�-�.#/1032�4�5�6�5�7�8�9:5�$�4�;�<>=@?BAC=ED�8FG2�H�IJ�K�L�M�N�O�PGO�N�L#Q1RTS�U�V�Q�Q�O�MXW�N�PY#Z�[�\�]_^ `�a�bdceb�bf#g�h�i�j�k�k�h�g>lmXn�o�pXq�r�s:tvu t#o�w�x�y�p#r�z:{�|#z�}#z:p#~�t����>�@�E�Xr�se�#{�x���{�~�s���x�y���r���z@�_����X�����G���������������#��������� ���������C��������������¡ �¢�����£:¤�¥1¦¨§#©CªE«�¬�­�®�¯�°�±²�³�´�µ�¶�·�¸�´#³�³C¹ º»�¼�½�¼�¾�¿�ÀXÁ�¿�Â:¼XÃ�Ä�¿�Á�Á>ÅÇÆ�ÈÉ�Ê�Ë�ÌGÍ�ÎÐÏ�Ë�Ì_Ñ Ò�ÓÔ�Õ�Ö�×�Ø>Ù Ú�ÛÜ:Ý�Þ�ßdà�á�à:Ý�â1ã ä�åæ�çdèé�êGë�ìdí�î�í:ï�ê1ð ñò�ó�ô:õ#ö_÷ øù#ú�û�ü�ý�þ�þ ÿXü�ü � ú���ü���� � ���� ��������� ù� ������������������! ��"�#���%$�ù�ÿ���&��'��()�*���"+ ,�-/.%0�1325462�7

8 9�:�;�<�=�>�?�<�@�A�>3B�C�8 9�:�;�>�@�A�A%;�:�BDC"8 :�E�F�:�@�GHB�C�8 I/J�K'L�K'M�N�O�PRQ�ST U�V%W�X�Y�Z�[�\^]D_"T \�`�a�b�[�c�dH]D_"T Z%eDc�[�W�V�V�d�f�W/V3]�_

g�h�i�j�h�k�l�m�nDo�j�o'p�h�n�q�hsr t�t�t u�v�w�x/yDz%{�|�}D~)�!����������������t��%v�w�x�yDz%{�|H�������

(b) ProductObjectAfter Two Documents

�/�D���������R�/���/��������������������� ���D¡£¢¥¤'¦3§¨¤*©��'ª£��«�¬­�®�¯�°�±�²�³R²/±�¯/´�µ·¶�¸�¹�´D´�²�°)º�±�³»/¼�½D¾�¿�À Á/Â�Ã5Ä%ÃDÃÅ/ÆDÇ�È�É�Ê�Ê�ÇDÆ¥ËÌ�Í�Î�Ï)Ð�Ñ�Ò�ÓÕÔ Ó�Î�ÖD×�ØDÏ�Ñ/Ù�Ú/Û/Ù�Ü)Ù�Ï�Ý�Ó�Þsß�à'á*Þ5Ñ�Ò%Þ)Ú�×�â�Ú�Ý�Ò3ã�×�Ø�ä�Ñ�ã)Ù�Þ¥å�æç�è�é�ê�ë�ìDí�î�è�ë�ì�ï�ì�ðDñ�òôó ïDõ ëDö�ï�÷ø�ù/ú�ûDü�ý�ýHþ�ÿ/üDý�������� ��� ��� ���������������������������� � �!�"�!$#�%'&�(�%�)*!�+�,�%�(�(�-/.0$1$2�354�687�2�3:9 ;<$=�>$?�@:A BC�D$E�FHG�I�G�D�J�K LM�NO*P Q5R�SHT�U�T�V�Q�W XY�Z�[�\�]_^ `a$b�c�d e�f�fhgid�d�j$b�k�d�l_m npo�q r$s�q$tvu$wxn a�s�y�z�{�|�}$~�z�u���$w�n�|�y �$a�g�r$�H~�z�����u$w�np���H��� �v�:���H�

� ������������� ����� ���$�x� ���������$���������$�x�p�����$���� ��$�x�p��¡�¢�£�¢�¤�¥H¦�§©¨�ª« ¬�­�®�¯ °�±�² ³µ´�¶�·$¸x« ³�¹�º »�²�¼�½�´�¶�¾H¸x«p±�¿$¼�²�®H­�­ ½�À ®H­�Á�¸�Â

Ã�Ä ÅHÆ Ä Ç5È�É ÊHË Æ�Ë�Ì�Ä�Ê$Í Ä_ΩÏ�Ï�ÏpÐ�Ñ�Ò�ÓHÔ�Õ Ö ×�Ø�Ù�Ú Û�Ü�Ý�Þ�Ø�ÝHß�Ïáà�Ñ�Ò�ÓHÔ�Õ Ö ×�Ø�ß�ßâ�âpã�ä�å�æHç�è é ê5ë�ìxâáí�ä�å æHç�è�é êvë�ì�ìî�îpï�ð�ñ�òHó�ô õ ö5÷�øùûú�ü�ý�þ ÿ������xî���ð ñ òHó�ô õ�ö�ý������

(c) IntermediateProductObject

���� ����������������������� ���!�"�#��$��&%$')(+*-,.(0/&"21���3�45�6�7�8�9�:�;-:�9�7�<�=?>�@�A�<�<�:�8�B�9�;C�D�E�F�GIH J&K�LNMOL�LP�Q�R�S�T�U�U�R�Q)VW�X�Y�Z�[�\�]_^a` ^�Y�b�c�d�Z&\&e�f&g&e�h�e_Z&i�^�j.k.l+m0jN\�]2j�f�c�n�f&i�]po�c�d�q�\�o�e+j�r�stNu�v�w�x�y�z�{�u�x�y�|�y�}�~���� |���x���|.����&�����&���������������&�I� �)�+����������������&�_ �¡�¢��&���)£ ¤¥�¦�§�¦�¨�©pª�«�©�¬_¦�­�®&©&«�«.¯?°±�²&³�´�µ&¶¸·�³�´�¹ º»�¼�½&¾�¿�À ÁÂ�Ã&Ä�Å�Æ�Ç�Æ�Ã�ÈIÉ ÊË&ÌNÍOÎ�Ï$Ð�Ñ�Ò�Ó�Ò�Ô�ÏIÕ Ö×�Ø�Ù�Ú&ÛIÜ ÝÞ�ß�à�á�â�ã�ãåä�á�á�æ�ß�ç&á�èIé êìë�í�î�ï�í�ð$ñ&òóê Þ�ï&ô�õ�ö&÷�ø�ùOõ�ñ.ú+û&òüêì÷�ô�ý&Þ�ä�î&þ�ù_õ�ÿ�ú+ñ�ò�� ���������� ����

� �������������������� �"!�� ��������"������#�"!�� ��$�%�����&���!'� (�)*+"*�,-�.�/#0�12 3�465�7�8�9�:;=<?>A@�B'2 ;?CD�E:�F�G#<?>�H�B'2 9�I"F�:5�4�4�G�J5�4�K�L�L

M"NO"P�N�QSR�T�U�V�P�V6W"N�U�X�NZY [�[�[ \�]�^_�`"a�b�cSd�egf�h�i�j�k�l"m�[on]�^_�`�a�b�c�l"m�mp�p q�r�st�u"v�w�x�y?zA{�|�}�~�����{��"�'p��r�s�t�u"v�w�x �"������ ��������"�������?�A�������������"�'���������"�����#�������� ��������"�����S �¡'��¢�������"�����S �¡�¡£�£ ¤�¥�¦§�¨"©�ª�«­¬"®°¯²±�³�´�µ�¶�·�¸�¹'£�º¥�¦�§�¨"©�ª�«�´�¹�¹�¹

(d) Final ProductObject

Fig. 11.Evolutionof theDramaticaProductObject

232

Page 37: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

jectontheblackboard.Sincetheproductnameis theonly informationthatcouldbeextractedfrom thedocumentat hand,a no-supportSOUis attachedto theobject,signifying theneedto obtainmoredetailedinformationon theproductin orderforit to beusedin thefinal decisionprocess.

As BIG actively pursuesandplansto resolveSOUs,methodGet More Detail 1 isselectedfor executionto resolvetheSOU.Themethodlooksfor objectswhichcon-tain theno-supportSOUandtriesto find moreinformationon relatedproductsbyretrieving andextractingdocuments.In thisparticularexample,Get More Detail 1queriesInfoSeekwith the keywords “Nisus Writer,” resultingin the productionof a setof candidateURLs andpartial documentdescriptions.BIG decidesto re-trieve and processthe review locatedat URL [65]. Text processingof this doc-ument leadsto the discovery of two new potentialcompetingproducts,namely“Mac Publishing”and“WordPerfect,” thustwomoreobjectswith theproductnameslotsfilled arepostedto the blackboardaccompaniedby no-supportSOUsastheproductobjectsareessentiallyemptyat this time.

BIG now hasthe following options:1) It cancontinuewith its original schedule,whichentailsexecutingtheBenchin Review Methodto gatherreviewsfor theNisusWriter product,or, 2) it canmakeanopportunisticchangein its plansandfind moreinformationon objectswhich containunresolved no-supportSOUs.In this case,this would meanexecutingtheGet More Detail 1 methodfor theMac PublishingandWordPerfectobjects.Thechoiceis determinedby theprecisionversuscoveragespecification(Section5.2) aswell ashow muchtime is availableto performthisextra processingbeforethedeadline.

In this particularscenario,the latterchoiceis madeandBIG decidesto find moreinformationon thenew productsratherthanfollow theoriginal schedule.MethodGet More Detail 1 is executedandthe CyberianOutpostretail site is queriedforinformationon thetwo products.Thequeryfor “Mac Publishing”returnsno sup-porting informationandthe certaintythat it is a valid word processingproductisdecreased.Thequeryfor “WordPerfect,” ontheotherhandis supportedby thedoc-umenthttp://srch.outpost.com/search/ proddesc.cfm?item=30271

and thus the belief that the product is a word processingproductis unchanged.Processingof thedocumentproducesnew informationabouttheproduct,shown inFigure12.

PRODUCTID CorelWordPerfect3.5- ACADEMIC

PRICE 29.95

MISCREQ UNINITIALIZED

SUPPORT 1

SOURCE http://srch.outpost.com/search/proddesc.cfm?item=30271

Fig. 12. InformationProduced

233

Page 38: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

Theinformationis incorporatedinto theproductobjectandBIG continuesprocess-ing its initially scheduledactivities. However, BIG may laterelectto work on theWordPerfectproductobjectagainasit is now avalid candidateproduct.

5.5 BIG’s DecisionProcess

The decisionmaker knowledgesourcedecideswhich productshouldbe recom-mendedto the userafter the searchprocessis completed.Generally, the decisionmaker looks at eachproductandcalculatesa total scorerepresentingthe overalllevel of consistency with the client’s query. As thereareseveral featuresfor oneproduct,suchasprice,quality, andhardware(platform),thescorerepresentseachfeaturebasedonhow importantit is to theclient(seeFigure1).Theratingfor afea-ture is calculatedfrom review setsin oneof two ways:1) for reviews of a certainclass,in which reviewersgive starsor otherwisenumerically(or ordinally) rankproductsaccordingto certainclasses,theratingsserveasasetof utility weightsforthesedatapoints;2) for reviewsthatdonot includenumericalor ordinalvalues,thedocumentsareprocessedwith aheuristicthatattemptsto assignsuchratingsbasedonkeywordsor anti-keywordsthatappearin thetext. For instance,if “f astlearningcurve” is usedto describetheproduct,it is a positive indicatorfor theeaseof usefeatureof theproduct,while “buggyproduct”wouldbeanegativeindicatorfor thestability feature.Theformulausedto calculatetheoverall scoreof a productis asfollows:

overall score= price score* price weight+ quality score* quality weight+hardwarescore* hardware weight

Sincetheinformationcomesfrom differentsources,theremaybeinconsistencies,anddifferentsourcesmay have differentrelative quality or confidencemeasures.The valueof informationin our systemis determinedby the valueof the source;information from a high quality sourceis consideredto be closer to the truth.To combineinconsistentinformationfrom differentsites,we classifyinformationsourcesasoneof threecategories:high,mediumor low quality. Theclassificationof aknown informationsourceis basedonhumanknowledgeandprior experienceaboutthissource.Ourratingsystemfor unknownsourcescurrentlyemploysaURLreferencesearchto ranksites,similarto theUsenetbasedapproachoutlinedin [58].Severalwebsearchenginesoffer aservicewhichallows theuserto searchfor websiteswhich link to acertainpage.Thisessentiallyallowsusto quantifyhow oftenawebsiteis referencedby others.Our heuristicrankssitesbasedon theassumptionthatsiteswhicharemoreusefulandcrediblewill bereferencedmoreoftenthanlesscredibleones.This trait offers two importantqualities:it is independent,sincetheratingis notdictatedby any onepersonor company, andit is alsoquitegeneric.Al-thoughwedonotdoso,it wouldalsobepossibleto augmentthis ratingwith eitheruserfeedback(e.g.“I typically don’t agreewith this reviewer’s point of view”) or

234

Page 39: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

datafrom acentralizedwebratingservice.This ratingis alsousedin theinitial se-lectionof whichdocumentsto process.For eachkind of informationsource,thereis a quality measuredistribution tablethat describesthe relationshipbetweentheinformationfrom this sourceandthepossibletruth values.Thesequality measuredistribution tableswereconstructedin anadhocfashion,basedon handreviewingof documentsfrom differentcategoriesandexperiencerunningthesystem.Thesetableshelp provide moreaccurateinterpretationsof datafrom thoseinformationsources,by takingtheobservedratingandtransposingit into adistribution of pos-sibleratings,weightedby thesource’s quality. Theneteffect of this mappingis toaddsomemeasureof skepticismto thesystem,basedon thequality of thesite.If,for instance,a highly ratedsitegivesa certainproducta ratingof 5, BIG is moreapt to believe that the review is accurate,ascomparedto a lower ratedsitewhichoffersthesameproductrating.

For example,the product“Corel WordPerfect,” basedon a review from site A, ishighly rated(it is givena productquality of 4). The review from site B givesit aslightly lower ratingof 3. SiteA itself is known to bea mediumquality sitewith asourcequality of 2, while B hasa higherquality ratingof 3. Thequality measuretablesfor sitesA andB areshown in Figure13,and14, respectively.

Observed InterpretedQualityDistribution

Review Quality 5 4 3 2 1

5 0.1 0.2 0.5 0.1 0.1

4 0.0 0.3 0.2 0.3 0.2

3 0.0 0.1 0.3 0.3 0.3

2 0.0 0.0 0.5 0.5 0.3

1 0.0 0.0 0.3 0.6 0.6

Fig. 13.Review Quality InterpretationTablefor SiteA (SourceQuality2)

Observed InterpretedQualityDistribution

Review Quality 5 4 3 2 1

5 0.7 0.2 0.1 0.0 0.0

4 0.2 0.6 0.2 0.0 0.0

3 0.0 0.2 0.7 0.1 0.0

2 0.0 0.0 0.2 0.7 0.1

1 0.0 0.0 0.0 0.3 0.7

Fig. 14.Review Quality InterpretationTablefor SiteB (SourceQuality 3)

Basedon thereview quality from siteA andsiteB andtheir quality measures,thedecisionmaker getsquality score distribution as: [(4, 0.25) (3, 0.45) (2, 0.2) (1,

235

Page 40: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

0.1)]. This meansthereis 25%probability thatquality of “Corel WordPerfect”is4, 35%probability it is 3, 20%probability it is 2, and10%probability it is 1. Theexpectedquality score of “Corel WordPerfect”is therefore2.85. Thus, for eachproduct,thedecisionmakerhasaquality scoredistributionandanexpectedscore.Theproductwith highestexpectedscoreis recommendedto theclientandthescoredistributionsareusedto calculatetheconfidenceof this decision.

In additionto thedecisionandproductinformation,theagentalsogivestheevalu-ationof this decisionto theclient.Sincetherearemany factorscontributing to theevaluationof thedecision,it is difficult to representthedecisionevaluationasasin-glenumber. Wechoosedecisioncoverageandprecisionastwo maincharacteristicsof thedecision.

DecisionCoverageis a3-dimensionalvector:

(1) Total Product Number Indicateshow many productsthe agenthasfound;themoreproductstheagentfinds,thehigherthequality of thedecision.

(2) Candidate Product Number Describesthe numberof competingproductsusedascandidatesin final decision;themoreproductsthatareconsideredforthedecision,thehigherthequality of thedecision.

(3) Inf ormation CoverageReflective of thenumberof documentstheagenthasprocessed.

DecisionPrecisionis a4-dimensionalvector:

(1) Average Coverage Indicatesthe averagenumberof documentssupportingeachcandidateproduct.

(2) Inf ormation Quality Describesthedistributionof high-qualitysources,medium-quality sources,andlow-qualitysourcesrespectively.

(3) ProcessAccuracy Measureshow accuratelytheagentprocessesdocuments.Sincetheinformationextractionprocessis not perfectfor any document,theextractiontool providesthedegreeof belief for every item it returns.For ex-ample,textext-ks may find the operatingsystemfor “Corel WordPerfect”is“mac,” with a degreeof belief of 0.8.Theprocessaccuracy is theaverageofthedegreeof belief of all items.

(4) DecisionConfidenceMeasureshow confidentthe agentfeelsthat the prod-uct it recommendedto the client is the bestproductit found. This is com-putedfrom thequality distributionsof thediscoveredproducts.For example,if productA hasadistributionof [(5, 0.3)(4,0.6)(3,0.1)],productB hasscoredistribution [(5, 0.1) (4, 0.3) (3, 0.3) (2, 0.2)], productA is recommendedbe-causeit hasa higherexpectedscore.ThepossibilityB is betterthanA is: 0.1* (0.6+ 0.1)+ 0.3* (0.1)= 0.1,sotheconfidenceof this decisionis 1 - 0.1=0.9;

Using this decisionevaluationdataallows the client to analyzethe final decisionwith amorecritical eye.An additionaltool thatwehavenotyet implementedis an

236

Page 41: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

appropriateinterfacefor aclient to accesstheraw datathatwasusedby thesystemin makingits decision.

5.6 PerformanceUnderVaryingTimeConstraints

Table3 illustrateshow thesystemoperationsunderdifferenttime constraints.Theexperimentscover searcheslooking for word processingproducts.Thesearchandproductcriteriais thesamefor all runs,only thetime allotedfor thesearchvaries.Theintentof thissectionis to show thatthesystemcaneffectively exploit thetimeallocatedto it by theuserto completeits search,andthatin mostcasesits intendedschedulecloselyapproximatestheactualexecutiontime.

The first four columnsof data provide information about the duration of eachsearch.User Time denotesthe userstarget searchtime; the value in parenthesisrepresentsthe upperboundon how far over the target searchtime the schedulerwaspermittedto go in orderto achieveagoodquality/cost/durationtradeoff. (Util-ity in thesecasesis linearly decreasingbetweenthe expresseddeadlineand10%above theexpresseddeadline.

»"¼) Scheduledenotesthe expectedtotal durationof

thescheduleproducedby theDesign-to-CriteriaschedulerandExecutiondenotesthe actualdurationof thediscovery anddecisionprocess.The differencein thesevaluesstemsfrom thehighvarianceof web-relatedactivitiesandreflectsissueslikechangesin network bandwidthduring thesearch,slow downsat remotesites,andsoforth.Thestatisticalcharacterizationsof theseactivitiesarealsooftenimperfect,thoughthey areimprovedover time.Giventhevariancesinvolved,wearesatisfiedwith therelationshipbetweenexpectationsandreality.

The next four columnsdenotenumberof consideredproducts(#p), total numberof productsfound(T.P.), aggregateinformationcoverage(I.C.), andaverageinfor-mationcoverageper productobject (A.C.). Thesevaluesreflect the numberandqualitiesof theinformationsourcesusedto generatethefinal decision.Givenaddi-tional time,BIG will adjustits searchingbehavior in anattemptto find bothmoresourcesof information,andmoresupportinginformationfor previouslydiscoveredproducts.Theresultsof thisbehavior canbeseenin thecorrelationbetweenlongerrunningtimeandlargerinformationcoveragevalues;thesevaluesrepresenttheto-tal numberof documentsfoundandtheaveragenumberof supportingdocumentsaproducthas,respectively. As onewouldexpect,thelargernumberof informationsourcesalsoservesto increaseboththenumberof known productsandthesizeofthesubsetselectedfor consideration,which in turn affectstheconfidenceBIG hasin its final decision.»"¼

This approachto deadlineswas taken to addressclient preferences.Despiterequeststo usea harddeadlinemodel,clientswereoften dissatisfiedif muchbetterresultswerepossiblefor slightly moretime,andtheschedulerselectedanoptionthatstayedwithin theexpresseddeadline.

237

Page 42: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

UserTime # Scheduled Execution #C.P. #T.P. I.C. A.C. P.A. D.C.

300(330) 1 311 367 4 10 12 1.5 1.6 1

2 308 359 3 10 16 1.3 1.4 1

3 305 279 3 10 11 1.3 1.5 1

4 311 275 3 11 13 1.67 1.5 1

5 321 286 4 10 12 1.5 1.6 1

6 321 272 3 10 12 1.3 1.6 0.84

7 262 327 3 11 12 1.67 1.5 1

8 262 337 3 10 11 1.3 1.5 1

9 262 301 2 11 10 1.0 1.4 1

10 259 292 2 11 11 1.5 1.5 1

average 302 310 3 10.4 12 1.4 1.5 0.98

s.d. 33 35 0.67 0.5 1.6 0.2 0.07 0.05

600(660) 1 658 760 6 17 45 4.0 1.7 0.99

2 658 608 4 17 44 6.75 1.8 1.0

3 645 732 5 20 46 5.4 2 1.0

4 649 809 10 28 49 3.1 1.8 0.96

5 649 730 7 17 42 4.3 1.8 0.84

6 653 774 4 23 55 6.5 2.3 0.99

7 653 671 4 18 35 5.3 2.1 0.99

8 653 759 6 18 41 4.8 2.2 0.84

9 653 760 5 28 50 5.4 2.2 0.94

10 653 852 5 18 42 4.6 2.0 0.85

average 652 746 5.6 20 45 5.0 2.0 0.95

s.d. 4 68 1.8 4.4 5.6 1.1 0.2 0.06

900(990) 1 951 975 5 37 61 5.8 2.2 0.99

2 968 956 8 30 55 4.1 2.1 1

3 914 919 8 23 64 4.0 1.9 1

4 960 796 6 34 64 5.3 1.9 0.96

5 960 1026 9 24 32 4.1 1.9 0.99

6 987 968 8 27 60 4.4 2.1 0.94

7 987 1102 8 27 63 5.5 1.7 0.94

8 987 896 5 32 69 5.4 2.1 0.84

9 987 918 7 32 66 5.1 2.0 0.84

10 978 1289 14 39 79 3.9 2.0 1

average 968 985 7.8 31 61 4.8 2.0 0.95

s.d. 23 134 2.6 5.3 12 0.7 0.14 0.06

Key: UserTime = userspreferredsearchtime (linearly decreasingutility post-deadlineinthiscase),Scheduled= totalexecutiontimeaspredictedby modelandanticipatedby sched-uler, Execution= actualexecutiontime, I.C. = informationcoverage,T.P. = total productobjectsconstructed,#C.P. = total productspassedto decisionprocess,A.C. = averagecov-erageperobject,P.A. = extractionprocessingaccuracy perobject,D.C. = overall decisionprocessconfidence,s.d.= standarddeviation.Table3DifferentTimeAllotmentsProduceDifferentResults

The last two columnsdescribehow confidentthesystemis in the informationex-tractionanddecisionmakingprocesses.Processaccuracy (P.A.), suppliedin partby theinformationprocessingtools,is thedegreeof belief thattheactualextractedinformationis correctlycategorizedandplacedin theinformationobjects.Decisionconfidence,generatedby thedecisionmaker, reflectsthelikelihoodthattheselectedproductis the optimal choicegiven the setof productsconsidered.This valueisbasedon the quality distributionsof eachproduct,andrepresentsthe chancethat

238

Page 43: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

theexpectedquality is correct.It shouldbenotedthatdecisionconfidencethereforeis not dependenton executiontimeor processeseffort.

Ourqueryfor thetestrunsis thatof aclient looking for awordprocessingpackagefor theMacintoshcostingnomorethan$200,andwould like thesearchprocesstotake300/600/900secondsandthesearchcostto belessthanfivedollars.Theclientspecifiesthe relative importanceof price to quality to be 60/40 and the relativeimportanceof coverageto confidenceto be50/50.

Looking at the results,onecanseethat theprocessaccuraciesfor the300secondrun areconsistentlylower thanthosefor the 600and900secondruns,which areroughlythesame.Processaccuracy is affectedby theamountof availableevidence,in thatmatchinginformationfrom differentsourcesincreasestheperceivedaccu-racy of the data.Sincethe latter two runshave similar averagecoveragevalues,onewouldexpectsimilar levelsof informationmatching,andthussimilar levelsofprocessaccuracy. Usingthesamelogic, onecanseewhy theprocessaccuracy forthe300secondrunswouldbeconsistentlylower, resultingfrom its lower levelsofaveragecoverage.

Thedecisionconfidencevalueis affectedby both thenumberof productsconsid-eredandtheir respectiveattributesandqualities.BIG first selectsa product,basedonits attributesandtheuser’spreferences.It thencalculatesthedecisionconfidenceby determiningtheprobabilitythattheselectedproductis theoptimalchoice,giventheavailablesubsetof products.In the300secondruns,the total numberof con-sideredproductsis fairly low, which increasesthechancethatthepoolof productsis heterogeneous.In suchapopulation,it is morelikely thatasinglecandidatewillstandout from the others,which goesto explain the large percentageof perfectscoresin theshortestrun.WhenBIG is givenmoretime to find moreproducts,thechancethat individual candidateswill sharplycontrastis reduced.Greateraveragecoverageaffectsthis contractby increasingthe likelihoodthatproductcandidateswill be fully specified.This will typically make the candidateset have a higherquality rating which makesthe populationmorehomogeneous.It is this blurringacrossattributedimensionswhich reducesBIG’s confidencein thefinal decision.

Two interestingcasesin this last columnareworth explaining in moredetail. Inthesixth 300secondrun,onecanseethatthedecisionqualitywascalculatedto be0.84,muchlower thanotherrunsin thesameset.This wasdueto thefactthattwoof the threeproductsconsideredwereactually the sameproduct,but onewasanacademicversion.Thesetwo productshadrelatively similar quality ratings,whichweresignificantlyhigherthantheremainingproduct,which causedBIG to have alowerconfidencein its decision.Thesecondanomalyoccursin thetenthrun in the900secondscenario.In thiscase,14productswereconsideredfor selection.Of thegroup,11 hada price higherthan$400,two wereabove $200andthe remainingproductwasroughly$70with goodcoverageof theuser’s desiredcharacteristics.This largepricediscrepancy led theselectedproductto haveamuchhigherquality

239

Page 44: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

ratingthanthecompetition,which led to thehighdecisionconfidence.

5.7 Experimentsin DifferentDomains

Besidesthewordprocessingdomain,wehavealsoexperimentedwith BIG in threeotherdomains:imageeditors,html editors,anddatabasesystems.For eachdomain,we spentapproximatelyonehour collectingandclassifyingdocumentsto form acorpusfor theBayesclassifier. As wediscussedin Section5.1,this processcanbeautomatedby equippingBIG with theability to learnfrom userfeedback.

Figure15 shows the resultsof experimentsin eachof thesedomains.The querycriteriafor eachdomainspecifiesthatarelevantsoftwarepackagefor theWindowsplatformis needed,with a searchdeadlineof 10 minutes(600seconds);thequerywasrepeated10timesfor eachdomain.All itemsin Figure15alsoappearin Table3andhave beenexplainedin Section5.6. The DTC schedulergeneratesthe sameschedulefor all queriesbecausethe searchandproductcriteria arethe sameandtheobjectdatabaseis clearedaftereachrun (BIG doesnot have knowledgeabouttheseproductsa priori andknowledgethat is learnedduring onetrial is removedbeforethenext trial).

The searchfor html editor productstakes longer to executethan the other twoqueriesbecausetherearemoreavailableproducts(#C.P. and#T.P.) in this domain.Thussomemethodstake longertime thanexpected,becausetheir executiontimesarerelatedto thenumberof productdescriptionsthatareeitherretrievedor activelybeingconsideredascandidates(suchasthe“SearchFor Reviews” method).

domain Scheduled Execution #C.P. #T.P. I.C. A.C. P.A. D.C. final decision

image editor average 541 520 4 7 28.2 5.3 0.8 1 product name: Adobe Image Readys.d. 0 8 0 0 1.9 0.5 0 0 platform: windows_nt, mac, win95

price: 188.98 occurrence: 10/10

html editor average 541 595 8 12 49.1 5.1 0.9 0.84 product name: Adobe Pagemill 3.0s.d 0 59 0.6 0 3.8 0.7 0.05 0.04 platform: windows_nt, mac, win95

price: 79.98 occurrence: 8/10

database average 541 548 4.6 10.2 43.6 8 1 0.97 product name: FileMaker Pro 4.0s.d 0 19 0.8 1.17 5.2 1.8 0.06 0.07 platform: win95, mac, windows_3.1

price: 159.98 occurrence: 7/10

Fig. 15.Experimentsin ThreeDifferentDomains

Key: Scheduled= total executiontime aspredictedby the modelandanticipatedby thescheduler, Execution= actualexecutiontime,#C.P. = totalcandidateproductspassedto thedecisionprocess,# T.P. = total productobjectsconstructed,I.C. = informationcoverage,A.C. = averagecoverageperobject,P.A. = extractionprocessingaccuracy perobject,D.C.= overall decisionprocessconfidence;final decision= theproductmostfrequentlyrecom-mendedby thesystemin 10runs;occurrence= thefrequency theproductrecommendedbythesystem;s.d.= standarddeviation.

Thefinal decisionpresentstheproductthatis mostfrequentlyrecommendedby the

240

Page 45: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

systemin the10runs.Theoccurrenceindicateshow many timesit is recommendedin those10runs.Dif ferentproductsarerecommendedin differentrunsbecausethesystemdoesnot have sufficient time to exhaustively processall available docu-ments,soit randomlyselectsequallyrateddocumentsfor processing.Theselecteddocumentset thereforevariesfrom onerun to the next, so the systemmay havedifferentinformation,which in turnmaysupportanalternatefinal decision.

For the imageeditordomain,theAdobeproductImageReadywasrecommendedeachtime becausetherearefewer candidateproductsin this domainandbecauseImageReadyis thebestamongthecandidateproducts;otherproducts’pricesareabout$300,which is muchhigherthanthepriceof ImageReady. For thehtml ed-itor domain,out of 10 runs,the systemrecommendedAdobePagemill 3.0 eighttimes,andin theothertwo runsit recommendedSymantec’s VisualPage2.0 andWebSpicerespectively. All threeproductshave similar functionalityandprice. Inthedatabasedomain,thesystemrecommendedFileMakerProseventimes,MS Ac-cessonceandMSFTAccessfor Win 95Stepby Steptwice.Thefirst two productsarereasonableselections,but thelastoneis actuallya book.Our classifierfails torejectthis alternative sincethereweremany database-relatedkeywordsin thede-scriptionandit hadvery low pricecomparedwith otherrealsoftwareproducts.

»�½

Thetechniquesandtechnologiesemployedby BIG arealsoapplicableto decision-making tasksin otherdomainsaswell, suchasthe car purchasingdomainmen-tionedearlier. Thecorecomponentsof BIG, includingtheRESUNplanner, design-to-criteriascheduler, taskassessor, anddatabasesareentirelydomainindependent,andmay be usedwithout modificationto addressdifferentquestions.The black-boardarchitectureis alsodomain-independent,but a designerwould needto enu-meratedifferent characteristicsfor objectsplacedon the blackboard.In the cardomain,for instance,the “hard drive” and“platform” traitsarenot pertinent;ele-mentssuchas“engine”or “model” wouldbemoreappropriate.Themostdemand-ing aspectsrequiringchangesin a new domainrelateto text processing.Both theNLP-style text extraction (textext-ks) and the documentclassifierrely on a one-time trainingsessionon a domaincorpusto achieve their results.Thewrappertextextractionutility (quickext-ks) is alsodomaindependent,andwould requirenewrulesto correctlyprocessthedifferentwebsitesusedin thedomain.Theothertextextractors(grep-ks,cgrep-ks,tablext-ks)aredomain-independent.

»�½Thereis an interestingquestionwhethermoreextensive trainingof theclassifierwould

have solved this problem.We could have alsoaddedspecialheuristicsto reflect the factthat productspricedso low for the genrewere likely either to not be completesoftwarepackagesor not software.Shareware,freewareandotherlow-cost,but potentiallyviable,solutionsfurthercomplicateissues.

241

Page 46: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

6 Strengths,Limitations, and Future Dir ections

Thecombinationof thedifferentAI componentsin BIG andtheview of informa-tion gatheringasan interpretationtaskhasgivenBIG somevery strongabilities.In contrastto mostotherwork donein this area,BIG performsinformationfusionnot just documentretrieval. That is, BIG retrievesdocuments,extractsattributesfrom thedocuments,convertingunstructuredtext to structureddata,andintegratestheextractedinformationfrom differentsourcesto build amorecompletemodeloftheproductin question.Theuseof theRESUNinterpretation-styleplannerenablesBIG to reasonaboutthe sources-of-uncertaintyassociatedwith particularaspectsof productobjectsandto plan to resolve theseuncertaintiesby gatheringandex-tractingmoreinformationthatservesaseithercorroboratingor negatingevidence.Thoughthis featurewasnotbroughtout in oursimpletrace,it is adefinitestrengthwhenoperatingin a realmof uncertaininformationcombinedwith uncertainex-tractiontechniques.

BIG is alsoquitescalable,becauseof theway it canfilter andfocusa largeamountof informationinto aconcise,usefulrepresentation.BIG canobtainandprocessin-formationfrom alargevarietyof sourcesontheweb,andgivensufficient time,canprocessasmuchinformationasit is ableto find duringits search.Thehierarchicalstructureof theblackboardallowsBIG to effectively usethedatabecauseit is con-densedandabstractedasit risesthroughthelevels.Thekey phrasehere,however,is givensufficienttime. BIG is still asingleentitysystem,sowhile thehigherlevelsof theblackboardaremanageable,thelower levelscanaccumulatea largenumberof objectsonthem,whichcanbeslow to process.A proposedsolutionto thisprob-lem is to make BIG a multi-agentsystem.In sucha system,multiple BIG agentscouldindependentlysearchfor information,conceptuallybranchingthelower lev-elsof theblackboard.Not only will this solutionincreaseinformationthroughputthroughparallelism,but communicationbetweentheagentscanalsohelpfocusthesystemasawhole.

Anotherfeatureof BIG not fully detailedin this paperis theuseof theDesign-to-Criteriaschedulerto reasonaboutthequality, cost,time,andcertaintytrade-offs ofdifferentcandidateactions.Theuseof theschedulerenablesBIG to addressdead-linesandsearchresourceconstraints,a featurethat is particularlyimportantgiventhescopeof thesearchspace,theuncertaintyinvolved,andthevery real require-mentfor informationsystemsto addresstime andresourceconstraints.Relatedly,while the issueof planningfor informationcostconstraintsis not stressedin thispaper, wefeel thatin thefuturethecostof accessingparticularinformationsourceswill needto be taken into accountby informationgatheringagents.Examplesoftheuseof costin BIG’s IG processarepresentedin [44].

Also not a focusof this paperis BIG’s ability to learn from prior problemsolv-ing instances[42]. Information objects(along with their associatedsources-of-

242

Page 47: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

uncertainty)canbestoredandusedto supplementsubsequentsearchactivities. Inthisfashion,BIG gainsfrom prior problemsolvinginstances,but, it alsorefinesandmodifiesthe productmodelsover time by resolvingpreviously unresolved SOUsandgatheringnew informationabouttheproducts.

In termsof limitationsandextensibility, many of thecomponentsusedin thesys-tem,suchasthewebretrieval interfaceandsomeof theinformationextractorslikegrep-ksandtablext-ks, aregenericanddomainindependent.However, certainas-pectsof thesystemrequiredomainspecificknowledge;adaptingBIG to operateinanotherdomain,perhapstheauto-purchasedomain,would requiretheadditionofspecificknowledgeabouttheparticulardomain.For example,asdiscussedin Sec-tion 5.7,informationextractorsliketheinformationextractionsystem,cgrepext-ks,andquickext-ks, requiresupervisedtrainingto learnextractionrulesandmakeuseof semanticdictionariesto guaranteea certainlevel of performance.Additionally,both the server andobjectdatabases,beingpersistentstoresof the system’s pastexperiences,are inherentlydomaindependent,renderingmostof this knowledgeuselessandpossiblydistractivewhenusedin otherscenarios.

Anotherpossiblelimitation with the currentincarnationof BIG is the useof textextractiontechnologyto convert unstructuredtext to structureddata.The text ex-tractiontechniquesaresometimesfragile, particularlywhenasked to extract datafrom a documentnot belongingto theclassof documenton which thesystemwastrained.Theuseof a documentclassifiergreatlyimprovesthesituation,but, infor-mationextractionremainsanon-trivial issue.Theuseof XML andotherdatastruc-turing mechanismson thewebwill helpalleviatethis issue.Interestingly, becauseRESUNrepresentsandworksto resolvesources-of-uncertainty, thelimitationsandsometimeserroneousoutputof thetext extractiontoolsis notnearlyasproblematicasit might seemat first glance.Givensufficient time for search,the plannerwillusuallyrecover from misdirectionsstemmingfrom poorinformationextraction.

Our future interestslie in improving the integrationof the top-down view of theDesign-to-CriteriaSchedulerandtheopportunisticbottom-upview of theRESUNplanner. Currently, thescheduler’sprimaryrole in thesystemis to producetheini-tial schedule.However, asthecontrolstructureevolves,weforeseeafeedbackloopin which theRESUNplannerandthe taskassessorposewhat-if typequestionstotheschedulerto supporthigh-level decisionsaboutwhich actionsto performnext.A strongertwo-wayinterfacewill alsosupportmoreopportunisticproblemsolvingstrategiesby enablingthe problemsolver to respondto changesandevaluatethevalueof changingits plannedcourseof action.We seethis asparticularlyvaluablein light of theuncertaintyin the informationgatheringdomainandthehigh-ordercombinatoricsof the trade-off decisionprocess.In this secondaryrole, thesched-uler becomesthetrade-off expertemployedby thetaskassessor/problemsolver toguideagentactivities during execution.Another importantdirection is to exploituserfeedbackabout the appropriatenessof the system’s decisionand the docu-ments/sitesthatsupportthatdecision.We feel this will allow usto bothstrengthen

243

Page 48: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

the processingof existing softwaregenresandalsoallow us to moreseamlesslyintegrateothersoftwaregenresthathavenotbeentrainedfor. Finally, wecouldliketo moveBIG into amulti-agentsysteminvolving mobileagents.Ourgroup[34] hasa long historyof developingdistributedproblemsolvingandmulti-agentsystems[19,20,38,55,9,46]andwe areinterestedin exploring multi-agentcoordinationviaagroupof agentsdescendedfrom thecurrentBIG agent.

7 Acknowledgments

We would like to thank the reviewers for their thoughtfulcommentsandeditingsuggestionsthatservedto improvetheoverallquality of this paper.

Becauseof the large numberof peopleinvolvedwith this effort we would like toacknowledgeand identify the contributionsof the authorsandother individuals.Authorsarelistedalphabetically, exceptfor thePI.

Victor Lesserhadoverall responsibilityfor thearchitecturedefinitionandprovidedprojectmanagementthroughouttheeffort. BryanHorling developedthewebinter-facefor simpledocumentretrievalandsearchengineusage,implementedtheserverinformationandobjectdatabases,implementedquickext-ks (thewrapper-style in-formationextractionutility), createdthewebspiderthatsupplementsBIG’s otherknowledgesources,andconstructedtheGUI for BIG. FrankKlassnercontributedby facilitatingtheintegrationof theRESUNplanner, implementingtheblackboarddatastructures,andby providing projectguidancein theearlystagesof thework.Anita Rajafocusedprimarily ontheextractionissuesandimplementedthegrep-ks,cgrep-ks,andtable-ksknowledgesources.RajaalsomodifiedtheBADGERextrac-tion systemto becompatiblewith BIG’s domain,enhancedBADGER’s back-endprocessing,and integratedthe documentclassifierinto the system.Shelley XQ.Zhangdesignedandimplementedthe taskassessorandthe decisionmaker, con-structedRESUN’scontrolplans,constructedtheinterfacebetweenRESUNandtheDesign-to-CriteriaScheduler, andcontributedto therestof thesystemintegration.Zhangalsohandledall of theexperimentalwork andwasprimarily responsiblefortheprojectandmodificationsduringits laterstages.TomWagnercontributedto theprojectmanagement,to theoveralldevelopmentof thearchitecture,to theselectionandpreliminarytestingof theindividualcomponentsof BIG, andprovidedTÆMSandschedulingsupport.

Wewouldalsoliketo thankProfessorNormanCarver for hisoriginaldevelopmentof theRESUNplannerusedin BIG andhis contributionsto theinterfacebetweenthe plannerand the scheduler. The ideasbehindthe task assessorare partly at-tributableto Carver aswell.

JonathanAseltine, formerly of the UMASS NLP group,alsodeservescredit for

244

Page 49: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

his assistancein themodificationof theBADGER informationextractionsystem.Thanksalso to David Fisher, alsoof the NLP group, for his time in making theBADGERsystemavailableto us.

In termsof the intellectualunderpinningsof this project; thanksare also in or-der to Mike Chia for his thoughtson the applicationandhandgeneratedproof-of-concepts,aswell asto Tim OatesandM. V. NagendraPrasadwho, alongwithVictor Lesser, developedthe foundationalmulti-agentscenario[50] of this appli-cationin 1994.Appreciationalsogoesto Keith Decker andTom Wagnerfor theirefforts on thepre-BIG prototypesandhand-runscenariosandto Mia Stern,who,along with Tom Wagner, studiedthe stateof the Internetand agentinformationgatheringandprovidedintellectualsupportfor theBIG objectivesin 1994.

Thanksalsoto PeterAmstutzandKyle Rawlins,undergraduates,for theirwork thatwasnot usedin this articleandto Manik Ahuja,alsoanundergraduate,who did aportionof thepreliminarytext mark-upwork.

References

[1] Ansi/nisoz39.50-1992,americannationalstandardinformationretrieval applicationservicedefinitionandprotocolspecificationfor opensystemsinterconnection,1992.

[2] Autonomy agentware technology white paper.http://www.agentware.com/main/tech/whitepaper.htm.

[3] S. Adali, K.S. Candan,Y. Papakonstantinou,and V.S. Subrahmanian. Queryprocessingin distributedmediatedsystems.Proceedingsof the1996ACM SIGMODConferenceonManagementof Data, June1996.

[4] Y. Arens,C. Knoblock,andW. Shen. Queryreformulationfor dynamicinformationintegration. Journal of Intelligent InformationSystems,SpecialIssueon IntelligentInformationIntegration, 6(2& 3):99–130,1996.

[5] C. Mic Bowman,PeterB. Danzig,Udi Manber, andMichael F. Schwartz. ScalableInternetResourceDiscovery: ResearchProblemsandApproaches.Communicationsof theACM, 37(8):98–114,1994.

[6] J. P. Callan,W. BruceCroft, andS. M. Harding. The INQUERY retrieval system.In Proceedingsof the3rd InternationalConferenceon DatabaseandExpertSystemsApplications, pages78–83,1992.

[7] Norman Carver and Victor Lesser. A new framework for sensorinterpretation:Planningto resolve sourcesof uncertainty. In Proceedingsof the Ninth NationalConferenceonArtificial Intelligence, pages724–731,August1991.

[8] Norman Carver and Victor Lesser. A plannerfor the control of problemsolvingsystems. IEEE Transactionson Systems,Man, and Cybernetics,SpecialIssueonPlanning, SchedulingandControl, (6):1519–1536,1993.

245

Page 50: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

[9] Norman Carver and Victor Lesser. The DRESUN testbedfor researchin FA/Cdistributed situationassessment:Extensionsto the model of external evidence. InProceedingsof theFirst InternationalConferenceonMultiagentSystems, June,1995.

[10] W. Clancey. From GUIDON to NEOMYCIN and HERACLES in Twenty ShortLessons.AI Magazine, 7(3):40–60,1986.

[11] W. Clancey andC. Bock. RepresentingControl KnowledgeasAbstractTasksandMetarules.In TechnicalReportKSL85-16,KnowledgeSystemsLaboratory, ComputerScienceDepartment,Stanford University, 1986.

[12] J. Cowie andW.G. Lehnert. Informationextraction. Communicationsof the ACM,39(1):80–91,1996.

[13] Naive bayesdocumentclassifier. Supplementto MachineLearningby Tom Mitchell,1997, McGraw Hill. http://www.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html.

[14] T. Deanand M. Boddy. An analysisof time-dependentplanning. In Proceedingsof the SeventhNational Conferenceon Artificial Intelligence, pages49–54,St. Paul,Minnesota,August1988.

[15] K. Decker, A. Pannu, K. Sycara,and M. Williamson. Designing behaviors forinformationagents.In Proceedingsof the1stIntl. Conf. onAutonomousAgents, pages404–413,Marinadel Rey, February1997.

[16] Keith S. Decker. Environment Centered Analysis and Design of CoordinationMechanisms. PhDthesis,Universityof Massachusetts,1995.

[17] KeithS.Decker. Taskenvironmentcenteredsimulation.In M. Prietula,K. Carley, andL. Gasser, editors,SimulatingOrganizations:ComputationalModelsof InstitutionsandGroups. AAAI Press/MITPress,1996.

[18] Keith S. Decker and Victor R. Lesser. Quantitative modeling of complexenvironments. InternationalJournal of Intelligent Systemsin Accounting, Finance,and Management, 2(4):215–234,December1993. Specialissueon “Mathematicaland ComputationalModels of Organizations:Models and Characteristicsof AgentBehavior”.

[19] KeithS.Decker andVictor R. Lesser. Designinga family of coordinationalgorithms.In Proceedingsof the First InternationalConferenceon Multi-Agent Systems, pages73–80,SanFrancisco,June1995.AAAI Press.LongerversionavailableasUMassCS-TR94–14.

[20] K.S. Decker, V.R. Lesser, M.V. NagendraPrasad,and T. Wagner. MACRON: anarchitecturefor multi-agentcooperative informationgathering.In Proceedingsof theCIKM-95 Workshopon Intelligent InformationAgents, Baltimore,MD, 1995.

[21] RobertDoorenbos,OrenEtzioni,andDanielWeld. A scalablecomparision-shoppingagentfor theworld-wide-web. In Proceedingsof theFirst InternationalConferenceon AutonomousAgents, pages39–48,Marinadel Rey, California,February1997.

246

Page 51: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

[22] Oren Etzioni. Moving up the information food chain: Employing softbotson theworld wide web. In Proceedingsof theThirteenthNationalConferenceon ArtificialIntelligence, pages1322–1326,Portland,OR,August1996.

[23] OrenEtzioni,Steve Hanks,TaoJiang,RichardKarp,Omid Madani,andOrli Waarts.Optimal information gatheringon the internet with time and cost constraints. InProceedingsof the Thirty-seventh IEEE Symposiumon Foundationsof ComputerScience(FOCS), Burlington,VT, October1996.

[24] T. Finin, R. Fritzson,D. McKay, andR.McEntire.KQML asanagentcommunicationlanguage.In Proceedingsof theThird InternationalConferenceon InformationandKnowledge ManagementCIKM’94. ACM Press,November1994.

[25] D. Fisher, S. Soderland,J. McCarthy, F. Feng, and W. Lehnert. Descriptionofthe UMass Systemsas Used for MUC-6. In Proceedingsof the 6th MessageUnderstandingConference, Columbia,MD, 1996.

[26] JoshuaGrassand Shlomo Zilberstein. Value-Driven Information Gathering. InProceedingsof the AAAI Workshop on Building Resource-BoundedReasoningSystems, Providence,RI, 1997.

[27] LauraHaas,DonaldKossmann,EdwardWimmers,andJunYang.Optimizingqueriesacrossdiversedatasources.VLDB, 1997.

[28] J. Hammer, H. Garcia-Molina, K. Ireland, Y. Papakonstantiou,J. Ullman, andJ. Widom. InformationTranslation,Mediation,andMosaic-basedBrowsing in theTSIMMIS System.In Proceedingsof theACM SIGMODInternationalConferenceonManagementof Data, 1995.

[29] WaqarHasan,DanielaFlorescu,andPatrick Valduriez.Openissuesin parallelqueryoptimization.SIGMODRecord, (3):28–33,1996.

[30] Eric Horvitz, Gregory Cooper, andDavid Heckerman. Reflectionandactionunderscarceresources:Theoreticalprinciplesandempiricalstudy. In Proceedingsof theEleventhInternationalJoint Conferenceon Artificial Intelligence, August1989.

[31] Eric Horvitz and JedLengyel. Flexible Renderingof 3D GraphicsUnder VaryingResources:IssuesandDirections.In Proceedingsof theAAAISymposiumonFlexibleComputationin IntelligentSystems, Cambridge,Massachusetts,November1996.

[32] AdeleHoweandDanielDreilinger. A Meta-SearchEnginethatLearnsWhichEnginesto Query.AI Magazine, 18(2),1997.

[33] Inforia quest.http://www.inforia.com/quest/iq.htm.

[34] The multi-agent systemslaboratory at the university of massachusettsamherst.http://mas.cs.umass.edu.

[35] Jango.http://www.jango.com/.

[36] PeterD Karp. A Strategy for DatabaseInteroperation. Journal of ComputationalBiology, (4):573–583,1995.

247

Page 52: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

[37] Bruce Krulwich. The BargainFinderAgent: Comparisonprice shoppingon theInternet. In JosephWilliams, editor, Botsand Other InternetBeasties. SAMS.NET,1996.http://bf.cstar.ac.com/bf/.

[38] S. LanderandV.R. Lesser. Negotiatedsearch:Organizingcooperative searchamongheterogeneousexpertagents.In Proceedingsof theFifth InternationalSymposiumonArtificial IntelligenceApplicationsin ManufacturingandRobotics, December1992.

[39] Leah Larkey andW. BruceCroft. Combiningclassifiersin text categorization. InProceedingsof the 19th InternationalConferenceon Research and DevelopmentinInformationRetrieval (SIGIR’96), pages289–297,Zurich,Switzerland,1996.

[40] W.G. Lehnert and B. Sundheim. A performanceevaluation of text analysistechnologies.AI Magazine, 12(3):81–94,1991.

[41] Victor Lesser, Michael Atighetchi, Bryan Horling, Brett Benyo, Anita Raja, RegisVincent, ThomasWagner, Ping Xuan, and Shelley XQ. Zhang. A Multi-AgentSystemfor IntelligentEnvironmentControl. In Proceedingsof theThird InternationalConferenceonAutonomousAgents(Agents99), 1999.

[42] Victor Lesser, Bryan Horling, Frank Klassner, Anita Raja, ThomasWagner, andShelley XQ. Zhang. BIG: A resource-boundedinformation gatheringagent. InProceedingsof theFifteenthNationalConferenceonArtificial Intelligence(AAAI-98),July 1998.SeealsoUMassCSTechnicalReports98-03and97-34.

[43] Victor Lesser, Bryan Horling, Frank Klassner, Anita Raja, ThomasWagner, andShelley XQ. Zhang.A next generationinformationgatheringagent.In ProceedingsoftheFourth InternationalConferenceon InformationSystems,Analysis,andSynthesis,pages41–50,July1998.In conjunctionwith theWorld MulticonferenceonSystemics,Cybernetics,andInformatics(SCI’98), Volume2. Also availableasUMASS CS TR1998-72.

[44] Victor Lesser, BryanHorling, Anita Raja,ThomasWagner, andShelley XQ. Zhang.SophisticatedInformation Gathering in a Marketplace of Information Providers.Computer Science Technical Report TR-99-57, University of MassachusettsatAmherst,October1999.Underreview.

[45] Victor LesserandShlomoZilberstein.IntelligentInformationGatheringfor DecisionModels. ComputerScienceTechnicalReportTR-96-35,Universityof Massachusettsat Amherst,1996.

[46] Victor R. Lesser. Reflectionson the natureof multi-agent coordinationand itsimplicationsfor anagentarchitecture.AutonomousAgentsandMulti-AgentSystems,1(1):89–111,1998.

[47] Alon Levy, AnandRajaraman,andJoannJ.Ordille. Query-answeringalgorithmsforinformationagents.In Proceedingsof theThirteenthNationalConferenceonArtificialIntelligence, pages40–47,Portland,OR,August1996.

[48] Ion Musela,Steve Minton, andCraigKnoblock. A hierarchicalapproachto wrapperinduction. In Proceedingsof the Third AnnualConferenceon AutonomousAgents,pages190–197,Seattle,WA, May 1999.ACM Press.

248

Page 53: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

[49] Ion Muslea,SteveMinton,andCraigKnoblock.STALKER: LearningExtrationRulesfor Semistructured,Web-basedInformation Sources. In Proceedingsof the AAAIWorkshopon AI andInformationIntegration, 1998.

[50] T. Oates,M. V. NagendraPrasad,and V. R. Lesser. Cooperative InformationGathering:A Distributed ProblemSolving Approach. ComputerScienceTechnicalReport94–66,Universityof Massachusetts,1994. Journalof SoftwareEngineering,SpecialIssueon DevelopingAgentBasedSystems,1997.

[51] Ralph R. Swick Ora Lassila. ResourceDescriptionFramework (RDF) Model andSyntax Specification. Recommendation,World Wide Web Consortium,February1999.http://www.w3.org/TR/PR-rdf-syntax/.

[52] Robosurfer. http://www.robosurfer.com/.

[53] CharlesRich and CandaceL. Sidner. Collagen: When agentscollaboratewithpeople. In Proceedingsof theFirst InternationalConferenceon AutonomousAgents(Agents97), pages284–291,1997.

[54] Stuart J. Russell and Shlomo Zilberstein. Composingreal-time systems. InProceedingsof the Twelfth InternationalJoint Conferenceon Artificial Intelligence,pages212–217,Sydney, Australia,August1991.

[55] TuomasSandholm.An implementationof thecontractnetprotocolbasedonmarginalcostcalculations. In Proceedingsof the EleventhNational Conferenceon ArtificialIntelligence, pages256–262,Washington,July1993.

[56] S.Soderland,D. Fisher, JAseltine,andW.G.Lehnert.Crystal:Inducingaconceptualdictionary. In Proceedingsof the FourteenthInternational Joint Conference onArtificial Intelligence, pages1314–1321,1995.

[57] Taemswhite paper, 1999.http://mas.cs.umass.edu/research/taems/white/.

[58] LorenTerveen,Will Hill, BrianAmento,David McDonald,andJoshCreter. A systemfor sharingrecommendations.Communicationsof theACM, (3):59–62,March1997.

[59] Anthony Tomasic,Louiqa Raschid,andPatrick Valduriez. A datamodelandqueryprocessingtechniquesfor scalingaccessto distributed heterogeneousdatabasesindisco. IEEE Transactionson Computers, 1997. Special issue on DistributedComputingSystems.

[60] ThomasWagner, Alan Garvey, and Victor Lesser. Complex Goal Criteria and ItsApplication in Design-to-CriteriaScheduling. In Proceedingsof the FourteenthNational Conference on Artificial Intelligence, pages294–301,July 1997. AlsoavailableasUMASS CSTR-1997-10.

[61] ThomasWagner, Alan Garvey, andVictor Lesser. Criteria-DirectedHeuristicTaskScheduling. International Journal of Approximate Reasoning, Special Issue onScheduling, 19(1-2):91–118,1998.A versionalsoavailableasUMASSCSTR-97-59.

[62] ThomasWagnerandVictor Lesser. Design-to-CriteriaScheduling:Real-TimeAgentcontrol. ComputerScienceTechnicalReportTR-99-58,Universityof Massachusettsat Amherst,October1999.Underreview.

249

Page 54: BIG: An Agent for Resource-BoundedInformation Gathering ...€¦ · agent, BIG (resource-BoundedInformation Gathering) [42,43], that takes the role of the human information gatherer

[63] M.P. Wellmen,E.H. Durfee,andW.P. Birmingham.Thedigital library ascommunityof informationagents.IEEEExpert, June1996.

[64] Zurfrider. http://www.zurf.com/.

[65] http://macworld.zdnet.com/pages/april.97/reviews.3334.html.

250