Upload
dangnhu
View
216
Download
0
Embed Size (px)
Citation preview
Knowledge Discovery in Databases and Information Retrieval
In
Knowledge Management Systems
Anne Marie Donovan
April 22, 2003
Knowledge Management Systems, LIS 385T
The University of Texas at Austin
School of Information
KDD and IR 2
Introduction
The processes of Knowledge Discovery in Databases (KDD) and Information
Retrieval (IR) appear deceptively simple when viewed from the perspective of
terminological definition. Fayyad, Piatetsky-Shapiro, and Smith (1996) define KDD as
"the nontrivial process of identifying valid, novel, potentially useful, and ultimately
understandable patterns in data" (p. 30). The closely related process of IR is defined by
Rocha (2001) as "the methods and processes for searching relevant information out of
information systems that contain extremely large numbers of documents" (1.1). In
execution, however, these processes are not simple at all, especially when executed to
satisfy specific personal or organizational Knowledge Management (KM) requirements
or as the core functionality of Knowledge Management Systems (KMS).
The potential validity or usefulness of an individual data element or pattern of
data elements may change dramatically from individual to individual, organization to
organization, or task to task. Relevance is a highly contextual and personal data
characteristic, changing even as the IR process is underway and information requirements
are incrementally met. Making retrieved data or a description of data patterns generally
understandable is also highly problematic. Data that may appear relevant and easily
understandable in one retrieval context may be completely unintelligible in another, even
to the same audience. KDD and IR are, in fact, highly complex processes that are
strongly affected by a wide range of factors. These factors include the needs and
information seeking characteristics of system users as well as the tools and methods used
to search and retrieve, the structure and size of the data set or database, and the nature of
the data itself.
KDD and IR 3
KDD and IR: An Historical Perspective
Origins
Information professionals often describe the KDD and IR processes in the context
of specific types of Database Management Systems (DBMS). Devarakonda (2001)
divides DBMS into four types: simple data without query, simple data with query,
complex data without query, and complex data with query. An example of the first type,
simple data without query, is a filing system, including files that may exist only in paper
form. The second, third and fourth types are exemplified by Relational DBMS
(RDBMS), Object-Oriented DBMS (OODBMS), and Object-Relational DBMS
(ORDBMS), respectively (Devarakonda, 2001, ORDBMS). The type of database that is
queried significantly affects the processes of knowledge discovery (KD) and IR.
Because an RDBMS of some type forms the core of almost all KMS,
improvement of RDBMS functionality for KD and IR has been a crucial part of KMS
refinement for the past three decades. The relatively recent introduction of OODBMS to
KMS has created many new KD and IR problem sets for researchers. These challenges
have been met, thus far, primarily through the introduction of certain features of RDBMS
to OODBMS. The result has been the development of a small group of ORDBMS that
combine the best KD and IR features of RDBMS and OODBMS (ORDBMS).
Information professionals familiar with traditional filing systems are acutely
aware of the limitations imposed on KD and IR by their pre-set filing structure. Although
technically a database, this type of DBMS does not lend itself to automated searching, but
KDD and IR 4
only to browsing or search by pre-designated subject categories and file descriptions
(e.g., library card catalogs). The difficulties presented for KD and IR by simple filing
structures were initially replicated in computer-supported file structures and were only
alleviated with the introduction of the Relational Database Model (RDM), by E. F. Codd
in 1970 (Devarakonda, 2002, RDBMS).
Introduction of the RDM resulted in rapid adoption of RDMS for information
organization and control across a broad range of commercial and social organizations as
well as the development of increasingly effective data collection and storage
technologies. RDBMS permitted much more flexibility in data organization and retrieval
than traditional data filing systems, but traditional IR methods did not permit flexibility in
the characterization of user needs or the delineation of search parameters (Rocha, 2001,
1.2). The result, of course, was increasing numbers of organizations that possessed very
large and continually growing databases but only rudimentary tools for KD and IR. Two
areas of research focus in information management developed in response to this
problem: data warehousing and data mining.
Data warehousing, defined by Fayyad et al. as "collecting and 'cleaning'
transactional data to make it available for online analysis and decision support" (2001, p.
30), focuses on the methodical collection and pre-processing of data for specific
analytical uses. The data is subject-oriented, time-stamped, and integrated to permit
interactive analysis in support of decision-making processes. A data warehouse normally
integrates data from a variety of sources, "thus enriching the data and broadening the
context and value of the information" (Rauber et al., 2002, Data Warehousing and
OLAP).
KDD and IR 5
Data mining, defined as "the application of specific algorithms to a data set for the
purpose of extracting data patterns" (p. 28), focuses on improving the utility of large data
sets as well as IR response. Data mining, in particular the algorithms used in data
mining, has received a lion's share of attention in the development of Decision Support
Systems (DSS) and RDMS research because results are often immediately applicable in
high-payoff decision-making industries such as insurance, sales, and financial and
medical services.
Inspirations and Intentions for the Technology
Rocha describes the ultimate goal of IR as the production or recommendation of
relevant information to users (2001, 1.2). We can ascribe the same motivation to the
development of KDD systems and methods in general, particularly in regards to the
refinement of DBMS. Research in data collection, storage, and retrieval has focused on
issues specifically related to the improvement of KD and IR functionality. Among the
topics given special attention have been data translation, change detection, integration,
duplication, summarization, aggregation, and timeliness (Widom, 1995).
Research has also focused on the need to improve automation in KD and IR,
especially in the areas of data selection and pre-processing, data transformation, and data
interpretation and evaluation (Fayyad et al., 1996, p. 28). However, increased automation
in KD and IR requires increased attention to the methods used for data collection and
storage as well as the statistical foundations of the search and retrieval processes (p. 29).
Despite this complication, however, it is clear that manual analysis of billions of records
and hundreds of fields is impractical and that automated data handling will be even more
KDD and IR 6
in demand as requirements for on-the-fly analysis and more flexible presentation of
search results increase (p. 28).
KDD and IR: Application to KMS
Technological Systems and Processes
Interface, interaction, and ubiquity. The relationship of KDD and IR to KMS is
intimate: all KMS rely in some form on the aggregation of data for search and retrieval.
Historically, improvements in the utility of KMS have depended in large part on
improvements in KDD and IR functionality. Fayyad et al. describe KDD as "the overall
process of knowledge discovery from data, including how the data is stored and accessed,
how algorithms can be scaled to massive data sets and still run efficiently, how results
can be interpreted and visualized, and how the overall human-machine interaction can be
modeled and supported” (Fayyad et al., 1996, p. 29). This comprehensive list of KDD
processes, which encompasses IR, also serves to describe the core functionality of most
KMS (pp. 30-31). Research issues that have arisen in the development of DBMS and the
study of KDD are also closely related to the development and deployment of KMS.
Among these are: data collection and pre-processing; continually increasing volumes of
data; increasingly complex forms of data; identifying and extracting useful knowledge
from extremely large repositories; means for identifying knowledge of value about as
well as in the data set; extracting knowledge from data and presenting that knowledge in
usable forms (pp. 30-31).
The development of highly specialized DBMS for data warehousing and the
continual refinement of data mining methods and technologies have been motivated in
KDD and IR 7
large part by the deployment of KMS throughout industry. Many KMS are simply
elaborated RDBMS integrated with IR and communication systems. More sophisticated
KMS may also add collaborative work tools. Decisions related to data mining, including
model functions, model representation, and preference criterion are an elemental part of
KMS development and deployment (pp. 31-32). Data mining tasks (classification,
forecasting, clustering, description, deviation detection, link analysis, and visualization
(Piatetsky-Shapiro, 1998, Slide 17) and search algorithms are fundamentally affected by
the focus and purpose of an organization's KMS.
System architecture. The characteristics of the underlying DBMS
determine the architecture of KD and IR systems. RDBMS are composed of many
relations in the form of two-dimensional tables of rows and columns containing related
tuples. The rows (tuples) are called records, and the columns (fields in the record) are
called attributes. Each column is accorded a specific data type. The type of data stored in
an RDBMS has traditionally been constrained to ensure that there are no ambiguous
tuples in the database (Devarakonda , 2002, RDBMS) although in the case of very
complex data types, for example scientific data, programmers have overcome the
constraints of the DBM by employing Binary Large Objects (BLOBs) to store data in a
database. This "solution" creates its own set of problems, however. BLOBs are usually
much larger than a single block of storage in a database, a characteristic that undermines
the efficiency of the database. As well, because of their size, and because BLOBs in a
single database may contain a variety of data types and compound data, the data content
of the BLOB is not visible to the database. The opacity of data content means that a user
KDD and IR 8
cannot perform a high-level search across the BLOBs in a database (Wallace, Benschop,
and Köhntopp, 1999).
RDBMS use Structured Query Language (SQL) for data definition, modification,
querying and constraint specification. Queries can range from simple single-table queries
to complicated multi-table queries. A commonly used RDBMS is Microsoft Access, but
the existence of a standard query language allows data to be migrated easily from one
RDBMS to another (Devarakonda , 2002, RDBMS). Although the structure of RDBMS
renders them incapable of handling complex data types such as spatial data, images, or
number arrays without the use of BLOBs, it does permit rapid data access and large
storage capacities.
The data management limitations of RDBMS led to the development of
OODBMS. In OODBMS, internal data structure is hidden so that external operations can
be performed on the data as an Abstract Data Type (ADT). RDBMS and OODBMS are
fundamentally different in the way they handle data relationships; OODBMS represent
relationships explicitly, which improves data access performance. Nonetheless,
OODBMS are plagued by poor query performance and problems of database scalability
(Devarakonda, 2002, OODBMS).
ORDBMS, a relatively recent innovation, are designed to incorporate the best
features of both RDBMS and OODBMS. Data is stored in tables, but some entries may
have richer data structure; as in OODBMS, these entries are called ADTs. Because the
data is stored in rows and columns, the ORDBMS maintains a relational data model,
although it must be heavily modified to support object-oriented programming. In
essence, the object-relational model adds a new object-oriented layer to support rich data
KDD and IR 9
types on top of the relational database model. ORDBMS support query and handle data
objects; the can also be built on a massive scale. These features make ORDBMS
particularly useful for the development of KMS for handling complex data types.
System configuration and deployment. A primary concern for many organizations
during the configuration and deployment of KD and IR systems has been the creation of
data and query context. Some efforts to create context have been retrospective. Lee and
Hwang (2002) describe the process of extracting and visualizing semantic metadata from
databases. This process, called relational database reverse engineering (RDRE), “extracts
a conceptual model from an existing relational database by analyzing data instances as
well as metadata” (Lee and Hwang, 2002, Conclusion). RDRE has been especially useful
in creating shared "conceptual schema" for multiple databases (Introduction). A
conceptual schema describes the database in terms of data items and relationships
between data items in a form "suitable for human presentation" (Introduction) thereby
enhancing KD and IR. The ability to discover and describe data relationships within and
between databases allows organizations to profile and map information in their data
warehouses in ways that were previously unimaginable. Mapping and profiling of data
not only creates discovery and retrieval context to enhance data reuse, it can also reveal
entirely new uses possibilities. A well-defined database reengineering project enables an
organization to integrate the masses of transactional data that lies in its data warehouse
with information collected from other enterprise systems or from outside the company.
Another common method for creating data and query context for enterprise data
warehouses is the establishment of mechanisms for creating context during data creation
and collection or during query construction. Many personal KMS provide robust
KDD and IR 10
mechanisms for data contextualization through the addition of metadata or by data
structuring. KMS such as PeopleGarden (Xiong and Donath, 1999) extract social
context for data during the processes of data collection and data exchange. Extending IR
throughout the social network of an organization, as is done by Answer Garden
(Ackerman, 1994, Ackerman, and Malone, 1990, & Ackerman and MacDonald, 1996) is
another method for providing query context for KD and IR.
Technology transition in organizations. Institutions that have pioneered the use of
KDD and IR, especially in the form of data mining, have traditionally been those that rely
heavily on knowledge-based decisions for their success. Because their operations have
historically relied heavily on data collection, these organizations normally have a large
quantity of accessible, relevant, historical and current data. They also anticipate a high
payoff for making rapid, correct decisions based on their collected data and they actively
seek a technological advantage in knowledge management. Financial institutions such as
banking and investment firms, healthcare and insurance organizations, and businesses
that rely heavily on marketing and customer relations are emblematic of sectors that have
aggressively pursued technological innovations in KD and IR (Piatetsky-Shapiro, 1998,
Slides 28-31).
The development of Decision Support Systems (DSS) based on electronic data
processing (EDP) was an early application of database technology to KM in large
enterprises. In many cases, however, technological strides in data collection (hardware
and software) rapidly outpaced the enterprises' ability to understand and manage the data
that was being collected and stored. Information was often plentiful without being
KDD and IR 11
relevant and extensive data warehouses often proved inadequate for applied decision
making (Bass, 1983, p. 189).
Another difficulty faced by organizations that relied on large data bases for
decision support was the danger that decisions would be made based on data that was
poorly contextualized or poorly understood. Managers faced with a complex decision
process might misinterpret the applicability of a data set to the problem or fail to
investigate the existence of contradictory data (Calvert, 1993, p. 91). The less contextual
the data, the more easily it may be misinterpreted or misapplied.
Organizational Systems and Processes
The introduction of automated KDD and IR changed the fundamental nature of
knowledge work, organizational architectures, management practices, and
communication flows in organizations. The introduction of Web-served data collection,
query and delivery has also significantly affected these systems. In particular, the
expansive application of KDD and IR technologies and techniques to information
management for distributed or "flattened" organizations has resulted in KM becoming a
ubiquitous "industrial" product in many business sectors.
Two aspects of knowledge work profoundly affected by the pervasive use of
KDD and IR technologies have been knowledge creation and communication in the
context of collaboration. The enhancement of collaborative possibilities in knowledge
work created by distributed KDD and IR has had significant social affects in
organizations and among individuals. The problem of creating shared context for data
collection, retrieval, and delivery in distributed DBMS has already been mentioned.
KDD and IR 12
Equally difficult are the incitement of collaboration and the creation of networks of trust
among the dispersed users of distributed DBMS.
The creation of massive, increasingly powerful DBMS and more effective KDD
and IR technologies and techniques has also raised many complex social issues outside
business processes. One significant social concern is the increasingly pervasive
collection of detailed individual data that enabled by sophisticated DBMS. Many
individuals enjoy the convenience offered by the maintenance of personal information in
commercial databases, but are unaware of the privacy implications inherent in the
services these databases enable. Many individuals are faced with a daily choice:
convenience and service or security and privacy?
KDD and IR: Looking to the Future
KDD and IR research problems
The demands of commercial KM markets drive the lifecycle of KD and IR
systems. The creation of highly dimensional, massive data sets and the increasing
sophistication of users and complexity of database uses have directed KDD research in
specific directions. High priority research topics include: problems of statistical
significance and missing data; the understandability of data patterns; the management of
changing data and data integration; and the manipulation of non-standard, multi-media,
and object oriented data (Fayyad, Piatetsky-Shapiro, & Smyth, 1996, pp. 33-34).
Research and development in IR is equally market driven. In 1995, Croft
published a "top ten" list of IR research issues based on his experiences in the area of
industrial and government research priorities as a member of the National Science
Foundation (NSF) Center for Intelligent Information Retrieval (CIIR) (¶ 3). These
KDD and IR 13
research priorities, derived from surveys of companies that use and sell IR systems, still
resonate today:
1. Integrated solutions (standardized architectures and common platforms; the
integration of database management and IR systems with multimedia capabilities)
2. Distributed IR (retrieval systems that can work in distributed, wide-area
network environments)
3. Efficient, flexible indexing and retrieval (including ability to handle a wide
variety of data formats)
4. Automatic query expansion (To overcome vocabulary mismatch between users
and databases
5. Interfaces and browsing (Interfaces that support a range of functions including
query formulation, presentation of retrieved information, feedback, and browsing
in a conceptually simple way)
6. Routing and filtering (many companies considered data routing to be the main
function required for a text-based DBMS, with IR being a secondary function)
7. Effective retrieval (companies are particularly interested in techniques that
produce significant improvements in precision but still avoid occasional major
retrieval mistakes)
8. Multimedia retrieval (techniques for accessing image, video and sound
databases without text descriptions)
9. Information extraction (techniques to identify database entities, attributes and
relationships in full text)
KDD and IR 14
10. Relevance feedback (improved algorithms and models for automatic relevance
feedback) (Croft, 1995)
New developments
KD and IR problems for Web resources. The rapid growth of the Web and
increasing reliance on the Web for the collection and delivery of data for KM has created
new problems in KD and IR as well as bringing some older problems to the fore. Among
the problems are: standardization of data collection and pre-processing; huge volumes
of continually changing data; complex, streaming, and multi-media data; identifying and
extracting useful knowledge from Web resources; a lack of consistent data models and
context; a lack of available descriptive information; the problem of presenting knowledge
in usable forms; and the rapid development of more time-sensitive, multi-media
applications for Web resources.
Many of these problems reflect the inadequacy of current methods for Web
resource KD and IR. Data collection is presently performed primarily by automated Web
crawlers. Pre-processing consists of link-based ranking or human indexing and
categorization. The identification and extraction of useful knowledge from Web
resources is dependent on highly inefficient keyword searches on natural language text or
on imprecise topical directories or topical Web sites. Retrieved knowledge can be
viewed only in its native format (with a plugin) or sometimes only as derived HTML.
A variety of research and development projects are underway to enable more
efficient, automated KD and IR for Web resources. Among the best known efforts are
those that seek to apply semantic markup to Web resources to enable machine
understanding and processing and inference analysis. Related projects seek to develop
KDD and IR 15
intelligent search engines and agents to exploit the semantic statements created by this
markup, while still others are creating ontologies to provide context for these search
engines and agents (Shah et. al., 2002)
Other researchers are examining improved methods for automated data and
context collection (data pre-processing), the provision of value-added services such as
query routing, the development of integrated query and knowledge delivery systems, and
the establishment of social accounting metrics to provide context for humans (Smith,
2002, p. 52). Another major area of research focuses on leveraging historical information
about individual and group Web browsing experience and patterns to enable more
efficient KD and IR (Chakrabarti et al., 1998, Abstract). Rauber et al. (2002) provide an
evocative description of the potential for enhanced KD and IR that is as yet untapped,
"With [such] a repository of Web data, as well as the metadata associated with the
documents and domains, we have a powerful source of information that goes beyond the
content of Web pages …. in order for the most useful analyses to yield answers to project
questions and issues, a different perspective of the Web and Web archives is needed, a
perspective focusing not solely on content, but on the wealth of information
automatically associated with each object on the Web" (Introduction). Capturing an
understanding of how other individuals have discovered, retrieved, and used Web content
provides invaluable context for users who are accessing that content for the first time.
Integration with Other Technologies
Enhanced presentation for the Web. The need for better integration of KDD and
IR systems with delivery and presentation technologies has already been mentioned and it
KDD and IR 16
is a need that cannot be overstated. This is particularly true in the case of information
presentation on the Web. Considerable research is underway in the area of reformatting
data for discovery and presentation through Web-enabled devices. Another area of
research focus is differentiated service for different devices that would enable variable
visualization of retrieved information depending on a user's needs and device
characteristics. Researchers in the field of adaptive graphics, "a unifying framework that
allows visual representations of information to be customized and mixed together into
new ones” have proposed content pre-viewing, interactive content, selective presentation,
and customized views of Web-served content (Boier-Martin, 2003, pp. 6-9) as areas ripe
for progressive research. Many of these researchers refer to the work of Turner Whitted
who in 1998 suggested the use of computer displays as "wallpaper" for interactive
information exchange to enable pervasive collaboration and information retrieval (1999,
p. 6).
KDD and IR for pervasive computing. Achieving what Cherniack, Franklin, and
Zdonik term “ubiquitous data access” (2001, slide 7) presents several unique challenges
in system integration. Many of these challenges reflect data management problems.
Among these are: the resolution of context-dependent data (e.g., push/data pull delivery
issues); synchronization of data from multiple, distributed sensors and collectors; the
efficient renewal of data streams; effecting profile-driven data management; dealing with
location aware, mobile devices; and the enabling of service mobility and service
discovery (slides 8-27).
KDD and IR 17
The next generation
Research trends and priorities suggest a number of substantial advances in next
generation KDD and IR systems. We can expect them to enable the solving of business
problems, not data analysis problems. They will embed knowledge discovery engines
and integrate access to enterprise and external data on the back-end. Moreover, most
importantly, they will integrate the knowledge discovery process with knowledge
delivery tools (Piatetsky-Shapiro, 1998, Slide 7). We can also expect next generation
KDD and IR systems to manage information retrieval contextually, allow contextual
query/continuous query, enable KD in virtual networks of peer-to-peer databases, and
interpolate or extrapolate for missing data (Cherniack et. al., 2001, slides 115-138).
To enable mobile and pervasive computing applications, future KDD and IR
systems will also have to be able to characterize information resources, recognize
individual users, provide variable means to exchange knowledge between users and
information sources (push and pull of information), adapt to the user community, and
enable the reuse and recombination of information as well as its exchange (Rocha, 2001,
1.2). The most fundamental and difficult of these challenges will be information
characterization.
Conclusion: On the Bleeding Edge
One might reasonably ask if the KDD and IR systems described above fall in the
realm of science or science fiction. The answer is, assuredly, in the realm of science,
although science fiction has often been influential in application development. This
answer is supported by a brief examination of the KDD and IR research being funded by
the Defense Advanced Research Projects Agency (DARPA) (the folks who brought us
KDD and IR 18
the Internet) under the auspices of the federal Total Information Awareness (TIA)
Program. This research covers substantially new database technologies, architectures,
population techniques, search algorithms, and data models.
One funded project, Genisys, has the goal of producing technology to enable
ultra-large, all-source information repositories (DARPA, 2003b, Program Strategy).
Unlike RDBMS in use today, Genisys-developed DBMS will require no prior data
modeling; support automated restructuring and projection of data; store data in context of
time and space; and develop a large, distributed system architecture for managing a huge
volume of raw data input, analysis results, and feedback (DARPA, 2003a, TIA System:
Program Strategy). Programs such as Genisys are building aggressively on a foundation
of 30 years of research in KDD and IR technology and techniques. Although these
initiatives raise new social as well as technical problems, they also suggest the possibility
of substantially new applications for these technologies.
The difficulties of contextualizing and interpreting data for KM have increased
many-fold in the past decade. New technologies for data collection and storage have led
to ever-larger data warehouses containing hugely complex data types-- a development
that has greatly complicated data discovery, retrieval, visualization, and sharing within
organizations. A growing need to incorporate increasingly disparate data sources from
outside the organization has transformed enterprise KM from a cluster of internal
management problems into a problem set that also encompasses an organization's
relationships with clients and competitors, as well as its ability to participate in lucrative
cooperative ventures. Enterprises now seek to use information technology to support not
just individual problem solving, but entire decision making processes.
KDD and IR 19
KD and IR have become tools that not only enhance human decision-making but
that also compensate for inherent weaknesses in human decision making processes. The
result has been the development of powerful new EDP applications in knowledge
discovery, KM, and enterprise decision making, especially in the areas of collaborative
ventures, market forecasting, the management of customer relations, and fraud or crime
detection. If these technologies are to progress even further, however, researchers must
deal with the essential task of describing (characterizing) our growing wealth of
information resources (online and offline). Only when we are able to visualize
meaningfully the vast extent of our available information resources will we be able to
develop new approaches to KD and IR. The fundamental problems in KM today relate to
our inability to find and understand the information we already possess, not to an
inability to collect and manipulate new data. It is in the development of better KD and IR
tools that the future of KM and KMS lie.
KDD and IR 20
References
Ackerman, M. S. (1998, July). Augmenting the organizational memory: A field study of Answer Garden. ACM Transactions on Information Systems, 16(3), 203-204. Retrieved March 28, 2003 from http://doi.acm.org/10.1145/290159.290160
Ackerman, M. S., & Malone, T. W. (1990, April). Answer Garden: A tool for growing
organizational memory. ACM SIGOIS Bulletin, 11(2-3), 31-39. Retrieved March 28, 2003 from http:// doi . acm .org/10.1145/91474.91485
Ackerman, M. S., & McDonald, D. W. (1996). Answer Garden 2: Merging organizational memory with collaborative help. Proceedings of the ACM Conference on Computer-Supported Cooperative Work 1996 (CSCW96 Boston, MA). Retrieved March 28, 2003 from http:// doi . acm .org/10.1145/240080.240203
Bass, B. M. (1983). Organizational decision making. In L. L. Cummins, E. Kirby Warren, & J. F. Mee (Eds.), The Irwin series in management and the behavioral sciences. Homewood, IL: Richard D. Irwin.
Boier-Martin, I. M.. (2003, January/February). Adaptive graphics. In T. Rhyne (Ed.) Visualization Viewpoints, IEEE Computer Graphics and Application, 23(1), 6-10. Retrieved April 5, 2003 from http://www.research.ibm.com/people/i/imartin/papers/visviewpoints.pdf
Calvert, G. (1993). Highwire management: Risk-taking tactics for leaders, innovators, and trailblazers. San Francisco, CA: Jossey-Bass Publishers.
Chakrabarti, S., Srivastava, S., Subramanyam, M., & Tiware, M. (1998). Using Memex to archive and mine community Web browsing experience. A paper presented at the 9th International World Wide Web Conference, Amsterdam, May 15-19, 2000. Retrieved April 12, 2003 from http://www9.org/w9cdrom/98/98.html
Croft, W. B. (1995, November). What do people want from information retrieval?: The
top 10 research issues for companies that use and sell IR systems. D-Lib Magazine. Retrieved April 5, 2003 from http://sunsite.anu.edu.au/mirrors/dlib/dlib/november95/11croft.html
DARPA. (2003a). Genysis. Retrieved from the DARPA Information Awareness Office Web site at: http://www.darpa.mil/iao/Genisys.htm
DARPA. (2003b). Total Information Awareness System. Retrieved from the DARPA Information Awareness Office Web site at: http://www.darpa.mil/iao/TIASystems.htm
KDD and IR 21
Devarakonda, R. (2001, March). Object-relational database systems - The road ahead. ACM Crossroads Student Magazine. Retrieved April 12, 2003 from www.acm.org/crossroads/xrds7-3/ordbms.html
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996, November). The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM, 39(11), 27-34. Retrieved March 03, 2003 from http://wwwhome.cs.utwente.nl/~mpoel/colleges/dwdm/ACM_artikelen/fayyad2.pdf
Lee, D., & Hwang, Y. (2002, March 1). Extracting semantic metadata and its visualization. ACM Crossroads Student Magazine. Retrieved March 27, 2003 from www.acm.org/crossroads/xrds7-3/smeva.html
Piatetsky-Shapiro, G. (1998, December 4). Data mining and knowledge discovery tools: The next generation. Retrieved February 27, 2003 from kdnuggets.com at http://www.kdnuggets.com/gpspubs/dama-nextgen-98/index.htm
Rauber, A., Aschenbrenner, A., Witvoet, O., Bruckner, R. M., & Kaiser, M. (2002, December). Uncovering information hidden in Web archives: A glimpse at Web analysis building on data warehouses. D-Lib Magazine, 8(12). Retrieved March 28, 2003 from http://www.dlib.org/dlib/december02/rauber/12rauber.html
Rocha, L. M. (2001). TalkMine: A soft computing approach to adaptive knowledge recommendation [Electronic version]. In V. Loia & S. Sessa (Eds.), Studies in fuzziness and soft computing: Vol. 75. Soft computing agents: New trends for designing autonomous systems. (pp. 89-116). New York: Springer. Retrieved March 28, 2003 from http://www.c3.lanl.gov/~rocha/softagents.html
Shah, U., Finin, T., Joshi, A., Cost, R. S., & Mayfield, J. (2002, November). Information retrieval on the Semantic Web. Paper presented at The ACM Conference on Information and Knowledge Management , November 2002. Retrieved March 28, 2003 from http://www.csee.umbc.edu/~finin/papers/cikm02/cikm02.pdf
Smith, M. (2002). Tools for navigating large social cyberspaces. Communications of the ACM, 45(4), 51-55. Retrieved March 28, 2003 from http://delivery.acm.org/10.1145/510000/505272/p51-smith.html?key1=505272&key2=5541680501&coll=GUIDE&dl=GUIDE&CFID=9914049&CFTOKEN=12943474
Wallace, N., Benschop, O., & Köhntopp, K. (1999). What is a BLOB? php.faqts. Retrieved May 1, 2003 from http://www.faqts.com/knowledge_base/view.phtml/aid/416/fid/876
Whitted, T. (1999, July/August). Draw on the Wall. IEEE Computer Graphics and Applications, 19(4), 6-9. Retrieved April 8, 2003 from ieeeexplore.ieee.org at:
KDD and IR 22
http://ieeexplore.ieee.org/iel5/38/16795/00773957.pdf?isNumber=16795&arnumber=773957&prod=JNL&arSt=6&ared=9&arAuthor=Whitted%2C+T.
Widom, J. (1995, November). Research problems in data warehousing. Proceedings of the 4th International Conference on Information and Knowledge Management (CIKM). Retrieved March 28, 2003 from http://www.ischool.utexas.edu/~i385tkms/readings/Widom-1995-ResearchProblems.pdf
Xiong, R., & Donath, J. (1999). PeopleGarden: Creating data portraits for users. CHI
Letters, 1(1), 37-44. Retrieved April 8, 2003 from http://smg.media.mit.edu/papers/Xiong/pgarden_uist99.pdf