19
Informati on Knowledge Data Wisdom An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature Walter J. Trybula, Ph.D., IEEE Fellow Ronald E. Wyllys, Ph.D. ASIS 2000 – Chicago, Illinois

An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

  • Upload
    lore

  • View
    21

  • Download
    0

Embed Size (px)

DESCRIPTION

An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature. Walter J. Trybula, Ph.D., IEEE Fellow Ronald E. Wyllys, Ph.D. ASIS 2000 – Chicago, Illinois November 14, 2000. Introduction. Data volume is growing and sources of information are more diverse - PowerPoint PPT Presentation

Citation preview

Page 1: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

Information

Knowledge

Data Wisdom

An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

Walter J. Trybula, Ph.D., IEEE FellowRonald E. Wyllys, Ph.D.

ASIS 2000 – Chicago, IllinoisNovember 14, 2000

Page 2: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

Introduction

• Data volume is growing and sources of information are more diverse

• There is a need to evaluate this information

• There are tools that claim to be able to find information in textbases

• An investigation of existing tools would provide a measure of their ability.

• If such tools worked, it might be possible to discover new knowledge.1

1 As described by Swanson as Undiscovered Public Knowledge

Page 3: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

Objective/Goals

• Provide a means of testing the existing instruments to determine their ability to “find” knowledge.

• Determine if any of these instruments provide useful insight to the data.

• Evaluate the findings of domain experts to determine if the instruments are helpful.

• Develop recommendations based on the results of the experiments.

Page 4: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

Overview of Process

• Selected a technical area with known commonality (lithography masks).

• Collected the most recent reports available.

• Compile results into textbase for analysis by text mining tools.

• Have domain experts evaluate the results.

• Analyze their conclusions and draw recommendations for future directions.

Page 5: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

EUV Mask

Si Wafer or ULE

substrate

Multi-layerreflective stack

Patterned Absorbers60nm - 100nm thick

(eg. Al, Cr, TaSi, TiW)

Si top layer( ~5nm thick)

Reflective Multilayers( Mo-Si = 13.4nm , Mo-Be = 11.3nm)

40 Pairs

Substrate (Si wafer)

Si (2.8nm)

Mo (4.1nm)

(6.9nm)

90nm Lines & Spaces~0.3 aspect ratio (l/w)360nm L/S @ mask

with 100nm absorber thickness

Ion Beam Mask

SOI Wafer

stencil pattern“opening”

membranes

Si membraneCarbon layer

1o retrograde

Field 2Field 1 (12.5 mm x 12.5 mm)

Field 3

Field 4

support strut

3 m thickmembrane

500nm thickC layer

90nm Lines & Spaces9.7 aspect ratio (l/w)360nm L/S @ mask

SiC membrane

1X X-Ray MasksSi Wafer

( 100 mm, 4” )

membrane

ring support(Pyrex)

TaX absorber (~ 400nm thick )

Si Waferetched

2.0 m

525 m

90nm Lines & Spaces~4.5 aspect ratio (l/w)

Scalpel Mask

Si Wafer

membrane window1.1 mm

0.2 mm

90nm Lines & Spaces~0.1 aspect ratio (l/w)360nm L/S @ mask

SiNx membrane100nm thick

Tungsten (W)25nm thick(scatterer)

Chrome (Cr)15nm thick(scatterer)

725

m

etched Si wafer(200mm format)

Example of Commonality

Page 6: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

Selection of Information

• Information from leading researchers was collected.– Asian efforts on X-ray technology.

– U.S. efforts on X-ray technology.

– European efforts on Ion Projection Lithography.

– U.S. efforts on Electron Projection Lithography.

– U.S. efforts on Extreme UltraViolet technology.

• Data was their annual update on technology progress provided for yearly review.

• All reports, presentations, and data were assembled into a single textbase for analysis.

Page 7: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

Sources of Data

Concerns:-Language-Terminology-Program (format)

AsiaEuropeU.S.

Page 8: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

Text Mining Tools

• Selected three types of Text Mining Instruments available for desk-top operation.– Key terms identified with pointers to text

– Excerpt presentation format

– Hierarchal tree-structure presentation• Did not include Self-Organizing Maps (SOMs)

• Included a search engine for baseline evaluation of the results (AltaVista).

Page 9: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

Text Mining Tools

Text Mining Tool that returns Key Terms

Page 10: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

Text Mining Tools

Text Mining Tool that returns Excerpts

Page 11: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

Text Mining Tool that returns Hierarchy

Text Mining Tools

Page 12: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

Results

• No method provided any novel results. There was some difficulty with mixed format documents.

• Domain experts were required to evaluate the output and determine importance of delivered information.

• Graphical information presentation was preferred over simple text.

• Search Engine provided many pointers to occurrences of search terms.

• There was no evidence that this approach provided any novel knowledge.

Page 13: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

Conclusions

• Text Mining instruments are in a developmental stage and need refinement to be more useful.

• Text Mining instruments must be able to handle data in various formats, i.e., documents, spreadsheets, presentations, etc.

• Without a defined goal of what data will be delivered, there is no commonality among the various instruments.

• Experts had difficulty in retrieving information that was known to be present due to methodology of evaluating information in textbase.

• There must be a cohesive direction provided for the development of these instruments.

Page 14: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

• An Instrument that evaluates the text in the textbase and provides an accurate representation of the information contained therein.

• An Instrument that provides this information in a manner that can be accurately and quickly evaluated by the intended user.

• An Instrument that draws the best elements from existing work and provides information based on proven methodologies. (In rapidly evolving technologies, efforts in one area may ignore developments in others. This is not acceptable.)

Rec

om

men

dat

ion

sFuture Directions – Information Needs

Page 15: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

Data Mining ProcessR

eco

mm

end

atio

ns

Start with existing methodology.

Page 16: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

Text Mining ProcessR

eco

mm

end

atio

ns

Develop new methodology from existing ones.

Page 17: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

Future Directions – Instrument Needs

• There needs to be a cohesive direction for future work. The existing development must draw on the knowledge developed in the Library Science field.

• Can build from Data Mining to derive Text Mining functionality. A key concern will remain the method of presenting the results.

• Need to have some agreement on the purpose of the Text Mining Instruments– What is the purpose of “mining” text?

– What kind of user will there be?

– What is the anticipated outcome?

• Consider the application of the latest software developments, e.g., Groove, Napster, for information sharing.

Rec

om

men

dat

ion

s

Page 18: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

Challenges

• Establish a “goal” for the results of Text Mining. What will be accomplished?

• Drive toward widespread application, i.e., desktop and handheld applications.

• Incorporate latest hardware developments, i.e., distributed, parallel processing and wireless communications.

• Deliver what the intended user needs.

• Don’t reinvent the “wheel”– Have the Library Science, the Information Science, and the

Computer Science people work together.

Rec

om

men

dat

ion

s

Page 19: An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

[email protected] 14 November 2000

Acknowledgements

• Dean Brooke Sheldon, Sanda Erdelez, Mary Lynn Rice-Lively (GSLIS, University of Texas at Austin).

• John Konopka of IBM.

• The International SEMATECH team including Scott Mackay, Mark Mason, Phil Seidel, David Stark.

• The various technology champions for their efforts in providing the latest technology information.