1
18 Computer Industry Trends T he Palomar Observatory in San Diego, California, records thou- sands of images of the sky, including millions of faint points of light from the distant reaches of space. It would take hundreds of peo- ple thousands of hours pouring over images to identify which light source is a star, which is a galaxy, and so on. And humans wouldn’t be able to identify some of the most distant light sources. However, NASA’s Jet Propulsion Laboratory (JPL) has a data-mining tool, called Skicat (Sky Image Cataloging and Analysis Tool), which can analyze digi- tized data from the observatory, recog- nize patterns, and make the proper identifications much more quickly and much less expensively. This is just one of the ways that research groups, large businesses, government agencies, and other organizations are using improved mining technologies and tech- niques to discover meaningful patterns in huge databases. The technology is also looking for geological patterns in earth- quake-prone areas, predicting bad credit risks, and anticipating inventory demands. And now, data mining has been refined to the point where even people who aren’t highly trained statisticians can use this complex data-analysis tool. INCREASED POPULARITY Data mining’s increased popularity is due partly to technological improvements that permit faster, more effective analy- ses of databases. Data-mining techniques, like the use of neural networks, have become more effective. A neural network is a processing device, either an algorithm or hardware, whose design was moti- vated by the design of the human brain. Neural networks exhibit interconnectiv- ity and parallelism, “learn” from exam- ples, and are able to generalize. Vendors have improved their products’ precision by combining a variety of data- mining techniques. In the past, they generally used only one technique in their products. Angoss International (http://www.angoss.com) recently released a data-mining product for businesses, KnowledgeSeeker, that uses both tree- based models and neural networks. Tree-based models organize data into branched systems that show how various pieces of data relate to each other. Powerful processors now let computers (including many desktop units) run com- plex mining algorithms and search large databases quickly. In addition, better graph- ics technology lets users see the results of data mining on easy-to-read graphs, charts, and so on. These two factors have made data mining a valuable tool for users who aren’t sophisticated statisticians. Data-mining products are also just now becoming Web-enabled. So, com- panies that, for example, access data- bases through corporate intranets can now use data-mining tools. Because of these factors, the technol- ogy is beginning to be used in so many settings that data-mining vendors are beginning to splinter into niches based on industry and function, said Erick Brethenoux, the Paris-based research director for the Gartner Group, a market research firm. For instance, SAS Institute (http://www.sas.com) is emerging as a favorite of statisticians, while HNC Software (http://www.hnc.com) is achiev- ing dominance among risk analysts. A CAUTIONARY NOTE Despite data mining’s value, users should realize that the technology pro- vides only a guide, not a gospel. Users must be wary of finding meaningless sta- tistical patterns that don’t indicate a cause and an effect or that don’t accurately pre- dict the future. For example, David J. Leinweber, man- aging director of First Quadrant Corp., an investment management firm, said he used data mining and found that, statis- tically speaking, the best predictor of the performance of Standard & Poor’s 500 Index of stocks is the price of butter in Bangladesh. Of course, he said, the two have no causal relationship, so the corre- lation is meaningless. This, he said, is an example of the type of “stupid data-min- ing tricks” users must be careful about. A nalysts expect vendors to begin releasing data-mining tools for spe- cific applications. One emerging area is text mining, which can analyze, for example, customer comments on sur- vey sheets. Brethenoux predicts that mul- timedia mining—which could, for example, analyze patterns in photos— will emerge within five years. Meanwhile, researchers are working on ways to accelerate data mining by trading accuracy for speed. And in the not-too-distant future, data mining will become a common data-analy- sis tool on many desktops said Herb Edelstein, president of Two Crows Corp., a data-mining consulting firm. This process has already started, as even people who don’t conduct complex statistical analysis are now beginning to use data mining. Joe Mullich is a freelance technology writer based in Glendale, California. Contact him at [email protected]. Data Mining: Making Data Meaningful Joe Mullich Editor: Lee Garber, Computer, 10662 Los Vaqueros Circle, PO Box 3014, Los Alamitos, CA 90720-1314; [email protected] JPL’s Skicat can identify distant light sources more quickly and inexpensively. .

Data mining: making data meaningful

  • Upload
    j

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

18 Computer

Indu

stry

Tre

nds

The Palomar Observatory in SanDiego, California, records thou-sands of images of the sky,including millions of faint pointsof light from the distant reaches

of space. It would take hundreds of peo-ple thousands of hours pouring overimages to identify which light source is astar, which is a galaxy, and so on. Andhumans wouldn’t be able to identifysome of the most distant light sources.

However, NASA’s Jet PropulsionLaboratory (JPL) has a data-mining tool,called Skicat (Sky Image Cataloging andAnalysis Tool), which can analyze digi-tized data from the observatory, recog-nize patterns, and make the properidentifications much more quickly andmuch less expensively.

This is just one of the ways that researchgroups, large businesses, governmentagencies, and other organizations are usingimproved mining technologies and tech-niques to discover meaningful patterns inhuge databases. The technology is alsolooking for geological patterns in earth-quake-prone areas, predicting bad creditrisks, and anticipating inventory demands.

And now, data mining has been refinedto the point where even people whoaren’t highly trained statisticians can usethis complex data-analysis tool.

INCREASED POPULARITYData mining’s increased popularity is

due partly to technological improvementsthat permit faster, more effective analy-ses of databases. Data-mining techniques,like the use of neural networks, havebecome more effective. A neural network

is a processing device, either an algorithmor hardware, whose design was moti-vated by the design of the human brain.Neural networks exhibit interconnectiv-ity and parallelism, “learn” from exam-ples, and are able to generalize.

Vendors have improved their products’precision by combining a variety of data-mining techniques. In the past, they generally used only one technique in their products. Angoss International (http://www.angoss.com) recently releaseda data-mining product for businesses,KnowledgeSeeker, that uses both tree-based models and neural networks. Tree-based models organize data intobranched systems that show how variouspieces of data relate to each other.

Powerful processors now let computers(including many desktop units) run com-plex mining algorithms and search largedatabases quickly. In addition, better graph-ics technology lets users see the results ofdata mining on easy-to-read graphs, charts,and so on. These two factors have madedata mining a valuable tool for users whoaren’t sophisticated statisticians.

Data-mining products are also justnow becoming Web-enabled. So, com-panies that, for example, access data-

bases through corporate intranets cannow use data-mining tools.

Because of these factors, the technol-ogy is beginning to be used in so manysettings that data-mining vendors arebeginning to splinter into niches based onindustry and function, said ErickBrethenoux, the Paris-based researchdirector for the Gartner Group, a marketresearch firm. For instance, SAS Institute(http://www.sas.com) is emerging as afavorite of statisticians, while HNCSoftware (http://www.hnc.com) is achiev-ing dominance among risk analysts.

A CAUTIONARY NOTEDespite data mining’s value, users

should realize that the technology pro-vides only a guide, not a gospel. Usersmust be wary of finding meaningless sta-tistical patterns that don’t indicate a causeand an effect or that don’t accurately pre-dict the future.

For example, David J. Leinweber, man-aging director of First Quadrant Corp.,an investment management firm, said heused data mining and found that, statis-tically speaking, the best predictor of theperformance of Standard & Poor’s 500Index of stocks is the price of butter inBangladesh. Of course, he said, the twohave no causal relationship, so the corre-lation is meaningless. This, he said, is anexample of the type of “stupid data-min-ing tricks” users must be careful about.

A nalysts expect vendors to beginreleasing data-mining tools for spe-cific applications. One emerging

area is text mining, which can analyze,for example, customer comments on sur-vey sheets. Brethenoux predicts that mul-timedia mining—which could, forexample, analyze patterns in photos—will emerge within five years.

Meanwhile, researchers are workingon ways to accelerate data mining bytrading accuracy for speed.

And in the not-too-distant future, datamining will become a common data-analy-sis tool on many desktops said HerbEdelstein, president of Two Crows Corp.,a data-mining consulting firm. This processhas already started, as even people whodon’t conduct complex statistical analysisare now beginning to use data mining. ❖

Joe Mullich is a freelance technologywriter based in Glendale, California.Contact him at [email protected].

Data Mining:Making Data

MeaningfulJoe Mullich

Editor: Lee Garber, Computer, 10662 LosVaqueros Circle, PO Box 3014, Los Alamitos,CA 90720-1314; [email protected]

JPL’s Skicat can identify distant lightsources more quickly

and inexpensively.

.