DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

Embed Size (px)

Citation preview

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    1/71

    a

    GAOUnited States General Accounting Office

    Report to the Ranking Minority Member,Subcommittee on Financial Management,the Budget, and International Security,Committee on Governmental Affairs,U.S. Senate

    May 2004 DATA MINING

    Federal Efforts Covera Wide Range of Uses

    GAO-04-548

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    2/71

    Federal agencies are using data mining for a variety of purposes, rangingfrom improving service or performance to analyzing and detecting terrorist

    patterns and activities. Our survey of 128 federal departments and agencieson their use of data mining shows that 52 agencies are using or are planningto use data mining. These departments and agencies reported 199 datamining efforts, of which 68 are planned and 131 are operational. The figurehere shows the most common uses of data mining efforts as described byagencies. Of these uses, the Department of Defense reported the largest

    number of efforts aimed at improving service or performance, managinghuman resources, and analyzing intelligence and detecting terroristactivities. The Department of Education reported the largest number ofefforts aimed at detecting fraud, waste, and abuse. The National Aeronauticand Space Administration reported the largest number of efforts aimed atanalyzing scientific and research information. For detecting criminalactivities or patterns, however, efforts are spread relatively evenly amongthe agencies that reported having such efforts.

    In addition, out of all 199 data mining efforts identified, 122 used personalinformation. For these efforts, the primary purposes were improving serviceor performance; detecting fraud, waste, and abuse; analyzing scientific andresearch information; managing human resources; detecting criminal

    activities or patterns; and analyzing intelligence and detecting terroristactivities.

    Agencies also identified efforts to mine data from the private sector and datafrom other federal agencies, both of which could include personalinformation. Of 54 efforts to mine data from the private sector (such ascredit reports or credit card transactions), 36 involve personal information.Of 77 efforts to mine data from other federal agencies, 46 involve personalinformation (including student loan application data, bank account numberscredit card information, and taxpayer identification numbers).

    Top Six Purposes of Data Mining Efforts in Departments and Agencies

    Both the government and theprivate sector are increasinglyusing data miningthat is, theapplication of database technologyand techniques (such as statisticalanalysis and modeling) to uncoverhidden patterns and subtlerelationships in data and to inferrules that allow for the prediction

    of future results. As has beenwidely reported, many federal datamining efforts involve the use ofpersonal information that is minedfrom databases maintained bypublic as well as private sectororganizations.

    GAO was asked to survey datamining systems and activities infederal agencies. Specifically, GAOwas asked to identify planned andoperational federal data miningefforts and describe their

    characteristics.

    www.gao.gov/cgi-bin/getrpt?GAO-04-548

    To view the full product, including the scopeand methodology, click on the link above.For more information, contact Linda Koontz at(202) 512-6240 or [email protected].

    Highlights of GAO-04-548, a report to theRanking Minority Member, Subcommitteeon Financial Management, the Budget,and International Security, Committee on

    Governmental Affairs, U.S. Senate

    May 2004

    DATA MINING

    Federal Efforts Cover a Wide Range ofUses

    http://www.gao.gov/cgi-bin/getrpt?GAO-04-548http://www.gao.gov/cgi-bin/getrpt?GAO-04-548http://www.gao.gov/cgi-bin/getrpt?GAO-04-548http://www.gao.gov/cgi-bin/getrpt?GAO-04-548http://www.gao.gov/cgi-bin/getrpt?GAO-04-548http://www.gao.gov/cgi-bin/getrpt?GAO-04-548
  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    3/71

    Page i GAO-04-548 Data Minin

    Contents

    LetterResults in BriefBackgroundAgencies Identified Numerous Data Mining Efforts with Various

    AimsSummary 1

    AppendixesAppendix I: Objective, Scope, and Methodology 1

    Appendix II: Surveyed Departments and Agencies 1

    Appendix III: Departments and Agencies Reporting No Data MiningEfforts 2

    Appendix IV: Inventories of Efforts 2

    Tables Table 1: Top Six Purposes of Data Mining Efforts in Departmentsand Agencies and Number of Efforts Reported

    Table 2: Department of Agricultures Inventory of Data MiningEfforts 2

    Table 3: Department of Commerces Inventory of Data Mining

    Efforts 2Table 4: Department of Defenses Inventory of Data Mining

    Efforts 2Table 5: Department of Educations Inventory of Data Mining

    Efforts 3Table 6: Department of Energys Inventory of Data Mining

    Efforts 4Table 7: Department of Health and Human Services Inventory of

    Data Mining Efforts 4Table 8: Department of Homeland Securitys Inventory of Data

    Mining Efforts 4Table 9: Department of the Interiors Inventory of Data Mining

    Efforts 4Table 10: Department of Justices Inventory of Data Mining

    Efforts 4Table 11: Department of Labors Inventory of Data Mining Efforts 4Table 12: Department of States Inventory of Data Mining Efforts 5Table 13: Department of Transportations Inventory of Data Mining

    Efforts 5

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    4/71

    Contents

    Page ii GAO-04-548 Data Minin

    Table 14: Department of the Treasurys Inventory of Data MiningEfforts 5

    Table 15: Department of Veterans Affairs Inventory of Data MiningEfforts 5

    Table 16: Environmental Protection Agencys Inventory of DataMining Efforts 5

    Table 17: Export-Import Bank of the United States Inventory of DataMining Efforts 5

    Table 18: Federal Deposit Insurance Corporations Inventory of DataMining Efforts 5

    Table 19: Federal Reserve Systems Inventory of Data Mining

    Efforts 5Table 20: National Aeronautics and Space Administrations

    Inventory of Data Mining Efforts 5Table 21: Nuclear Regulatory Commissions Inventory of Data

    Mining Efforts 6Table 22: Office of Personnel Managements Inventory of Data

    Mining Efforts 6Table 23: Pension Benefit Guaranty Corporations Inventory of Data

    Mining Efforts 6Table 24: Railroad Retirement Boards Inventory of Data Mining

    Efforts 6Table 25: Small Business Administrations Inventory of Data Mining

    Efforts 6

    Figures Figure 1: Top Six Purposes of Data Mining Efforts That InvolvePersonal Information 1

    Figure 2: Top Six Purposes of Data Mining Efforts That InvolvePrivate Sector Data 1

    Figure 3: Top Six Purposes of Data Mining Efforts That InvolveData from Other Federal Agencies 1

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    5/71

    Contents

    Page iii GAO-04-548 Data Minin

    Abbreviations

    CARDS Counterintelligence Analytical Research Data SystemCG Coast GuardCI-AIMS Counterintelligence Automated Investigative

    Management SystemDHHS Department of Health and Human ServicesDOD Department of DefenseDOE Department of EnergyDOT Department of Transportation

    EFTPS Electronic Federal Tax Payment SystemEOS Earth Observing SystemFARS Fatality Analysis Reporting SystemFDA Food and Drug AdministrationGENESIS Global Environmental and Earth Science Information

    SystemGSFC Goddard Space Federal CenterHR Human ResourcesHRSA Health Resources and Services AdministrationMATRIX Multistate Anti-terrorism Information Exchange SystemNASA National Aeronautics and Space AdministrationNVO National Virtual Observatory

    OIG Office of Inspector GeneralOLAP On-line Analytical ProcessingRSST Real Estate Stress TestSAA Spectral Analysis AutomationSAS Safety Automated SystemSMARTS Statistical Management Analysis and Reporting Tool

    SystemSWC Space Warfare CenterTIMS Technical Information Management SystemTOP Treasury Offset Program VA Veterans AffairsVHA Veterans Health Administration

    VISN Veterans Integrated Service Network

    This is a work of the U.S. government and is not subject to copyright protection in theUnited States. It may be reproduced and distributed in its entirety without furtherpermission from GAO. However, because this work may contain copyrighted images orother material, permission from the copyright holder may be necessary if you wish toreproduce this material separately.

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    6/71

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    7/71

    Page 2 GAO-04-548 Data Minin

    To address our objective to identify and describe operational and planneddata mining systems and activities in federal agencies, we surveyed chiefinformation officers or comparable officials at 128 federal departments andagencies to determine whether the agencies had operational and planneddata mining systems or activities.2 We then conducted telephone interviewswith the reported system managers to obtain information on thecharacteristics of the identified data mining efforts. To verify theinformation we received, we sent follow-up letters to agencies thatresponded as well as to those that did not respond, we asked responsibleofficials to verify the information, and we performed random assessmentsof the means that these officials used to verify the information.

    In addition, we conducted a search of technical literature and periodicalsto develop a comprehensive list of federal government data mining effortsand then compared these efforts with data mining efforts reported byfederal agencies. If the data mining efforts on our lists were not reported onthe survey, we contacted the appropriate chief information officers and,with their concurrence, added the efforts.

    We performed our work from May 2003 to April 2004 in accordance withgenerally accepted government auditing standards. Additional details onour scope and methodology are provided in appendix I.

    Results in Brief Federal agencies are using data mining for a variety of purposes, rangingfrom improving service or performance to analyzing and detecting terroristpatterns and activities. Our survey of 128 federal departments and agencieson their use of data mining shows that 52 agencies are using or are planningto use data mining. These departments and agencies reported 199 datamining efforts, of which 68 were planned and 131 were operational. Themost common uses of data mining efforts were described by agencies as

    improving service or performance;

    detecting fraud, waste, and abuse;

    analyzing scientific and research information;

    2That is, we asked about both systems explicitly dedicated to data mining and activitiesusing automated tools to mine databases that are part of other systems. In this report, weuse the word efforts to refer to both systems and activities, unless otherwise specified.

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    8/71

    Page 3 GAO-04-548 Data Minin

    managing human resources;

    detecting criminal activities or patterns; and

    analyzing intelligence and detecting terrorist activities.

    The Department of Defense reported having the largest number of datamining efforts aimed at improving service or performance and at managinghuman resources. Defense was also the most frequent user of efforts aimedat analyzing intelligence and detecting terrorist activities, followed by theDepartments of Homeland Security, Justice, and Education.

    The Department of Education reported the largest number of efforts aimedat detecting fraud, waste, and abuse, while the National Aeronautics andSpace Administration targets most of their data mining efforts (21 out of23) toward analyzing scientific and research information. Data miningefforts for detecting criminal activities or patterns, however, were spreadrelatively evenly among the reporting agencies.

    In addition, out of all 199 data mining efforts identified, 122 used personalinformation. For these efforts, the primary purposes were detecting fraud,waste, and abuse; detecting criminal activities or patterns; analyzingintelligence and detecting terrorist activities; and increasing taxcompliance.

    Agencies also identified efforts to mine data from the private sector anddata from other federal agencies, both of which could include personalinformation. Of 54 efforts to mine data from the private sector (such ascredit reports or credit card transactions), 36 involve personal informationOf 77 efforts to mine data from other federal agencies, 46 involve personalinformation (including student loan application data, bank accountnumbers, credit card information, and taxpayer identification numbers).

    Background Data mining enables corporations and government agencies to analyzemassive volumes of data quickly and relatively inexpensively. The use ofthis type of information retrieval has been driven by the exponentialgrowth in the volumes and availability of information collected by thepublic and private sectors, as well as by advances in computing and datastorage capabilities. In response to these trends, generic data mining toolsare increasingly available foror built intomajor commercial databaseapplications. Today, mining can be performed on many types of data,

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    9/71

    Page 4 GAO-04-548 Data Minin

    including those in structured, textual, spatial, Web, or multimedia forms.Data mining is becoming a big business; Forrester Research has estimatedthat the data mining market is passing the billion dollar mark.

    Although the use and sophistication of data mining have increased in boththe government and the private sector, data mining remains an ambiguousterm. According to some experts, data mining overlaps a wide range ofanalytical activities, including data profiling, data warehousing, onlineanalytical processing, and enterprise analytical applications.3 Some of theterms used to describe data mining or similar analytical activities includefactual data analysis and predictive analytics. We surveyed technical

    literature and developed a definition of data mining based on the mostcommonly used terms found in this literature. Based on this search, wedefine data mining as the application of database technology andtechniquessuch as statistical analysis and modelingto uncover hiddenpatterns and subtle relationships in data and to infer rules that allow for theprediction of future results. We used this definition in our initial survey ofchief information officers; these officials found the definition sufficient toidentify agency data mining efforts.

    Data mining has been used successfully for a number of years in the privateand public sectors in a broad range of applications. In the private sector,these applications include customer relationship management, marketresearch, retail and supply chain analysis, medical analysis and diagnosticsfinancial analysis, and fraud detection. In the government, data mining wasinitially used to detect financial fraud and abuse. For example, data mininghas been an integral part of GAO audits and investigations of federalgovernment purchase and credit card programs.4 Data mining and relatedtechnologies are also emerging as key tools in Department of HomelandSecurity initiatives.

    3Lou Agosta, Data Mining Is DeadLong Live Predictive Analytics! (Forrester Research,Oct. 30, 2003), http://www.forrester.com/Research/LegacyIT/0,7208,33030,00.html(downloaded Jan. 26, 2004).

    4For more information on the uses of data mining in GAO audits, see U.S. GeneralAccounting Office,Data Mining: Results and Challenges for Government Programs,Audits, and Investigations, GAO-03-591T (Washington, D.C: Mar. 25, 2003).

    http://www.forrester.com/Research/LegacyIT/0http://www.forrester.com/Research/LegacyIT/0,7208,33030,00.htmlhttp://www.gao.gov/cgi-bin/getrpt?GAO-03-591Thttp://www.gao.gov/cgi-bin/getrpt?GAO-03-591Thttp://www.gao.gov/cgi-bin/getrpt?GAO-03-591Thttp://www.forrester.com/Research/LegacyIT/0http://www.forrester.com/Research/LegacyIT/0,7208,33030,00.html
  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    10/71

    Page 5 GAO-04-548 Data Minin

    Data Mining Poses PrivacyChallenge

    Since the terrorist attacks of September 11, 2001, data mining has beenseen increasingly as a useful tool to help detect terrorist threats byimproving the collection and analysis of public and private sector data. In arecent report on information sharing and analysis to address the challengesof homeland security, it was noted that agencies at all levels of governmentare now interested in collecting and mining large amounts of data fromcommercial sources.5 The report noted that agencies may use such data noonly for investigations of known terrorists, but also to perform large-scaledata analysis and pattern discovery in order to discern potential terroristactivity by unknown individuals. Such use of data mining by federal

    agencies has raised public and congressional concerns regarding privacy.

    One example of a large-scale development effort launched in the wake ofthe September 11 attacks is the Multistate Anti-terrorism InformationExchange System, known as MATRIX. MATRIX, currently used in fivestates,6 provides the capability to store, analyze, and exchange sensitiveterrorism-related and other criminal intelligence data among agencieswithin a state, among states, and between state and federal agencies.Information in MATRIX databases includes criminal history records,drivers license data, vehicle registration records, incarceration records,and digitized photographs. Public awareness of MATRIX and of similarlarge-scale data mining or data mining-like projects has led to concerns

    about the governments use of data mining to conduct a massdataveillance7a surveillance of large groups of peopleto sift throughvast amounts of personally identifying data to find individuals who might fita terrorist profile.

    5Creating a Trusted Information Network for Homeland Security (New York City: TheMarkle Foundation, December 2003),

    http://www.markletaskforce.org/Report2_Full_Report.pdf(downloaded Mar. 8, 2004).6Five states are currently participating in the MATRIX pilot project: Connecticut, Florida,Michigan, Ohio, and Pennsylvania.

    7Roger Clarke, Information Technology and Dataveillance, Communications of the ACMvol. 31, issue 5 (New York City: ACM Press, May 1988),http://www.anu.edu.au/people/Roger.Clarke/DV/CACM88.html (downloaded Mar. 5, 2004).Clarke defines mass dataveillance as the systematic use of personal data systems in theinvestigation or monitoring of the actions or communications of groups of people.

    http://www.markletaskforce.org/Report2_Full_Report.pdfhttp://www.anu.edu.au/people/Roger.Clarke/DV/CACM88.htmlhttp://www.anu.edu.au/people/Roger.Clarke/DV/CACM88.htmlhttp://www.markletaskforce.org/Report2_Full_Report.pdfhttp://www.anu.edu.au/people/Roger.Clarke/DV/CACM88.htmlhttp://www.markletaskforce.org/Report2_Full_Report.pdf
  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    11/71

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    12/71

    Page 7 GAO-04-548 Data Minin

    Agencies IdentifiedNumerous Data MiningEfforts with VariousAims

    Of 128 federal departments and agencies surveyed for information on theirplanned and operational data mining efforts (listed in app. II), 52 agenciesreported 199 data mining efforts, and 69 agencies reported that they werenot engaged in data mining and were not planning such efforts (listed inapp. III). Of the 199 data mining efforts, 68 were planned and 131 wereoperational. Seven agencies did not respond to our survey.10 Appendix IVlists the 199 data mining efforts reported, along with key characteristics.

    Agencies described the most common purposes of data mining efforts as

    improving service or performance;

    detecting fraud, waste, and abuse;

    analyzing scientific and research information;

    managing human resources;

    detecting criminal activities or patterns; and

    analyzing intelligence and detecting terrorist activities.

    As shown in table 1, the Department of Defense reported the largestnumber of efforts aimed at improving service or performance (with 19 outof 65 reported efforts) and at managing human resources (with 14 out of 17efforts). Defense was also the most frequent user of efforts aimed atanalyzing intelligence and detecting terrorist activities, with 5 of 14 effortsfollowed by the Departments of Homeland Security and Justice, with 4 and3 efforts, respectively. The Department of Education has the largestnumber of efforts aimed at detecting fraud, waste, and abuse (9 out of 24efforts reported). The National Aeronautics and Space Administrationaccounts for 21 of the 23 identified efforts for analyzing scientific andresearch information. Efforts are spread relatively evenly among theagencies that reported using data mining efforts for detecting criminal

    10Agencies that did not respond to our survey are (1) the Central Intelligence Agency; (2) theCorporation for National and Community Services; (3) the Department of Army, Departmenof Defense; (4) the Equal Employment Opportunity Commission; (5) the National ParkService, Department of the Interior; (6) the National Security Agency, Department ofDefense; and (7) the Rural Utilities Service, Department of Agriculture.

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    13/71

    Page 8 GAO-04-548 Data Minin

    activities or patterns. Table 1 summarizes the top six uses of data miningefforts among the responding agencies.

    Table 1: Top Six Purposes of Data Mining Efforts in Departments and Agencies and Number of Efforts Reported

    Source: GAO analysis of agency-provided data.

    Department or agency

    Improvingservice or

    performance

    Detectingfraud, waste,

    and abuse

    Analyzingscientific and

    researchinformation

    Managinghuman

    resources

    Detectingcriminal

    activities orpatterns

    Analyzinintelligenc

    and detectinterroris

    activitie

    Department of Agriculture 8 1

    Department of Commerce

    Department of Defense 19 1 1 14 1

    Department of Education 6 9 3

    Department of Energy 3

    Department of Health and HumanServices 4 1

    Department of Homeland Security 5 2 2

    Department of the Interior 1

    Department of Justice 1 1 3

    Department of Labor 3 1

    Department of State 2

    Department of Transportation 1

    Department of the Treasury 4 1 2

    Department of Veterans Affairs 5 5 1

    Environmental Protection Agency 1

    Export-Import Bank of the UnitedStates 1

    Federal Deposit Insurance Corporation 1

    Federal Reserve System 1

    National Aeronautics and SpaceAdministration 1 1 21

    Nuclear Regulatory Commission 1Office of Personnel Management 1

    Pension Benefit Guaranty Corporation 2

    Railroad Retirement Board 1

    Small Business Administration 1

    Total 65 24 23 17 15 1

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    14/71

    Page 9 GAO-04-548 Data Minin

    Some data mining purposes focus on human activities and therefore areinherently likely to involve personal information; examples of thesepurposes are detecting fraud, waste, and abuse; detecting criminalactivities or patterns; managing human resources; and analyzingintelligence. The following are examples of data mining efforts for each ofthese purposes:

    Detecting fraud, waste, and abuse. The Veterans BenefitsAdministrations C & P Payment Data Analysis effort mines veteranscompensation and pension data for evidence of fraud.

    Detecting criminal activities or patterns. The Department ofEducations Title IV Identity Theft Initiative effort focuses on identitytheft cases involving education loans.

    Managing human resources. The U.S. Air Forces Oracle HR (HumanResources) uses data mining to provide information on promotions, paygrades, clearances, and other information relevant to human resourcesplanning.

    Analyzing intelligence and detecting terrorist activities. The DefenseIntelligence Agencys Verity K2 Enterprise mines data from theintelligence community and Internet sources to identify foreignterrorists or U.S. citizens connected to foreign terrorism activities.

    On the other hand, other categories of efforts do not necessarily focus onhuman activities or involve personal information, such as many of theefforts aimed at analyzing scientific and research information. The NationaAeronautics and Space Administration, for example, mines large, complexearth science data sets to find patterns and relationships to detect hiddenevents (the system is called Machine Learning and Data Mining forImproved Data Understanding of High Dimensional Earth Sensed Data).

    Similarly, many efforts aimed at improving service or performance (the

    most frequently cited purpose of data mining efforts) do not involvepersonal information. For example, the Department of the Navys SupplyManagement System Multidimensional Cubes system includes a datawarehouse containing data on every ship part that has been ordered sincethe 1980s, with multidimensional information on each part. The Navy usesdata mining to calculate failure rates and identify needed improvements;according to the Navy, this system reduces downtime on ships byimproving parts replacement.

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    15/71

    Page 10 GAO-04-548 Data Minin

    However, some efforts aimed at improving service or performance doinvolve personal information. For example, the Veterans AdministrationsVISN (Veterans Integrated Service Network) 16 Data Warehouse is minedfor a variety of information, including patient visits, laboratory tests, andpharmacy records, to provide management with health care systemperformance information.

    Overall, 122 of the 199 data mining efforts involve personal information.Figure 1 shows the top six purposes of these efforts, as well as theirdistribution.

    Figure 1: Top Six Purposes of Data Mining Efforts That Involve Personal Informatio

    Of the 199 data mining efforts, 54 use or plan to use data from the privatesector. Of these, 36 involve personal information. The personal informationfrom the private sector included credit reports and credit card transactionrecords. Figure 2 shows the distribution of the top six purposes of the 54

    efforts involving data from the private sector.

    0 10 20 30 40

    Purposes

    Source: GAO analysis of agency data.

    Number of data mining efforts

    Managing human resources

    Analyzing intelligence and detectingterrorist activities

    Increasing tax compliance

    Detecting criminal activities orpatterns

    Improving service or performance

    Detecting fraud, waste, and abuse

    7

    10

    15

    15

    24

    33

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    16/71

    Page 11 GAO-04-548 Data Minin

    Figure 2: Top Six Purposes of Data Mining Efforts That Involve Private Sector Data

    Of the 199 data mining efforts, 77 efforts use or plan to use data from otherfederal agencies. Of the 77 efforts, 46 involve personal information. Thepersonal information from other federal agencies included student loanapplication data, bank account numbers, credit card information, andtaxpayer identification numbers. Figure 3 shows the top six uses for the 77efforts involving data from other federal agencies and their distribution.

    0 10 20 30 40

    Purposes

    Source: GAO analysis of agency data.

    Number of data mining efforts

    Improving safety

    Detecting criminal activities or

    patterns

    Analyzing intelligence and

    detecting terrorist activities

    Analyzing scientific and research

    information

    Detecting fraud, waste, and abuse

    Improving service or performance

    4

    4

    5

    8

    9

    14

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    17/71

    Page 12 GAO-04-548 Data Minin

    Figure 3: Top Six Purposes of Data Mining Efforts That Involve Data from Other

    Federal Agencies

    Summary Driven by advances in computing and data storage capabilities and bygrowth in the volumes and availability of information collected by thepublic and private sectors, data mining enables government agencies toanalyze massive volumes of data. Our survey shows that data mining isincreasingly being used by government for a variety of purposes, rangingfrom improving service or performance to analyzing and detecting terroristpatterns and activities.

    Although this survey provides a broad overview of the emerging uses ofdata mining in the federal government, more work is needed to shed lighton the privacy implications of these efforts. In future work, we plan toexamine selected federal data mining efforts and their implications.

    As agreed with your office, unless you publicly announce the contents ofthe report earlier, we plan no further distribution until 30 days from thereport date. At that time, we will send copies of this report to the Chairmenand Ranking Minority Members of the House Committee on GovernmentReform; Subcommittee on Civil Service and Agency Organization, HouseCommittee on Government Reform; Select Committee on HomelandSecurity, House of Representatives; Senate Committee on Governmental

    0 10 20 30 40

    Purposes

    Source: GAO analysis of agency data.

    Number of data mining efforts

    Detecting fraud, waste, and abuse

    Analyzing scientific and researchinformation

    Analyzing intelligence and detecting

    terrorist activities

    Detecting criminal activities or patterns

    Managing human resources

    Improving service or performance

    5

    6

    7

    12

    13

    20

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    18/71

    Page 13 GAO-04-548 Data Minin

    Affairs; and the Subcommittee on Oversight of Government Management,the Federal Workforce and the District of Columbia, Senate Committee onGovernmental Affairs. We will also make copies available to others onrequest. In addition, this report will be available at no charge on the GAOWeb site at http://www.gao.gov.

    If you have any questions concerning this report, please call me at (202)512-6240 or Mirko J. Dolak, Assistant Director, at (202) 512-6362. We canalso be reached by e-mail at [email protected] and [email protected],respectively. Key contributors to this report were Camille M. Chaires,Barbara S. Collier, Orlando O. Copeland, Nancy E. Glover, Stuart M.

    Kaufman, Lori D. Martinez, Morgan F. Walts, and Marcia C. Washington.

    Sincerely yours,

    Linda D. KoontzDirector, Information Management Issues

    http://www.gao.gov./mailto:[email protected]:[email protected]:[email protected]:[email protected]://www.gao.gov./mailto:[email protected]:[email protected]:[email protected]:[email protected]
  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    19/71

    Page 14 GAO-04-548 Data Minin

    Appendix I

    Objective, Scope, and Methodology

    Our objective was to identify and describe planned and operational federadata mining efforts. As a first step in addressing this objective, wedeveloped a definition of data mining. Because this expression has arange of meanings, we surveyed the technical literature to develop adefinition based on the most commonly used terms found in this literatureWe defined data mining as the application of database technology andtechniquessuch as statistical analysis and modelingto uncover hiddenpatterns and subtle relationships in data and to infer rules that allow for theprediction of future results. In our initial survey of chief informationofficers, these officials found the definition sufficient to identify agencydata mining efforts.

    We then surveyed chief information officers or comparable officials at 128federal departments and agencies (see app. II) and asked them to identifywhether their agency had operational and planned data mining efforts. Weachieved a 95 percent response rate. Of the 121 agencies that responded, 69reported that they did not have any data mining efforts (see app. III). Wefollowed up with these 69 agencies and gave them another opportunity toreport data mining efforts.

    To obtain information on the characteristics of the identified operational orplanned data mining efforts, we conducted structured telephoneinterviews1 with the identified system owners or activity managers. Theinterviews were designed to obtain detailed information about each datamining system, including the purpose and size, the use of personalinformation, and the use of data from the private sector or other federalorganizations. We pretested the structured interview to ensure relevanceand clarity.

    We aggregated these data by agency and sent them back to the chiefinformation officer, comparable official, or their designee and asked thatthey review the characteristics for completeness and accuracy. One of the52 departments and agencies that reported data mining systemstheDepartment of Homeland Securityhas not responded to our request to

    review the reported data for completeness and accuracy.

    1In a structured interview, the interviewer asks the same questions of numerous individualsor individuals representing numerous organizations in a precise manner, offering eachinterviewee the same set of possible responses.

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    20/71

    Appendix IObjective, Scope, and Methodology

    Page 15 GAO-04-548 Data Minin

    We performed random assessments of the means that these officials usedto verify the information. Based on these assessments, we concluded thatthe agencies verification methods were reasonable and that as a result, wecould rely on the accuracy of the reported data. We also conducted asearch of technical literature and periodicals to develop a list of federalgovernment data mining efforts and then compared the efforts on this listwith the data mining efforts reported by federal agencies. If the data miningefforts on our list were not reported on the survey, we contacted the chiefinformation officer or comparable official to determine whether that datamining effort should be included in our survey.

    Because this was not a sample survey, there are no sampling errors.However, the practical difficulties of conducting any survey may introduceerrors, commonly referred to as nonsampling errors. For example,difficulties in how a particular question is interpreted, in the sources ofinformation that are available to respondents, or in how the data areentered into a database or were analyzed can introduce unwantedvariability into the survey results. We took steps in the development of thestructured interview, the data collection, and the data analysis to minimizethese nonsampling errors. Among these steps, we pretested the structuredinterview instrument, contacted nonresponding agencies as well asagencies not identifying data mining efforts, and sent the aggregated datato the agency chief information officer for review.

    We conducted our work from May 2003 to April 2004 in accordance withgenerally accepted government auditing standards.

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    21/71

    Page 16 GAO-04-548 Data Minin

    Appendix II

    Surveyed Departments and Agencies

    Department of Agriculture

    Agricultural Marketing Service

    Agricultural Research Service

    Animal and Plant Health Inspection Service

    Cooperative State Research, Education, and Extension Service

    Farm Service Agency

    Food and Nutrition Service

    Food Safety and Inspection Service

    Foreign Agricultural Service

    Forest Service

    National Agricultural Statistics Service

    Natural Resources Conservation Service

    Risk Management Agency

    Rural Utilities Service

    Department of Commerce

    Bureau of the Census

    Economic Development Administration

    International Trade Administration

    National Oceanic and Atmospheric Administration

    U.S. Patent and Trademark Office

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    22/71

    Appendix IISurveyed Departments and Agencies

    Page 17 GAO-04-548 Data Minin

    Department of Defense

    Missile Defense Agency

    Defense Advanced Research Projects Agency

    Defense Commissary Agency

    Defense Contract Audit Agency

    Defense Contract Management Agency

    Defense Information Systems Agency

    Defense Intelligence Agency

    Defense Legal Services Agency

    Defense Logistics Agency

    Defense Security Cooperation Agency

    Defense Security Service

    Defense Threat Reduction Agency

    Department of the Air Force

    Department of the Army

    Department of the Navy

    National Geospatial-Intelligence Agency

    National Security Agency

    U.S. Marine Corps

    Department of Education

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    23/71

    Appendix IISurveyed Departments and Agencies

    Page 18 GAO-04-548 Data Minin

    Department of Energy

    Bonneville Power Administration

    Southeastern Power Administration

    Southwestern Power Administration

    Western Area Power Administration

    Department of Health and Human Services

    Administration for Children and Families

    Agency for Healthcare Research and Quality

    Centers for Disease Control and Prevention

    Centers for Medicare and Medicaid Services

    Food and Drug Administration

    Health Resources and Services Administration

    Indian Health Service

    National Institutes of Health

    Program Support Center

    Department of Homeland Security

    Border and Transportation Security Directorate

    Bureau of Citizenship and Immigration Services

    Emergency Preparedness and Response Directorate

    Information Analysis and Infrastructure Protection Directorate

    Management Directorate

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    24/71

    Appendix IISurveyed Departments and Agencies

    Page 19 GAO-04-548 Data Minin

    Science and Technology Directorate

    U.S. Coast Guard

    U.S. Secret Service

    Department of Housing and Urban Development

    Department of the Interior

    Bureau of Indian Affairs

    Bureau of Land Management

    Bureau of Reclamation

    Minerals Management Service

    National Park Service

    Office of Surface Mining Reclamation and Enforcement

    U.S. Fish and Wildlife Service

    U.S. Geological Survey

    Department of Justice

    Bureau of Alcohol, Tobacco, Firearms, and Explosives

    Drug Enforcement Administration

    Federal Bureau of Investigation

    Federal Bureau of Prisons

    U.S. Marshals Service

    Department of Labor

    Department of State

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    25/71

    Appendix IISurveyed Departments and Agencies

    Page 20 GAO-04-548 Data Minin

    Department of Transportation

    Federal Aviation Administration

    Federal Highway Administration

    Federal Motor Carrier Safety Administration

    Federal Railroad Administration

    Federal Transit Administration

    National Highway Traffic Safety Administration

    Department of the Treasury

    Bureau of Engraving and Printing

    Bureau of the Public Debt

    Financial Management Service

    Internal Revenue Service

    Office of the Comptroller of the Currency

    Office of Thrift Supervision

    U.S. Mint

    Department of Veterans Affairs

    Veterans Benefits Administration

    Veterans Health Administration

    Agency for International Development

    Central Intelligence Agency

    Corporation for National and Community Service

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    26/71

    Appendix IISurveyed Departments and Agencies

    Page 21 GAO-04-548 Data Minin

    Environmental Protection Agency

    Equal Employment Opportunity Commission

    Executive Office of the President

    Export-Import Bank of the United States

    Federal Deposit Insurance Corporation

    Federal Energy Regulatory Commission

    Federal Reserve System

    Federal Retirement Thrift Investment Board

    General Services Administration

    Legal Services Corporation

    National Aeronautics and Space Administration

    National Credit Union Administration

    National Labor Relations Board

    National Science Foundation

    Nuclear Regulatory Commission

    Office of Management and Budget

    Office of Personnel Management

    Peace Corps

    Pension Benefit Guaranty Corporation

    Railroad Retirement Board

    Securities and Exchange Commission

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    27/71

    Appendix IISurveyed Departments and Agencies

    Page 22 GAO-04-548 Data Minin

    Small Business Administration

    Smithsonian Institution

    Social Security Administration

    U.S. Postal Service

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    28/71

    Page 23 GAO-04-548 Data Minin

    Appendix III

    Departments and Agencies Reporting No DataMining Efforts

    The following 69 departments and agencies reported that they have nooperational or planned data mining efforts:

    Department of Agriculture

    Agricultural Marketing Service

    Agricultural Research Service

    Animal and Plant Health Inspection Service

    Cooperative State Research, Education, and Extension Service

    Farm Service Agency

    Foreign Agricultural Service

    Forest Service

    National Agricultural Statistics Service

    Food Safety and Inspection Service

    Department of Commerce

    Economic Development Administration

    Bureau of the Census

    International Trade Administration

    Department of Commerce Headquarters

    National Oceanic and Atmospheric Administration

    Department of Defense

    Defense Contract Audit Agency

    Missile Defense Agency

    Defense Legal Services Agency

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    29/71

    Appendix IIIDepartments and Agencies Reporting NoData Mining Efforts

    Page 24 GAO-04-548 Data Minin

    Defense Security Service

    Defense Threat Reduction Agency

    Defense Logistics Agency

    Defense Advanced Research Projects Agency

    Defense Contract Management Agency

    Defense Security Cooperation Agency

    Department of Energy

    Bonneville Power Administration

    Southeastern Power Administration

    Southwestern Power Administration

    Western Area Power Administration

    Department of Health and Human Services

    Centers for Medicare and Medicaid Services

    Administration for Children and Families

    National Institutes of Health

    Indian Health Service

    Department of Homeland Security

    Science and Technology Directorate

    Management Directorate

    Bureau of Citizenship and Immigration Services

    Department of Homeland Security Headquarters

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    30/71

    Appendix IIIDepartments and Agencies Reporting NoData Mining Efforts

    Page 25 GAO-04-548 Data Minin

    Department of Housing and Urban Development

    Department of the Interior

    Bureau of Reclamation

    Bureau of Land Management

    U.S. Geological Survey

    Fish and Wildlife Service

    Office of Surface Mining Reclamation and Enforcement

    Bureau of Indian Affairs

    Department of the Interior Headquarters

    Department of Justice

    Bureau of Alcohol, Tobacco, Firearms, and Explosives

    Department of Transportation

    Federal Aviation Administration

    Federal Transit Administration

    Federal Railroad Administration

    Federal Motor Carrier Safety Administration

    Federal Highway Administration

    Department of the Treasury

    Comptroller of the Currency

    Bureau of the Public Debt

    Office of Thrift Supervision

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    31/71

    Appendix IIIDepartments and Agencies Reporting NoData Mining Efforts

    Page 26 GAO-04-548 Data Minin

    Department of the Treasury Headquarters

    Bureau of Engraving and Printing

    Agency for International Development

    Executive Office of the President

    Federal Energy Regulatory Commission

    Federal Retirement Thrift Investment Board

    General Services Administration

    Legal Services Corporation

    National Credit Union Administration

    National Labor Relations Board

    National Science Foundation

    Office of Management and Budget

    Peace Corps

    Security and Exchange Commission

    Smithsonian Institution

    Social Security Administration

    U.S. Postal service

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    32/71

    Page 27 GAO-04-548 Data Minin

    Appendix IV

    Inventories of Efforts

    The following tables present selected information from our survey of 128major federal departments and agencies on their use of data mining. Thetables list the purpose of each data mining effort, whether the system isplanned or operational, and whether the system uses personal informationdata from the private sector, or data from other federal agencies. Thesurvey shows that 52 departments and agencies are using or are planning touse data mining. These departments and agencies reported 199 data miningefforts, of which 68 were planned and 131 were operational.

    Table 2: Department of Agricultures Inventory of Data Mining Efforts

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesector data

    Otheragencydata

    Department of Agriculture Headquarters

    Travel Data Mart Will consolidate employee travelinformation from financial andtravel systems. Will allow for agovernmentwide e-travel systemand provide the department withinformation on the financial

    ramifications of its travel.

    Improvingservice orperformance

    Planned Yes No No

    Financial StatementsData Warehouse

    Is used in the production ofconsolidated financial statements.Provides information for productsthat are used to satisfy externalreporting requirements, such asOffice of Management and Budgetand Department of the Treasuryrequirements.

    Financialmanagement

    Operational No No No

    Financial DataWarehouse

    Is the departments internalfinancial management reportingsystem. Data mining is done for adhoc and on-demand reports.

    Financialmanagement

    Operational Yes No No

    Food and Nutrition Service

    Grantee MonitoringActivitiesSoutheastRegional Office

    Assists in monitoring the financialstatus of grant holders. Granteesare required to provideexpenditure reports, and analysisis performed quarterly thatmatches stated draws to theactual draws from the U.S.Treasury.

    Improvingservice orperformance

    Operational Yes No No

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    33/71

    Appendix IVInventories of Efforts

    Page 28 GAO-04-548 Data Minin

    Source: Department of Agriculture.

    Grantee MonitoringActivitiesMountainPlains RegionalOffice

    Assists in monitoring themanagement and distribution ofIndian funds for major food benefitprograms, such as food stamps, in10 grantee states.

    Improvingservice orperformance

    Operational Yes No No

    Grantee MonitoringActivities

    Southwest RegionalOffice

    Maximizes on-site monitoringefforts by confirming the accuracy

    of grantee accounting. Reduceson-site time, maximizes time tocomplete reviews, and hasachieved a 50 percent travelsavings.

    Improvingservice or

    performance

    Operational Yes No No

    Grantee MonitoringActivitiesMidwestRegional Office

    Will be a reporting system toprovide reports and automate theaudit process. Plans are toacquire data mining tools to reviewand compare budgets, reports,and plans.

    Improvingservice orperformance

    Planned No No Yes

    Grantee MonitoringActivitiesNortheastRegional Office

    Supports on-site reviews ofanalyses to confirm financialreport information.

    Improvingservice orperformance

    Operational Yes Yes No

    Integrated ProgramAccounting SystemData Integrity

    Will create ad-hoc reportingcenters to validate accountinginformation.

    Improvingservice orperformance

    Planned No No No

    Natural Resources Conservation Service

    National ResourceInventory Used forStatistical Analysis ofPast Soil SurveyDatabases.

    Is a trending database that tracksmore than 200 resource issuessuch as monitoring erosion. Alsoprocesses statistical technology.

    Improvingservice orperformance

    Operational No No No

    Risk Management Agency

    CAE Is part of a congressionallymandated project to assist theRisk Management Agency incontrolling fraud, waste, and

    abuse in the Federal CropInsurance Corporation program.

    Detectingfraud, waste,and abuse

    Operational Yes Yes Yes

    (Continued From Previous Page)

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesector data

    Otheragencydata

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    34/71

    Appendix IVInventories of Efforts

    Page 29 GAO-04-548 Data Minin

    Table 3: Department of Commerces Inventory of Data Mining Efforts

    Source: Department of Commerce.

    Table 4: Department of Defenses Inventory of Data Mining Efforts

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesector data

    Otheragencydata

    U.S. Patent and Trademark Office

    CompensationProjection Model inthe Enterprise DataWarehouse

    Generates and makes availablecompensation projection data,both salary and benefits, oncurrent employees and onplanned hires. It also accounts forplanned attritions.

    Managinghumanresources

    Operational Yes No Yes

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesectordata

    Otheragency data

    Defense Commissary Agency

    DeCA ElectronicRecordsManagement andArchive System

    Will be a corporate informationsystem for managing unstructureddata. It will allow for electronicrecord keeping, documentmanagement, and automatedreceipt processes.

    Improvingservice orperformance

    Planned Yes Yes Yes

    Corporate DecisionSupport System/CommissaryOperationsManagement System

    Mines data to produce analyticaldata on commissary operations.Provides information such as whatitems stores are selling and helpsdetermine whether cashiers arebeing honest.

    Improvingservice orperformance

    Operational No No No

    Defense Information Systems Agency

    Enterprise BusinessIntelligence System

    Will replace the currentmanagement informationenvironment, which includesoperations, reporting, billing,statistics, and other managementinformation activities.

    Improvingservice orperformance

    Planned No No No

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    35/71

    Appendix IVInventories of Efforts

    Page 30 GAO-04-548 Data Minin

    Defense Intelligence Agency

    Insight SmartDiscovery

    Will be a data mining knowledgediscovery tool to work againstunstructured text. Will categorizenouns (names, locations, events)and present information in images.

    Analyzingintelligenceand detectingterroristactivities

    Planned Yes No Yes

    Verity K2 Enterprise Mines data from the intelligence

    community and Internet searchesto identify foreign terrorists or U.S.citizens connected to foreignterrorism activities.

    Analyzing

    intelligenceand detectingterroristactivities

    Operational Yes Yes Yes

    PATHFINDER Is a data mining tool developed foranalysts that provides the ability toanalyze government and privatesector databases rapidly. It cancompare and search multiplelarge databases quickly.

    Analyzingintelligenceand detectingterroristactivities

    Operational Yes No Yes

    Autonomy Is a large search engine tool thatis used to search hundreds ofthousands of word documents. Isused for the organization and

    knowledge discovery ofintelligence.

    Analyzingintelligenceand detectingterrorist

    activities

    Operational No No Yes

    Department of the Air Force

    ANG DataWarehouseGuardian

    Will be used to measure militaryreadiness. It incorporatesinformation on all disciplines toprovide management informationneeded to assess militaryreadiness.

    Measuringmilitaryreadiness

    Planned Yes No No

    Integrated SpaceWarfare Center(SWC) InformationSystem

    Will be an internal databasecontaining information on alldevelopment/execution activitieswithin the SWC. Will be used by allmanagement and analystpersonnel to track and align thecenters activities to warfighterneeds, report on execution status,financial status, schedule status,and performance measurements.

    Improvingservice orperformance

    Planned Yes No No

    Safety AutomatedSystem (SAS)

    Will query databases to findautomation mishaps. Governed byDirective 920124 and will allow forthe investigation and reporting ofidentified automation mishaps.

    Improvingsafety

    Planned Yes No No

    (Continued From Previous Page)

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesectordata

    Otheragency data

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    36/71

    Appendix IVInventories of Efforts

    Page 31 GAO-04-548 Data Minin

    Enterprise BusinessSystem

    Will support strategic planning,assist in building scientific andtechnical budgets for the AirForce, and serve as a launch pointfor all new programs. Researchand development case files will bemaintained for 75 years; theactivity indexes, catalogs, and

    tracks these files.

    Improvingservice orperformance

    Planned No No Yes

    Genomic andProteomic ResultsAnalysis

    Analyzes National Institutes ofHealths genetic data.

    Analyzingscientific andresearchinformation

    Operational No No Yes

    IG CorporateInformation System

    Enhances combat readiness andmission capabilities for Air CombatCommand units and commanders.It assists in preparing for andconducting inspections.

    Improvingservice orperformance

    Operational Yes No No

    Computer NetworkDefense System

    Evaluates network activities tocreate rules for intrusion detectionsystem signature sets.

    Improvinginformationsecurity

    Operational No No No

    FAME Will serve as a central repositoryfor Air Force manpowerinformation. Will track manpowerand unit authorization funding.

    Managinghumanresources

    Planned No No Yes

    Resource Wizard Serves as a manpower trackingsystem. Tracks positions andcaptures data for specific fundingpurposes.

    Improvingservice orperformance

    Operational No No No

    GovernmentPurchase Card

    Is used in overseeing purchasesmade by Air Force personnel withgovernment-provided credit cards.

    Detectingfraud, waste,and abuse

    Operational Yes Yes No

    Ambulatory DataSystem Queries

    Tracks the initial diagnosis ofpatients with the results of furthertesting and diagnosis. Allows forearly notification of diseases andinjuries.

    Monitoringpublic health

    Operational Yes No No

    Modus OperandiDatabase

    Is an investigative tool used toidentify and track trends incriminal behavior. It linkscharacteristics of crimes andprovides details on crime scenesand other crime factors.

    Detectingcriminalactivities orpatterns

    Operational Yes No No

    (Continued From Previous Page)

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesectordata

    Otheragency data

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    37/71

    Appendix IVInventories of Efforts

    Page 32 GAO-04-548 Data Minin

    Executive DecisionSupport System

    Takes data from all functionalmetric balances. Processes chartsand graphs to identify trends andto make sure goals areaccomplished.

    Improvingservice orperformance

    Operational No No No

    Inspire Is a tool that assists in providing anarrative description of all

    research and development that isbeing conducted within the AirForce. Provides cost andmilestone information on researchand development projects.

    Performingstrategic

    planning

    Operational Yes No Yes

    Discoverer Is used to manage personnelrecords, including individualaliases and histories.

    Managinghumanresources

    Operational Yes No No

    Requirements andConcepts System

    Will serve as a repository for newsystem projects and systemrequirements. It will be availablefor consultation for information onall project requests and identifiedrequirements.

    Improvingservice orperformance

    Planned No No No

    Business Objects Is a commercial off-the-shelf toolthat is used to analyze and reporton human resources activities.

    Managinghumanresources

    Operational Yes No Yes

    THRMIS Uses commercial off-the-shelfsoftware to maintain a datawarehouse of integrated inventoryand manpower data for the TotalForce: active duty (officer andenlisted), Air Force Reserve, AirNational Guard, and civilians. Isused to assess and analyze thehealth of the Air Force.

    Managinghumanresources

    Operational Yes No No

    SAS Is a Web-enabled personnel datasystem that gives authorizedusers worldwide the ability to

    tabulate demographic data onrecruitment, promotion, andretention.

    Managinghumanresources

    Operational Yes No No

    Oracle HR Is a personnel managementsystem that manages informationfor promotions, pay grades,clearances, and other informationrelevant to human resources.

    Managinghumanresources

    Operational Yes No No

    (Continued From Previous Page)

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesectordata

    Otheragency data

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    38/71

    Appendix IVInventories of Efforts

    Page 33 GAO-04-548 Data Minin

    Health Modeling andInformatics DivisionData Mart

    Provides information and decisionsupport to the Air Forceheadquarters surgeon general fordecision making, policydevelopment, and resourceallocation. It also providesperformance information andanalysis to medical field units in

    support of performancemeasurement objectives.

    Improvingservice orperformance

    Operational Yes No No

    FIRST EDV (BRIO) Will deal with Air Force budgetsand other components of itsfinancial environment. Historicalanalyses and trend analyses willbe performed on the budgetprocess.

    Improvingservice orperformance

    Planned No Yes No

    IG World Is used to store and track data andrequirements, such as lodging andaugmentee requirements, for thePAC inspector general.

    Improvingservice orperformance

    Operational Yes No No

    Department of Defense Headquarters

    AutomatedContinuingEvaluation System

    Will be used to improve personnelsecurity continuing evaluationefforts within Department ofDefense (DOD) by identifyingissues of security concernbetween the normalreinvestigation cycle for those whohold DOD security clearances andhave signed a consent form that isstill in effect.

    Managinghumanresources

    Operational Yes Yes Yes

    Department of the Navy

    Human ResourceTrend Analysis

    Is used to improve Navyreadiness. Data on personnelmanning levels are mined toensure that each Navy unit has

    the correct number of trainingpersonnel aboard.

    Managinghumanresources

    Operational No No No

    (Continued From Previous Page)

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesectordata

    Otheragency data

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    39/71

    Appendix IVInventories of Efforts

    Page 34 GAO-04-548 Data Minin

    U.S. Naval Academy Allows for the assessment ofacademic performance ofmidshipmen. It includesdemographic information,information on grades,participation in sports, leadershippositions, etc. It is an extension ofthe registrars system and is

    mined for comparisons andtrends.

    Managinghumanresources

    Operational Yes No No

    Navy Training MasterPlanning System

    Provides overall Navy traininginformation to assist in deliveringNavy training in the most efficientmanner. Pertinent data frommultiple databases areconsolidated into a singledatabase that is mined.

    Managinghumanresources

    Operational Yes Yes No

    DHAMSMultidimensionalCubes

    Is a database that containsinformation on the time andattendance of 3,000 marinersacross 120 ships. Allowsmanagers to look at what people

    were doing at a particular time andto look across the fleet as a wholeand compare ship activities.

    Improvingservice orperformance

    Operational No No No

    National CargoTracking Plan CargoTracking Division

    Is used to conduct predictiveanalysis for counterterrorism,small weapons of massdestruction proliferation, narcotics,alien smuggling, and other high-interest activities involvingcontainer shipping activity.

    Analyzingintelligenceand detectingterroristactivities

    Operational No Yes No

    Supply ManagementSystemMultidimensionalCubes

    Reduces downtime on ships byallowing for the analysis of shipparts information. The datawarehouse contains data on everypart that has been ordered since

    the 1980s, and hasmultidimensional information oneach part. Failure rates can becalculated and improvements canbe identified.

    Improvingservice orperformance

    Operational No No No

    (Continued From Previous Page)

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesectordata

    Otheragency data

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    40/71

    Appendix IVInventories of Efforts

    Page 35 GAO-04-548 Data Minin

    Type CommandersReadinessManagement System

    Is designed to provide a fullyintegrated environment for onlineanalytical processing of readinessindicators. Examples of readinessindicators include status ofsupplies available, equipment inoperation, health status, andcapabilities of the crew.

    Measuringmilitaryreadiness

    Operational No No Yes

    FATHOM (APMCHuman Resources)

    Will be an internal program andproject tool used to improvestaffing, recruiting, and managingday-to-day operations.

    Managinghumanresources

    Planned Yes No No

    Navy Training QuotaManagement System

    Is used for planning andforecasting training needs basedon skill requirements.

    Improvingservice orperformance

    Operational No No Yes

    National Geospatial-Intelligence Agency

    OLAP (On-LineAnalyticalProcessing)

    Will provide aggregations ofimagery system performance datafor management officers andsenior source decision makers tocharacterize system performance

    and contribution to intelligenceissues of national priority.

    Improvingservice orperformance

    Planned No No No

    CITO Data Mining Will evaluate and identify imagerysystem performance trends foroptimization, monitoring, orreengineering.

    Improvingservice orperformance

    Planned No No No

    InformationRelevance Prototype

    Will establish an informationrelevancy prototype to serve as aframework for communityevaluation of commercialinformation relevanceapproaches, methods, andtechnology. The term informationrelevance refers to the ability ofusers to receive or extract, then

    display and describe, informationwith measurable satisfactionaccording to their need.

    Improvingservice orperformance

    Planned No No No

    U.S. Marine Corps

    Operational DataStore Enterprise

    Is used for workforce planning. Managinghumanresources

    Operational Yes No No

    (Continued From Previous Page)

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesectordata

    Otheragency data

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    41/71

    Appendix IVInventories of Efforts

    Page 36 GAO-04-548 Data Minin

    Source: Department of Defense.

    Global CombatSupport SystemsMarine Corps

    Will be a physical implementationof the IT enterprise architecturedesigned to support bothimproved and enhanced marineair/ground task force combatservice support functions andcommander and combatantcommander joint task force

    combatant support informationrequirements. Data mining willallow for interoperability withlegacy Marine Corps systems andallow for a shared dataenvironment.

    Improvingservice orperformance

    Planned No Yes No

    Total Force DataWarehouse

    Is a system whose primarypurpose is workforce planning andworkforce policy decision making.It contains current (after 30 days)and historical workforce data.

    Managinghumanresources

    Operational Yes No No

    Marine CorpsRecruitingInformation Support

    System

    Is a Web-based informationsystem used for managing assetsand tracking enlisted and officer

    accessions into the Marine Corps.

    Managinghumanresources

    Operational Yes No No

    (Continued From Previous Page)

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesectordata

    Otheragency data

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    42/71

    Appendix IVInventories of Efforts

    Page 37 GAO-04-548 Data Minin

    Table 5: Department of Educations Inventory of Data Mining Efforts

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesectordata

    Otheragency data

    Citizenship of PLUSLoan BorrowersNational StudentLoan Data Systems

    Looks for issues regardingcitizenship among its PLUS loanborrowers. Flags records basedon selected criteria and requestsadditional information from

    schools.

    Improvingservice orperformance

    Operational Yes Yes Yes

    Foreign SchoolsInitiatives NationalStudent Loan DataSystem/CentralProcessing

    Is a proactive investigation effortthat looks at whether financial aidwas granted individuals attendingforeign institutions during periodsof nonenrollment.

    Detectingcriminalactivities orpatterns

    Operational Yes No Yes

    ProfessionalJudgment Practices:Title IV Pell Grants,National StudentLoan Data

    Used to determine whenprofessional judgment has beenexercised for special situationswhere families cannot affordcollege expenses.

    Improvingservice orperformance

    Operational Yes Yes Yes

    Title IV ApplicantDeath DatabaseMatch

    Compares Department ofEducation data with the SocialSecurity Administrations death

    database to detect fraud orcriminal activity.

    Detectingfraud, waste,and abuse

    Operational Yes No Yes

    Title IV Loans withNo Applications

    Will compare information from theFree Application for FederalStudent Aid Program with theFederal Family Education LoanProgram to identify fraud.

    Detectingfraud, waste,and abuse

    Planned Yes No No

    OIGProjectStrikeback

    Compares Department ofEducation and Federal Bureau ofInvestigation data for anomalies.Also verifies personal identifiers.

    Analyzingintelligenceand detectingterroristactivities

    Operational Yes No Yes

    Accuracy of U.S.Department of

    Education PersonalData

    Audits and verifies personalinformation that is contained in the

    Department of Educationspersonal data system.

    Detectingfraud, waste,

    and abuse

    Operational Yes No Yes

    Impact of CohortDefault RateRedefinitionNational StudentLoan Data System

    Audits data to determine theimpact of legislation that extendedthe college loan repayment defaultperiod from 180 to 270 days.

    Legislativeimpact

    Operational Yes No No

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    43/71

    Appendix IVInventories of Efforts

    Page 38 GAO-04-548 Data Minin

    CheckFreeSoftware/PurchaseCard Program

    Takes monthly billing informationfrom the Bank of America tocreate reports on purchases,purchase quantity, and frequencyof purchases. Data are mined forinstances of fraud or abuse.

    Detectingfraud, waste,and abuse

    Operational Yes Yes No

    Improper Pell Grant

    Payment Activity

    Will compare Pell Grants issued

    with the amounts received andlook at the eligibility of grantrecipients.

    Detecting

    fraud, waste,and abuse

    Planned Yes No No

    Title IV Identity TheftInitiative

    Helps identify patterns and trendsin identity theft cases involvingloans for education. Provides aninvestigative resource for victimsof identity theft.

    Detectingcriminalactivities orpatterns

    Operational Yes No No

    Title IV ApplicantUse of MultipleAddresses/CentralProcessing System

    Reviews addresses listed on TitleIV applications to see if they arevalid. For example, jails oremployment addresses are notconsidered valid addresses.

    Improvingservice orperformance

    Operational Yes No Yes

    Lapsed

    Funds/ImproperDraw of FederalGrant Proceeds

    Identifies funds that remain in the

    grants and payment processingsystem beyond the time period forallocating the funds.

    Improving

    service orperformance

    Operational No No No

    Decision SupportSystem with OnlineAnalytical ProcessingQuery

    Will support the departmentsperformance-based initiative. Willallow custom queries of schoolsfrom state and local databases fordemographics and test scores.

    Improvingservice orperformance

    Planned No No No

    Grant Administrationand Payment System

    Assists in managing grantactivities and aids in detectinginstances of fraud or abuse ingrant activities.

    Detectingfraud, waste,and abuse

    Operational Yes Yes Yes

    Budget ExecutionSupport

    Uses information in the NationalStudent Loan Data System and asample drawn from it to estimatecohort distributions for financialactivities related to the FederalFamily Education Loan Programpursuant to the Credit Reform Act.

    Financialmanagement

    Operational Yes No No

    Pell Grant ModelAssumptions

    Provides estimates on the totalcost of the Pell Grant program. Ituses data from previous years andmakes assumptions for futureyears.

    Financialmanagement

    Operational No No No

    (Continued From Previous Page)

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesectordata

    Otheragency data

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    44/71

    Appendix IVInventories of Efforts

    Page 39 GAO-04-548 Data Minin

    Source: Department of Education.

    National StudentLoan Data System

    Compiles student loan informationfrom the guaranteeing agencies.Is used for eligibility tracking andto calculate default rates.

    Detectingfraud, waste,and abuse

    Operational Yes No Yes

    Loan ModelAssumptions

    Estimates the cost of loanprograms. Also analyzes loandefault behavior.

    Financialmanagement

    Operational Yes No Yes

    Office of theInspector General(OIG) Projects:Tumbleweed/Snowball

    Is part of an OIG investigation todetermine potential fraud offinancial aid grants primarily inNew Hampshire.

    Detectingcriminalactivities orpatterns

    Operational Yes No Yes

    Central ProcessingSystem

    Processes applications for studentaid. Contains data on more than13 million applications. Data aremined for demographic trends.

    Detectingfraud, waste,and abuse

    Operational Yes No No

    Direct Loan ServicesSystem

    Is used to track the life of studentdirect loans and to monitor loanrepayments.

    Improvingservice orperformance

    Operational Yes Yes Yes

    CheckFreeSoftware/Travel Card

    Program

    Uses monthly billing informationfrom Bank of America to create

    reports on travel expenditures tolook for improper use of travelcards.

    Detectingfraud, waste,

    and abuse

    Operational Yes Yes No

    (Continued From Previous Page)

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesectordata

    Otheragency data

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    45/71

    Appendix IVInventories of Efforts

    Page 40 GAO-04-548 Data Minin

    Table 6: Department of Energys Inventory of Data Mining Efforts

    Source: Department of Energy.

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesectordata

    Otheragency data

    CounterintelligenceAutomatedInvestigativeManagement System(CI-AIMS)

    Is an investigative managementsystem used by Department ofEnergy (DOE) field sites to trackinvestigative cases on individualsor countries that threaten DOE

    assets. Information stored in thisdatabase is also used to supportfederal and state law enforcementagencies in support of nationalsecurity.

    Detectingcriminalactivities orpatterns

    Operational Yes No No

    Autonomy Will be used to mine a myriadintelligence-related databaseswithin the intelligence communityto uncover criminal or terroristactivities relating to DOE assets.

    Detectingcriminalactivities orpatterns

    Planned Yes No No

    CounterintelligenceAnalytical ResearchData System(CARDS)

    Is used to log briefings anddebriefings given to DOEemployees who travel to foreigncountries or interact with foreignvisitors to DOE facilities. Data aremined to identify potential threatsto DOE assets.

    Detectingcriminalactivities orpatterns

    Operational Yes No Yes

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    46/71

    Appendix IVInventories of Efforts

    Page 41 GAO-04-548 Data Minin

    Table 7: Department of Health and Human Services Inventory of Data Mining Efforts

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesector data

    Otheragency data

    Agency for Healthcare Research and Quality

    National PatientSafety Network

    Will contain reports on adversemedical events that are filed byhospitals. The planned networkspurpose is to take out patientpersonal identifiers and otheritems that may violate certainrules and create a warehouse thatcan be used by registered andunregistered users to evaluate andimplement patient safety andquality measures. The network willbe used to create tools thathospitals can use for makingquality improvements.

    Improvingservice orperformance

    Planned No No No

    Centers for Disease Control and Prevention

    BioSense Enhances the nations capability torapidly detect bioterrorism events.

    Analyzingintelligenceand detectingterrorist

    activities

    Operational No Yes Yes

    Department of Health and Human Services Headquarters

    DHHS BloodMonitoring Program

    Monitors the countrys bloodsupply by keeping an inventory onred blood cells and platelets andmonitors blood supply shortages,the nature of the shortage, andsize of the shortages.

    Monitoringpublic health

    Operational No Yes No

    Food and Drug Administration

    MissionAccomplishment andRegulatoryCompliance ServicesSystem

    Is a comprehensive redesign andreengineering of two core mission-critical legacy systems at Foodand Drug Administration (FDA)that support the regulatory

    functions that primarily take placein FDA's field offices.

    Monitoringfood or drugsafety

    Operational No Yes Yes

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    47/71

    Appendix IVInventories of Efforts

    Page 42 GAO-04-548 Data Minin

    Turbo EstablishmentInspection Report

    Provides a standardized databaseof citations of regulations andstatutes, and help investigators inpreparing reports. It will collectdata on specific observationsuncovered during inspections andprovide a more uniform formatnationwide that will allow forelectronic searches and statistical

    analysis to be performed bycitation.

    Improvingsafety

    Operational No Yes No

    PhoneticOrthographicComputer Analysis

    Is a search engine that providesresults indicating how similar twodrug names are on a phonetic andorthographic basis. Its purpose isto help in the safety evaluation ofproposed proprietary names toreduce drug name confusion afteran application is approved by theFDA.

    Improvingsafety

    Operational No Yes No

    MPRIS DataWarehouse

    Will provide data to support enduser ad-hoc query analysis andstandard reporting needs. It will

    provide the foundation for a centralreporting repository that can beused to populate business-specificdata marts.

    Improvingservice orperformance

    Planned No No No

    Development andDeployment ofAdvanced AnalyticalTools for Drug SafetyRisk Assessment

    Will develop advanced softwaretools for quantitative analysis ofdrug safety data. Medical officersand safety evaluators will usethese advances in software tools.

    Analyzingscientific andresearchinformation

    Planned Yes Yes Yes

    Add data miningcapability to CFSANAdverse EventReporting System

    Is a comprehensive system fortracking, reviewing, and reportingadverse event incidences involvingfoods, cosmetics, and dietarysupplements. Integrating andcentralizing the system and

    eliminating patchwork systemsmake information on theseadverse events available tofederal, state, and localgovernments as well as to industryand the public in a more timelyand efficient manner.

    Monitoringfood or drugsafety

    Planned Yes Yes Yes

    (Continued From Previous Page)

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesector data

    Otheragency data

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    48/71

    Appendix IVInventories of Efforts

    Page 43 GAO-04-548 Data Minin

    Source: Department of Health and Human Services.

    Table 8: Department of Homeland Securitys Inventory of Data Mining Efforts

    Health Resources and Services Administration

    HRSA GeospatialData Warehouse

    Data warehouse that primarilycollects programmatic,demographic, and statistical data.

    Improvingservice orperformance

    Operational No Yes Yes

    Program Support Center

    Employee AssistanceProgram Analysis

    Uses information from a databaseof employee assistance programcase information that does notcontain client personal identifiers.Data are mined for qualityassurance and programmanagement information that isused to enhance the quality andcost effectiveness of services.

    Improvingservice orperformance

    Operational No No No

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesector data

    Otheragency data

    Border and Transportation Security Directorate

    Workforce ProfileData Mart

    Contains payroll and personneldata and is mined for workforcetrends.

    Managinghumanresources

    Operational Yes No Yes

    Customs IntegratedPersonnel PayrollSystem Data Mart

    Is a Customs data mart containedwithin Department of HomelandSecuritys workforce profile datamart. Personnel and payroll dataare mined for workforce trends.

    Managinghumanresources

    Operational Yes No Yes

    Internal AffairsTreasuryEnforcementCommunicationsSystem Audit DataMart

    Assists the Internal Affairs group bymining criminal activity data toascertain how Customs employeesare using the Treasury EnforcementSystem.

    Detectingcriminalactivities orpatterns

    Operational Yes No Yes

    OperationsManagementReports Data Mart

    Assists in managing the operationof all ports of entry for incomingcarriers, people, and cargo. Helpsin making resource (people andequipment) allocation andoperational improvement decisions.

    Improvingservice orperformance

    Operational No No Yes

    (Continued From Previous Page)

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesector data

    Otheragency data

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    49/71

    Appendix IVInventories of Efforts

    Page 44 GAO-04-548 Data Minin

    Automated ExportSystem Data Mart

    Mines data on export trade in theU.S. and produces reports onhistorical shipping and receivingtrends.

    Improvingservice orperformance

    Operational No Yes Yes

    Seized Property/Forfeitures,Penalties, and FinesCase Management

    Data Mart

    Mines data to ensure data qualityand review work assignments.System has two components: onethat processes legal cases like a

    law firm, and a second that servesas property and inventory control bytracking property seized.

    Improvingservice orperformance

    Operational Yes No No

    Incident Data Mart Will look through incident logs forpatterns of events. An incident is anevent involving a law enforcementor government agency for which alog was created (e.g., traffic ticket,drug arrest, or firearm possession).The system may look at crimes in aparticular geographic location,particular types of arrests, or anytype of unusual activity.

    Analyzingintelligenceand detectingterroristactivities

    Planned Yes Yes Yes

    Case ManagementData Mart

    Assists in managing lawenforcement cases, including

    Customs cases. Reviews caseloads, status, and relationshipsamong cases.

    Analyzingintelligence

    and detectingterroristactivities

    Operational Yes Yes Yes

    Emergency Preparedness and Response Directorate

    Enterprise DataWarehouse

    Will take data from multiple,disparate systems and integrate thedata into one reportingenvironment. The objective of theeffort is to allow for the reduction ofdata within the agency and toprovide an enterprise view ofinformation necessary to drivecritical business processes anddecisions. Data on internal human

    resources, all aspects of disastermanagement, infrastructure,equipment location, etc., will beused.

    Disasterresponse andrecovery

    Planned Yes Yes Yes

    Information Analysis and Infrastructure Protection Directorate

    Analyst Notebook I2 Correlates events and people tospecific information

    Analyzingintelligenceand detectingterroristactivities

    Operational Yes Yes No

    (Continued From Previous Page)

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesector data

    Otheragency data

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    50/71

    Appendix IVInventories of Efforts

    Page 45 GAO-04-548 Data Minin

    Source: Department of Homeland Security.

    Automatic MessageHandling System(Verity)

    Automatically takes messages fromexternal agencies and routes themto appropriate recipients

    Analyzingintelligenceand detectingterroristactivities

    Planned No No Yes

    U.S. Coast Guard

    ReadinessManagement

    System

    Assists in ensuring readiness for allCoast Guard missions.

    Improvingservice or

    performance

    Operational Yes No No

    CG Info Provides one-stop shopping forCoast Guard information. It is thecentral location and commoninterface for the entire Coast Guardto gain near real-time access todata from multiple, disparate CoastGuard information systems. Itprovides a single interface for usersto view mission-critical supportdata.

    Improvingservice orperformance

    Operational Yes No Yes

    U.S. Secret Service

    CriminalInvestigation

    Division Data Mining

    Mines data in suspicious activityreports received from banks to find

    commonalities in data to assist instrategically allocating resources.

    Detectingcriminal

    activities orpatterns

    Operational Yes No Yes

    (Continued From Previous Page)

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesector data

    Otheragency data

  • 8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]

    51/71

    Appendix IVInventories of Efforts

    Page 46 GAO-04-548 Data Minin

    Table 9: Department of the Interiors Inventory of Data Mining Efforts

    Source: Department of the Interior.

    Features

    Organization/system name Description Purpose Status

    Personalinformation

    Privatesectordata

    Otheragency data

    Minerals Management Service

    Data Mining of theTechnical InformationManagement System(TIMS) Database

    Is a corporate database for oil andgas leases. The database ismined in support of policydevelopment. One area of datamining is identification of leasesthat will be abandoned in the nearfuture. Data mini