14
WHITE PAPER Text Mining for Safety Develop your untapped reserves of unstructured data for health, safety and environmental improvements

Text Mining for Safety · 2017-11-28 · TEXT MINING FOR SAFETY ii Content for this paper, Data Mining for Safety, was provided by Bill Tuzin, Principal Consultant and Solutions Architect

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Text Mining for Safety · 2017-11-28 · TEXT MINING FOR SAFETY ii Content for this paper, Data Mining for Safety, was provided by Bill Tuzin, Principal Consultant and Solutions Architect

WHITE PAPER

Text Mining for SafetyDevelop your untapped reserves of unstructured data for health, safety and environmental improvements

Page 2: Text Mining for Safety · 2017-11-28 · TEXT MINING FOR SAFETY ii Content for this paper, Data Mining for Safety, was provided by Bill Tuzin, Principal Consultant and Solutions Architect

i

TEXT MINING FOR SAFETY

Table of Contents

Executive summary ............................................................................ 1Current challenges .............................................................................. 1Technology for achieving significant improvement ................................. 3

Improve the quality of data collected .................................................. 3Increase the utilization of the incident management system .................. 4Automate manual processes ............................................................. 5Use data-based analytics to support decisions ..................................... 8

Conclusion ......................................................................................... 9About SAS ....................................................................................... 10

Page 3: Text Mining for Safety · 2017-11-28 · TEXT MINING FOR SAFETY ii Content for this paper, Data Mining for Safety, was provided by Bill Tuzin, Principal Consultant and Solutions Architect

TEXT MINING FOR SAFETY

ii

Content for this paper, Data Mining for Safety, was provided by Bill Tuzin, Principal Consultant and Solutions Architect for the Oil and Gas Industry Sales Group at SAS. This paper is a result from his contributions to the Society of Petroleum Engineers.

Page 4: Text Mining for Safety · 2017-11-28 · TEXT MINING FOR SAFETY ii Content for this paper, Data Mining for Safety, was provided by Bill Tuzin, Principal Consultant and Solutions Architect

TEXT MINING FOR SAFETY

Executive summary

To ensure the overall health of the enterprise and its safety and environmental conditions, companies prepare and file detailed incident (and near-incident) reports. Monitoring these events and performing timely analysis leads to effective remediation efforts. Many incident management systems are cumbersome and unavailable at the job site, causing some events to go undocumented. In addition, event details are often not reviewed or captured, which if done could help reduce future occurrences. All of these conditions impede the value and quality of information entered into the system, resulting in additional challenges to maintaining a safe workplace. Proven technology provided by SAS can help you take prompt action by easing the burden of capturing incident data, improving the quality of the data, and speeding analysis and management notification.

Current challenges

Good corporate citizenship requires effective programs to maintain the highest standards of employee health, safety and environmental (HSE) protection. This effort is very intense for oil and gas companies due to the risks inherent to the exploration, refinement and distribution of petrochemical products. Many policies, practices and systems are implemented to protect employees and the environment from harm. The HSE performance of the enterprise is monitored by the incident management systems and hazard observation programs (IM/HO). These programs generally capture workers’ input as it applies to incidents; incidents without consequence; and observations of behaviors and hazardous conditions.

In addition to reporting on past HSE performance and alerting management to new hazards, IM/HO programs serve to prevent incidents by reinforcing worker behavior regarding hazard awareness. HSE professionals monitor the volume of IM/HO input in the belief that a drop-off in volume indicates poor behavior that will result in a significant incident. This is based upon industrial safety theory, which holds that each significant accident is preceded by upward of 600 less-significant incidents.1 Thus, to be effective, IM/HO must have high volumes of input from the field. Only a few organizations have implemented IM/HO that supports both high-transaction volume and data that can be analyzed for new hazards. Many factors inhibit the efficacy of these systems, including:

• Cumbersome data entry. Structured data collected by check boxes or “yes” or “no” fields make it easy for analysis and reporting purposes. But filling out lengthy questionnaires on so many structured data fields during the actual data input makes the data entry process very cumbersome.

1 Bird, Frank E. and George L. Germain. Loss Control Management: Practical Loss Control Leadership. Revised Edition. Det Norske Veritas (USA) Inc. 1

Page 5: Text Mining for Safety · 2017-11-28 · TEXT MINING FOR SAFETY ii Content for this paper, Data Mining for Safety, was provided by Bill Tuzin, Principal Consultant and Solutions Architect

TEXT MINING FOR SAFETY

• Data entry skills. Workers with the greatest exposure to risks are often the least skilled when it comes to data entry. Many times it is the job of supervisors and managers to perform the data entry task. This practice contributes to the delay in recording information into the system. It also results in second-hand accounts of the official record of the incident.

• Workplace environment. The environments with the greatest exposure to risks are usually the least suitable environments for data entry into computer terminals. Paper-based systems are suitable for data capture but seldom get digitized for inclusion in analysis and detailed reporting.

• Confusing workflow. Online systems often require numerous screens of data entry, which can overwhelm the occasional user.

• User frustration. When data input users get frustrated, they try to complete the task at hand as fast as possible. This is often accomplished by entering data as quickly as possible, taking as many shortcuts as allowed by the system’s workflow.

• Data quality. Systems typically do not cross-check for accuracy between the structured and unstructured data. Thus, an injury incident accurately described in the text field may be misclassified as an environmental spill in the structured field.

• Incident system limitations. Some systems allow only a single incident consequence to be specified and secondary consequences do not get recorded. For example, a truck accident with an injury and fuel spill may be classified as an injury incident. The environmental aspect would not appear in standard reports driven by the structured data.

• Delays in data capture. It is common for IM/HO data to be entered at the end of a shift, week or month. Delays in data entry impede analysis and remediation response.

• Recollection decay. The longer the time lapse between the incident and the data input, the more details that are forgotten and not recorded.

• Delays in analysis. Prompt data entry allows prompt analysis and faster identification of new trends and new risks. On the other hand, some companies are capable of compiling and reporting incident statistics only on a quarterly basis.

2

Page 6: Text Mining for Safety · 2017-11-28 · TEXT MINING FOR SAFETY ii Content for this paper, Data Mining for Safety, was provided by Bill Tuzin, Principal Consultant and Solutions Architect

TEXT MINING FOR SAFETY

Technology for achieving significant improvement

The risk mitigation decision process depends heavily on worthwhile data and diligent analysis of that data to determine future risk mitigation actions. The key underlying principle is that prompt reporting and analysis to identify new issues and trends require prompt action. Success is hampered by the many causes identified above that affect timeliness, quality and user participation. SAS has proven technology that can make all the HSE programs more effective by improving the quality of data collected, increasing the utilization of the incident management system, automating manual processes and using data-based analytics to support decisions.

Improve the quality of data collected

The IM/HO system must strike a fine balance between ease of use and collection of useful data. User input screens that present many structured data fields, with select-from-list input options, are viewed by many users as cumbersome. Those fields that are not mandatory often go unused, resulting in an incomplete description of the incident. Of course, those fields could be made mandatory but that would further affect the frequency of use and system adoption in the field. Systems that emphasize free-form text and minimize structured data fields are perceived as being user friendly, but place burdens on HSE professionals to analyze and classify the incident based upon the user’s textual input.

The ideal solution would relieve the data input user’s burden to specify many structured data fields, while also providing the structured data definitions that are needed by HSE personnel for reporting and analysis. SAS has proven technologies that can interpret textual data reliably to determine the appropriate values for the structured text fields. This capability has been honed in numerous installations where customer complaints are analyzed and identification of emerging trends happens up to 70 percent faster than traditional methods.

Real-world analysis of a sampling of approximately 1,000 incident records provided more than 90 percent success in projecting the classification of incident severity. Additional system tuning may provide similar results for data fields such as the nature of damages, probable damage estimates, asset IDs, etc. These capabilities would free the incident-creation user from specifying all of the structured data fields that HSE professionals rely on, resulting in a system that is user-friendly.

Because SAS® software is available as a callable service (SAAS), this interpretation function can be integrated into the workflow processes of many incident management systems. Thus, a first report can be created, interpreted and have structured data elements suggested prior to the first approval step of the incident workflow process.

3

Page 7: Text Mining for Safety · 2017-11-28 · TEXT MINING FOR SAFETY ii Content for this paper, Data Mining for Safety, was provided by Bill Tuzin, Principal Consultant and Solutions Architect

TEXT MINING FOR SAFETY

SAS® data mining capabilities were developed to analyze both structured and unstructured data to predict an outcome. The greatest success has been in the financial arena where SAS models provide real-time credit scoring. Similar real-time scoring could be applied to new incidents recorded into the incident management system. Records that do not meet the minimum criteria in the textual description would be met with a request to amend the input. For example, trip-and-fall incidents would be examined to ensure the objects that caused the trips are identified. An incident record that states “Harry tripped and fell” would be scored as unacceptable, and the user would be prompted to identify what Harry tripped over. The prompt evaluation of text would allow follow-up questions to be asked quickly before the passage of time diminishes a person’s recollection, resulting in a higher quality of incident data.

SAS analytical models can analyze textual data and determine structured data field values, which can validate the user-specified structured data field values. For example, if the textual description of an incident indicated a release of product A, then the models would validate that product A was specified in the structured data fields for environmental release information.

Figure 1: Portal technology provides rapid access to summary data and detailed reports.

Increase the utilization of the incident management system

For many systems, witness statements are collected on paper and scanned images of the documents are attached to the incident record. The contents of these statements are not analyzed other than through the diligence of the HSE professional monitoring the incident. Optical character recognition (OCR) software can digitize the contents of the witness statements to allow data mining to be applied to the statements, thus providing additional input for further analysis.

4

Page 8: Text Mining for Safety · 2017-11-28 · TEXT MINING FOR SAFETY ii Content for this paper, Data Mining for Safety, was provided by Bill Tuzin, Principal Consultant and Solutions Architect

TEXT MINING FOR SAFETY

A more user-friendly data capture system will promote additional use by field personnel. For paper-based systems, the data contained on those documents can be digitized by using OCR software. Once digitized, the input would be available for analysis and reporting. Users, witnessing a management response to the input they provided, will be encouraged to continue to use the system.

The value is a consistent, automated process to analyze and classify hazard observations. These same observations precede more significant incidents that HSE professionals monitor to create a safer workplace.

Automate manual processes

For many manufacturing and service companies, customer complaints provide the feedback that drives decisions for product quality. This is analagous to monitoring the IM/HO data as a means to measure the efficacy of preventive programs and managed systems. Customer complaints and issues are handled by call centers where the conversations are recorded. These are the common sites where you will hear that “your conversation is being recorded for quality purposes.” The quality purpose is generally tied directly to textual data mining activities that analyze the conversations to discover new quality issues.

Voice capture of conversations is performed by vendors, including CallMiner, Verint, NICE Systems, Witness Systems and more. SAS can read the audio output from any system you might be using once the audio signal is converted to textual format. The information provided by the voice capture includes the categories created by phonetic index search, metadata about the call and the call transcriptions.2 Once transcribed, SAS Text Miner can interpret the transcription to determine the classifications and categories for the incident.

2 Tune into the Voice of Your Customer with Voice Mining. A SAS white paper.

5

Page 9: Text Mining for Safety · 2017-11-28 · TEXT MINING FOR SAFETY ii Content for this paper, Data Mining for Safety, was provided by Bill Tuzin, Principal Consultant and Solutions Architect

TEXT MINING FOR SAFETY

Figure 2: Analysis of conversations via SAS Text Miner.

For IM/HO systems, this voice-recording capability could be used as an alternate to the computer data input screens where the identification of hazards and creation of incident records (First Report) are accomplished. Users would call into an emergency response center where a few prompts would collect critical information, such as who is reporting and where they are located, prior to allowing the user to explain the details of the incident.

Transcription will make the recording available for processing by SAS Analytics, where a company-specific taxonomy is used to interpret colloquialisms. Models, built upon analysis of prior records, would predict the structured data elements based upon the interpretation of the text. Once classified, the IM/HO system can fulfill its normal workflow process. Incomplete records, or recordings that could not be interpreted, would trigger an alert to get HSE personnel involved right away.

6

Page 10: Text Mining for Safety · 2017-11-28 · TEXT MINING FOR SAFETY ii Content for this paper, Data Mining for Safety, was provided by Bill Tuzin, Principal Consultant and Solutions Architect

Figure 3: Measure change by comparing periodic frequency of term usage.

One of the processes applied to textual data is the grouping of like records into clusters. Less informative words such as “and,” “in,” “the,” “of” and “to” are ignored and emphasis is placed on more useful words, such as common and proper nouns and verbs. The resulting clusters provide a synopsis of the content of the textual data fields. Thousands or tens of thousands of records can be synopsized into a few clusters, yielding an insight that can be digested quickly by the HSE professional. As new incidents are analyzed and compared to historical clusters, any shift in the clusters will signify changing trends or new issues that warrant attention. The clustering process is automated and reveals the key terms used in the text. The result is the equivalent of the first level of a root-cause analysis without any manual involvement in the process.

A best practice developed by SAS involves the continual monitoring of this information and providing portals where this information is accessed by users. The portals use graphical displays, graphs and reports complete with the capability to drill down to examine the record details. Reports contrasting current cluster contents with historical trends would pinpoint changing conditions.

Page 11: Text Mining for Safety · 2017-11-28 · TEXT MINING FOR SAFETY ii Content for this paper, Data Mining for Safety, was provided by Bill Tuzin, Principal Consultant and Solutions Architect

8

TEXT MINING FOR SAFETY

Figure 4: Dashboard of cluster analysis from the incident description field.

Use data-based analytics to support decisions

Incident management systems monitor the health of the enterprise and identify current risk exposures. Both provide critical input into decisions regarding actions that will be taken to mitigate risk. As new risks are identified, mitigation steps may include revisions to existing programs or adding new preventative programs. Existing risk mitigation programs are continually monitored to measure the effectiveness of the program. SAS analytical capabilities assist in deciding what actions should be taken, what preventative programs should be deployed and what changes should be made to existing programs.

A risk-reduction program common to the oil and gas industry is the use of audits to measure preparedness and awareness of hazards. Audits that are not focused on the current hazards of the enterprise are a failure and undermine overall HSE programs. SAS Text Miner can analyze the topics and topic emphasis within audits, as well as the makeup of the current risks facing the enterprise as reflected in the incident management system. Discrepancies in the emphasis of either audits or current risks indicate either misplaced emphasis or new risk exposures. In either case, an adjustment is needed to increase the value of the audit as a risk mitigation process.

An outcome of the incident management system is often a corrective action. In many cases, the incident management system and corrective action system may be totally separate systems or may be segmented, depending on the type of corrective action that is created. SAS can unify these disparate data sources to provide unified reporting that also can be provided to users via the portal.

Page 12: Text Mining for Safety · 2017-11-28 · TEXT MINING FOR SAFETY ii Content for this paper, Data Mining for Safety, was provided by Bill Tuzin, Principal Consultant and Solutions Architect

TEXT MINING FOR SAFETY

9

Text mining should also be applied to the corrective action records and root cause systems for the purpose of identifying consistency between the incident, the root causes and the corrective actions. Many times the corrective actions do not repair the root cause of the incident, resulting in a repeat incident. Better alignment of the content of those three systems for a single incident will decrease the likelihood of a repeat incident.

Figure 5: A bubble map of clusters by consequence, sized by frequency, with the reddish color shift indicating risk.

Unified reports indicating current status of corrective actions would eliminate duplicate corrective action efforts and the confusion that surrounds incidents with repetitive frequency greater than the cycle time needed for correction. SAS can integrate with virtually any data source and has standard integration into many of the most popular ERP systems.

Conclusion

Curent day technologies enable speedier and easier collection of safety and environmental information. With the removal of data collection barriers, more data points will be processed more quickly. Empowered by insights previously hidden in paper-based or administrative systems, companies can become more responsive and agile. Faster fact-based decision making will lead to reductions in risk and accidents.

Page 13: Text Mining for Safety · 2017-11-28 · TEXT MINING FOR SAFETY ii Content for this paper, Data Mining for Safety, was provided by Bill Tuzin, Principal Consultant and Solutions Architect

10

TEXT MINING FOR SAFETY

About SAS

SAS is the leader in business analytics software and services, and the largest independent vendor in the business intelligence market. Through innovative solutions delivered within an integrated framework, SAS helps customers at more than 45,000 sites improve performance and deliver value by making better decisions faster. Since 1976 SAS has been giving customers around the world THE POWER TO KNOW®.

Page 14: Text Mining for Safety · 2017-11-28 · TEXT MINING FOR SAFETY ii Content for this paper, Data Mining for Safety, was provided by Bill Tuzin, Principal Consultant and Solutions Architect

SAS Institute Inc. World Headquarters +1 919 677 8000 To contact your local SAS office, please visit: www.sas.com/offices

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright © 2009, SAS Institute Inc. All rights reserved. 104074_542521.0709