26
AN APPROACH TO A UTOMATED TECHNIQUES FOR D ATA EXTRACTION AND INTEGRITY V ALIDATION OF CT DOSIMETRY REPORTS Jaron Chong Diagnostic Radiology Resident, PGY-1 Department of Radiology McGill University Health Centre CAR ASM 2011 Educational Exhibit EE137

An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

Embed Size (px)

DESCRIPTION

Submitted to the Canadian Association fo Radiologist Meeting 2011.

Citation preview

Page 1: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

AN APPROACH TO AUTOMATED

TECHNIQUES FOR DATA EXTRACTION AND

INTEGRITY VALIDATION OF CT DOSIMETRY

REPORTS

Jaron ChongDiagnostic Radiology Resident, PGY-1

Department of Radiology

McGill University Health CentreCAR ASM 2011 Educational ExhibitEE137

Page 2: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

Why Dose?

With ever increasing utilization of Computed Tomography (CT) scanners, there is increasing pressure for greater accountability, reporting, and analysis of CT dosimetry. Recent focusing events such as the Cedars Sinai incident in California and the Jacoby Roth incident at an E.R. in California have resulted in vendor recalls, redesigns, and heightened public anxiety and attention regarding radiation dose.

General concern about dose has reached public awareness from thyroid shields on the Dr. Oz Showto prime-time TV show reports on the dangers of CTs and the Fukushima nuclear reactor. While most reports are over-sensationalized, our collective ability to control and respond to these issues have been limited by our usage of dose reports.

Page 3: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

Anatomy of a Dose Report

Each Dose Report is a DICOM Screen Capture, a bitmap representation of a dose estimate generated by the scanner. Images are generally stored as lossless images and are quite large in size, occupying approximately 528kB per page of report. These reports are usually labeled #999 or #501 as the series number.

Page 4: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

Changing Our Mindset

At the time, this solution of storing information as a bitmap was geared towards providing an audit trail — metaphorical receipts in shoeboxes. From a technical perspective, the information is all there, but from a practical perspective, we have previously lacked the tools to address some important questions ranging from alerting radiologists to acute overdose events to more subtle issues like the comparison and evaluation of new scanners, protocols and quantifying the radiation dose exposure through the lifetime of a patient.

Information is only as good as it is accessible and accessibility has been poor.

[Shoebox of Receipts]

Page 5: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

The Core Principles of Audits

Structured Data

Complete

Fast

Accurate

Routine

Performing and comparing approaches to audits should be guided by core principles that will determine the best approach.

Audits should promote structured textual data, to allow for proper analysis.

The process should be fast and accurate, and complete with as many records as is possible in these analyses.

Finally, audits would be made routine and integrated into day-to-day practice, as much as possible.

Of these it is important to emphasize the first two.

Page 6: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

Structured Data

It is only from structured information that we will be able to enable routine real-time alerts of dosimetry and longitudinal analyses of the tens of thousands of scans we perform annually. Only through the creation of textual and analyzable data can we appropriately monitor and improve our protocols in a responsive fashion to our patients.

While some attempts to generate structured data have been attempted, such as the development of the Radiation Dose Structured Report, it still remains very difficult to access dosimetry data. Meta-information on scanning parameters and clinical indications is often hidden in the DICOM Headers or other virtual silos of information.

Page 7: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

Complete Audits

In order to guarantee safety we require complete audits. There is a palpable difference between: “In a sample of 100 Head CT’s, none were found to exceed recommended radiation doses.” versus “In our institution, no Head CT’s exceeded recommended radiation doses.” One statement is academic while the other is one you could take to court. In business, you would never audit a company by only looking at 10% of the financial records. Why should radiology be any different?

Page 8: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

The Problem With Dose Reports

Achieving these principles has not been, and will not be easy. As information is locked into an image, it has traditionally been laboriously intensive to perform dosimetry audits. The vast majority of studies examining dose are done prospectively with limited achievable sample sizes in mind.

As for retrospective studies, these are usually limited to time spans that can be manually transcribed, and often come months if not years after the scans have been completed.

Audits are slow, labour-intensive, complex, and seem out of the way for the typical radiologist.

Page 9: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

Optical Character Recognition

However, starting in 2010, we now have software tools that are capable of performing Optical Character Recognition on dose reports. DoseUtility, made available by Dr. David Clunie and packaged into a server-hosted platform by Dr. Tessa Cook called RADIANCE at RSNA 2010 fundamentally changes the way we can go about performing dose audits.

This novel software has been custom-built to be fast, accurate, and customized to solving the crucial step of converting the bitmap Dose Report into text.Using this software, it is now possible to either analyze a set of DICOM Dose Reports manually in very large batches, or to have a dedicated server that can OCR dose reports automatically and convert them into a structured queryable database.

We call these two techniques: Semi-Auto OCR and Auto OCR.

Page 10: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

Audit Techniques

ManualWhat has been traditionally done – a sub-population of dose reports are sampled. Dose Report values are manually inputted to a spreadsheet or database for analysis. Accurate and cross-compatible with multiple scanners but slow and labour-intensive.

Semi-Automated OCRSimilar to a manual audit but done at a much larger scale. DICOM files of the Dose Reports are collected in bulk and using DoseUtility, OCR’d into textual data. Conventional tools like SPSS, Excel, and Access, and more advanced tools like Python/PyDICOM are used to analyze data.

Automated OCRPioneered by and first publicly demonstrated at Dr. Tessa Cook and the RADIANCE platform at RSNA 2010, it involves the setup of a dedicated server that receives and processes bitmap Dose Reports as well as DICOM headers into a structured queryable MySQL database.

RDSRStands for Radiation Dose Structured Report, a text XML document with theoretical cross-vendor format support. Only some newer scanners are now generating RDSR. Few to no tools exist for analyzing RDSR’s for interested radiologists.

Page 11: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

Comparison of Audit Techniques

Manual Semi-Auto OCR Auto OCR RDSR

Speed 0 ++ +++ +++

Accuracy +++ +++ +++ +++

Completion 0 +++ +++ +++

Multi-Vendor +++ ++ ++ +/???

Ease of Implementation

+++ +++ + +/+++

Availability Now +++ +++ +++ 0

Comparing audit techniques, it is readily apparent that on all measures, computer-assisted audits represent a significant improvement in terms of speed and breadth of analysis. However, amongst the computer assisted methods, RDSR is vendor specific with no available tools for analysis. Between Auto and Semi-Auto, the difference is fundamentally whether your local IT support exists for implementation of a fully automated server.

Page 12: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

RDSR: The R’s Aren’t For Ready

Radiation Dose Structured Reports without a doubt represent the future of CT dosimetry. Painstakingly formulated over the past decade, RDSRs represent an industry-standardized way of reporting dose that is importable to dose registries. However, there are numerous disadvantages at present:

1) RDSR is not available on all scanners: Only the latest scanners from vendors have enabled the functionality to export these reports. In some cases, this is only scanners installed in the past 2 years.

2) RDSR ignores our archives: All of the information collected from our nearly 10-year old digital archives is ignored. The earliest dose history of patients is ignored.

3) RDSR is a ‘black-box’ with no tools: For those of us lucky enough, RDSR is being archived on our PACS (e.g. Series #997 Dose Record) but as a ‘black-box’ without the tools for analysis. It is possible to export RDSR files from PACS, but there is no incentive for vendors to provide us with tools to analyze these files, let alone RDSR’s made from the hardware of other vendors. For institutions with multiple vendors, achieving interoperability is challenging.

Once RDSR becomes the norm, it will revolutionize the way we think about dose reports. But this may only come after 2 or 3 more hardware upgrade cycles which conservatively could be another 5, possibly 10 years away.

Page 13: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

Can You Do Full Auto?

Without a doubt, having a fully automated audit system is at present the optimal solution. The difficulty is with implementation.

Fully automated monitoring requires:

1) Setting up a ‘XAMPP’ server, which is an Apache/PHP/MySQL server, analogous to setting up your own webserver. This requires dedicated hardware and support service, something that all institutions may not have or be able to easily obtain.

2) Even if it is technically possible to do so, there are also political hurdles to approve such a monitoring solution on your live clinical PACS network.

The RADIANCE group has made tremendous strides in delivering documentation to setting up such a server, but it still requires significant technical expertise.

Page 14: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

Semi-AutomatedAudit Protocol

START

OCR Dose Information from

Dose Report Image

Collect Raw

DICOMs

Extract Meta-Information

(DICOM headers)

Filter/Validate

Filter/Validate

Merge Information

Processed Table

END

The Semi-Automated Audit Protocol, which is the method this document will describe, revolves around using the OCR software as a manual tool instead of part of an automated server. In doing so, we avoid the setup of a server, but experience the slight inconvenience of having to run each processing step manually.

In practice, the most time-consuming step is the collection of raw DICOM files. Depending on whether you have the assistance of your PACS administrator, and whether your PACS can export Dose Reports in batch, Semi-Auto can offer nearly all of the speed of Auto, while avoiding the technical difficulty of establishing a dedicated server.

Page 15: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

Semi-AutomatedAudit Protocol

START

OCR Dose Information from

Dose Report Image

Collect Raw

DICOMs

Extract Meta-Information

(DICOM headers)

Filter/Validate

Filter/Validate

Merge Information

Processed Table

END

If it is possible for you to perform the first step, the remainder of the protocol: subjecting the DICOM Dose Reports to Optical Character Recognition and Extracting Information from the DICOM Headers can be done on any desktop computer with freely available software.

The merging of the two streams of information, can be done with Microsoft Access, part of a standard MS Office Suite. Analysis of the processed table can be done with traditional statistics tools like the commercial IBM SPSS or open-source PSPP.

Full web links to the software mentioned are detailed at the end of

this presentation.

Page 16: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

#1: Collect the DICOMs Dose ReportsMethodology on how to collect the DICOM files of the Dose Reports will vary depending on your particular PACS. If possible, seek the assistance of your PACS administrator in order to do a bulk export of all relevant Dose Report series.

If such support is not possible, most PACS implementations have the ability to export complete DICOMs. Relevant studies can be searched for and exported. If given the option, try to export only the Dose Report and not the actual clinical acquisition to save disk space.

For G.E. scanners, dose reports are labeled Series #999, Siemens Series #501, and Phillips Series #1. On our own PACS (IntelleViewer, Montreal, QC) we found that querying in 2-4 day intervals was necessary in order to fall under the maximum results threshold. Studies were exported and saved under corresponding folders labeled “YYYY-MM”.

Page 17: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

#2: OCR Dose Reports

To recognize the Dose Reports, we used DoseUtility, a cross-platform Java application. After starting up the utility, click Import and select the desired month of DICOMs. For settings:

uncheck ‘Show tabular layout’uncheck ‘Process only dose series’uncheck ‘Show only dose summary’

These settings are necessary to generate a complete text-only report that contains both the dose summary as well as individual series details. Click Report. Copy, Paste, and Save the Report into Notepad.

Repeat as necessary.

Page 18: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

#3: Extract DICOM HeadersOCR is not definitive. There is very useful data only contained within DICOM Headers that needs to be extracted and correlated with the OCR record.

Most notable among these are Protocol Names, Additional Patient History, Referring Physicians, and CT Console Names.

In order to extract DICOM Header information, we used custom-built scripts made in Python/PyDICOMwhich are available on our website. These script programs recursively go through the folders and generate monthly tabular reports.

Page 19: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

#4: Filter / Validate / Remove Duplicates

At every juncture, various checks should be performed to ensure images are being correctly recognized, that the case records are valid, and that any duplications are being removed so as to not artificially inflate statistics and frequencies.

Relevant to validation is the removal of any external imported studies that are not native to your site. Duplicates can be removed using Excel or SPSS by sorting out the rows of the tables and searching for records with identical variables such as Accession Number and Acquisition Date/Time.

Page 20: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

#5: Merge

Merger of the two data streams is accomplished by (1) importing both filtered tables into MS Access. The two tables in MS Access are joined using a common index (2)usually by using the Accession Number. (3) Finally, a query is made that combines fields from both the OCR table and the DICOM-Header table. (4) The results can then be exported into a Tab-delimited or Excel file for final analysis.

Page 21: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

#6: Analysis of the Processed Table

Analysis is conducted as you would do with any other study. Spreadsheet data can either be manipulated directly in Excel or more powerful statistical analyses can be done with a dedicated statistics program like IBM SPSS or Open-Source PSPP.

Page 22: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

Tips / PitfallsACCESSION NUMBERS ARE NOT UNIQUE:Depending on the practices of your institution, a single accession number may be shared across several different examinations or repeat examinations. Your analyses should take this into account.

DUPLICATES ARE INSIDUOUS:As a corollary of the above pitfall, if you use the Accession Number as your unique identifier, duplication is possible, especially after the matching step. Be thorough with your filtering for duplicates and if possible, identify them using SPSS and its duplicate cases function.

BEWARE OF DELIMITTERS:

There may be numerous inconsistencies in the way technicians enter in Study Descriptions. If certain characters are used such as “quotation marks” or <Tab>, your imports and exports may be damaged. It is worth it to grossly inspect your data in Excel before committing for full analysis.

PROTOCOL DIVERSITY:It goes without saying that the generalizations you make will rest upon accurately categorizing and grouping types of protocols. If a protocol manual is available, become familiar with it to learn your institution’s vocabulary and if not, recruit the consultations of your staff to determine which protocols deserve to be grouped together.

ONLY SOME VENDORS ARE SUPPORTED:

DoseUtility currently supports GE, Siemens, and Phillips dose reports. While active development is still in progress, CT Scans and Dose Reports from other hardware vendors are not guaranteed to work.

CHANGE IS A CONSTANT:

Be wary while interpreting trends in dosimetry because protocols will vary by site and across time. In addition, subtle changes in settings can result in unexpected findings in your dataset. Full consultation with your site’s physicists and technicians is recommended to draw a better understanding of why doses are the way they are.

Page 23: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

Post-Processing: Advanced Analysis

Beyond the level of this presentation, it is also possible to use the Python programming language and a module, PyDICOMto do much more sophisticated analyses. For example, our institution’s audit is able to take into account overlapping series by reading in the minimum and maximum ranges, and the radiation of each series. These kinds of calculations are quite onerous for manual audits but can be algorithmically determined by computer. All this is made possible through the enabling of structured data.

Page 24: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

Conclusion

Audits can be fast. Audits can be easy. There is no excuse not to do them. We now have the tools to do fast, accurate, and complete audits.

If you have access to the IT expertise to implement a fully automated audit system like RADIANCE you can develop a very powerful database for quality control and research purposes.

If you do not have access to IT expertise, we have hopefully introduced you to some of the tools to enable significantly more powerful semi-automated audits in this presentation that with a little bit of effort, can allow you to achieve results comparable to fully automated techniques.

These techniques have allowed us to do a complete dosimetry audit of a year’s worth of 40,000+ CT’s in 2 weeks…

Page 25: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

More Information

For more detailed information on how you can perform an audit at

your institution as well as a copy of this presentation, visit :

http://www.nationaldosimetry.ca

Happy Auditing!

Page 26: An Approach to Automated Techniques for Data Extraction and Integrity Validation of CT Dosimetry Reports

Contact Information

Jaron Chong

Radiology Resident PGY-1

Diagnostic Radiology

McGill University

e: [email protected]

e: [email protected]