Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Comparative Analysis of Algorithmic Approaches for Auto-Coding with
ICD-10-AM and ACHI
Rajvir Kaur
Master of Research
Authors
Rajvir KaurJeewani Anupama Ginige
Introduction
• Electronic Health Records (EHRs): Digitised version of paper based medical records
• What is Clinical Coding?
• Assignment of alphanumeric codes
• Manually assigned by clinical coders
• Uses:
• Funding, insurance claim processing
• Research
• Government and policy makers use coded data.
Image Credit: https://medium.com/@Petuum/automated-icd-coding-using-deep-learning-1e9170652175
Classification system in different countries
• Countries specific classification system:
ICD-10-CM (Clinical Modification)
ICD-10-CA (Canadian Modification)
ICD-10-GM (German Modification)
ICD-10-AM (Australian Modification)
• Ireland, Singapore, Saudi Arabia
Image Credit: https://www.slideshare.net/EduardoPorras2
Challenges in manual coding
• Complexity of codes
ICD-9 : 3,882 codes
ICD-10: Approx. 70,000 codes
• 15-42 records per day
• Annual cost: 25 billion dollars (U.S.)
• Training and recruitment cost
• Highly prone to errors
Image Credit: http://bestptbilling.com/how-to-reduce-icd-10-transition-pain-for-physical-therapy-practice-owners/
“Boy, this new system is so confusing.your ICD-9 code says that you’re here
for a sprained ankle, but your ICD-10 codesays it’s complete and irreversible skeletal failure.
Our Contribution
• We focus on:
• ICD-10-AM and ACHI classification system
• Comparing and analysing various approaches based on standard evaluation criteria
• Our research concentrates on only two ICD-10-AM and ACHI chapters
• Digestive System:
Chapter 11: Diseases of the digestive system (ICD-10-AM)
Chapter 10: Procedures on digestive system (ACHI)
• Respiratory System:
Chapter 10: Diseases of the respiratory system (ICD-10-AM)
Chapter 7: Procedure on respiratory system (ACHI)
Ethics Approval
• Western Sydney University Ethics No.: H12628190
• Dataset:
• Total 190 clinical records (Gold Standard)
• Collected from hospitals across Australia
• Archived by National Centre for Classification
in Health (NCCH)
Sample data
Paper based Electronic version
• PDF or Image file to Tabular format
• Created text narratives
• Information extracted from medical records include:
Principal Diagnoses (PDx)
Additional Diagnoses (ADx)
Smoke related diagnosis
Diabetes condition
Supplementary conditions
Past Medical History (PMHx)
Family Medical History
Principal Procedure
Additional Procedure
Type of anaesthesia
Ventilation details
Allied health intervention
Dataset
• 190 original records
• Additional 45 records similar to digestive and respiratory diseases and interventions
45 Clinical Records = 190 + 45 =235
15 digestive system 30 respiratory system
Dataset Digestive system records Respiratory system records
Data190 116 74
Data235 131 104
Overview of the Proposed work
Clinical Text Processing Using ICD-10-AM/ ACHI
TASK 1:ICD-10-AM/ ACHI
Chapter Classification
TASK 2:ICD-10-AM/ ACHI Code Assignment
Digestive System
Respiratory System
Pattern Matching
Rule Based
Machine Learning
Approaches and TechniquesClinical Text Processing Approaches and Techniques
Pattern Matching
Regular Expression
Evaluation1. Precision2. Recall3. F-score4. Accuracy5. Hamming Loss6. Jaccard Similarity
Rule-based Machine LearningPre-processing1. Sentence splitting2. Abbreviation Expansion3. Tokenisation4. Spell Check
Defining Rules
Pre-processing1. Sentence splitting2. Abbreviation Expansion3. Tokenisation4. Spell Check5. Stop word removal6. Negation detection
Feature Extraction1-gram, 2-gram, 3-gram, 4-gram
ClassificationSVM, Naïve Bayes, Decision TreeRandom Forest, AdaBoost, kNN, MLP
Evaluation
Evaluation1. Precision2. Recall3. F-score4. Accuracy5. Hamming Loss6. Jaccard Similarity
Pattern Matching
• Simplest approach
• Search a text-string within the text
• Match character for character
• Use Regular Expression
bronchi, bronchus, bronchial, bronchitis
A 51 year old patient has serious cough but no sign of pneumonia
keywords
Rule-based approach
• Use logical expression and Boolean operations
if (logical expression) then (category)
ICD-10 Codes Generating rulesK05.2
Acute periodontitis
Acute pericoronitis
Parodontal abscess
Peridontal abscess
Excludes
acute apical periodontitis (K04.4)
periapical abscess (K04.7)
periapical abscess with sinus (K04.6)
If document contains
acute periodontitis OR
acute pericoronitis OR
parodontal abscess OR
peridontal abscess OR
AND document NOT contains
acute apical periodontitis AND
periapical abscess AND
periapical abscess with sinus
assign code K05.2
Machine Learning
• ML
Image Credit: https://www.newtium.com/Software/Predictive
Data Preprocessing
1. Abbreviation Expansion Admission Date: **** Discharge Date:****Presenting ProblemsRespiratory -coughPRINCIPAL DIAGNOSISInfective exacerbation of bronchiectasisAcute-on-chronic Type 2 respiratory failureSummary of ProgressDear Doctor,Thank you for your ongoing care of **** , who presented to ****hospital on **** with SOB, cough and chest pain, on abackground of bronchiectasis. The patient was admitted underthe case of Dr**** (Respiratory) for management of infectiveexacerbation of bronchiectasis.BackgroundBronchiectasis- Known to Dr****(Respiratory)- Bronchiectasis diagnosed 20 years ago, secondary to childhoodpertussis Left ventricular failure- Known to Dr****(Cardiology)Cough, SOB, Pleuritic chest pain
Abbreviations Full-form
COPD Chronic obstructive pulmonary disease
SBO Small bowel obstruction
IHD Ischaemic heart disease
SOB Shortness of breath
HTN Hypertension
T2DM Type 2 diabetes mellitus
Data Preprocessing
2. Spell Check
Used : NLTK and PyEnchant Python libraries
Australian English American English
oesophagus esophagus
tumour tumor
anaemia anemia
anaesthetic anesthetic
ischaemic ischemic
diarrhoea diarrhea
Data Preprocessing
3. Stop word removal
‘again’, ‘about’, ‘there’, ‘once’, ‘during’, ‘out’, ‘they’, ‘own’, ‘an’,‘some’, ‘its’, ‘yours’ ‘such’, ‘into’, ‘most’, ‘itself’, ‘other’, ‘off’, ‘am’,‘who’, ‘as’, ‘him’, ‘each’, ‘themselves’, ‘until’, ‘we’, ‘these’, ‘your’, ‘his’,‘through’, ‘me’, ‘her’, ‘more’ , ‘himself’, ‘this’, ‘down’, ‘should’, ‘our’, ‘their’,‘while’, ‘above’, ‘both’, ‘up’, ‘ours’, ‘she’, ‘all’, ‘when’, ‘at’, ‘any’,‘before’, ‘them’, ‘same’, ‘yourselves’, ‘because’, ‘what’, ‘over’, ‘why’, ‘now’,‘he’, ‘you’, ‘herself’, ‘just’, ‘ourselves’, ‘hers’, ‘yourself’, ‘how’, ‘theirs’‘further’, ‘doing’, ‘where’, ‘too’, ‘whom’, ‘those’
Xno, not, nil, never
Data Preprocessing
4. Negation Detection
negated term
The patient is suffering from serious cough but no evidence of pneumonia.
keywords
Negated findings: (pneumonia, ‘True’) – do not assign code
Non-negated findings: (cough, ‘True’) – assign code
Feature Extraction
Bag of words representation
X:The infant was admitted to The hospital for bronchiolitis with worse cough andwheeze
Y:The old male presented forvomiting and diarrhoea
admitted 1and 2bronchiolitis 1cough 1diarrhoea 1for 2hospital 1infant 1male 1old 1presented 1to 1the 3vomiting 1was 1wheeze 1with 1worse 1
Classification
Seven classifiers:
Support Vector Machine (SVM)
Naïve Bayes (NB)
Decision Tree (DT)
Random Forest (RF)
AdaBoost
k-Nearest Neighbor (kNN)
Multi Layer Perceptron (MLP)
Evaluation
Yi: Ground truth label Zi : Predicted label N: Number of records M: Set of all labels
Positive Negative
Positive True Positive (TP)
False Negative(FN)
Negative False Positive (FP)
True Negative(TN)
Pre
dic
ted
Ground Truth
Results: TASK 1 ICD-10-AM/ACHI Chapter Classification
TASK 1:ICD-10-AM/ ACHI
Chapter Classification
Digestive System
Respiratory System
Gastrointestinalclass
Respiratory class
Metrics
Classifiers
Data190 0.95 0.95 0.95 0.9474 0.05263 0.94736
Data235 0.87 0.87 0.87 0.8723 0.12765 0.87234
Data190 0.93 0.92 0.92 0.9211 0.07894 0.92105
Data235 0.98 0.98 0.98 0.9787 0.02127 0.97872
Data190 0.89 0.87 0.86 0.8684 0.13157 0.86842
Data235 0.88 0.87 0.87 0.8723 0.12765 0.87234
Data190 0.76 0.55 0.42 0.5526 0.44736 0.55263
Data235 0.84 0.81 0.8 0.8085 0.19148 0.80851
Data190 0.84 0.84 0.84 0.8421 0.15789 0.84211
Data235 0.9 0.89 0.89 0.8936 0.10638 0.89361
Data190 0.85 0.84 0.84 0.8421 0.15789 0.84211
Data235 0.89 0.89 0.89 0.8936 0.10638 0.89361
Data190 0.88 0.87 0.87 0.8684 0.13157 0.86842
Data235 0.9 0.89 0.89 0.8936 0.10638 0.89361
Multi Layer
Perceptron
Support Vector
Machine
Naïve Bayes
Decision Tree
Random Forest
k-Nearest
Neighbor
AdaBoost
Jaccard
SimilarityDataset Precision Recall F-score Accuracy
Hamming
Loss
Task 2: ICD-10-AM/ACHI Code Assignment TASK 2:
ICD-10-AM/ACHI Code Assignment
Pattern Matching
Rule-BasedMachine Learning
Training-Testingnot required
Training-Testing required
Test Data(20%)
Data190
Digestive system: 22
Respiratory system:16
Total: 38
Data235
Digestive system: 26
Respiratory system:21
Total: 47
Number of Medical Records
Results: TASK 2 ICD-10-AM/ACHI Code Assignment
Data190 Data235
00.20.40.60.8
1
Pattern Matching Rule-Based
Precision
Recall
F-score
Accuracy
HL
JS
0
0.2
0.4
0.6
0.8
1
Pattern Matching Rule-Based
Precision
Recall
F-score
Accuracy
HL
JS
Approach Dataset Precision Recall F-score Accuracy
Hamming
Loss
Jaccard
Similarity
Data190 0.7953 0.4184 0.5277 0.4027 0.043 0.4365
Data235 0.8029 0.409 0.5201 0.3945 0.0405 0.4255
Data190 0.7913 0.6916 0.7257 0.6053 0.1728 0.5803
Data235 0.792 0.6872 0.7222 0.6011 0.1745 0.5768
Pattern
Matching
Rule based
TASK 2 Results: Machine Learning
Classifier Dataset Precision Recall F- score Accuracy
Hamming
Loss
Jaccard
Similarity
Data190 0.76798 0.45175 0.54361 0.44051 0.03706 0.44776
Data235 0.89308 0.55191 0.65373 0.54143 0.01955 0.52697
Data190 0.62534 0.63168 0.57465 0.44051 0.67841 0.42014
Data235 0.72891 0.61722 0.61821 0.49643 0.35158 0.48805
Data190 0.58333 0.25586 0.33523 0.25389 0.01392 0.27135
Data235 0.66666 0.30773 0.39793 0.29717 0.02453 0.32365
Data190 0.81421 0.81329 0.79115 0.66831 0.23514 0.65517
Data235 0.92392 0.92019 0.91412 0.86118 0.09458 0.82945
Data190 0.92062 0.85015 0.87305 0.79201 0.08776 0.74537
Data235 0.91407 0.91295 0.90351 0.84462 0.11271 0.79245
Data190 0.62938 0.29488 0.37559 0.29073 0.02192 0.29
Data235 0.63475 0.34756 0.38689 0.34537 0.00942 0.33055
Data190 0.68001 0.46388 0.51485 0.38567 0.34411 0.36667
Data235 0.57679 0.46974 0.40582 0.40993 0.24057 0.3913kNN
SVM
Naïve Bayes
Random
Forest
AdaBoost
Decision
Tree
MLP
Data190 results using 4-gram and Data235 results using 2-gram feature set
Comparison of approaches
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 Pattern Matching
Rule-based
Machine Learning
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 Pattern Matching
Rule-based
Machine Learning
Data190 Data235
Conclusion and Future Work
• Conclusion:
• Due to adoption of EHRs and advanced classification systems, there is the need to automate clinical workflow
• Computer Assisted Coding has capability to overcome the challenges of manual coding
• Machine Learning approach is capable to predict correct ICD-10-AM and ACHI codes
• Future Work:
• To work on large-scale data
• To work on other chapters of ICD-10-AM and ACHI classification system
• To apply Deep Learning and Hybrid approaches for Computer Assisted Coding
Thank you