15
An introduction to analysing a SNOMED CT coded dataset using a FHIR terminology server Matt Cordell Terminology Specialist

An introduction to analysing a SNOMED CT coded dataset

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An introduction to analysing a SNOMED CT coded dataset

An introduction to analysing a SNOMED CT coded dataset using a FHIR terminology server

Matt CordellTerminology Specialist

Page 2: An introduction to analysing a SNOMED CT coded dataset

A quick introduction to SNOMED CT, FHIR & Ontoserver

SNOMED CT

• Much larger than most other code systems traditionally used in healthcare (ICD, ICPC etc.)

• Primary purpose is recording clinical notes, with the specificity required by clinicians, and interoperability– Structure* supports secondary uses (analytics).

• Codes have no intrinsic meaning, simply identifiers. 278285008|Left hemiplegia| & 278284007|Right hemiplegia|

• Concepts in the terminology are associated by range of relationships, forming an Ontology.

• Expression Constraint Language (ECL) – language that supports sophisticated queries against the terminology.

FHIR

• Latest Interoperability standard from HL7, supporting modern RESTful practices. (ValueSets)

Ontoserver

• Provides FHIR based access to terminology, including ECL support

• Made available for use throughout Australia via the National Clinical Terminology Service (NCTS)

Page 3: An introduction to analysing a SNOMED CT coded dataset

ECL in 90 seconds

<396234004|Infective arthritis| All (Subtypes) of Infective arthritis

<64572001|Disease|:116676008|Associated morphology|=23583003|Inflammation|

All Diseases associated with inflamation

<928000|Musculoskeletal disorder|:246075003|Causative agent|=<<49872002|Virus|

Musculoskeletal disorders with some Viral involvement

Page 4: An introduction to analysing a SNOMED CT coded dataset

What might a SNOMED CT dataset look like?

Unique Conditions : 24647

Unique Medications: 10128

Rows : 500,000

* Randomly generated synthetic dataset

Index Sex DoB PostCode Condition Medication

0 F 26/04/1998 B03 102930000 7086011000036102

1 F 24/01/1953 E00 49512000 1112071000168105

2 M 7/09/1943 E00 277627005 5604011000036100

3 M 1/01/1966 E00 3109008 3231000036108

4 F 14/02/1957 E00 723409007 6286011000036105

5 M 14/08/1961 E00 3272007 761951000168100

6 F 28/01/1986 C04 86225009 921045011000036104

7 F 15/06/1983 C04 163577001 NaN

8 F 23/05/1967 C04 191737008 927853011000036101

… … … … … …

499998 M 16/01/1984 B09 443919007 36227011000036103

499999 M 28/03/1995 B09 723913009 5081011000036108

Page 5: An introduction to analysing a SNOMED CT coded dataset

Basic outline of approach to SNOMED CT analytics

o Define aggregation categories using SNOMED CT Expression Constraint Language (ECL)

o Identify all the codes that match our category, using Ontoserver to perform valueSet Expansions.

o Store the results of each expansion in a Hash Set for fast lookup.

o Use the Sets to filter our dataset, and optionally create human readable labels.

o Use standard analytic approaches to report and visualise the data.

Page 6: An introduction to analysing a SNOMED CT coded dataset

Populate Set with ECL

• Create a GET request with the ECL parameter

• Parse the JSON response to a FHIR Value Set

• Iterate through the Value Set and populate

the Hash with just the codes.

• Return the Hash.

import requests #for Rest calls

from fhir.resources.valueset import ValueSet

def PopulateSetWithECL(ecl):

endpoint= https://ontoserver.csiro.au/stu3-latest

expandAPI="/ValueSet/$expand“

sctValueSetUrl='http://snomed.info/sct?fhir_vs=ecl/’

urlParam={'url':sctValueSetUrl+ecl}

response=requests.get(endpoint+expandAPI,params=urlParam)

j=response.json()

vs=ValueSet(j)

_set=set()

for e in vs.expansion.contains:

_set.add(e.code)

return _set

Page 7: An introduction to analysing a SNOMED CT coded dataset

Creating Health Condition Labels

o A list of tuples, each tuple consisting of an ECL definition and label

o Iterate through this list

o Create the Hash Set based of the ECL

o Create Boolean filter for concepts that match the Set

o Label accordingly in a new “Category” column.

healthCategories=[

('<<106028002','Musculoskeletal problems’),

('<<106048009','Respiratory problems’),

('<<195967001','Asthma’),

('<<363346000','Cancer’),

('<<13645005','COPD’),

('<<73211009','Diabetes mellitus’),

('<<106063007','Cardiovascular problems’),

('<<249578005','Kidney problems’),

('<<74732009','Mental illness’),

('<<40733004','Infectious disease’),

('<<414022008','Blood disease’)]

for category in healthCategories:

categorySet = PopulateSetWithECL(category[0])

filter = codeSet["Condition"].isin(categorySet)

codeSet.loc[filter,"Category"]=category[1]

Index Sex Condition Medication Category

0 F 102930000 7086011000036102Other Condition

1 F 49512000 1112071000168105 Mental illness

2 M 277627005 5604011000036100 Cancer

… … … … …

499998 M 443919007 36227011000036103 Mental illness

499999 M 723913009 5081011000036108 Mental illness

Page 8: An introduction to analysing a SNOMED CT coded dataset

codeSet.groupby(['Category','Sex']).size()Category Sex Count

Blood disease F 7741

M 3295

Cancer F 1909

M 3298

Cardiovascular problems F 13716

M 10481

Diabetes mellitus F 18463

M 10362

Infectious disease F 1435

M 368

Kidney problems F 531

M 356

Mental illness F 106980

M 104910

Musculoskeletal problems F 1817

M 1400

Other Condition F 107163

M 105340

Respiratory problems F 230

M 205

Page 9: An introduction to analysing a SNOMED CT coded dataset

Category OverlapOverlap managed by:• Categories ordered by priority

• Later categories overwrite; or• Only label unlabled

• Build disjointness into ECL

<<106048009|Respiratory|

Minus (

<<363346000|Cancer|

OR <<106028002|Musculoskeletal|

OR <<40733004|Infectious

)

Use case dependent,

especially where double counting

Page 10: An introduction to analysing a SNOMED CT coded dataset

Counting Opioidso Again, iterate through this list as before, adding an “Opioid”

labelopioids= [('<34841011000036108','dihydrocodeine'),

('<21821011000036104','codeine'),

('<21705011000036108','pholcodine'),

('<21232011000036101','buprenorphine'),

('<21357011000036109','methadone'),

('<135971000036102','tapentadol'),

('<21258011000036102','fentanyl'),

('<21259011000036105','oxycodone’),

('<21252011000036100','morphine'),

('<21486011000036105','tramadol'),

('<21901011000036101','dextropropoxyphene'),

('<34839011000036106','pethidine’),

('<1247191000168104','sufentanil')]

for opioid in opioids:

OpioidSet = PopulateSetWithECL(opioid[0])

filter = codeSet[“Medication"].isin(OpioidSet)

codeSet.loc[filter,"Opioid"]= opioid[1]

Index Sex Medication Opioid

65 M 7349011000036100 oxycodone

219 M 1070441000168107 codeine

648 F 1048081000168105 buprenorphine

... ... ... ...

499738 F 34022011000036100 methadone

499802 M 785911000168101 fentanyl

499951 M 36062011000036104 dextropropoxyphene

Page 11: An introduction to analysing a SNOMED CT coded dataset

Opioids

Page 12: An introduction to analysing a SNOMED CT coded dataset

Using AMT’s “Concrete domain” in ECL

/*High Dose, 200mg or greater*/

<30497011000036103|medicinal product|:

{

30364011000036101|has Au BoSS|=1817011000036100|aspirin|,

700000111000036105|Strength| >= #200,

177631000036102|has unit|=700000801000036102|mg/each|

},

[1..1] 700000081000036101|has intended active ingredient|=ANY

53798011000036101|Ecotrin 650 mg enteric tablet|

/*Low Dose <200mg */

<30497011000036103|medicinal product|:

{

30364011000036101|has Au BoSS|=1817011000036100|aspirin|,

700000111000036105|Strength| < #200,

177631000036102|has unit|=700000801000036102|mg/each|

},

[1..1] 700000081000036101|has intended active ingredient|=ANY

/*Combination Aspirin Products*/

<21719011000036107| aspirin (MP)|:

[2..*] 700000081000036101|has intended active ingredient|=ANY

Page 13: An introduction to analysing a SNOMED CT coded dataset

“Concrete Domain” expansions

High Dose – 28 concepts

o Solprin 300 mg dispersible tablet

o Disprin Direct 300 mg chewable tablet

o Alka-Seltzer Lemon-Lime 324 mg effervescent tablet

Low Dose – 27 concepts

o Spren 100 mg tablet

o Cardasa 100 mg enteric tablet

o Aspirin Low Dose (Nyal) 100 mg enteric tablet

Combination Products – 54 concepts

o Clopidogrel/Aspirin 75/100 (AN) tablet

o Duoprel 75/100 tablet

o Action Cold and Flu effervescent tablet

Page 14: An introduction to analysing a SNOMED CT coded dataset

Additional Resources

snomed.org/eclSNOMED CT ECL Specification

ontoserver.csiro.au/shrimpShrimp Browser

github.com/AuDigitalHealth/ecl-examplesAgency ECL examples

bit.ly/SNOMED_HDA19Supplementary Jupyter Notebook

Page 15: An introduction to analysing a SNOMED CT coded dataset

Contact us

1300 901 001

[email protected]

healthterminologies.gov.au

twitter.com/AuDigitalHealth

Help Centre

Website

Twitter

Email

OFFICIAL