11
COMBINING HUMAN & MACHINE INTELLIGENCE TO SUCCESSFULLY INTEGRATE BIOMEDICAL DATA TIMOTHY DANFORD | TAMR, INC.

Using Human+Machine Intelligence to Integrate Biomedical Data

Embed Size (px)

Citation preview

Page 1: Using Human+Machine Intelligence to Integrate Biomedical Data

COMBINING HUMAN & MACHINE INTELLIGENCE TO SUCCESSFULLY INTEGRATE BIOMEDICAL DATATIMOTHY DANFORD | TAMR, INC.

Page 2: Using Human+Machine Intelligence to Integrate Biomedical Data

THE DATA INTEGRATION PROBLEM

● flat files: every file has its own columns

● bioinformatics: every tool has its own file format

● graph data: RDF, OWL, “knowledge graphs”

● proprietary / legacy formats: SAS, DBF

● relational databases: inconsistent data models

Biomedical Data Integration is aConstantly Moving Target

Page 3: Using Human+Machine Intelligence to Integrate Biomedical Data

THE BIOMEDICAL DATA INTEGRATION PROBLEM

Fundamentally, many scientific analyses are tabularrows are ‘entities’

columns are ‘attributes’ graphs (paths) and hierarchies (part/whole) are other shapes

tables emphasize independence of entities and attributes

Tabular Datasets are a Core Data Shape

Page 4: Using Human+Machine Intelligence to Integrate Biomedical Data

THE BIOMEDICAL DATA INTEGRATION PROBLEM

● Column-oriented: Find the matching attributes● Row-oriented: Discover duplicate entities

Data Integration Proceeds In Two Directions

Page 5: Using Human+Machine Intelligence to Integrate Biomedical Data

THE DATA INTEGRATION PROBLEM

● One solution: hire or train data curators who understand the subject area

● Benefits: accuracy

● Problemso Low bandwidtho Difficult to scale to larger

problemso Recording decisionso Consistency between curators

Data Curation Teams Do Not Scale

Page 6: Using Human+Machine Intelligence to Integrate Biomedical Data

THE DATA INTEGRATION PROBLEM

● Build an automated or rules-based system to perform data integration

● Benefits: scale

● Problemso Accuracy, edge-caseso Programmers do not scaleo Out-of-band communicationo Expensive to maintaino Brittle in the face of new data

Rule-based Integration Is Brittle

Page 7: Using Human+Machine Intelligence to Integrate Biomedical Data

TAMR AUTOMATES DATA INTEGRATION

● Solution: combine learning rules with asking experts

● Modern machine learning techniqueso semi-supervised learningo active learning

● Benefits o speed of an automated systemo accuracy of human expertso auditability o responds well to changing

requirements

Use Probabilistic Rules with Active Learning

Page 8: Using Human+Machine Intelligence to Integrate Biomedical Data

TAMR AUTOMATES DATA INTEGRATION

● Build a unified schema and link it to source attributes

● Engage subject matter experts to answer questions

● Automate data transformation

● Eliminate redundant records with de-duplication

Tamr Combines Machine Learning and Expert Feedback

Page 9: Using Human+Machine Intelligence to Integrate Biomedical Data

● 80% of clinical data today goes unused● Clinical Data Warehouses capture legacy data● Improved analytics = better trials, less $$

Advanced Analytics, Better Clinical Trials

TAMR BUILDS LASTING VALUE

SAS

Faster Regulatory Filings

Better Clinical Analytics

Data Mining for New Indications

Page 10: Using Human+Machine Intelligence to Integrate Biomedical Data

CASE STUDY: CLINICAL STUDY DATA

● Clinical study data integration is motivated by a single schema: CDISCo mandated by FDA for data

submissiono common schema for clinical data

warehouses

● Mostly performed by SAS scripting today

● Tamr learns attribute mapping and transformations using human feedback

An Example: Clinical Study Data Integration

Page 11: Using Human+Machine Intelligence to Integrate Biomedical Data

Thank You