37
CPTR RDST Data Platform Concept September 22, 2014

CPTR RDST Data Platform Concept › wp-content › uploads › 2014 › 11 › ...Nov 22, 2014  · s2 y1 y2 y3 y4 y5 y6 y7 s.. z1 z2 z3 z4 z5 z6 z7 Strain 1 sequence data Strain 2

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • CPTR RDST Data Platform Concept September 22, 2014

  • Outline

    • C-Path overview and examples of data projects

    • Knowledge sharing concept

    • RDST approach

    • Examples of RDST data types

    • Database architecture

    • Next steps – timeline

    CPTR-RDST Data Platform 2014 Workshop Slides 2

  • C-Path Consortia

    Coalition Against Major Diseases UNDERSTANDING DISEASES OF THE BRAIN

    Critical Path to TB Drug Regimens TESTING DRUG COMBINATIONS

    Multiple Sclerosis Outcome Assessments Consortium

    DRUG EFFECTIVENESS IN MS

    Polycystic Kidney Disease Consortium NEW IMAGING BIOMARKERS

    Patient-Reported Outcome Consortium DRUG EFFECTIVENESS

    Electronic Patient-Reported Outcome Consortium DRUG EFFECTIVENESS

    Predictive Safety Testing Consortium DRUG SAFETY

    Seven global consortia developing novel drug development tools

    Biomarkers

    Clinical Outcome Assessment Instruments

    Clinical Trial Simulation Tools

    In vitro tools

    Data Standards

    3

  • C-Path Online Data Repository

    Current C-Path examples CAMD – AD Clinical Trial Simulation Tool PKD - Biomarker Qualification Project MSOAC – New Outcome Assessment Instrument for MS

    MSOAC

    4

    CDC TB study data now available

  • Datasets contributed to C-Path for consortia projects

    Consortium Therapeutic Area # of

    Studies

    Total Number

    of Subjects

    Number of

    Data

    Contributors

    Coalition Against

    Major Diseases

    Alzheimer's disease 27 7340 11

    Parkinson's disease 7 2597 2

    Critical Path to

    TB drug Regimens Tuberculosis 10 2495 5

    MS Outcome

    Assessments Consortium Multiple sclerosis 6 4700 4

    Polycystic Kidney Disease Polycystic kidney

    disease 5 2941 4

    Predictive

    Safety

    Testing

    Consortium

    Normal healthy

    volunteer-kidney 1 172 1

    Skeletal-muscular

    (non- clinical) 38 1766 6

    Hepato-toxicity

    (non-clinical) 43 2340 7

    Nephro-toxicity

    (non-clinical) 14 941 8

    5

  • Value of data sharing, data standards & data pooling

    Nine member companies agreed to share data from 24 Alzheimer’s disease (AD) trials

    The data were not in a common format The data were remapped to the CDISC AD

    standard and pooled

    A new clinical trial simulation tool was created and has

    been the first model endorsed by the FDA and EMA Researchers utilizing database to advance research

    Start Point

    Result

    24 studies, >6500 patients

    6

  • 7

    Model endorsed by FDA and EMA

    Access to AD data available to

    qualified researchers

  • Future TB model

    8

  • Rapid DST TB Data Sharing Platform Architecture Concept

    9

  • CPTR TB Drug Resistance DB

    Data Platform to Inform Assay Development

    10

    How do we build this system?

    Linking Global TB Sequence Researchers

    • TB sequence community inputs

    • Expert review to advance investigational biomarkers to validated status

    • Can use separate or consolidated DBs

    • FDA compliant CDISC architecture for regulatory submission DB

    Approved members • Academic labs • Reference labs • Commercial companies • Others…

    Validated DR biomarkers

    Expert Panel

    Review

    Approved biomarkers

    Sequence repository Anonymized sequence data Clinical annotation Phenotypic methods User friendly cloud interface

    Analysis files generated

    Investigational DB

    Analysis files generated

    Analysis files generated

    Analysis files generated

    Analysis files generated

    Analysis files generated

    CPTR-RDST Data Platform 2014 Workshop Slides

  • How do we accomplish this?

    • Clear objective

    – Improved research resource to enable development of new rapid diagnostics for TB

    • Future objective

    – With sustainability funding: resource for clinicians

    • Build on previous efforts for TB and for data sharing

    – Apply technology product development discipline

    – Design to handle wide range of data types

    – Quality criteria and defined process for incoming data

    – Lean, efficient and well managed implementation

    – Expandable / adaptable / flexible

    – Great usability

    • Strong alignment with anticipated analysis use cases

    11 CPTR-RDST Data Platform 2014 Workshop Slides

  • Related efforts: TBDReamDB

    12 CPTR-RDST Data Platform 2014 Workshop Slides http://www.tbdreamdb.com/index.html

  • Example of future objective: Stanford HIV database

    13 CPTR-RDST Data Platform 2014 Workshop Slides http://hivdb.stanford.edu/

  • Product development discipline

    • Detailed, documented requirements

    • Early prototyping

    • Design to requirements

    • Staged development with clear milestones

    • Extensive testing

    – Verification of all features and function

    – Usability

    – Performance

    – Scalability

    • Phased rollout

    – RDST members

    – Qualified external researchers

    • Ongoing support and enhancements based on user feedback

    Requirements

    Prototype

    Design

    Build

    Test

    Deploy

    Support and enhance

    14 CPTR-RDST Data Platform 2014 Workshop Slides

  • RDST Data: multiple data types

    15

    Need to incorporate multiple types of data

    • sequence data

    • SNP reports

    • resistance test data

    • clinical trial/study/registry data

    • external information resources

    • any other data that may be necessary

    Which need to be analyzed to find and validate correlations

    CPTR-RDST Data Platform 2014 Workshop Slides

  • RDST Data: genotypic data example

    @M00347:61:000000000-A9B8J:1:1101:15324:1677 1:N:0:1

    TCTTGATCGCGAGTTCGCGGCCCGGGGTGAGCACCCAGGTGAGCGGGAAATGCGTGGTGTCGTGGTAGCTGACGTCGACGATGCCGTGGCGGTATTCGAGGTCTGTGAACGTGTCGTCGTCGAGGAAGTTCTGCAGCACCAGCAGCGGATC

    +

    11>>1@BF1>>11AEF00000AA////A//A1AB/?/GAEC1GBE??///FFG/?E?EFHF/F?A?EG1BDBC/FCGGCCCC-.1011/0//?/333333/030333044300///B1111/00//>22222211111111101111100

  • RDST Data: SNP report example

    17

    SNP reports

    CPTR-RDST Data Platform 2014 Workshop Slides

  • RDST Data: phenotypic data example

    18

    resistance test data

    CPTR-RDST Data Platform 2014 Workshop Slides https://tbdr.org/cgi/tbdr

  • STUDYID DOMAIN USUBJID AGE SEX RACE ARM

    19 DM 10001 27 F WHITE Ethambutol 5 Times Per Week

    19 DM 10002 63 M WHITE Moxifloxacin 3 Times Per Week

    19 DM 10003 42 M BLACK OR AFRICAN AMERICAN Moxifloxacin 5 Times Per Week

    19 DM 10004 30 F ASIAN Moxifloxacin 5 Times Per Week

    19 DM 10005 29 M BLACK OR AFRICAN AMERICAN Moxifloxacin 3 Times Per Week

    19 DM 10006 35 M BLACK OR AFRICAN AMERICAN Ethambutol 3 Times Per Week

    19 DM 10007 46 F UNKNOWN Ethambutol 3 Times Per Week

    19 DM 10008 34 F BLACK OR AFRICAN AMERICAN Moxifloxacin 5 Times Per Week

    19 DM 10009 55 M BLACK OR AFRICAN AMERICAN Ethambutol 3 Times Per Week

    19 DM 10010 42 M ASIAN Moxifloxacin 5 Times Per Week

    19 DM 10011 23 F BLACK OR AFRICAN AMERICAN Ethambutol 3 Times Per Week

    19 DM 10012 47 F WHITE Ethambutol 3 Times Per Week

    19 DM 10013 25 F BLACK OR AFRICAN AMERICAN Moxifloxacin 5 Times Per Week

    19 DM 10014 21 M WHITE Ethambutol 3 Times Per Week

    19 DM 10015 79 M WHITE Moxifloxacin 3 Times Per Week

    19 DM 10016 27 F ASIAN Moxifloxacin 3 Times Per Week

    19 DM 10017 37 M BLACK OR AFRICAN AMERICAN Ethambutol 3 Times Per Week

    19 DM 10018 28 M BLACK OR AFRICAN AMERICAN Moxifloxacin 3 Times Per Week

    RDST Data: clinical data example (hypothetical data)

    19

    STUDYID DOMAIN USUBJID MBTESTCD MBTEST MBORRES MBSPEC VISIT

    13 MB 10001 AFB Acid Fast Bacilli NEGATIVE SPONT SPUTUM WEEK 8

    13 MB 10001 ORGANISM Organism Present NEGATIVE FOR TUBERCULOSIS SPONT SPUTUM WEEK 4

    13 MB 10001 MTBINH M.tuberculosis INH Resistant POSITIVE NON-OVERNIGHT SPUTUMSCREENING

    15 MB 10001 ORGANISM Organism Present NEGATIVE FOR TUBERCULOSIS SPONT SPUTUM WEEK 4

    13 MB 10002 AFB Acid Fast Bacilli NEGATIVE SPONT SPUTUM WEEK 8

    13 MB 10002 ORGANISM Organism Present POSITIVE FOR M. TUBERCULOSIS COMPLEX SPONT SPUTUM SCREENING

    15 MB 10002 ORGANISM Organism Present POSITIVE FOR M. TUBERCULOSIS COMPLEX SPONT SPUTUM SCREENING

    13 MB 10003 ORGANISM Organism Present NEGATIVE FOR TUBERCULOSIS INDUCED SPUTUM WEEK 4

    15 MB 10004 ORGANISM Organism Present NEGATIVE FOR TUBERCULOSIS SPONT SPUTUM WEEK 4

    STUDYID DOMAIN USUBJID MOTESTCDMOTEST MOORRES MOSTRESC MOLOC VISIT MODY

    17 MO 10001 CAVIT Cavitation Y Y LUNG, LEFT SCREENING -4

    17 MO 10002 CAVIT Cavitation Y Y LUNG, LEFT SCREENING -5

    17 MO 10002 PLEURALD Pleural Disease N N LUNG, LEFT SCREENING 1

    17 MO 10004 PLEURALD Pleural Disease N N LUNG, LEFT SCREENING -8

    17 MO 10005 CAVIT Cavitation N N LUNG, LEFT SCREENING -9

    17 MO 10005 CAVIT Cavitation N N LUNG, LEFT SCREENING -15

    17 MO 10006 CAVIT Cavitation Y Y LUNG, LEFT SCREENING 1

    17 MO 10006 PLEURALD Pleural Disease N N LUNG, LEFT SCREENING -7

    clinical trial data

    CPTR-RDST Data Platform 2014 Workshop Slides

  • RDST Data: TB strain summary table

    http://www.ncbi.nlm.nih.gov/genome/genomes/166 20

    external resources

    CPTR-RDST Data Platform 2014 Workshop Slides

  • RDST data platform: design to handle multiple data types

    Aggregated Research Database

    Subject – Level Clinical Trial Data

    VAR1 1 2 3 4 5 6 7

    s1 x1 x2 x3 x4 x5 x6 x7

    s2 y1 y2 y3 y4 y5 y6 y7

    s.. z1 z2 z3 z4 z5 z6 z7

    Time

    VAR2 1 2 3 4 5 6 7

    s1 x1 x2 x3 x4 x5 x6 x7

    s2 y1 y2 y3 y4 y5 y6 y7

    s.. z1 z2 z3 z4 z5 z6 z7

    VAR3 1 2 3 4 5 6 7

    s1 x1 x2 x3 x4 x5 x6 x7

    s2 y1 y2 y3 y4 y5 y6 y7

    s.. z1 z2 z3 z4 z5 z6 z7

    Strain 1 sequence data

    Strain 2 sequence data

    Strain 3 sequence data

    ACAAGATGCCATTGTCCCGCT…

    CCTGGAGGGTGGGAGACA…

    CTTTCCTCGCTTGGGTGG…..

    21

    Clinical Trial Data Data

    Analysis

    Data Analysis

    OBS1 1 2 3 4 5 6 7

    s1 x1 x2 x3 x4 x5

    s2 y1 y2 y3 y4 y5

    s.. z1 z2 z3 z4

    VAR1 1 2 3 4 5 6 7

    s1 x1 x6 x7

    s2 y1 y4 y5 y6 y7

    s.. z1 z2 z3 z4 z5 z6 z7

    TEST1 1 2 3 4 5 6 7

    s1 x_base x_chk1 x7

    s2 y_base y_chk1 y7

    s.. z_base z2 z3 z7

    Surveillance Data Time

    Apply CDISC Data Standards

    Surveillance data

    Genotypic data

    Phenotypic data data analysis

    CPTR-RDST Data Platform 2014 Workshop Slides

  • Key success factors for incoming data

    • Buy in for data contributions

    – Survey and prioritize

    – Proactive engagement

    – Recognition and incentives for contributions

    • Clearly defined quality criteria

    – Develop and vet during initial data survey & prioritization

    • Consistent process for incoming data processing

    – Unified pipeline for incoming sequence data

    – Ability to apply CDISC standards to create efficient database (vs large number of small data buckets)

    • Ongoing curation and quality control

    22 CPTR-RDST Data Platform 2014 Workshop Slides

  • Quality criteria and defined process for incoming sequence data

    23

    Incoming FASTQ plus associated SNP report

    New SNP report generated with RDST unified pipeline

    RDST unified pipeline for sequence data

    CPTR-RDST Data Platform 2014 Workshop Slides

  • Data Element: Phase of TB treatment

    Data Element: TB Symptoms

    24

    TB clinical data mapping to CDISC

    We do this today for CPTR

    USUBJID EXTRT EXDOS EXDOSU

    USUBJID CETERM CEPRESP CEOCCUR

    Clin

    ical E

    vents

    (CE

    )

    Exposu

    re

    (E

    X)

    Skin

    Response

    (SR

    )

    USUBJID SRTESTCD SRTEST SRORRES SRORRESU

    12345 INDURDIA Induration

    Diameter

    16 mm

    Controlled Terminology

    Map to CDISC domains

    CDISC Variables

    Data Element: Tuberculin Skin Test Result Definition: The number of millimeters in diameter of the induration, or raised hardening, at the tuberculin skin test site. Permissible value set: mm of induration.

    Preserve, do not change the data content A place for everything, everything in its place Capture the smallest usable elements of data

    24 CPTR-RDST Data Platform 2014 Workshop Slides

  • Rapid DST Data Platform

    25

    • Can we apply CDISC standards to TB genotypic and phenotypic data?

    • What is the benefit of doing this?

    CPTR-RDST Data Platform 2014 Workshop Slides

  • Interventions Special

    Purpose

    Demographics

    Subject Elements

    Subject Visits

    Findings

    ECG

    Incl/Excl Exceptions

    Events

    Con Meds

    Disposition Comments

    Trial Design

    Trial Elements

    Trial Arms

    Trial Visits

    Trial Incl/Excl

    Exposure

    Substance Use

    Adverse Events

    Medical History

    Deviations

    Clinical Events

    PK Concentrations

    Vital Signs

    Microbiology Spec.

    Questionnaire

    Drug Accountability

    Subject Characteristics

    Labs

    Microbiology Suscept. PK Parameters

    Physical Exam

    Trial Summary Findings About

    26 CPTR-RDST Data Platform 2014 Workshop Slides

    CDISC Study Data Tabulation Model (SDTM) domains for classification of data elements

  • Data Mapping: SNP report example

    27

    PFORREF – reference result (can apply to nucleotides or amino acids, depends on value in PFTEST)

    PFORRES – experimental result (can apply to nucleotides or amino acids, depends on value in PFTEST)

    PFRESCAT – category of result (is this a nonsense or missense mutation? frameshift? etc.)

    PFGENTYP – type of feature we’re looking at (gene, sector, protein, etc.)

    PFGENRI – region of interest (it is defined as the specific gene or locus being looked at)

    PFSTRESC – standard result of the analysis. Usually uses HGVS nomenclature

    CPTR-RDST Data Platform 2014 Workshop Slides

  • Rapid DST Data Platform

    28

    Three primary categories of data

    • Data as received from contributors

    • Quality checked, processed, standardized data

    –Master copy

    –Full complement of data for RDST consortium use

    –Authorized subset for external researchers (as broad as possible within sharing terms and conditions imposed by each data contributor)

    • Analysis data extracts and reports to support research

    CPTR-RDST Data Platform 2014 Workshop Slides

  • Rapid DST Data Platform

    + Lean, efficient and well managed implementation + Expandable / adaptable / flexible + Great usability

    Strong alignment with anticipated analysis use

    cases

    29

  • Rapid DST Data Platform

    30

    Next Steps

    CPTR-RDST Data Platform 2014 Workshop Slides

  • 2014 2015 2016 2017

    S O N D J F M A M J J A S O N D J F M A M J J A S O N D J F M A M J J A S O

    S O N D J F M A M J J A S O N D J F M A M J J A S O N D J F M A M J J A S O

    2014 2015 2016 2017

    RDST Data Sharing Platform Timeline v4

    1.1 Governance Model

    1.5 Dev Ph 1 Dev Ph 2

    Data Platform Available for Consortium Members

    2.4 Perform phase 1 program assessment

    2.6 Enable for external researchers

    Sustainability Funding Secured

    C-Path Milestone

    1.3 Req’s, Arch and Design

    1.2 Value proposition, DUA updates and Communication Plan

    1.5 Test Ph 1 1.7 Test Ph 2

    Dev Ph 3

    1.8 Test Ph 3

    3.5 Perform Phase 2 program assessment

    Data Platform Available for external researchers

    3.6/3.8 Release 2 Dev/Test

    FIND Milestone

    2.5 Expand Capacity 3.7 Expand Capacity

    1.6/2.2/3.3 Prepare and load contributed data in Data Platform as it becomes available

    2.3/3.4 Review and approve access requests as they are submitted

    2.1/3.2 Monitor performance and usage

    1.1.1 Inventory of available DBs

    1.2.1 Form Expert Panel and support Data Platform development and use

    1.2.2 Develop criteria for determination of resistance mutations

    1.2.3 Develop algorithms for interpretation of genotypic data

    1.2.3 Published algorithm for interpretation of genotypic data

    1.3.1 Develop guidelines/criteria for clinical validation of assays to detect/interpret resistance mutation

    1.4.1 Support for development of access models and tools for broad access

    PHASE 2 PHASE 3

    C-PATH Milestones

    3.9 Pursue funding to support clinical use

    2.8/3.1 Pursue sustainability funding

    1.6 Load early data

    2.7 Beta Test

    1.4 Request early data

    1.9 Prep for production

    Request early data

    1.5.1/1.5.2 Support for sustainable business model and review process

    1.1.2 Early Data Packages available for inclusion in Data Platform

    1.2.2 Defined Criteria for determination of resistance mutations

    1.3.1 WHO report on guidelines/criteria for validation of assays to detect and intrerpret resistance mutations

    1.1.2 Prepare data packages for inclusion in Data Platform

    1.1.3 Input to C-Path on design of Data Platform

    PHASE 1

    FIND Milestones

    Assist with development of Value

    Proposition and Communications Plan

    Next steps: the art of the start

    Build and deploy Expanded access Sustainability

  • Rapid DST Data Platform

    32

    • Big job in front of us

    • We are not starting from scratch

    • We have lots of help

    We can do this!

    CPTR-RDST Data Platform 2014 Workshop Slides

  • 33

    www.c-path.org

    CPTR-RDST Data Platform 2014 Workshop Slides

  • Rapid DST Data Platform

    34

    Backup

    CPTR-RDST Data Platform 2014 Workshop Slides

  • Rapid DST Data Platform

    35

    + Lean, efficient and well managed implementation + Expandable / adaptable / flexible + Great usability

    CPTR-RDST Data Platform 2014 Workshop Slides

  • Rapid DST Data Platform

    36

    Investigational DB with user access levels

    (data team, RDST, external)

    use

    r

    frie

    nd

    ly

    cl

    ou

    d

    in

    terf

    ace

    FASTQ data files

    internal

    Incoming data

    storage

    external

    CPTR-RDST Data Platform 2014 Workshop Slides

  • Rapid DST Data Platform

    37

    Strong alignment with anticipated analysis use cases

    CPTR-RDST Data Platform 2014 Workshop Slides