39
HL7 Clinical-Genomics SIG Genotype Topic, V0.1 VOCABULARIES This document is derived from the complete Genotype Topic Ballot Document and contains the GeneticLocus walk-through section, including class property tables as well as vocabulary tables Please introduce your comments / contributions directly in the text while keeping the Word "Track Changes" option on! June 12, 2007 Please send comments to: HL7-CG listserv: [email protected] If you do not subscribe to our listserv, please send your comments to the group facilitator: Amnon Shabo (Shvo) 1 , Ph.D. at [email protected] . 1 IBM Research Lab in Haifa. 1

HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

HL7 Clinical-Genomics SIG

Genotype Topic, V0.1

VOCABULARIES This document is derived from the complete Genotype Topic

Ballot Document and contains the GeneticLocus walk-through section, including class

property tables as well as vocabulary tables

Please introduce your comments / contributions directly in the text while keeping the Word "Track Changes" option on!

June 12, 2007

Please send comments to:HL7-CG listserv: [email protected]

If you do not subscribe to our listserv, please send your comments to the group facilitator:Amnon Shabo (Shvo)1, Ph.D. at [email protected].

1 IBM Research Lab in Haifa.

1

Page 2: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

Vocabulary Discussion:In a typical HL7 model, vocabulary tables list the various codes (or external terminologies & markups) that should be used in specific attributes of the model's classes. For more efficient discussion of the Genotype Topic vocabularies, it might be useful to distinguish between the following types of vocabularies in the Genotype Topic models (e.g., GeneticLocus):

1. Additions to existing HL7 vocabularies (e.g., ObservationInterpretation)2. Unique genomic vocabularies, e.g., zygosity, listing genomic observations or

methods that cannot be incorporated into existing HL7 vocabulary3. Refinement to existing HL7 vocabularies (e.g., ObservationMethod has the code

"Nucleic Acid Sequence Based Analysis" which might need refinement)4. Link to value sets drawn from external terminologies (e.g., LOINC, NCI),

ontologies (e.g., GO), reference database (e.g., dbSNP) and so forth5. Bioinformatics markup (e.g., MAGE, BSML)

In the context of Clinical Genomics, a unique vocabulary contribution of the Genotype Topic models could be in listing codes for the different types of Genotype-Phenotype relationship, both for known phenotypes (knowledge) and observed phenotypes (patient-specific data). This vocabulary could be proposed as addition to the HL7 ActRelationship.typeCode vocabulary and can be utilized in the GeneticLocus model, in the association between the genomic observations and the phenotype observation (currently this type is fixed to the generic HL7 notion of "pertinent information").

The following is an attempt to group the vocabularies in this document by the above types:

Type 1:SequenceVariation.interpretationCode, Expression.value

Type 2:GenotypeLocus.code, GenotypeLocus.value, LocusAssociatedObservation.code, LocusAssociatedObservation.value, Sequence.value, SequenceProperty.code, SequenceProperty.value, SequenceVariationProperty.code, SequenceVariationProperty.value, Expression.code, ExpressionProperty.code, ExpressionProperty.value, reference.typeCode (associations between different genes, alleles and loci)

Type 3: Individualallele.methodCode, Sequence.methodCode, ExpressionProperty.methodCode

Type 4: GenotypeLocus.value, IndividualAllele.value, SequenceVariation.value, Polypeptide.value, DeterminantPeptide.value

Type 5:Sequence.code, SequenceVariation.value

2

Page 3: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

GeneticLocus OverviewThe GeneticLocus model describes data relating to a genetic locus (position of a particular given sequence in a genome or linkage map), which we propose to be the basic unit of genomic information exchange in healthcare. This model is not meant to be a biological model; rather it is aimed at the needs of healthcare with the vision of personalized medicine in mind. Also it could facilitate the needs of clinical research conducted within the healthcare enterprises. It is the result of our effort to look for the commonalities in each genomic-oriented storyboard that we have been working on (i.e., Tissue Typing, Cystic Fibrosis, BRCA and Pharmacogenomics). Thus, this model contains a subset of the overall Clinical Genomics DIM as it focuses on a single locus. It appears as a CMET in both the Clinical Genomics DSTU as well as in the HL7 Common Domains so that every HL7 group that needs to convey high-resolution genomic data could utilize it. The Genetic Locus might be derived and further constrained by its main subject (e.g., human, animal, and viral) or by type of genomic data (e.g., DNA, Expression and Proteomics). The trial use of this draft standard will guide the derivation process in a way that is most useful to its early adopters.

Main Characteristics:

The entry point to this model is a GeneticLocus observation which could be associated with a pair of alleles on paternal and maternal homologous chromosomes.

Core observations associated directly with each allele are Sequence, Sequence Variation and Expression. These core classes are also the ones which encapsulate raw genomic data.

In addition, sequence, sequence variation and expression data could be associated directly with the GeneticLocus observation if data is not available at the allelic level granularity.

In addition to the core classes, other classes hold extracted and derived data such certain types of variations, expression main results, references to as set of loci (e.g., a haplotype that this gene belongs to), and proteomics data (e.g., determinant peptides).

Both the GeneticLocus and IndividualAllele observations can recurse to represent relationships to genes/alleles from other loci (e.g., in a biological pathway).

The Sequence class could recurse as well to allow the representation of the translational path, i.e., DNA, RNA and Protein sequences, derived from each other.

The following figure show a bird's eye view of the GeneticLocus model:

3

Page 4: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

The GeneticLocusand its Alleles

SequenceVariations

ExpressionData

Sequenceand

Proteomics

ClinicalPhenotypes

The GeneticLocus Model - Focal Areas:

Common Attributes:The reader should refer to the HL7 RIM (Reference Information Model) documentation for a complete set of class & attribute descriptions. Nevertheless, for convenience we bring here the definitions of the three most common attributes we use in the Genotype classes2:

Act.id: A unique identifier for the Act

Act.code: A code specifying the particular kind of Act that the Act-instancerepresents within its class (e.g., for clinical observation-acts: physical examination, serum potassium, etc.)

Observation.value: Information that is assigned or determined by the observation action(Note: this attribute holds the actual observation result)

The Use of the 'id', 'code' and 'value' Attributes:The use of these attributes in the various Genotype classes depends on the extent to which the data has being personalized and how different are the results from the known genome. It is also different in those classes that encapsulate raw genomic data.

For example, in the IndividualAllele class, in the case that the patient's allele was fully sequenced and found to be slightly different 2 Note that these classes are refinement of the RIM Observation or Procedure classes, which are both a type of the RIM core class – Act.

4

Figure 2: A bird's eye view of the GeneticLocus model .

Page 5: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

than the one registered in GenBank or other reference databases, there is no external code to place in one of those attributes, rather the IndividualAllele class is associated with the Sequence class where the individual sequence could be placed in the value attribute.

If it is a new allele indeed, temporary identifiers could be placed in the id attribute until it is registered externally. If, however, that is a known allele, then the 'value' attribute can be populated with the appropriate code from GenBank for example. In this case there isn't much point in populating the Sequence class as it can be retrieved from GenBank, but for self-containment purposes in a specific implementation, it could be that the GenBank sequence will be copied and placed in the Sequence class.

Overall, the 'id' attribute should be used to uniquely identifying that specific instance (possibly using the LSID format). The 'code' attribute should identify the kind of data stored in the 'value' attribute, and the 'value' attribute should hold actual data, for example, a genotype characteristic (e.g., heterozygous) or an external gene code from GenBank, dbSNP. In the 'encapsulating' classes (e.g., Sequence, Expression, etc.) the 'value' attribute should hold the bioinformatics markup itself. In the latter case, the code should hold an indication of the exact bioinformatics format used to populate the 'value' attribute.

Structural Attributes:Each class in the Genotype model also has two mandatory structural attributes: classCode and moodCode, which have both fixed or default values: the classCode attribute has a default value of OBS (i.e., Observation) and the moodCode is fixed to EVN (event) designating that theses observations and procedures already happened and are not ordered or just an intent perhaps. In a later phase of our work, this attribute might be relaxed to allow the ordering of specific genomic data (when genomic testing will become routine in healthcare practice), nevertheless, the order of genetic test is currently done through a single test code and the genetic testing lab performs the tests as described in its test catalog. Thus, these HL7 models are payloads of observations that have been completed.

Color legend: Red classes are "encapsulating classes; Blue classes are "bubbled-up" classes.

Important Note:All vocabularies presented in the models walk-through are considered informative part pf the ballot as they are still under development.

5

Page 6: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

The GeneticLocus Model Walk-Through

Entry Point and Locus/Gene/Allele Classes:

GeneticLocus

Important note: The name ‘GeneticLocus’ refers to ALL genomic data and aspects of a specific locus along a chromosomal or mitochondrial DNA.

The GeneticLocus class is the entry point of the GeneticLocus model. A genetic locus is a position in a genome (or linkage map) of a particular given sequence. The given sequence could be that of a gene, allele, marker, etc. An example of a locus is 6p21.3 in cytogenetic nomenclature. This locus is for chromosome 6, short arm (p), region 2, band 1, subband 3. This is a specification for a position on a chromosome, not for what is actually found at that position. Note that the HL7 specs recognize several nomenclatures for specifying a locus. A Genetic Locus class can be associated with zero to many IndividualAllele classes. Possible use cases: (1) two alleles are associated, representing the two variants on the paternal and maternal chromosomes; (2) only one allele is associated with the locus (in cases of insufficient data or interest in one allele only); (3) no alleles are associated with the locus in cases where the locus' alleles have not been determined but there is a need to represent data related to the locus such as expression data, variations and even sequences; and (4) multiple allele are available, for example, in cases of tumor tissues where several acquired (somatic) variants are encountered.

GeneticLocus Attributes:Attribute Purpose Vocabularyid Instance identifier External Standard Format: LSIDcode Code designating the presence

of information about alleles[HL7 Vocabulary candidate – see table 1]This attribute identifies the type of data found in the value attribute. There are two main options:1. If there are alleles associated with

this genotype, then use the code "ALLELIC" to indicate that the gene identifier placed in the value attribute designates the locus in general while its specific alleles will be designated in the value attribute of the IndividualAllele object(s).

2. If there aren't any alleles associated with this genotype and the intention is to identify a gene and possibly associate expression data directly with it, then the code should be populated with "NON-

6

Page 7: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

ALLELIC" and the value attribute should hold the gene identifier taken from a recognized reference data (e.g., GenBank).

text Free-text description, including a reference mechanism.

effectiveTime Time at which the aggregation of observations in a GeneticLocus instance holds (are effective) for the patient. This is similar to the effectiveTime of a clinical document which contains various observations in its body, each of which might have a different effectiveTime value

confidentialityCode A code that controls the disclosure of information about this GeneticLocus observation.

HL7 Vocabulary: "Confidentiality " with possible extensions from the Clinical Genomics group (TBD).

uncertaintyCode A code indicating whether this observation statement as a whole, with its subordinate components, has been asserted to be uncertain in any way.

HL7 Vocabulary: "ActUncertainty" with possible extensions from the Clinical Genomics group (TBD).

value populated according to the code attribute

[HL7 Vocabulary candidate – see table 2]

methodCode The method(s) by which this data about a genetic locus was compiled.

HL7 Vocabulary: "ObservationMethod" with possible extensions from the Clinical Genomics group (TBD).

Vocabulary Table 1: GeneticLocus.codeLevel Code Definition1 ALLELIC This code indicates that The GeneticLocus.value attribute only designates the

locus in general without identifying its alleles.

1 NON-ALLELIC The value attribute will hold the gene/locus code.

Vocabulary Table 2: GenotypeLocus.valueLevel Code Definition1 IDENTIFIER Abstract 2 (LOCUS) Gene/locus identifier or chromosomal locus identifier drawn from a

recognized reference database such as GeneBank, HUGO, etc.

o referenceThe reference class represents a related gene that is on a different locus, and still has significant interrelation with the source gene (this is a recursive association of GeneticLocus).

7

Page 8: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

Note that it is possible to either expand the associations of the referred allele, or just indicate its id, assuming that it is detailed elsewhere (and accessible using its id).

The association class called reference has a typeCode attribute currently set to REFR but we are developing a new vocabulary with codes like FUNCTIONAL, PHYSICAL, SIGNALING, and METABOLIC_PATHWAY that will be described in subsequent ballot cycles.

o AssociatedObservationThe AssociatedObservation class associated with GeneticLocus is a place holder for various observations related to a locus, for example, a Copy Number value that represents the number of copies of this gene or allele. The class has a shadow associated with the IndividualAllele class (see below). Both classes share the same vocabulary at this point but might be separated in future versions. The code attribute holds the type of Observation, e.g., COPY_NUMBER and the value holds the actual result. Another example would be code=ZYGOSITY and value could then be either HOMOZYGOTE or HETEROZYGOTE. See tables 3&4 for more possible codes.

Vocabulary Table 3: AssociatedObservation.code (relating to GeneticLocus and IndividualAllele)

Level Code Definition1 ZYGO The value attribute will hold the zygosity of this genotype

Zygosity determination (SN)1 INHER_MODE Refers to the expression state of a gene that controls phenotype.

Options for searching by Inheritance1 GENE_FAM The value attribute will hold a gene's family name like HLA,

CYP A set of genes coding for diverse proteins which, by virtue of their high degree of sequence similarity, are believed to have evolved from a single ancestral gene (NCI_The).

1 COPY_NUM The value attribute will hold the gene's copy number.The number of copies of a given gene present in a cell or nucleus. An increase in gene dosage can result in the formation of higher levels of gene product, provided that the gene is not subject to autogenous regulation (MeSH). Zero indicates allele loss.

1 ALLELE_FREQ The proportion of one particular in the total of all ALLELES for one genetic locus in a breeding POPULATION (SN)

1 HETZYG_FREQ A measure of genetic variation in a population calculated as the mean frequency of heterozygotes over all loci.

1 MOTIF The value attribute will hold the repeating motif in a sequence.1 LINKAGE

The proximity of two or more markers (e.g., genes, RFLP markers) on a chromosome; the closer the markers, the lower the probability that they will be separated during DNA repair or replication processes (binary fission in prokaryotes, mitosis or

8

Page 9: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

meiosis in eukaryotes), and hence the greater the probability that they will be inherited together.

1 PENETRANCE The probability of expressing a phenotype given a genotype. Penetrance is described as either “complete” or “incomplete.” For example, individuals who carry the gene for tuberou sclerosis have an 80% chance of expressing the disorder.

Table 4 lists the vocabulary from which codes are drawn to populate the value attribute. The abstract codes in this vocabulary are the codes from table 3 and thus these two vocabularies will be maintained in synch.

Vocabulary Table 4: AssociatedObservation.value(relating to GeneticLocus and IndividualAllele)

Level Code Definition

1 ZYGO Abstract

2 HOMO An individual in which both alleles at a given locus are identical (MeSH)

2 HETERO An individual having different alleles at one or more loci in homologous chromosome segments (MeSH)

2 LOH Loss of Heterozygocity as a result of loss of the parental alleles.

1 INHER_MODE Abstract

2

SEMI_DOMThe dominant and recessive traits blend into a middle ground. This is because heterozygous individuals only produce half the amount of the trait.

2CO-DOM

Neither phenotype is dominant. Instead, the individual expresses BOTH phenotypes.

2 DOM This allele is dominant (use this code only in association with an IndividualAllele object).Genes that influence the PHENOTYPE both in the homozygous and the heterozygous state (MeSH)

2 REC This allele is recessive (use this code only in association with an IndividualAllele object).Or Genes that influence the PHENOTYPE only in the homozygous state (MeSH)

2 PLE_GENE A gene affecting more than one (apparently unrelated) characteristic of the phenotype (IUPAC Compendium of Chemical Terminology)

2 EPIGENETIC NCI:C21051 Changes in the regulation of the expression of gene activity without alteration of genetic structure

9

Page 10: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

1 GENE_FAM Abstract

2 MULTI_FAM A set of genes descended by duplication and variation from some ancestral gene. Such genes may be clustered together on the same chromosome or dispersed on different chromosomes (MeSH)

1 COPY_NUM Abstract 2 The copy number1 ALLELE_FREQ Abstract 2 The proportion of one particular in the total of all ALLELES

for one genetic locus in a breeding POPULATION (SN)1 HETZYG_FREQ Abstract 2 A measure of genetic variation in a population calculated as

the mean frequency of heterozygotes over all loci.1 MOTIF Abstract 2 The repeating motif in "size alleles" listed in the associated

IndividualAllele classes in this instance, e.g. "CAG".1 LINKAGE Abstract 2 Distance is measured in centimorgans (cM). 1 PENETRANCE Abstract 2 The probability of expressing a phenotype given a genotype.

Penetrance is described as either “complete” or “incomplete.” For example, individuals who carry the gene for tuberou sclerosis have an 80% chance of expressing the disorder.

IndividualAllele

The GenotypeLocus class is associated with 0 to many alleles represented by the IndividualAllele class. The IndividualAllele class identifies the specific allele instance (using the id attribute) and optionally specifies its external code (if known) and the method by which it was identified.

For example, representing an HLA allele might look like the following XML portion (skeletal structure):

<GeneticLocus> <component> <individualAllele> <value code="HLA00398" codeSystemName="IMGT/HLA Database" displayName="HLA B*8101"/> </individualAllele> </component> </GeneticLocus >

Important note: The term 'Individual Allele' doesn't refer necessarily to a known variant of the gene, rather it refers to the patient data regarding the locus that might well contain personal variations (e.g., SNPs w/unknown-significance).

10

Page 11: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

Attribute Purpose Vocabularyid Instance identifier External Standard Format: LSIDtext Free-text description, including a

reference mechanismvalue The allele identifier, if known

(due to new variations found, this allele might not be identified yet, as often the case in HLA alleles)

External recognized reference database.

methodCode The method by which this allele was identified

See table 6.

Vocabulary Table 5: IndividualAllele.valueLevel Code Definition1 ALLELE_IDENT Abstract 2 Allele identifier or chromosomal locus identifier drawn from a recognized

reference database.Note: in case of a "size allele" this attribute could hold the actual size (e.g., 11 or 11.1). In this case, the motif which is common to all size alleles will be designated in the class LocusAssociatedObservation.

Vocabulary Table 6: Individualallele.methodCodeLevel Code Definition1 MSI_TEST Microsatellite Instability by PCR

Polymerase Chain Reaction/Fragment Analysis 83892 x2 Digestion; 83898 x10 Amplification; 83894 x2 Gel separation; 83912 Interpretation and report 

1 ALLELE_TRANSC MOLECULAR DIAGNOSTICS; MUTATION IDENTIFICATION BY ALLELE SPECIFIC TRANSCRIPTION, SINGLE SEGMENT, EACH SEGMENT, CPT code: 83905

1 ALLELE_TRANSL MOLECULAR DIAGNOSTICS; MUTATION IDENTIFICATION BY ALLELE SPECIFIC TRANSLATION, SINGLE SEGMENT, EACH SEGMENT, CPT code: 83906

The CDC NCHS/NAHNES database codes for the type of assay used to determine the variation:

IlluminaRFLPAllele specific PCRSequenomSingle Base Extension (SBE)Snapshot

11

Page 12: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

SSCPOther

o AssociatedObservationThis is a shadow of the AssociatedObservation class. Please refer to the description of that class in association with the GeneticLocus class for more details about the use of this class (also see general discussion of the differences between AssociatedObservation and AssociatedProperty.

o refrenceThe reference class represents a related allele of a different locus, and still has significant interrelation with the source allele (this is a recursive association of IndividualAllele). See the equivalent class in association with Genetic Locus for more details.

Encapsulating Classes: Sequence

The Sequence class is a generalization of all types of bio sequences (i.e., DNA, RNA, Protein) encapsulating the raw sequencing results of the DNA, and the derived sequences of the resultant RNA and protein molecules.The Sequence class has a recursive relation that makes it possible to nest an RNA sequence within a DNA sequence, and a protein sequence within an RNA sequence. The relationship type is DRIV (derivation) as the nested Sequence classes are meant to be placeholders for sequences that were computed from the first Sequence class which is the only "encapsulating" class in this path (by 'first' we mean the one that is associated directly with the IndividualAllele class).

Sequence Attributes:Attribute Purpose Vocabularyid Instance identifier External Standard Format: LSIDcode Type of sequencing markup (e.g.,

BSML) used to populate the value attribute

[HL7 Vocabulary candidate]Optional codes: BSML_CONSTRAINED

text Free-text description, including a reference mechanism

effectiveTime The time at which the sequencing observation holds (is effective) for the patient

value Contains the actual allele-related sequence, following a constrained sequencing markup like BSML

External Content Model, controlled by HL7

methodCode The method of sequencing [HL7 Vocabulary candidate] e.g., see table 7 for possible codes for DNA sequencing.

Vocabulary Table 7: Sequence.methodCodeLevel Code Definition

12

Page 13: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

1 MSI_TEST Microsatellite Instability by PCR -Polymerase Chain Reaction/Fragment Analysis. For example, 83892 x2 Digestion; 83898 x10 Amplification; 83894 x2 Gel separation; 83912 Interpretation and report. 

1 SEQ_ANAL A multistage process that includes the determination of a sequence, its fragmentation and analysis, and the interpretation of the resulting sequence information (MeSH). For example, CPT codes: 83891, 83894x14 83898x14, 83904x24, 83912.

1 INV_DETECT Inversion detectionCPT codes: 83891, 83892, 83894, 83896, 83897, 83912

o AssociatedPropertyThe AssociatedProperty class is a placeholder for various properties that relate to the sequence data, which are supposed to be extracted (bubbled-up) from the raw expression data encapsulated in the Sequence class. This class is basically a code-value pair allowing the association of multiple properties with the core sequence class which sets the context of these property observations in terms of identification and time for example. See discussion about the differences between associated observations versus properties further on. Table 8 lists the vocabulary from which codes are drawn to populate the code attribute while table 9 lists the vocabulary from which codes are drawn to populate the value attribute. The two vocabularies are synchronized in the sense that table 8 codes are the abstract codes in table 9 and each of them defines the vocabulary (nested within the abstract code) used when that abstract code was selected to populate the code attribute.

Vocabulary Table 8: AssociatedProperty.code (relating to Sequence)Level Code Definition1 TYPE The value attribute will hold the type of molecule.1 CLASSSIFICATION The value attribute will hold an indication whether this

sequence is known or novel.

Vocabulary Table 9: AssociatedProperty.value (relating to Sequence)Level Code Definition1 TYPE Abstract 2 DNA Nucleotide sequence, derived from genomic DNA (MO) 2 RNA Nucleotide sequence, derived from transcribed RNA (MO) 2 PROT Polypeptide sequence, derived from translated protein

(MO)1 CLASSSIFICATION Abstract 2 KNOWN Sequence corresponds to a previously-identified allele

13

Page 14: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

2 NOVEL Sequence does not appear to correspond to a named allele

SequenceVariationThe class SequenceVariation is a generalization of all variation types, i.e., in all molecules (DNA, RNA, Protein) and of all types within each molecule (e.g., in DNA: SNP, Mutation, etc.). Note: This class replaces the Polymorphism hierarchy in previous versions.

SequenceVariation Attributes:Attribute Purpose Vocabularyid Instance identifier External Standard Format: LSIDcode Type of sequencing markup

(e.g., BSML) used to populate the value attribute or type of external terminology code like LOINC Genetic Naming

[HL7 Vocabulary candidate]Optional codes: BSML_CONSTRAINED orLOINC Genetic Naming

text Free-text description, including a reference mechanism

effectiveTime The time at which the variation observation holds (is effective) for the patient

value The actual variation, following a constrained sequencing markup like BSML with variation markup, or drawn from an external terminology like LOINC Genetic Naming (in the case the variation is a mutation).

External Content Model, controlled by HL7

interpretationCode The interpretation of the variation.

[HL7 Vocabulary candidate](proposed to RIM Harmonization, see table 10 for the proposed vocabulary)

Vocabulary Table 10: SequenceVariation.interpretationCodeLevel Code Definition1 DELETERIOUS A variation in the DNA that is associated with

disease(s).1 INDIFFERENT A variation in the DNA which is a non-disease

causing variation found frequently in the general population and thus is interpreted as being NOT associated with disease(s).

1 UNKNOWN.SIGNIFICANCE A variation in the DNA that may or may not be associated with disease(s).

o AssociatedPropertyThe class AssociatedProperty is a place holder for various properties that relate to a sequence variation, for example, position, length, region, reference and more. It replaces the distinct observations we had in previous versions of the Genotype model. This class is basically a code-value pair allowing the association of multiple properties with the core variation class which sets the context of these property observations in terms of identification and time for example. See discussion about the

14

Page 15: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

differences between associated observations versus properties further on. Table 11 list the vocabulary from which codes are drawn to populate the code attribute while table 12 lists the vocabulary from which codes are drawn to populate the value attribute. The two vocabularies are synchronized in the sense that table 11 codes are the abstract codes in table 12 and each of them defines the vocabulary (nested within the abstract code) used when that abstract code was selected to populate the code attribute.

Vocabulary Table 11: AssociatedProperty.code (relating to SequenceVariation)

Level Code Definition1 TYPE The value attribute will hold the type of variation1 POS The value attribute will hold the reference point where the

variation starts1 LEN The value attribute will hold length of the variation

1 REF The value attribute will hold an identifier for the reference (drawn from reference database like GenBank)

1 REGION The value attribute will hold the variation region.1 POSITION.GENE The value attribute will indicate whether the variation

occurred in the intron, exon, etc.1 POSITION.GENOM

EThe value attribute will indicate whether the variation occurred in the locus, translocated, etc..

1 SPLI The position where intron is excised(SO)

1 CRYP_SP_AC Mutation creates a new (functional) splice site (SO)

Vocabulary Table 12: AssociatedProperty.value (relating to SequenceVariation)Level Code Definition1 TYPE Abstract 2 SNP SV is a single base change from “wild type” sequence

OrSNPs are single base pair positions in genomic DNA at which different sequence alternatives (alleles) exist in normal individuals in some population(s), wherein the least frequent allele has an abundance of 1% or greater (SO)   definition_reference: http://www.cgr.ki.se/cgb/groups/brookes/Articles/essence_of_snps_article.pdf

3 ANONYMOUS_SNP SNPs that have no known effect on gene function.

Thought to be the most common type of SNPs and possibly valuable as markers for linkage disequilibrium studies, when they are relatively close to the gene being sought.

15

Page 16: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

3 C_SNPSNPs present in the actual gene-coding region of a chromosome. They have a higher probability of influencing propensity to disease or drug response than SNPs found outside gene regions.

3 CANDIDATE_SNP Particular SNPs thought to have a functional effect. 3 P_SNP If a cSNP or an rSNP leads to an altered amino acid,

which in turn leads to altered protein function or expression and an observable change in the organism’s phenotype, the SNP may be labeled a pSNP.

3 R_SNPThese SNPs affect regulatory regions that govern gene expression. Thought to be relatively uncommon and potentially as valuable as cSNPs.

3 SYNONYMOUS_SNP

TBD

3 TAG_SNP This SNP was identified as a Tag SNP in the HapMap project of a SNP haplotype that was observed in the subject of the GeneticLocus instance and specified in a GeneticLoci instance that this instance id part of.

2 MUT Any detectable and heritable change in the genetic material that causes a change in the GENOTYPE and which is transmitted to daughter cells and to succeeding generations (MeSH)

3 INS SV is the addition of one or more nucleotide base pairsOrA region of sequence identified as having been inserted (SO)

3 INDELS indels are described as a deletion ("del"), followed by an insertion ("ins"). (From HUGO)

3 DEL SV is the loss of one or more nucleotide base pairsOrThe sequence that is deleted (SO)

3 INV A continuous nucleotide sequence is inverted in the same position (SO)

3 TRANSL A region of nucleotide sequence that has translocated to a new position (SO)

3 SUB Any change in genomic DNA caused by a single event (SO)

3 TRANSV Change of a pyrimidine nucleotide, C or T, into a purine nucleotide, A or G, or vice versa (SO)definition_reference: http://www.ebi.ac.uk/mutations/recommendations/mutevent.html

3 GENE_DUPLICATION

A chromosomal structural change resulting in the doubling of a section of the genome of prokaryotes and eukaryotes. The size of the duplicated segment may vary considerably. Duplications may be interchromosomal, with the duplicate

16

Page 17: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

segment incorporated into another chromosome, or intrachromosomal, with the duplicate region present in the same chromosome. (Rieger et al., Glossary of Genetics: Classical and Molecular, 5th ed) The production of a tandem repeat of a DNA sequence by unequal crossing over or by an accident of replication. Inclusion of two copies of the same genetic material in a genome; an important step in diversification of genomes, as in the evolution of the (non-allelic) hemoglobin chains from a common ancestor. (HUGO)[NCI Thesaurus: C16607]

3 GENE_CONVERSIONS

A gene conversion is a nonreciprocal transfer of genetic information between two homologous sequences. As a result of a gene conversion the sequence of (part of) a gene can be copied from a highly similar sequence residing elsewhere in the genome. (HUGO)

3 FUSED_GENE A hybrid gene created by joining portions of two different genes (to produce a new protein) or by joining a gene to a different promoter (to alter or regulate gene transcription). These can be created by chromosomal aberration or by laboratory methods. (HUGO) [NCI Thesaurus:C28510]

3 DNA REP An increase number of repeats of a genomic, tandemly repeated DNA sequence from one generation to the next (MeSH)

2 TRANSLOCATION This variation type indicates that the variation stemmed from a translocation.

1 POSITION.GENE Abstract 2 INTRON SV occurs within a segment of DNA that is transcribed,

but removed from within the transcript by splicing together the sequences (exons) on either side of it. (SO3)

2 EXON SV occurs within the region of the genome that codes for portion of spliced messenger RNA (SO:0000234); may contain 5'-untranslated region (SO:0000204), all open reading frames (SO:0000236) and 3'-untranslated region (SO:0000205). (SO)

2 UTR SV occurs within the 5’untranslated region or the 3’untranslated region (SO)

2 PROMO SV occurs within the region on a DNA molecule involved in RNA polymerase binding to initiate transcription. (SO)

3

17

Page 18: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

2 ENH Cis-acting DNA sequences which can increase transcription of genes. Enhancers can usually function in either orientation and at various distances from a promoter (MeSH)

1 POSITION.GENOME

Abstract

2 NORMAL_LOCUS SV occurs within the natural locus of the Genotype in the genome

2 ECTOPIC SV occurs out of place from the natural locus (may be redundant with translocation, below)

2 TRANSLOCATION SV is part of a region of nucleotide sequence that has translocated to a new position. (SO)

1 LEN Abstract

2 Size value Size of the sequence

2 Allele name/pattern Allele name based on size

o TagSNPThe characteristic of a SNP as a 'Tag SNP' is one of the SequenceVariation properties and thus was removed from this version of the model and was added to the vocabulary.

ExpressionThe class Expression is a generalization of all types of expression data (RNA and protein). Its code attribute identifies the type of expression data it carries. This class is one of the Genotype's encapsulating classes, that is, it holds in its value attribute portions of relevant bioinformatics markup (e.g., MAGE-ML for gene expression data), complying with constrained schemas of the full-fledged markups. In such a case, the code attribute holds the exact reference to the contained bioinformatics schema which the value's content should comply with. Note that the association cardinality between this class and its source class IndividualAllele is zero to many. The idea here is to be able to represent gene expression over multiple experiments for the same allele under perhaps differing clinical environments and with differing expression. If this association is traversed several times then it's important to populate the id & effectiveTime attributes so that each object of that class will be distinguished clearly and identified uniquely.

Expression Attributes:Attribute Purpose Vocabulary

18

Page 19: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

id Instance identifier External Standard Format: LSIDcode Type of expression markup (e.g.,

MAGE) used to populate the value attribute

[HL7 Vocabulary candidate]Optional codes: MAGE_CONSTRAINED

text Free-text description, including a reference mechanism

effectiveTime The time at which the expression observation holds (is effective) for the patient

value Contains the actual allele-related expression profile, following a constrained expression markup like MAGE

External Content Model, controlled by HL7

methodCode The method by which this expression assay was done

o AssociatedPropertyThe AssociatedProperty class is a place holder for various properties that relate to expression data, for example, normalized intensity, qualitative indication, p-value and more, which are supposed to be extracted (bubbled-up) from the raw expression data encapsulated in the Expression class. The AssociatedProperty class replaces the distinct observations we had in previous versions of the model. This class is basically a code-value pair allowing the association of multiple properties with the core expression class which sets the context of these property observations in terms of identification and time for example. See discussion about the differences between associated observations versus properties further on. Table 13 list the vocabulary from which codes are drawn to populate the code attribute while table 14 list the vocabulary from which codes are drawn to populate the value attribute. The two vocabularies are synchronized in the sense that table 13 codes are the abstract codes in table 14 and each of them defines the vocabulary (nested within the abstract code) used when that abstract code was selected to populate the code attribute.

Vocabulary Table 13: AssociatedProperty.code (relating to Expression)Level Code Definition1 TYPE The value attribute will hold the type of molecule.1 NORMAL_INT The value attribute will hold the normalized intensity

extracted from the raw expression data.1 QUAL_EXP The value attribute will hold a qualitative indication (e.g.,

Affymetrix Present/Absence call) extracted from the raw expression data.

1 P_VAL The value attribute will hold the p value extracted from the raw expression data.

Vocabulary Table 14: AssociatedProperty.value (relating to Expression)Level Code Definition1 TYPE Abstract 2 RNA Expression of RNA was measured. RNA refers to mRNA or

19

Page 20: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

other RNA synthesized on a DNA or RNA template by an RNA polymerase. (SO)

2 PROT Expression of the protein was measured. A protein refers to one or more polypeptides which may, or may not, be covalently bonded, and which assume a native secondary and tertiary structure. (SO)

1 QUAL_EXP Abstract 2 PRES Expression level was sufficient to be detected by the method

used to measure expression (MO) 2 ABS Expression level was below that detected by the method used

to measure expression (MO)

Vocabulary Table 15: Expression.methodCodeLevel Code Definition1

MICRO_SINGLEfor Affy and associated like nimblegen etc

1MICRO_DUAL

for all Axon scanner based two dye platforms,Agilent, and oligo set makers, cDNAs

1QUANT_MRNA

this covers dual labeled probes, molecular beacons, scorpion probes, and double stranded intercalating dyes

Other Classes and Proteomics:

GeneticLoci:The Haplotype class was replaced by the GeneticLoci class which is a generalization of a haplotype and other sets of loci. The class could carry essential data about the loci or just hold an id of Genetic Loci instance complying with the HL7 Clinical Genomics Genetic Loci model (see further on in this document).

GeneticLoci Attributes:Attribute Purpose Vocabularyid Instance identifier External Standard Format: LSIDcode Type of the genetic loci set, e.g.,

Haplotype, Genetic Profile, etc.[HL7 Vocabulary candidate]

text Free-text description, including a reference mechanism

effectiveTime The time at which the genetic loci observation holds (is effective) for the patient

value The loci set identifier External Terminology for loci set identifiers (if any)

o Translocation:This class was removed and the term was added to the vocabulary of the

20

Page 21: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

AssociatedProperty class associated with the SequenceVariation class.

Polypeptide & DeterminantPeptide:The Sequence class could be associated with its resulted or corresponding polypeptides (represented by the Polypeptide class) as well as with the determinant peptides4 (represented by the DeterminantPeptide class). Note that the Sequence class has a recursive relation and it is possible to nest an RNA sequence within a DNA sequence, and a protein sequence within an RNA sequence. The Polypeptide could then be associated with the protein sequences or directly with any of the above levels. Also, it is possible to associate the DeterminantPeptide with the Polypeptide class or directly with the Sequence class. Both classes (Polypeptide and DeterminantPeptide) could be associated with several instances of the ClinicalPhenotype classes.

The proteomics classes in this model represent protein data that one is deriving from the sequences (by means of computational biology) and is not measuring directly in the subject. The latter could be represented as regular lab results (using the HL7 Lab specs), which could be referenced in the GeneticLocus instance as if they were phenotype observations.

A common case for the use of proteomics in this model could be as follows: Checking whether an amino acid change would result from the variant; if so - whether the new amino acid change is to an amino acid of a different size or charge state that would likely change the shape of the active region of the protein; how far the change is from the active site; whether the change is in a regulator region, and so forth. These observations could then be associated to phenotypic data. For example, consider the following case described in OMIM:“Despite the dramatic responses to EGFR inhibitors in patients with non-small cell lung cancer, most patients ultimately have a relapse. Kobayashi et al. (2005) reported a patient with EGFR-mutant, gefitinib-responsive, advanced non-small cell lung cancer who had a relapse after 2 years of complete remission during treatment with gefitinib. The DNA sequence of the EGFR gene in his tumor biopsy specimen at relapse revealed the presence of a second mutation {131550.0006}. Structural modeling and biochemical studies showed that this second mutation led to the gefitinib resistance.” (OMIM *131550)

Polypeptide & DeterminantPeptide Attributes:Attribute Purpose Vocabularyid Instance identifier External Standard Format: LSIDtext Free-text description, including a

reference mechanismeffectiveTime The time at which the observation

holds (is effective) for the patient

4 By 'determinant peptides' we mean those peptides which influence the functionality of the protein translated from the allele's DNA sequence.

21

Page 22: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

value The polypeptide or determinant peptide identifier

Codes from an External Terminology for protein identifiers like SwissProt, PDB, PIR and HUPO

To Phenotype and Beyond…

ClinicalPhenotypeThe 'wrapper' class ClinicalPhenotype is associated with the SequenceVariation class and includes a choice of populating an ObservedClinicalPhenotype class internally (i.e., within the Genotype instance), externally (ExternalObservedClinicalPhenotype) or pointing to a KnownClinicalPhenotype. Note that this class is just a 'warpper' (i.e., classCode = "ActContainer") of the actual phenotypes and thus has no attributes of its own.

ObservedClinicalPhenotype and ExternalObservedClinicalPhenotypeThe difference between ObservedClinicalPhenotype and ExternalObservedClinicalPhenotype is that the former represents a phenotype that the genomic data creator chose to include within the genotype instance itself, while the latter represents a phenotype that exists externally to the genotype instance, for example residing in the patient record's problem list. This class has an id5 (mandatory attribute) that serves as a pointer to the target observation. Note that the typeCode attribute is set to ActRelationshipExternalReference domain which is an "x_" value set6 dedicated to representing types of external references. For example, it could be set to "XCRPT" which means that the source observation is an excerpt from the target observation. In our case, this could be used when you might want to refer to a clinical observation in a patient's medical record, but also would like to show a few of its essential data items in the GeneticLocus instance (in the ClinicalPhenotype object).

KnownClinicalPhenotypeThe difference between KnownClinicalPhenotype and ObservedClinicalPhenotype is that while observed phenotype means a phenotype that was observed in the patient, KnownClinicalPhenotype is in fact a piece od knowledge, i.e., it is known to be associated with the genomic data but not necessarily in this specifc patient to whom the genomic data belongs. The class KnownClinicalPhenotype has a definitional mood (i.e., taken from or pointing to a description of such a disease in some form of a master file, ontology, lookup table, etc., separated from the patient data) to reflect the difference from the event mood of the ObservedClinicalPhenotype class.

5 The HL7 id attribute is of type II (Instance Identifier) which includes the root and extension child elements. The root represents the OID of the organization where the object resides and the extension represents the local identifier of that object within the scope of that organization. This combination allows information systems to resolve the id value and link to the 'remote' object, in this case – the clinical observation representing a phenotype in the context of genomic data.6 This x_ value set is an aggregation of codes from the entire ActRelationshipType Vocabulary.

22

Page 23: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

pertinentInformationThis ActRelationship class (named pertinentInformation) represents the association of a genomic observation with a number of phenotype (clinical) observations. Its mandatory attribute typeCode holds the semantics of what is the type of this association. It is defined as <=PERT which means that any code in the PERT sub-hierarchy of the HL7 ActRelationshipType Vocabulary is permitted here. For a full list of these codes, please refer to the Vocabulary chapter in the HL7 V3 Ballot Package. Appendix E lists codes from all branches of the ActRelationshipType Vocabulary (PERT and others) that seem possible candidates for use in Clinical Genomics. This work is still under progress and we might propose this list as well as our own codes to be a new x_domain in ActRelationshipType Vocabulary.

23

Page 24: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

Miscellaneous Issues:

Association types (ActRelationship typeCode) are consistent with the following principles:

o Encapsulating classes are components (COMP typeCode) of the IndividualAllele and GeneticLocus classes, while...

o Bubbled-up classes are derivations (DRIV typeCode) of encapsulating classes;

o Clinical phenotypes are pertinent to (PERT typeCode) genomic classes;(Note: in previous versions this was fixed to "caused by" but it was too restricted. The code CAUS is part of the PERT sub-vocabulary and thus could be still used, but other codes are available as well)

o Non patient-specific data items are defined (INST typeCode) by classes with mood code = DEF (definitional), that is, defined and described outside of the patient medical records, in some kind of master file, dictionary , ontology, etc.;

o Alleles in one GeneticLocus refer (REFR typeCode) to alleles in another locus. Nevertheless, we develop a vocabulary for such types of relationship, which will be proposed for RIM harmonization as a new domain in ActRelationshipType Vocabulary (see table 16 for a first draft).

Vocabulary Table 16: reference.typeCodeLevel Code Definition1 PATHWAY The source and the target alleles are part of the

same biological pathway.1 M.A.PHENOTYPE The source and the target alleles are part of a multi-

allelic phenotype.1 TRANSLOCATED The target allele is a copy of the source allele that

has been translocated to a different locus.

Observation Interpretation vs. Observation RelationshipsThere is a subtle distinction between observation interpretation codes and ActRelationship types between genomic and clinical observations. The observation interpretation is in fact a piece of knowledge, not necessarily shown as phenotypes of that genotype in the patient. If the interpretation code is DELETERIOUS (see table 8) then the observation could be complemented by associating it to the KnownAssociatedDisease class which is also a non-patient specific data in DEF mood Code. In contrast, the phenotype is an actual observation valid for this patient, which is associated with a genomic observation.

Bioinformatics formats:In general, we use bioinformatics formats in the Genotype model to feature the encapsulation of raw genomic data such as sequencing, expression and proteomic data. To enable the embedding of such data accepted from labs that work with bioinformatics formats, it is possible to assign specific XML portions into the Sequence and Expression value attributes (as well as into SequenceVariation).

24

Page 25: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

This encapsulation of 'foreign' markup is made possible due to the use of the HL7 ED (Encapsulating Data) Data Type which is defined as follows:“ED holds data that is primarily intended for human interpretation or for further machine processing which is outside the scope of HL7.ED includes unformatted or formatted written language, multimedia data, or structured information as defined by a different standard (e.g., XML-signatures.)”

The use of the XML bioinformatics markups is restricted, that is, not all tags are allowed, rather only a subset which relates to a specific patient and include the information pertinent to healthcare. The restrictions on those external XML standards are specified elsewhere but a draft of a constrained BSML schema for sequencing data is presented in Appendix C.For more details about the rationale behind this mixture of HL7 and bioinformatics markup, see the section "Coexistence of HL7 Classes and Bioinformatics Markup".

Validation:The use of external markup in HL7 messages requires that a receiver of an HL7 instance that contains a Genotype instance, will carry out a 'double-validation' process: first step is to validate the instance against the HL7 message specification (of which the Genotype schema is part of) and the second phase is to validate the content of those value attributes against their respective content models. The valid content models of the Sequence and Expression value attributes will be an integral part of the entire Genotype specification, but at this point it is still considered informative.

Associated Properties / Observations and the Harmonization Proposals:In previous versions we have coped with the reluctance of both the HL7 RIM Harmonization process and the HL7 Clinical Genomics group to nail down common attributes of genomic observations by adding new classes and attributes to the RIM, by elaborating on the SequenceVariation and Expression properties and creating two new Observation classes (SequenceVariationProperty and ExpressionProperty) to be placeholders for each of theses properties. For example, the proposed 'length' property of a possible SequenceVariation new RIM class could be represented by an object of the SequenceVariationProperty class which only has code and value attributes. The code will indicate that this observation describes the position of the variation and the value attribute holds the position itself. The assumption is that this observation is an integral part of the parent observation with the same effective time. It could be identified only by going through the source variation object. In contrast, we also had the LocusAssociatedObservation class which is a place holder for associated observations such as copy number, zygosity, dominancy and gene family. These observations are independent observations that do have an id, effective time and method code.

In this update #1 of the approved DSTU, the associated properties/observation classes

25

Page 26: HL7 Clinical-Genomics SIG€¦  · Web viewHL7 Clinical-Genomics SIG. Genotype Topic, V0.1. VOCABULARIES . This document is derived from the complete Genotype Topic . Ballot Document

were consolidated to two classes: AssociatedProperty and AssociatedObservation. Instead of having specific class names (e.g., SequenceVariationProperty), all core classes now have these two generic classes coming off them. It makes the model simpler (but put the burden on the parsing to understand the context of each associated property/observation).

The basic difference between associated properties and associated observations is that an associated property should have been (and eventually may be) part of the parent class attributes. It’s an inherent part of the parent observation and thus doesn't have id, time stamp, method, performer, etc. They 'inherent' all these attributes from their parent. An associated observation, in contrast, is an independent observation and a component of its parent class.

26