8
Statistics and biomedical informatics in forensic sciences Jana Zva ´rova ´ 1,2 * ,y 1 EuroMISE Centre of Charles University and Academy of Sciences of the Czech Republic, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic 2 Institute of Computer Science of the Academy o Sciences of the Czech Republic, v.v.i., Pod Voda ´renskou ve ˇz ˇı ´ 2, 182 07 Prague 8, Czech Republic SUMMARY Statistics and biomedical informatics play a very important role in forensic sciences. Statistics can be regarded as the science of uncertainty. It is therefore natural that statistics should be applied to evidence used for legal purposes or the individual’s identification as uncertainty is a feature of any legal or identification processes where decisions are based on upon the basis of evidence. Biomedical informatics is bringing new approaches how to handle genetic and other data supporting decision-making processes. We will show the simultaneous application of statistical and biomedical informatics approaches in the field of forensic dentistry and for calculating weight-of-evidence for forensic DNA profiles. Copyright # 2009 John Wiley & Sons, Ltd. key words: biomedical informatics; biomedical statistics; genetic information; forensic dentistry 1. INTRODUCTION Modern information and communication technologies have strongly influenced healthcare and processes of identification of an individual. The healthcare sector has to face enormous acceleration in appearance of new knowledge, in development of new technologies and technical devices, new drugs, as well as spread of new diseases. The patient benefit, professional competence and service excellence can be facilitated by the achievements of information and communication technology. Healthcare in general is not the fastest in utilizing information technology to its full benefit. Due to new achievements in electronic health record (EHR), methods of identification of an individual and telemedicine, we can expect great changes in healthcare and forensic services. Today’s healthcare environments use EHRs that are shared between computer systems and which may be distributed over many locations and between organizations, in order to provide information to internal users, to payers and to respond to external requests. With increasing mobility of populations, patient data is accumulating in different places, but it needs to be accessible in an organized manner on a national and even global scale. Large amounts of information may be accessed via remote workstations and complex networks supporting one or more organizations, and potentially this may happen within a national information infrastructure. ENVIRONMETRICS Environmetrics 2009; 20: 743–750 Published online 11 February 2009 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/env.975 *Correspondence to: J. Zva ´rova ´, EuroMISE Centre, Institute of Computer Science AS CR v.v.i., Pod Voda ´renskou ve ˇz ˇı ´ 2, 182 07 Prague 8, Czech Republic. y E-mail: [email protected] Copyright # 2009 John Wiley & Sons, Ltd. Received 12 May 2008 Accepted 1 October 2008

Statistics and biomedical informatics in forensic sciences

Embed Size (px)

Citation preview

ENVIRONMETRICS

Environmetrics 2009; 20: 743–750

Published online 11 February 2009 in Wiley InterScience

(www.interscience.wiley.com) DOI: 10.1002/env.975

Statistics and biomedical informatics in forensic sciences

Jana Zvarova1,2*,y

1 EuroMISE Centre of Charles University and Academy of Sciences of the Czech Republic, Faculty of Mathematics and Physics,Charles University, Prague, Czech Republic

2 Institute of Computer Science of the Academy o Sciences of the Czech Republic, v.v.i., Pod Vodarenskou vezı 2, 182 07 Prague8, Czech Republic

SUMMARY

Statistics and biomedical informatics play a very important role in forensic sciences. Statistics can be regarded asthe science of uncertainty. It is therefore natural that statistics should be applied to evidence used for legal purposesor the individual’s identification as uncertainty is a feature of any legal or identification processes where decisionsare based on upon the basis of evidence. Biomedical informatics is bringing new approaches how to handle geneticand other data supporting decision-making processes. We will show the simultaneous application of statistical andbiomedical informatics approaches in the field of forensic dentistry and for calculating weight-of-evidence forforensic DNA profiles. Copyright # 2009 John Wiley & Sons, Ltd.

key words: biomedical informatics; biomedical statistics; genetic information; forensic dentistry

1. INTRODUCTION

Modern information and communication technologies have strongly influenced healthcare and

processes of identification of an individual. The healthcare sector has to face enormous acceleration in

appearance of new knowledge, in development of new technologies and technical devices, new drugs,

as well as spread of new diseases. The patient benefit, professional competence and service excellence

can be facilitated by the achievements of information and communication technology. Healthcare in

general is not the fastest in utilizing information technology to its full benefit. Due to new achievements

in electronic health record (EHR), methods of identification of an individual and telemedicine, we can

expect great changes in healthcare and forensic services. Today’s healthcare environments use EHRs

that are shared between computer systems and which may be distributed over many locations and

between organizations, in order to provide information to internal users, to payers and to respond to

external requests. With increasing mobility of populations, patient data is accumulating in different

places, but it needs to be accessible in an organized manner on a national and even global scale. Large

amounts of information may be accessed via remote workstations and complex networks supporting one or

more organizations, and potentially this may happen within a national information infrastructure.

*Correspondence to: J. Zvarova, EuroMISE Centre, Institute of Computer Science AS CR v.v.i., Pod Vodarenskou vezı 2, 182 07Prague 8, Czech Republic.yE-mail: [email protected]

Copyright # 2009 John Wiley & Sons, Ltd.

Received 12 May 2008

Accepted 1 October 2008

744 J. ZVAROVA

2. BIOMEDICAL INFORMATICS

Medical informatics and bioinformatics are two interdisciplinary areas located at the intersection of

informatics with medicine and biology (genomics), respectively. Historically, medical informatics and

bioinformatics have always been separated and only occasionally have researchers of both disciplines

collaborated in the past, see Kohane (2000). In the last years bioinformatics, a younger discipline, has

grown enormously due to its contribution to genomic research, see Martin-Sanchez et al. (2004).

Medical informatics now exists more then 40 years. Despite of major advantages in the science and

technology of health care it seems that medical informatics discipline has the potential to improve and

facilitate the ever-changing and ever-broadening mass of information concerning the etiology,

prevention and treatment of diseases as well as the maintenance of health. Its very broad field of interest

covers many interdisciplinary research topics with consequences for patient care and education. There

have been different views on informatics. One definition of informatics declares informatics as the

discipline that deals with information Gremy (1989). However, there are also other approaches. We

should recall that the term of informatics was adopted in the 1960s in some European countries (e.g.

Germany and France) to denote what in other countries (e.g. in USA) was known as computer science

Moehr (1989). In the 1960s, the term informatics was also used in Russia for the discipline concerned

with bibliographic information processing (Russian origins of this concept are also mentioned in

Colens (1986)). These different views on informatics led to different views on medical informatics.

Haux (1997) initiated the broad discussion on the medical informatics discipline. Zvarova (1997)

viewed medical informatics as a structure divided into four information rings and their intersections

with the field of medicine, comprising also healthcare. These information rings are displayed on the

Figure 1.

Basic information ring displays different forms of information derived from data and knowledge.

Information methodology ring covers methodological tools for information processing (e.g. theory of

measurement, probability and mathematical statistics, linguistics, logic, artificial intelligence, decision

theory). Information technology ring covers technical and biological tools for information processing,

transmitting and storing in practice. Information interface ring covers interface methodologies

developed for effective use of nowadays information technologies. For better storing and searching

information, theories of databases and knowledge bases have been developed. Information

transmission (telematics) is closely connected with methodologies like coding theory, data protection,

networking and standardization. Better information processing using computers strongly relies on

computer science disciplines, e.g. theory of computing, programming languages, parallel computing,

numeric methods. The biomedical community seeks to remove the walls between biological

information and medical information. The interoperability of biological and medical information for

all appropriately authorized users creates imperatives, opportunities and challenges. Biomedical

informatics, an intersection of informatics with the field of biomedicine, starts to show its significance

as an interdisciplinary science developed on the basis of interaction of information sciences with

biology, medicine and health care in accordance with the attained level of information technology.

3. IDENTIFICATION OF AN INDIVIDUAL

The introduction of DNA evidence at the end of the 1980s was rightly heralded as a breakthrough for

identification of an individual especially in criminal justice and in natural and man-made disasters.

DNA profiling technology has advanced impressively and appropriate methods for interpreting DNA

Copyright # 2009 John Wiley & Sons, Ltd. Environmetrics 2009; 20: 743–750

DOI: 10.1002/env

Figure 1. Structure of informatics. This figure is available in colour online at www.interscience.wiley.com/journal/env

STATISTICS AND BIOMEDICAL INFORMATICS IN FORENSIC SCIENCES 745

evidence has also generally improved. However, the potential for crucial mistakes and

misunderstandings remains. Although DNA evidence is typically very powerful, the circumstances

under which it might not lead to satisfactory conclusions about identification or relatedness are not

widely appreciated.

Let us consider the following scene. In natural disasters (e.g. tsunami, flood) from human remains

was collected a stain for DNA and also other evidence. We wish to evaluate the uncertainty of

proposition that the stain comes from a missing person, giving the evidence. Let I denote the non-DNA

evidence (e.g. dental evidence) and let GH and GM denote the DNA typing results of a stain from human

remains and that of the missing person (from available health documentation), respectively. In this

simple case we consider two hypotheses

H0: The missing person left the stain in human remains,

H1: Some unknown person left the stain in human remains.

Copyright # 2009 John Wiley & Sons, Ltd. Environmetrics 2009; 20: 743–750

DOI: 10.1002/env

746 J. ZVAROVA

The evidence is interpreted in terms of odds. We investigate the odds in favour of H0. Using Bayes

theorem we can write the posterior odds in favour of H0 as

PðH0jGM;GH; IÞPðH1jGM;GH; IÞ ¼

PðH0jIÞPðH1jIÞ

� PðGM;GHjH0; IÞPðGM;GHjH1; IÞ

where the first ratio on the right side is called the prior odds in favour of H0 (before the DNA typing

results) and the second one is the likelihood ratio (LR). The multiplication of these two terms gives the

posterior odds in favour of H0 (after the DNA typing results).

The likelihood ratio LR is an effective tool to assign the weight of the DNA evidence. The larger the

value of LR, the larger is the support for H0 that the missing person left the stain in human remains. The

example of the verbal equivalents to the ranges of values of LR can be found in Evett and Weir (1998).

The value of LR between 1 and 10 is a limited support, whereas the value of LR larger than 1000 is

considered to be a very strong support. Further we deal with the calculation of the likelihood ratio. The

results apply to a single locus. In practice we often have DNA profiles at more loci. We can obtain the overall

likelihood ratio as a product of likelihood ratios for single loci, assuming independence between loci.

First we consider there is only one contributor to the stain. Considering GH¼GM and P(GMjH0,

I)¼P(GMjH1, I), the likelihood ratio for the single stain reduces to

LR ¼ 1

PðGHjGM;H1; IÞ (1)

The expression in the denominator is the probability that we would observe the genotype GH if some

person other than the missing person left the stain. Its value depends mainly on the other evidence I and

on additional assumptions. The simplest situation is to assume that the missing person and the true

human offender are unrelated and their genotypes are independent. Then the genotype of the missing

person GM does not influence the uncertainty about the human offender’s genotype GH. The

denominator of Equation (1) further simplifies to P(GHjH1, I), which is the expected proportion of the

genotype GH in the population under Hardy-Weinberg equilibrium (Evett and Weir, 1998).

It equals

PðGHjH1; IÞ ¼ p2i for GH ¼ Ai Ai

PðGHjH1; IÞ ¼ 2 pipj for GH ¼ Ai Aj for i 6¼ j(2)

where pi is the proportion of the observed allele Ai (of a specified locus) in the reference population.

The first equation holds for a homozygous locus and the second one for a heterozygous one.

The more complicated situation arises when the missing person and the true human offender are

unrelated but members of the same subpopulation, for example they may come from the same town or

the same ethnic group. The population is characterized by the coancestry coefficient, which describes

the variation in allele proportions among subpopulations. The match probabilities for AiAi

homozygotes and AiAj heterozygotes are calculated in (Balding and Nichols, 1994).

The relationship is characterized by the kinship coefficient ki that denotes the probability that the two

people will share i alleles identical by descent, i¼ 0, 1, 2.

The kinship coefficients for the most encountered relationships are presented in Table 1.

Sometimes the stain consists of DNA from more than one contributor. The evaluation of DNA

mixtures assuming the effect of a structured population is discussed in Curran et al. (1999), Hu and

Fung (2004) and Zoubkova and Zvarova (2004). However, the authors in Curran et al. (1999) consider

unconditional probabilities instead of conditional ones. We have incorporated the calculations

Copyright # 2009 John Wiley & Sons, Ltd. Environmetrics 2009; 20: 743–750

DOI: 10.1002/env

Table 1. Kinship coefficients for common close relationships

Relationship k0 k1 k2

Parent–child 0 1 0Siblings 1/4 1/2 1/4Half-siblings 1/2 1/2 0Uncle–nephew 1/2 1/2 0Grandparent–grandchild 1/2 1/2 0Cousins 3/4 1/4 0Second cousins 15/16 1/16 0Unrelated 1 0 0

STATISTICS AND BIOMEDICAL INFORMATICS IN FORENSIC SCIENCES 747

described above in the package forensic (Marusiakova, 2007) in the R system for statistical computing.

The package can be downloaded from the Comprehensive R Archive Network (CRAN) at http://

cran.R-project.org/.

4. DENTAL IDENTIFICATION

Dental identifications have always played a key role in natural and man-made disaster situations. They

take two main forms. First, the most frequently performed examination is a comparative identification

which is used to establish that the remains of a decedent and a person represented by ante mortem

Figure 2. Interactive DentCross component. This figure is available in colour online at www.interscience.wiley.com/journal/env

Copyright # 2009 John Wiley & Sons, Ltd. Environmetrics 2009; 20: 743–750

DOI: 10.1002/env

748 J. ZVAROVA

(before death) dental records are the same individual. The degree of certainty is high. Information from

the body assessment or circumstances usually contains clues as to who has died. Second, in those cases

where ante mortem records are not available and no clues exist to the possible identity, a post mortem

(after death) dental profile is completed by the forensic dentist. Suggesting characteristics of the

individual likely narrow the search for the ante mortem materials. Dental identification of humans

occurs because of a number of different reasons, that is criminal, burial, social etc. and in a number of

various situations. The victims’ bodies of violent crimes, fires, motor vehicle accidents and work place

co incidents can be disfigured to the extent that identification by a family member is neither reliable nor

desirable. Persons who have been deceased for some time prior to discovery and those found in water

also present unpleasant and difficult visual identifications. Because of the lack of a comprehensive

fingerprint and DNA database, dental identification continues to be crucial. Dental structures can

provide useful indicators to the individual’s identification. The jaws of victims may be exposed and the

mandible disarticulated. Using standardized Interpol forms and protocols, a dental chart is compiled

and a full mouth survey is made using 14 dental X-ray images. Polaroid photographs are then taken at

various magnifications to record any dental anomalies or unique features. The ‘hard’ copies of the

radiographs, photographs and the dental charting are then reconciled to ensure that no errors have been

made in recording the post-mortem dental evidence. The dental autopsy is the slowest in the

identification process and because of the effect on facial structures it is the last of the investigative

procedures.

There is a special form named Disaster Victim Identification Form designed by Interpol (2005) used

for victim identification in practice. An electronic version of this form exists and is called DVI System

International. The programme covers all parts of the paper based DVI forms. The user may select

different languages for the user interface.

A range of conclusions can be reached when reporting a dental identification. The American Board

of Forensic Odontology (1994) recommends that these are limited to the four conclusions

Positive identification: the ante mortem and post mortem data are matched in sufficient details, with

no unexplainable discrepancies, to establish that they are from the same individual.

Possible identification: the ante mortem and post mortem data have consistent features but, because

of the quality of either the post mortem remains or the ante mortem evidence, it is not possible to

establish identity positively.

Insufficient evidence: the available information is insufficient to form the basis for a conclusion.

Exclusion: the ante mortem and post mortem data are clearly inconsistent.

The forensic dentist produces the post mortem record by careful charting and written descriptions of

the dental structures and radiographs. Once the post mortem record is complete, a comparison between

the two records can be carried out.

5. IDENTIFICATION SUPPORTED BY DENTCROSS COMPONENT

Development of the EHR at the EuroMISE Centre started in the year 2000 based on inspirations and

experience from existing CEM/TC251 standards and several European projects, for example van

Ginneken (1999). The suggested solutions were implemented in a pilot application named Multimedia

Distributed Record (MUDR), see Hanzlıcek et al. (2005), Spidlen (2005). Practical experience from

the evaluation of the MUDR system showed that the core of the system is well prepared to serve as a

dedicated application server allowing multiple clients to connect and manipulate with stored data.

However, the implementation of this system in a resource limited outpatients department of general

Copyright # 2009 John Wiley & Sons, Ltd. Environmetrics 2009; 20: 743–750

DOI: 10.1002/env

STATISTICS AND BIOMEDICAL INFORMATICS IN FORENSIC SCIENCES 749

practitioner is very difficult. Therefore, we developed a new system simplified in the data storage and

enhanced in the part of user interface named MUDRLite. The first practical implementation of the

MUDRLite system was in the area of dentistry, see Zvarova et al. (2008). A specific requirement for the

implemented EHR instance was the advanced form of user interface for data entry and presentation.

New technique increased significantly by implementation of interactive DentCross components.

DentCross graphically looks like a kind of a dental arch photo and X-ray image (i.e. root canal or

implant picture).

New technique increased significantly by implementation of an interactive DentCross component,

see Figure 2.

6. CONCLUSIONS

MUDR EHR with DentCross component is an advanced tool to collect data and create longitudinal

EHR in dentistry. In forensic dentistry it can collect both DNA and dental evidence. It can highly

support processes of the identification of an individual based on methods of probability and

mathematical statistics. Clearly, individuals with a numerous and a complex dental treatments are often

easier to identify in forensic dentistry than those individuals with a little or no restorative treatment.

Interactive DentCross component helps us to analyse secret findings. PBI, tooth movement, calculus

and periodontal pocket can be also detected, however these types of examination cannot be provided in

victims. On the other hand, bone resorption can be also detected and could help in comparison with an

X-ray image.

ACKNOWLEDGEMENTS

This work was supported by the project MSM 0021620839 of MSMT.

REFERENCES

American Board of Forensic Odontology. 1994. Body identification guidelines. Journal of the American Dental Association 125:1244–1254.

Balding DJ, Nichols RA. 1994. DNA profile match probability calculation:how to allow for population stratification, relatedness,databaseselection and single bands. Forensic Science International 64: 125–140.

Colens MF. 1986. Origins of medical informatics. Western Journal of Medicine 145: 778–785.Curran JM, Triggs CM, Buckleton J, Weir BS. 1999. Interpreting DNA mixtures in structured populations. Journal of Forensic

Sciences 44: 987–995.Evett IW, Weir BS. 1998. Interpreting DNA evidence; statistical genetics for forensic scientists. Sinauer: Sunderland.Gremy F. 1989. Crisis of meaning and medical informatics education: a burden and/or a relief? Methods of Information in

Medicine 28: 189–195.Hanzlıcek P, Spidlen J, Heroutova H, Nagy M. 2005. User interface of MUDR electronic health record, in Baud R.et al. (eds) Int J

Med Inform 74, 21-227.Haux R. 1997. Aims and tasks of medical informatics. International Journal of Medical Informatics 44: 3–10.Hu YQ, Fung WK. 2004. Interpreting DNA mixtures with related contributors in subdivided populations. Scandinavian Journal

of Statistics 31: 115–284.Interpol. 2005. Disaster victim identification guide. Lyon, France: Interpol; Available at http://www.interpol.com/public/

disastervictim/default.aspMartin-Sanchez F, Iakovidis I, Norager S, Maojo V, de Groen P, Van der Lei J, Jones T, Abraham-Fuchs K, Apweiler R, Babic A,

Baud R, Breton V, Cinquin P, Doupi P, Dugas M, Eils R, Engelbrecht R, Ghazal P, Jehenson P, Kulikowski C, Lampe K,De Moor G, Orphanoudakis S, Rossing N, Sarachan B, Sousa A, Spekowius G, Thireos G, Zahlmann G, Zvarova J, Hermosilla

Copyright # 2009 John Wiley & Sons, Ltd. Environmetrics 2009; 20: 743–750

DOI: 10.1002/env

750 J. ZVAROVA

I, Vicente FJ. 2004. Synergy between medical informatics and bioinformatics: facilitating genomic medicine for future healthcare. Journal of Biomedical Informatics 37: 30–42.

Marusiakova M. 2007. Forensic: Statistical Methods in Forensic Genetics. R. package version 0.2.Moehr JR. 1989. Teaching medical informatics: teaching on the seams of disciplines, cultures, traditions. Methods of Information

in Medicine 28: 273–280.Kohane S. 2000. Bioinformatics and clinical informatics: the imperative to collaborate. Journal of the American Medical

Informatics Association 7(5): 512–516.Spidlen J, Pies M, Teuberova Z, Nagy M, Hanzlıcek P, Zvarova J, Dostalova T. 2005. MUDRLite—an electronic health record

applied to dentistry by the usage of a dental-cross component EMBEC05. IFMBE Proceeding 11: 1077–1081.van Ginneken AM, Stqam H, van Mulligen EM, de Wilde M, van Mastrigt R, van Bemmel JH. ORCA: the versatile CPR.

Methods Inf Med, 1999; 38: 332–338.Zoubkova K, Zvarova J. 2004. Statistical methods in forensic genetics, Master’s Thesis, Charles University, Prague.Zvarova J. 1997. On the medical informatics structure. International Journal of Medical Informatics 44: 75–82.Zvarova J, Dostalova T, Hanzlıcek P, Teuberova Z, Nagy M, Pies M, Seydlova M, Eliasova H, Simkova H. 2008. Electronic Health

Record for Forensic Dentistry. Methods of Information in Medicine 47: 8–13.

Copyright # 2009 John Wiley & Sons, Ltd. Environmetrics 2009; 20: 743–750

DOI: 10.1002/env