46
Exploring Your Personal Genome with Free, Online Bioinformatics Tools by Shannon Bohle, BA, MLIS, CDS (Cantab), FRAS, AHIP .org 2014 Tech Conference

Exploring your personal genome with free, online bioinformatics tools

Embed Size (px)

Citation preview

Page 1: Exploring your personal genome with free, online bioinformatics tools

Exploring Your Personal Genome with Free, Online

Bioinformatics Tools by

Shannon Bohle, BA, MLIS, CDS (Cantab), FRAS, AHIP

.org 2014 Tech Conference

Page 2: Exploring your personal genome with free, online bioinformatics tools

What is the future of genomic sciences and bioinformatics? Ethical considerations of newborn screening: privacy, inaccuracy, discrimination, eugenics

Video: Gattica (1997): http://www.youtube.com/watch?v=1Q67bMYOm7E

Page 3: Exploring your personal genome with free, online bioinformatics tools

Reduced cost: The $1,000 genome “Illumina’s DNA Supercomputer Ushers in the $1,000 Human Genome” (January 14, 2014)http://www.businessweek.com/articles/2014-01-14/illuminas-dna-supercomputer-ushers-in-the-1-000-human-genome

+ Genome sequencing at birth:“Baby DNA Analysis Ushers in Brave New World of Treatment” (January 16, 2014)http://www.bloomberg.com/news/2014-01-16/baby-dna-analysis-ushers-in-brave-new-world-of-treatment-health.html

= Big industry“Illumina and a Billionaire Want to Jump-Start Genomics Upstarts” (February 17, 2014)http://www.businessweek.com/articles/2014-02-17/illimuna-and-billionaire-yuri-milner-to-aid-genomics-startups

The future of genomic sciences and bioinformatics is NOW.

Page 4: Exploring your personal genome with free, online bioinformatics tools

Presentation Overview: Predictive Pathology Hopefully you will learn a great deal today about the biological basis of disease. Specifically, we will discuss the following pathways in which disease can occur:

• At conception, chromosomes from both parents combine to pass on genetic material to a child. Sometimes when chromosomes combine there are problems that occur in this crossing over process called chiasma, and these variations are not inherited. • Chromosomal abnormalities like an addition, deletion, translocation, inversion, or insertion, are inherited. A common example of a structural variation would be Down Syndrome where there is an additional copy of chromosome 21.• Also at conception, because chromosomes contain DNA, the specific traits (called phenotypes) and the genetic code (called genotypes) are also transferred. Genotypes are always present, while phenotypes may be expressed (dominant) or hidden (recessive) in an individual. Recessive traits can be passed on through generations expressing themselves down the family line, and dominant traits can skip generations. A common example of an autosomal recessive heritable disease is sickle cell anemia.• During childhood and adulthood, factors like the environment (such as exposure to chemicals), diet, exercise, aging, et cetera can also damage genes, mutating them, and this may lead to disease. The branch of study examining context dependent, non-inherited factors is called epigenetics. An example of this is Protein misfolding. • Inherited and de novo (chiasmic, protein misfolding, and epigenetically-caused) variations can be studied in detail when looking at the level of either proteins or DNA (which is made of amino acids). Therefore, sequencing of plant, animal, and other forms of life have been done to try to understand and control biology, specifically biological function. The field of functional genomics designs technology tools that aid in diagnoses when biology malfunctions. About 40-60% of genes in a sequenced genome are related to biological function. Under different conditions, proteins may express themselves in novel, transient ways. These gene expressions are difficult to detect.

Trained professionals identify specific biomarkers, like JAK2, that have a high association with diseases. Knowing these in advance can sometimes influence a person’s lifestyle choices, such as having children, diet, and medical decisions. Because bioinformatics is a very new field, a genetic counselor should interpret test results to provide patients with guidance on two items. First, their level of risk by percentage, and second, the level of confidence scientists have that a specific biomarker actually causes a disease. Scientists determine this looking across species, through phylogenetics. But most importantly they learn about the genetic basis of human disease by using bioinformatics tools to compare DNA of patients who share the same disease and creating cell lines. That is why projects like the Personal Genome Project not only benefit the individual participant, but also contribute to advances in medicine and personalized medicine. “Personalized medicine is an emerging practice of medicine that uses an individual's genetic profile to guide decisions made in regard to the prevention, diagnosis, and treatment of disease” (NLM ‘s GHR glossary).

Having your genome sequenced provides an overview of your genetic background as well as the state of your genes at a given time.

Page 5: Exploring your personal genome with free, online bioinformatics tools

Courtesy of the Genetics & Public Policy Center with support from The Pew Charitable Trusts

Page 6: Exploring your personal genome with free, online bioinformatics tools

Genome: all hereditary genetic material of an organismChromosome: DNA, protein, and RNA found in cellsGene: strands of 5’ to 3’ DNA (promoters, exons, introns) (Humans have about 22,000 genes)Allele: one of 2 or more variants of each gene (two of which are inherited from parents)Genotype: coded information 2 Types: Homozygote: same alleles – AA, aa Heterozygote: different alleles – AaPhenotype: physical manifestation of a characteristic Dominant Trait: expressed Recessive Trait: not expressed

a) Autosomal Recessive: Two abnormal copies must be present to get the disorderb) X-linked Recessive: Females are carriers only

GENETICS (GENES/CHROMOSOMES)A Short Overview of Biological Inheritance (Heredity)

Described Through Cell Biology CELLGENOME

CHROMOSOMEGENEDNAAMINO ACIDS

Image Courtesy of Mayo Clinic: http://www.mayoclinic.org/procedure/genetic-testing/multimedia/genetic-disorders/sls-20076216

If you have a genetic disorder or are a carrier, will your children inherit it?

NOT NECESSARILY. SEE AN MD OR GENETIC COUNSELOR.

Page 7: Exploring your personal genome with free, online bioinformatics tools

GENETICS (GENES/CHROMOSOMES)Mitosis v. Meiosis

Some chromosome abnormalities are not inherited. De novo variants appear

for the first time in an individual. They can occur in recombination or

“crossing over” during mitosis or meiosis.

Image Credit: OpenStax College. "Laws of Inheritance." Connexions. February 24, 2014.

http://cnx.org/content/m44479/1.3.

Mitosis occurs with somatic cells. It results in two cells that are duplicates of the original cell. In other words, one cell with 46 chromosomes becomes two cells with 46 chromosomes each. This kind of cell division occurs throughout the body, except in the reproductive organs. This is how most of the cells that make up our body are made and replaced. These mutations are not passed on to children.

Meiosis occurs with germ cells. It results in cells with half the number of chromosomes (in diploid humans, 23 instead of the normal 46). These are the eggs and sperm. These mutations can be passed on to children in their stem cells. During gestation, the stem cells gain specificity as somatic cells of various types and germ cells to become a male or female child.

Source: http://www.genome.gov/11508982#6

ChiasmaDuring meiosis

chromosomal material crosses over

Page 8: Exploring your personal genome with free, online bioinformatics tools

Video: Cell Division and the Cell Cycle

http://www.youtube.com/watch?v=Q6ucKWIIFmg

Page 9: Exploring your personal genome with free, online bioinformatics tools

BIOCHEMISTRY (PROTEINS)A Short Overview of Molecular Biology

and Bioinformatics

Page 10: Exploring your personal genome with free, online bioinformatics tools
Page 11: Exploring your personal genome with free, online bioinformatics tools

Video: Central dogma of molecular biology (1958): replication, transcription and translationVariations (mutations) can occur during these processes, sometimes causing diseases

that can be passed on to children.

http://www.youtube.com/watch?v=Q_WRFw8KQk4

Page 12: Exploring your personal genome with free, online bioinformatics tools

http://www.youtube.com/watch?v=D3fOXt4MrOM

Video animation:The central dogma of molecular biology

"DNA The Secret of Life” by PBS

Page 13: Exploring your personal genome with free, online bioinformatics tools

After proteins are formed they fold into various shapes based on their chemical makeup. Misfolding is a second cause of de novo variants. Misfolding sometimes causes disease, and is

passed on to children. A linear analysis of amino acid chains in a protein cannot anticipate amino acids near each other when proteins fold so 3D modeling is used.

http://www.youtube.com/watch?v=Pjt1Q2ZZVjA“Simulating How Proteins Self-Assemble, Or Fold” by Stanford University

Video: Protein folding

Page 14: Exploring your personal genome with free, online bioinformatics tools

When cells go bad,control decisions must be made

that regulate the micro-“society.” Reform or Remove?

DNA ligase, an enzyme, (shown left, in color)

repairs mistakes in DNA.Some proteins, like p53,

(shown below) enforce cell death (apoptosis). P53 malfunction is one cause of

cancer, where cells with mutations grow out of control.

The Life Cycle of DNA

Page 15: Exploring your personal genome with free, online bioinformatics tools

Sir John Gurdon: Epigenetics Founder &

Nobel Laureate "for the discovery that

mature cells can be reprogrammed to become

pluripotent"

Turning back the clock on disease: Mature, specialized cells can be reverted to their embryonic stem cell state.

University of Cambridge, 2012, the year Gurdon won the Nobel Prize

Xenopus

Page 16: Exploring your personal genome with free, online bioinformatics tools

Protein-Protein InteractionHow proteins interact with one another

is key to understanding their function in the body.

Only 1% of the human genome

codes for 20,000 our proteins.Function is largely determined

on how proteins interact.

Epigenetics“Epigenetic mechanisms are affected by several factors and processes including development in utero and in childhood, environmental chemicals, drugs and pharmaceuticals, aging, and diet. DNA methylation is what occurs when methyl groups, an epigenetic factor found in some dietary sources, can tag DNA and activate or repress genes. Histones are proteins around which DNA can wind for compaction and gene regulation. Histone modification occurs when the binding of epigenetic factors to histone “tails” alters the extent to which DNA is wrapped around histones and the availability of genes in the DNA to be activated. All of these factors and processes can have an effect on people’s health and influence their health possibly resulting in cancer, autoimmune disease, mental disorders, or diabetes among other illnesses.” Image and description credit:

National Institutes of Health

Page 17: Exploring your personal genome with free, online bioinformatics tools

Comparative Genomics and PhylogenyTo locate new disease markers and learn how pathogens function,

it is helpful to examine ultra-conserved regions in cross-species protein & nucleic acid production, because these are most often linked to important bodily functions, disease and health.

(See: 1) Kumar S, Sanderford M, Gray VE, Ye J, Liu Li. Evolutionary diagnosis method for variants in personal exomes. Nature Methods (2012) p;9(9):855-6. doi:10.1038/nmeth.2147.

\2) Liu L, Kumar S. (2013) Evolutionary Balancing is Critical for Correctly Forecasting Disease

Associated Amino Acid Variants. Molecular Biology and Evolution 30:1252-1257 (Epub 2013 March 5))

About 5%-10% of the human genome are regulatory motifs across species, that turn genes “on” and “off” to control gene expression, in addition to the 1% used for coding proteins.

Page 18: Exploring your personal genome with free, online bioinformatics tools

Visualization of a Phylogenic Tree Using MEGA 6

Newick notation: ((((Cucumis sativus,Ricinus communis), Solanum lycopersicum), Medicago truncatula)(Arabidopsis thaliana,Capsella rubella))

Page 19: Exploring your personal genome with free, online bioinformatics tools

“Proteins are clustered on branches on the basis of the similarity of their amino acid sequences. The phylogenetic representation tends to cluster structurally (and sometimes functionally) related proteins. Drugs targeting a specific protein are more likely to be active against other proteins on the same branch. Distinct phylogenetic branches are highlighted with distinct colours (in the case of the malignant brain tumour (MBT) family, where only a few MBT domains are actually binding methyl-lysines, the red colour coding indicates the branch where all known methyl-lysine-binding domains are clustered). We assembled protein families by looking for domains associated with 'writing', 'reading' and 'erasing' acetyl and methyl marks in the Human Protein Reference Database, and by complementing the list with data from the literature, as well as data from the Pfam protein family database and the SMART (Simple Modular Architecture Research Tool) database. The phylogeny outlined in the trees is derived from multiple sequence alignments of the domain after which the family was named (full-length sequences were used for acetyltransferases as the catalytic domain is not always clearly defined for this family). If a domain is present multiple times in a protein, the protein is shown multiple times in the corresponding tree, followed by the sequential iteration of the domain in parenthesis for example, L3MBTL(2) corresponds to the second MBT domain of the protein L3MBTL. If multiple variants with insertions or deletions were reported for a gene, the variant number according to Swiss-Prot nomenclature is indicated after a hyphen: for example, TRIM33-2 in the tree of bromodomain-containing proteins corresponds to the second Swiss-Prot variant of the TRIM33 (tripartite motif-containing protein 33) bromodomain. For each tree, a seed alignment was derived from available protein structures by aligning residues that were superimposed in the three-dimensional space. Additional sequences were appended by aligning them to the closest seed sequence..”

http://www.nature.com/nrd/journal/v11/n5/fig_tab/nrd3674_F2.html

Phylogenetic trees of epigenetic protein families.

Page 20: Exploring your personal genome with free, online bioinformatics tools

Mega-genomics and

Next Generation Sequence Analysis

Page 21: Exploring your personal genome with free, online bioinformatics tools

Sequencing human DNA: The Human Genome Project and the Personal Genome Project

First Human Genomes Sequenced: 1) Dr. J. Craig Venter2) Dr. James D. Watson: Molecular Biology Founder & Nobel Laureate3) Personal Genome Project4) Hundred Person Wellness Project5) UK Personal Genome ProjectCold Spring Harbor Laboratory, 2006

Genome-Wide Association Studies (GWAS)compare one human genome to another to look for similarities and differences that might cause disease.

Page 22: Exploring your personal genome with free, online bioinformatics tools

Current understanding of the human genome, categorized by function of each gene product,

given both as number of genes and as percentage of all genes.

Image description and credit: Mikael Häggström (Wikimedia Commons)

Our understanding of function within the human genome is incomplete. More samples are needed for improved results.

Page 23: Exploring your personal genome with free, online bioinformatics tools

The Cost Reduction for Sequencing Genomes Greatly Outpaced Moore’s Law

Page 24: Exploring your personal genome with free, online bioinformatics tools

State Direct-to-Consumer Testing Statutes and Regulations Courtesy of the Genetics & Public Policy Center with support from The Pew Charitable Trusts

Page 25: Exploring your personal genome with free, online bioinformatics tools

Limitations of GINA

“The Genetic Information Nondiscrimination Act, known as GINA, does not apply to three types of insurance — life, disability and long-term care — that are especially important to people who may have serious inherited diseases … The American Medical Association’s code of ethics states that ‘it may be necessary’ for doctors to maintain a separate file for genetic test results so the information is not sent to insurers.”

-- “Fearing Punishment for Bad Genes,” The New York Times, April 7, 2014. http://www.nytimes.com/2014/04/08/science/fearing-punishment-for-bad-genes.html?_r=1

Genetic Information Nondiscrimination Act (GINA) of 2008: http://www.genome.gov/24519851

Page 26: Exploring your personal genome with free, online bioinformatics tools

Henrietta Lacks: The Ethics of Cell Line Development and Research

Henrietta Lacks, 1945. Image courtesy of The Lacks Family. (Source: Wikipedia).

Do you own your DNA?

Page 27: Exploring your personal genome with free, online bioinformatics tools

Testing Companies23andMe454 Life SciencesAdvanced Healthcare, IncAIBioTechAncestry DNAAtlas Sports GeneticsAthleticodeBiologis Personal Genomics ServiceBioresolveCounsylComplete GenomicsdeCODE GeneticsdeCODEme.comDNA-CARDIOCHECKDNA DTCDNATraitsEastern Biotech & Life ScienceseasyDNAEnteroLabFamily Tree DNAFuture GeneticsGeenitestiGenelex

GenePlanetGenetic HealthGenetic TechnologiesGenetic Testing LaboratoriesGeneyouinGenographic ProjectGenotekGentle LabsGraceful EarthHealthCheckUSAHelloGene / HelloGenomeHolistic HealthIDNA.comi-geneIlluminaIndian BiosciencesInoLife TechnologiesInterleukin Genetics JCVI Knome Lumigenix Map My Gene

MapMyGenome meragenome.com MyGene23 Navigenics Oxford Nanopore Technologies Pacific Biosciences Pathway Genomics Pediatrix Medical Group Perkin Elmer Genetics Personal Genome ProjectPersonalis PHENOM Biosciences Positive Bioscience Sequenom SNPedia Test Country Ubiome Viaguard/Accu-metrics vuGene Xcode Life Sciences

As of March 10, 2014, 23andMe had 650,000+ genotyped customers

Page 28: Exploring your personal genome with free, online bioinformatics tools

ScreeningsMore than 420 Conditions and Traits are Screened for During Genetic Testing.

CONDITIONSCANCERLIVER HEARTHEARINGSIGHT DIABETES PSYCHIATRIAC/PSYCHOLOGICALREPRODUCTIVE / STD (FERTILITY)REGULATORY FUNCTIONS (BREATHING, SLEEP, WEIGHT, RENAL)ADDICTION (ALCOHOL, DRUG)IMMUNE SYSTEM (HIV, AIDS)MUSCULO-SKELETAL (MARFANS)PHARMACOGENOMICS/DRUG EFFICACY (CANCER, WARFARIN)NEUROLOGICAL (PARKINSON’S, ALZHEIMER’S, MS)SKIN

ABILITIES & PHYSICAL TRAITSINTELLIGENCEENDURANCEEYE & HAIR COLOR

NCBI Resourceshttps://www.ncbi.nlm.nih.gov/variationhttp://www.ncbi.nlm.nih.gov/guide/genetics-medicinehttp://www.ncbi.nlm.nih.gov/books/NBK1116http://www.ncbi.nlm.nih.gov/medgenhttp://www.ncbi.nlm.nih.gov/mesh

Other Resourceshttp://www.omim.orghttp://www.orpha.nethttp://www.genome.govhttp://www.dnapolicy.org

See handout for

specific tests

Asclepius

Page 29: Exploring your personal genome with free, online bioinformatics tools

How to Submit Your DNA for Sequencing and Analysis with the Personal Genome Project

Basic eligibility: 1. US citizen age 21 or older2. Additional details: http://www.personalgenomes.org/harvard/protocols

How it Works: We will be using an existing volunteer’s genome for this presentation.

Steps:1. Provide Open Consent (form)2. Supply Medical History (form) 3. Donate DNA Samples (saliva, hair, blood, tissue) by self-collection or at a designated facility4. Samples Sent to Lab (blood=dna, tissue=exome, saliva=microbiome); tissue samples may be used to develop cell lines for research purposes5. Harvard’s PGP Team Analyzes Data for Anomalies and Creates a Personalized Health Prognosis Report 6. The PGP Team Publishes Your Information Online (Your data is associated with a volunteer number, but your name can also be used if you would like to do this)7. Safety follow-up monitoring by email8. Additional details: http://www.personalgenomes.org/harvard/howitworks

Page 30: Exploring your personal genome with free, online bioinformatics tools

Volunteer huA90CE6

John Lauerman In His Own Words

Whole Genome Sequence (WGS) Analysis

http://youtu.be/YGIxMYiPLOU

Page 31: Exploring your personal genome with free, online bioinformatics tools

Volunteer huA90CE6 = Case Study: John Lauerman (Harvard Analysis)JAK2-V617F and APOE-C130R variations

Step 1 Create a C:\data folder and download John Laurman’s genome from the PGP website:

https://my.pgp-hms.org/profile/huA90CE6. Examine the variant report on the same page.

Page 32: Exploring your personal genome with free, online bioinformatics tools

Locating and Interpreting Errors: Cytogenetic Location JAK2-V617F is located on the short arm of chromosome 9p (9pLOH). Sources: Kralovics R1, Passamonti F, Buser AS, Teo SS, Tiedt R, Passweg JR, Tichelli A, Cazzola M, Skoda RC. A gain-of-function mutation of JAK2 in myeloproliferative disorders. N Engl JMed. 2005

Apr 28;352(17):1779-90.

There are 22 chromosomes and X or Y. The first integer is the chromosome number.The second integer is the letter p or q, where p is the “short arm” and q is the “long arm.”The position is usually designated by two digits (representing a region and a band), which are sometimes followed by a decimal point and one or more additional digits (representing sub-bands within a light or dark area)http://ghr.nlm.nih.gov/handbook/howgeneswork/genelocation

LIST OF COMMON ERRORS BY CHROMOSOME NUMBER: http://ghr.nlm.nih.gov/chromosomes

9pLOHJanus kinase 2 –

Cytogenetic Location: 9p24http://ghr.nlm.nih.gov/chromosome/9

Human Gene JAK2 Transcript (Including UTRs) Position: chr9:4,985,245-5,128,183 Size: 142,939 Total Exon Count: 25 Strand: +Coding Region Position: chr9:5,021,988-5,126,791 Size: 104,804 Coding Exon Count: 23

Page 33: Exploring your personal genome with free, online bioinformatics tools

JAK2-V617F

Human Reference Genome - “Normal” JAK2 using UCSCGB

Right Click over JAK2 and choose

“Get DNA for JAK2.” Then, in the popup

window, choose “get DNA.” Using the shift key, highlight all

the information. “Save As” JAK2. .

Open the file with notepad to see JAK2

in more detail.

http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chr9%3A4985245-5128183

C:\Users\Shannon\Desktop\LRITA\LRITA_2014

Page 34: Exploring your personal genome with free, online bioinformatics tools

We will examine a volunteer’s “Variant” JAK2 with two free bioinformatics tools using Windows.

At the end of the talk there will be a list of additional non-Windows compatible tools for other systems like Linux, MAC, and iPad.

PGA

BLAST National Center for Biotechnical Information (NCBI)(Web-based)

Personal Genome Analyzer from Archivopedia

Page 35: Exploring your personal genome with free, online bioinformatics tools

Volunteer huA90CE6 -- Case Study: John Lauerman (Looking Closer with PGA & PyMOL)

Step 1 Download and install Python https://www.python.org/download/releases/2.7.5

Windows X86-64 MSI Installer (2.7.5) [1] (sig) Step 2

Download and install the PyMOL extension for a free 3D molecule viewerhttp://www.lfd.uci.edu/~gohlke/pythonlibs/#pymol (pymol 1.7.1.0.win amd64 py2.7.exe‑ ‑ ‑ )

Find application file C:\Python27\PyMOL\ Find PyMOL application file in the list and create shortcut.

Drag shortcut to the desktop. Double click icon on desktop to run PyMOL.

SETTING UP PYTHON (Win 7, 64-bit)

Note: Installing the extension may open a C prompt window to compile.

Step 3Download and install the wxPython extension: http://downloads.sourceforge.net/wxpython

/wxPython3.0-win64-3.0.0.0-py27.exe

Page 36: Exploring your personal genome with free, online bioinformatics tools

Volunteer huA90CE6 -- Case Study: John Lauerman (Looking Closer with PGA)Step 4

Use the Python-driven tool designed for this project to convert an isolated chromosome in your whole genome sequence from TSV to FASTA and SQL formats in under 5 minutes.

Note: The following sources were used to create the tool: Search engine - http://wiki.personal-genome.org/index.php?title=Talk:MtDNA_haplogroup

Human reference genome (rCRS) - http://www.ncbi.nlm.nih.gov/nuccore/251831106?report=fasta

Insert •Browse your hard drive for the Volunteer’s Whole Genome Sequence •File name: huA90CE6--GS000006909-ASM.tsv

Insert •Enter a single chromosome number you wish to examine•1-22, X, or Y; or leave blank for whole genome. [Enter 9]

Insert •Enter an exact location or leave at defaults if you wish to scan the whole chromosome or whole genome. [Use Default]

Check mark “Generate FA” for FASTACheck mark “Generate SQL” for SQL

Click the PROCESS BUTTON.Go to C:\data

for the converted files in FASTA and SQL formats.

Page 37: Exploring your personal genome with free, online bioinformatics tools

Volunteer huA90CE6 -- Case Study: John Lauerman (Looking Closer with PGA)

UNDER DEVELOPMENTAfter a single search of a whole

genome or chromosome,use PGA to view the FASTA file in

the “View FASTA” window. Or, view the exact location of variants

simply by clicking on the “Variants” tab.

This image shows some variants in John Lauerman’s Chr1 compared

to the Human Reference Genome.

Note: The following sources were used to create the tool: Search engine - http://wiki.personal-genome.org/index.php?title=Talk:MtDNA_haplogroup

Human reference genome (rCRS) - http://www.ncbi.nlm.nih.gov/nuccore/251831106?report=fasta

Future plans include adding reports with graphs and other

visualizations.

Page 38: Exploring your personal genome with free, online bioinformatics tools

Volunteer huA90CE6 Case Study: John Lauerman

(Looking Closer with BLAST)

Step 5 Use the generated FASTA file to perform a BLASTn search.

In this case, John Lauerman’s Chr9 filewas used

(after using PGA, it is located in C:\data with a .fa extension).

Page 39: Exploring your personal genome with free, online bioinformatics tools

Volunteer huA90CE6 Case Study: John Lauerman

(Looking Closer with BLAST)

Page 40: Exploring your personal genome with free, online bioinformatics tools

Free Tools for Other Platforms

CGATools(MacOS or LINUX only)

Download Complete Genomics Analysis Tools software and User Guide documentation:http://cgatools.sourceforge.net

CGA Tools 1.8.0 Software: CGA Tools 1.8.0 User Guide:

http://cgatools.sourceforge.net/docs/1.8.0/cgatools-user-guide.pdf

Illumina’s MyGenome AppRequires iOS 6.1 or later. Compatible with iPad.

http://www.illumina.com/clinical/clinical_informatics/mygenome_app.ilmn

Complete Genomics’ Genome Voyagerhttp://www.completegenomics.com/analysis-tools/voyager

Complete Genomics’ List of Third Party Tools: http://www.completegenomics.com/analysis-tools/third-party-tools

PyMOL for Linux and Mac: http://www.pymolwiki.org/index.php/Linux_Installhttp://www.pymolwiki.org/index.php/MAC_Install

Page 41: Exploring your personal genome with free, online bioinformatics tools

Using a mySQL database, it is possible to import many whole human genome sequences from

the PGP project by following the example in Slide #36 using PGA.

2. Consider the needed space allocation. ● Each unzipped TSV file of an entire genome is about 1.3 MB

TO GET STARTED 1. Determine the minimum and ideal sample sizes (number of volunteer DNA sequences)

for significance in your study (usually 10,000). The PGP aims for a collection of 100,000 sequenced genomes.

In silico human genome scientific studies can be conducted for the following applications: ● disease biomarker identification

● pharmacogenetics

3. Consider needed time for conversion and import into a mySQL database.

Create your own database Analyzing Collections of Whole Human Genomes

Through Multiple Sequence Alignments and Analysis

Page 42: Exploring your personal genome with free, online bioinformatics tools

WHAT’S NEXT?

Archivopedia LLC and Real Data, Inc are discussing a possible collaboration to provide sequencing and analysis of personal genomic data.This includes further development of the bioinformatics tool, the Personal Genome Analyzer (free basic version), and a service plan for personalized medical services.

BUSINESS PLAN: What would be different about us?Profit sharing with participants if cell lines are developed based upon their DNA samples

FDA Friendly: Results delivery and counseling provided by a Staff PhysicianConfidential: No insurance reporting (all participants must be self-pay)

Private: Sequences are not published on the internet—protects against mosaic effectFree tool plus a new subscription service for medical professionals

Fast turnaroundIllumina 2500 sequencers for human genome sequencing

Storage in 2 encrypted supercomputers to help prevent data breachesExpertise creating novel algorithms & performing data analysis, text mining, AI, NLP

Probability value (p value) reporting percentages of likelihood of pathogenesisReporting and visualization tools integrating a variety of free online tools and databasesElectronic Health Record (EHR) integration to bridge research and clinical domains and to

help achieve meaningful use objectives

Page 43: Exploring your personal genome with free, online bioinformatics tools

Create & Deliver Ordered Lab Reports

Sequence

Compare & Compute

Perform Text Mining on Medical Literature

(MESH, Informatics Taxonomies)

Samples

Isolate, Replicate & Monetize Cell Lines for Research

$$$

Workflow Overview (Simplified)

National Human Genome Research Institute’s Undiagnosed Diseases Program &

FDA pharmacogenics list

ID Biomarkers & Publish Academic Papers

Search Variant Databases Everyone has about 3-4 million variants. Which are pathogenic and which are not?Key: Look for nucleotide changes in exon

regions that code for different amino acids than the human reference genome

because these can affect function.Databases of Known Human Variants

NCBI Variant DatabasesDatabase of Genomic Variants Cancer Genome Consortium

Cancer Genome AtlasOnline Mendelian Inheritance in Man

$$$

Page 44: Exploring your personal genome with free, online bioinformatics tools

Selected Bibliography

ALBERTS, B. (1983). Molecular biology of the cell. New York, Garland Pub.CAREY, N. (2013). Epigenetics revolution: how modern biology is rewriting our understanding of genetics, disease, and inheritance.CHURCH, G. M., & REGIS, E. (2012). Regenesis: how synthetic biology will reinvent nature and ourselves. New York, Basic Books.SCHRODINGER, E. (2012). What is life?: the physical aspect of the living cell. Cambridge,

Univ. Press.SKLOOT, R. (2010). The immortal life of Henrietta Lacks. New York, Crown Publishers.VENTER, J. C. (2007). A life decoded: my genome, my life. New York, Viking.WATSON, J. D. (1968). The double helix; a personal account of the discovery of the structure of DNA. New York, Atheneum. [SIGNED FIRST EDITION]WATSON, J. D. (2008). Molecular biology of the gene. San Francisco, Pearson/Benjamin Cummings.ZVELEBIL, M. J., & BAUM, J. O. (2008). Understanding bioinformatics. New York, Garland Science.

Page 45: Exploring your personal genome with free, online bioinformatics tools

Credits

Personal Genome Project (Harvard)MITx: 7.00x: Introduction to Biology - The Secret of Life

(14 weeks) : Eric Lander (MIT, Harvard)Bioinformatic Methods I | Coursera

(6 weeks): Nicholas Provart - (University of Toronto)Bioinformatic Methods II | Coursera

(6 weeks): Nicholas Provart - (University of Toronto)Illumina

Gattica (screenshot) Genetics & Public Policy Center

Mayo ClinicStanford University

Mega 6JMOLNIH

UCSC Genome DatabasePyMOL

CGA ToolsComplete Genomics

MyGenome AppNCBI – BLAST

PBSMG – RAST

NatureReal Data, Inc.

Dr. Frank N. Kautzmann III, PhD.John Lauerman

Tracy KovachMikael Häggström

Database of Genomic VariantsNational Human Genome Research InstituteInternational Society of Genetic Genealogy

Personal Genome Analyzer: Architect: S. Bohle, Programmers: D. Yount, W. McCready

Page 46: Exploring your personal genome with free, online bioinformatics tools

Contact Information

Archivopedia.com