Efficient and Accurate Clustering for Large-Scale Genetic Mappingveronika/Efficient and... · 2014....

Efficient and Accurate Clustering for Large-Scale Genetic Mapping

V. Strnadová (Neeley) , Aydın Buluç , Jarrod Chapman , Joseph Gonzalez ,

John Gilbert , Stefanie Jegelka , Daniel Rokhsar , Leonid Oliker

* ++ § ¶

*,++ * *, ¶ §

§++ *,§, ¶ *

Lawrence Berkeley National Labs, UC Santa Barbara, UC Berkeley, Joint Genome Institute

Motivation

• High-throughput sequencing methods have produced a flood of inexpensive genetic information

• Genetic maps are important to breeding studies but genetic mapping software is prohibitively slow on large data sets

The Genetic Mapping Problem

𝑖1 𝑖2 𝑖3 𝑖4 𝑖5 𝑖6

𝑚1 A B - - A -

𝑚2 A B A A B A

𝑚3 A A - - - B

𝑚4 A - B - B B

𝑚5 B - B A - A

𝑚6 A A B A - -

𝑚7 - - - A B B

𝑚8 A B A B - A

𝑚9 A B - B - -

𝑚10 B B B - A A

𝑚11 A A A A B B

𝑚12 B - A B A -

𝑚13 B B - A A -

𝑚14 - - - B A A

𝑚15 B - - A A B

(missing data)

𝑖1 𝑖2 𝑖3 𝑖4 𝑖5 𝑖6

𝑚1 A B - - A -

𝑚2 A B A A B A

𝑚3 A A - - - B

𝑚4 A - B - B B

𝑚5 B - B A - A

𝑚6 A A B A - -

𝑚7 - - - A B B

𝑚8 A B A B - A

𝑚9 A B - B - -

𝑚10 B B B - A A

𝑚11 A A A A B B

𝑚12 B - A B A -

𝑚13 B B - A A -

𝑚14 - - - B A A

𝑚15 B - - A A B

(missing data)

𝑚8𝑚9

𝑚15

𝑚4𝑚5

𝑚7𝑚10𝑚11𝑚12

𝑚13𝑚14

Linkage group 1

Linkage group 2

cluster

𝑖1 𝑖2 𝑖3 𝑖4 𝑖5 𝑖6

𝑚1 A B - - A -

𝑚2 A B A A B A

𝑚3 A A - - - B

𝑚4 A - B - B B

𝑚5 B - B A - A

𝑚6 A A B A - -

𝑚7 - - - A B B

𝑚8 A B A B - A

𝑚9 A B - B - -

𝑚10 B B B - A A

𝑚11 A A A A B B

𝑚12 B - A B A -

𝑚13 B B - A A -

𝑚14 - - - B A A

𝑚15 B - - A A B

(missing data)

𝑚8𝑚9

𝑚15

𝑚4𝑚5

𝑚7𝑚10𝑚11𝑚12

𝑚13𝑚14

𝑚15

Linkage group 1

Linkage group 2

cluster

Linkage group 1 Linkage group 2

𝑚11

𝑚13

𝑚12

𝑚10

𝑚5𝑚14

𝑖1 𝑖2 𝑖3 𝑖4 𝑖5 𝑖6

𝑚1 A B - - A -

𝑚2 A B A A B A

𝑚3 A A - - - B

𝑚4 A - B - B B

𝑚5 B - B A - A

𝑚6 A A B A - -

𝑚7 - - - A B B

𝑚8 A B A B - A

𝑚9 A B - B - -

𝑚10 B B B - A A

𝑚11 A A A A B B

𝑚12 B - A B A -

𝑚13 B B - A A -

𝑚14 - - - B A A

𝑚15 B - - A A B

(missing data)

𝒎𝟏

𝒎𝟐

𝒎𝟖𝒎𝟗

𝒎𝟏𝟓

𝒎𝟑

𝒎𝟒𝒎𝟓

𝒎𝟔

𝒎𝟕𝒎𝟏𝟎𝒎𝟏𝟏𝒎𝟏𝟐

𝒎𝟏𝟑𝒎𝟏𝟒

Linkage group 1

Linkage group 2

cluster

𝑚15

Linkage group 1 Linkage group 2

𝑚11

𝑚13

𝑚12

𝑚10

𝑚5𝑚14

The Need for Large-Scale Clustering in Genetic Mapping

• Hundreds of thousands of genetic markers available, but current software can only handle up to ~10,000 markers

• A major bottleneck is the linkage-group-finding phase

• Popular mapping tools all handle this phase the same way, with an 𝑂(𝑀2) clustering algorithm for 𝑀 markers

• Hundreds of thousands of genetic markers available, but current software can only handle up to ~10,000 markers

• A major bottleneck is the linkage-group-finding phase

• Popular mapping tools all handle this phase the same way, with an 𝑂(𝑀2) clustering algorithm for 𝑀 markers

Our solution: A fast, scalable clustering algorithm tailored to genetic marker data

The Need for Large-Scale Clustering in Genetic Mapping

Standard Approach to Genetic Marker Clustering

cluster

𝑖1 𝑖2 𝑖3 𝑖4 𝑖5 𝑖6

𝑚1 A B - - A -

𝑚2 A B A A B A

𝑚3 A A - - - B

𝑚4 A - B - B B

𝑚5 B - B A - A

𝑚6 A A B A - -

𝑚7 - - - A B B

𝑚8 A B A B - A

𝑚9 A B - B - -

𝑚10 B B B - A A

𝑚11 A A A A B B

𝑚12 B - A B A -

𝑚13 B B - A A -

𝑚14 - - - B A A

𝑚15 B - - A A B

𝑚8𝑚9

𝑚15

𝑚4𝑚5

𝑚7𝑚10𝑚11𝑚12

𝑚13𝑚14

Linkage group 1

Linkage group 2

𝑚1𝑚2

𝑚8𝑚9

𝑚15

𝑚4 𝑚5𝑚6

𝑚7𝑚10𝑚11

𝑚12

𝑚13𝑚14

Linkage group 1

Linkage group 2

𝑖1 𝑖2 𝑖3 𝑖4 𝑖5 𝑖6

𝑚1 A B - - A -

𝑚2 A B A A B A

𝑚3 A A - - - B

𝑚4 A - B - B B

𝑚5 B - B A - A

𝑚6 A A B A - -

𝑚7 - - - A B B

𝑚8 A B A B - A

𝑚9 A B - B - -

𝑚10 B B B - A A

𝑚11 A A A A B B

𝑚12 B - A B A -

𝑚13 B B - A A -

𝑚14 - - - B A A

𝑚15 B - - A A B

(1) Compute the similarity between all 𝑂(𝑀2) pairs of markers, producing a complete graph with 𝑀 vertices

• Similarity function is the “LOD score”, a logarithm of odds that two markers are genetically linked

• (2) Cut all edges below a LOD threshold

• (3) The resulting connected components = linkage groups

𝑚1𝑚2

𝑚8𝑚9

𝑚15

𝑚4 𝑚5𝑚6

𝑚7𝑚10𝑚11

𝑚12

𝑚13𝑚14

Linkage group 1

Linkage group 2

𝑖1 𝑖2 𝑖3 𝑖4 𝑖5 𝑖6

𝑚1 A B - - A -

𝑚2 A B A A B A

𝑚3 A A - - - B

𝑚4 A - B - B B

𝑚5 B - B A - A

𝑚6 A A B A - -

𝑚7 - - - A B B

𝑚8 A B A B - A

𝑚9 A B - B - -

𝑚10 B B B - A A

𝑚11 A A A A B B

𝑚12 B - A B A -

𝑚13 B B - A A -

𝑚14 - - - B A A

𝑚15 B - - A A B

• (2) Cut all edges below a LOD threshold

• (3) The resulting connected components = linkage groups

LOD ScoreCompares the likelihood of obtaining test data if the two markers are indeed linked, to the likelihood of observing the same data purely by chance:

𝐿𝑂𝐷(𝑚𝑖 , 𝑚𝑗) = log10𝑃(𝑙𝑖𝑛𝑘𝑎𝑔𝑒𝑖𝑗)

𝑃(𝑛𝑜 𝑙𝑖𝑛𝑘𝑎𝑔𝑒𝑖𝑗)

Formally,

𝐿𝑂𝐷 = log10(1 − 𝜃𝑖𝑗 )

𝑁𝑅𝑖𝑗𝜃𝑖𝑗𝑅𝑖𝑗

0.5𝑅𝑖𝑗+𝑁𝑅𝑖𝑗

Where:

𝑅𝑖𝑗 = number of recombinant offspring

𝑁𝑅𝑖𝑗 = number of nonrecombinant offspring 𝜃𝑖𝑗 = recombination fraction, i.e. 𝑅

𝑅+𝑁𝑅

𝑚𝑖 A B - - A -

𝑚𝑗 A B A A B A

𝐿𝑂𝐷(𝑚𝑖 , 𝑚𝑗) = log10𝑃(𝑙𝑖𝑛𝑘𝑎𝑔𝑒𝑖𝑗)

𝑃(𝑛𝑜 𝑙𝑖𝑛𝑘𝑎𝑔𝑒𝑖𝑗)

Formally,

𝑅𝑖𝑗𝜃𝑖𝑗𝑅𝑖𝑗

0.5𝑅𝑖𝑗+ 𝑅𝑖𝑗

Where:

𝑅𝑖𝑗 = number of nonrecombinant offspring 𝜃𝑖𝑗 = recombination fraction, i.e. 𝑅𝑖𝑗

𝑅𝑖𝑗+ 𝑅𝑖𝑗

𝑚𝑖 A B - - A -

𝑳𝑶𝑫(𝒎𝒊,𝒎𝒋) = 𝐥𝐨𝐠𝟏𝟎(𝟏 − 𝟏 𝟑)

𝟐( 𝟏 𝟑)𝟏

𝟎. 𝟓𝟑= 𝟎. 𝟎𝟕𝟒

Formally,

Where:

𝑅𝑖𝑗 = number of nonrecombinant offspring 𝜃𝑖𝑗 = recombination fraction, i.e. 𝑅𝑖𝑗

𝑅𝑖𝑗+ 𝑅𝑖𝑗

𝑚𝑖 A B - - A -

(2) Cut all edges below a LOD threshold

𝑖1 𝑖2 𝑖3 𝑖4 𝑖5 𝑖6

𝑚1 A B - - A -

𝑚2 A B A A B A

𝑚3 A A - - - B

𝑚4 A - B - B B

𝑚5 B - B A - A

𝑚6 A A B A - -

𝑚7 - - - A B B

𝑚8 A B A B - A

𝑚9 A B - B - -

𝑚10 B B B - A A

𝑚11 A A A A B B

𝑚12 B - A B A -

𝑚13 B B - A A -

𝑚14 - - - B A A

𝑚15 B - - A A B

𝑚1𝑚2

𝑚8𝑚9

𝑚15

Linkage group 2

𝑚4 𝑚5𝑚6

𝑚7𝑚10𝑚11

𝑚12

𝑚13𝑚14

Linkage group 1

(2) Cut all edges below a LOD threshold

(3) The resulting connected components = linkage groups

𝑖1 𝑖2 𝑖3 𝑖4 𝑖5 𝑖6

𝑚1 A B - - A -

𝑚2 A B A A B A

𝑚3 A A - - - B

𝑚4 A - B - B B

𝑚5 B - B A - A

𝑚6 A A B A - -

𝑚7 - - - A B B

𝑚8 A B A B - A

𝑚9 A B - B - -

𝑚10 B B B - A A

𝑚11 A A A A B B

𝑚12 B - A B A -

𝑚13 B B - A A -

𝑚14 - - - B A A

𝑚15 B - - A A B

𝑚1𝑚2

𝑚8𝑚9

𝑚15

Linkage group 2

𝑚4 𝑚5𝑚6

𝑚7𝑚10𝑚11

𝑚12

𝑚13𝑚14

Linkage group 1

Our Approach: The BubbleCluster Algorithm

Primary assumption: Clusters have a “linear structure”

Our Approach: The BubbleCluster Algorithm

Primary assumption: Clusters have a “linear structure”

Key idea: Maintain a set of representative or “sketch” points which reveal the cluster structure

LOD threshold

Input: set of markers, LOD threshold 𝜏, non-missing threshold 𝜂, low-quality threshold 𝑐, cluster size threshold 𝜎

Output: set of clusters C and set of representative points R

Phase I: Build initial set of clusters and set of representative points using high-quality markers (those with at least 𝜂 non-missing entries)

Phase II: Add low quality markers (less than 𝜂 non-missing entries) to intialset of clusters

Phase III: Attempt to merge small clusters with large

The BubbleCluster Algorithm: Overview

The BubbleCluster Algorithm

Iteration i:find 𝑟𝑀𝐴𝑋 ∶= 𝑟𝑗 for which 𝐿𝑂𝐷(𝑚, 𝑟𝑗) is

maximal;set 𝐶𝑀𝐴𝑋 ≔ 𝐶𝐾 ∈ 𝐶 containing 𝑟𝑀𝐴𝑋

𝐿𝑂𝐷(𝑚, 𝑟𝑗)

𝑟𝑗𝐶1 𝐶2

If (𝐿𝑂𝐷 𝑚, 𝑟𝑀𝐴𝑋 < 𝑳𝑶𝑫_𝒕𝒉𝒓𝒆𝒔𝒉𝒐𝒍𝒅)

𝒎𝐿𝑂𝐷(𝑚, 𝑟𝑀𝐴𝑋)

𝑟𝑀𝐴𝑋𝐶1 𝐶2

If (𝐿𝑂𝐷 𝑚, 𝑟𝑀𝐴𝑋 < 𝑳𝑶𝑫_𝒕𝒉𝒓𝒆𝒔𝒉𝒐𝒍𝒅)𝐶 = 𝐶 ∪ {𝑚}

𝑟𝑀𝐴𝑋𝐶1 𝐶2

Else If ( IS_INTERIOR 𝑟𝑀𝐴𝑋 )

𝑟𝑀𝐴𝑋

𝐶1𝐶2

𝐿𝑂𝐷(𝑚, 𝑟𝑀𝐴𝑋)

Else If ( IS_INTERIOR 𝑟𝑀𝐴𝑋 )𝐶𝑀𝐴𝑋 = 𝐶𝑀𝐴𝑋 ∪ {𝑚}

𝑟𝑀𝐴𝑋

𝐶1 = 𝐶𝑀𝐴𝑋𝐶2

Else If ( IS_EXTERIOR 𝑚, 𝑟𝑀𝐴𝑋 )

𝐶2𝐶1 = 𝐶𝑀𝐴𝑋

𝑟𝑀𝐴𝑋

Else If ( IS_EXTERIOR 𝑚, 𝑟𝑀𝐴𝑋 )Add 𝑚 to representative points of 𝐶𝑀𝐴𝑋Add 𝑚 to 𝐶𝑀𝐴𝑋

𝑟𝑀𝐴𝑋

Else // 𝑚 is interior to the outer point 𝑟𝑀𝐴𝑋

𝑟𝑀𝐴𝑋

Else // 𝑚 is interior to the outer point 𝑟𝑀𝐴𝑋Add 𝑚 to 𝐶𝑀𝐴𝑋

𝑟𝑀𝐴𝑋

𝐶1 𝐶2

If 𝑚 has a LOD score above the threshold to two clusters,

𝐶𝑁𝐸𝑊 = 𝐶1 ∪ 𝐶2

If 𝑚 has a LOD score above the threshold to two clusters, Then merge the clusters and add m to the merged cluster

End of Phase IStop when all markers with at least 𝜂 non-missing entries have been processed

Running time: 𝑶 |𝜢| 𝐥𝐨𝐠𝟐 𝚮 + |𝜢||𝑹|where: |𝛨| = size of high-quality marker set,

|𝑅| = size of representative point set

Phase II: add low-quality markers

Phase III: merge small clusters

BubbleCluster Parameters

LOD threshold

Non-missing threshold: determines how many markers are high-quality

Recall the LOD score: 𝐿𝑂𝐷 = log10(1 − 𝜃)𝑁𝑅𝜃𝑅

0.5𝑅+𝑁𝑅

Highest achievable LOD: lim𝑅→0log10

(1 − 𝜃)𝑁𝑅𝜃𝑅

0.5𝑅+𝑁𝑅= 𝑁𝑅 log10 2

𝑚𝑖 A B - - A -

LOD threshold

Recall the LOD score: 𝐿𝑂𝐷 = log10(1 − 𝜃𝑖𝑗 )

LOD threshold

Example LOD: log10(1 − 1 3)

2( 1 3)1

0.53= 0.074

𝑚𝑖 A B - - A -

LOD threshold

Highest achievable LOD: lim𝑅𝑖𝑗→0log10

(1 − 𝜃𝑖𝑗) 𝑅𝑖𝑗𝜃𝑖𝑗𝑅𝑖𝑗

0.5𝑅+ 𝑅𝑖𝑗

= 𝑅𝑖𝑗log10 2

𝑚𝑖 A B - - A -

Evaluation Metric: 𝐹-score

Given a golden standard clustering, the 𝐹-score measures the quality of another clustering by comparing it to the golden standard

Range: 0 – 1

The 𝐹-score is a harmonic mean of precision and recallAn 𝐹-score of 1 indicates perfect precision and perfect recall for every golden

standard cluster

Results: BubbleCluster on Real Data Sets

Dataset Size Time F-Score

Barley 64K 15 sec 0.9993

Switchgrass 113K 8.9 min 0.9745

Switchgrass 548K 1.9 hrs 0.9894

Wheat 1.58M 1.22 hrs N/A *

* Results under review at Genome Biology

Comparison of Clustering Algorithms for Simulated Data

ClusteringMethod

12.5K Markers 25K Markers

F-score Time F-score Time

JoinMap 0.99964 14 min 0.99982 46 min

MSTMap 0.99964 4.5 min 0.99982 20 min

PIC 0.47024 11 sec (+ 4min)

0.60782 44 sec (+ 16.5min)

BubbleCluster 0.99964 6 sec 0.99982 15 sec

Simulated data created with Nicholas Tinker’s Spaghetti Software.

12.5 25 50 100 200 400

Dataset Size (in thousands of markers)

error, 65% missing

error, 35% missing

runtime, 65% missing

runtime, 35% missing

Scaling Results for Bubble Cluster on Simulated Data

Effect of the LOD threshold

LOD threshold 5 10 15 20 25 30

F-Score 0.6225 0.9999 0.9999 0.9999 0.9999 0.9999

Time (s) 48.6 67.0 70.9 82.0 106 170

Fixed missing entry threshold, increasing LOD threshold

200K markers, 300 individuals, 35% missing rate

𝜏1𝜏2 vs.

Effect of the missing data threshold

Fixed LOD threshold, increasing non-missing entry threshold

Non-missing threshold

132 166 172 179 186 192

F-Score 0.9999 0.9999 0.9992 0.9930 0.9610 0.8948

Time (s) 82.0 84.6 82.7 83.0 81.7 82.0

200K markers, 300 individuals, 35% missing rate

𝜏 𝜏𝜏vs.

ConclusionBy exploiting the structure underlying genetic marker clusters, we were able to design a fast clustering algorithm tailored to genetic marker data

𝐶1 𝐶2

ConclusionBy exploiting the structure underlying genetic marker clusters, we were able to design a fast clustering algorithm tailored to genetic marker data

𝐶1 𝐶2While remaining highly accurate, we outperform popular existing tools in both runtime and scalability

Clustering Method 12.5K Markers 25K Markers

F-score Time F-score Time

JoinMap 0.99964 14 min 0.99982 46 min

MSTMap 0.99964 4.5 min 0.99982 20 min

PIC 0.47024 11 sec (+ 4 min)

0.60782 44 sec (+ 16.5 min)

BubbleCluster 0.99964 6 sec 0.999982 15 sec

Future Work

• Use representative points as starting point for ordering phase

Future Work

• Use representative points as starting point for ordering phase

• Provide a more thorough theoretical analysis of achievable clustering as well as order accuracy given assumptions about error and missing data rates

• Develop efficient and accurate, large-scale genetic mapping software

Thank You

Code for BubbleCluster soon available at: www.ucsb.edu/~veronika

Backup Slides

Choosing LOD and non-missing thresholdGoal: minimize 𝑷(mistake) and maximize 𝑭- score

Let 𝑝 = 𝑃 𝐿𝑂𝐷 𝑚𝑖 , 𝑚𝑗 > 𝐿𝑂𝐷𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝑥𝑖 ∈ 𝐶𝑖 , 𝑥𝑗 ∈ 𝐶𝑗 , 𝑖 ≠ 𝑗)

Let 𝜏 = LOD threshold

By definition of the LOD score, 𝑝 =1

10𝜏

Let 𝑛𝑐𝑜𝑚𝑝 = number of LOD comparisons we make

Then, if we want to ensure that (1 − 𝑝)𝑛𝑐𝑜𝑚𝑝< 1 − 𝜀 then we need:

𝜏 > log10(1

1 − (1 − 𝜀) 1𝑛𝑐𝑜𝑚𝑝)

At the same time, we want to include a marker in the high-quality set only if we expect that it will achieve a LOD of 𝜏 or greater with another marker, requiring:

𝑛𝑛𝑚 > 𝜏(1 − 𝜇) log10 2

Where 𝜇 is the missing rate and 𝑛𝑛𝑚 is the number of non-missing entries in the marker

Evaluation Metric for Cluster Quality

F-score (range: 0 to 1)• Given a “golden standard clustering”, the F-score measures the

quality of a clustering as follows:

• The F-score between a golden standard cluster 𝑔 and a test cluster 𝑐is a harmonic mean of precision 𝑃 and recall 𝑅:

𝐹𝑠𝑐𝑜𝑟𝑒 𝑔, 𝑐 =2𝑃𝑅

𝑃 + 𝑅• The overall F-score between the golden standard clustering G and a

test clustering C is a weighted average of the F-scores for each golden standard cluster 𝑔:

𝑜𝑣𝑒𝑟𝑎𝑙𝑙_𝐹𝑠𝑐𝑜𝑟𝑒 𝐺, 𝐶 =1

𝑔 ∋ 𝐺

𝑔 ∗ max𝑐 ∋𝐶𝐹𝑠𝑐𝑜𝑟𝑒(𝑔, 𝑐)

Local Linearity Assumption

Although the LOD score does not obey the triangle inequality, we assume that it does at close ranges and with enough data

𝑳𝑶𝑫(𝒎, 𝒓𝟏)𝑳𝑶𝑫(𝒎, 𝒓𝟐)

𝒓𝟏𝒓𝟐𝑳𝑶𝑫(𝒓𝟐, 𝒓𝟏)

𝑳𝑶𝑫 𝒎, 𝒓𝟐 < 𝑳𝑶𝑫 𝒎, 𝒓𝟏&&

𝑳𝑶𝑫 𝒎, 𝒓𝟐 < 𝑳𝑶𝑫 𝒓𝟏, 𝒓𝟐&&

𝑳𝑶𝑫 𝒎, 𝒓𝟏 > 𝑳𝑶𝑫 𝒓𝟏, 𝒓𝟐

𝒎 is a new boundary point

The Effect of the LOD threshold

The standard approach to clustering can be viewed as single linkage clustering

Time complexity: 𝑂(𝑀2)

LOD 10

𝑚15𝑚2𝑚1 𝑚8 𝑚9

𝑚4𝑚3 𝑚5 𝑚6 𝑚7 𝑚10 𝑚11 𝑚12 𝑚13 𝑚14

A high LOD threshold ensures that the only edges that remain in the completely connected graph are between markers that are extremely likely to be genetically linked

LOD 10

𝑚15𝑚2𝑚1 𝑚8 𝑚9

𝑚4𝑚3 𝑚5 𝑚6 𝑚7 𝑚10 𝑚11 𝑚12 𝑚13 𝑚14

Example: LOD threshold = 10

LOD 10

𝑚15𝑚2𝑚1 𝑚8 𝑚9

𝑚4𝑚3 𝑚5 𝑚6 𝑚7 𝑚10 𝑚11 𝑚12 𝑚13 𝑚14

𝑚1𝑚2

𝑚8𝑚9

𝑚15

Linkage group 2

𝑚4 𝑚5𝑚6

𝑚7𝑚10𝑚11

𝑚12

𝑚13𝑚14

Linkage group 1

LOD 10

𝑚15𝑚2𝑚1 𝑚8 𝑚9

𝑚4𝑚3 𝑚5 𝑚6 𝑚7 𝑚10 𝑚11 𝑚12 𝑚13 𝑚14

𝑚1𝑚2

𝑚8𝑚9

𝑚15

Linkage group 2

𝑚4 𝑚5𝑚6

𝑚7𝑚10𝑚11

𝑚12

𝑚13𝑚14

Linkage group 1

LOD 10

𝑚15𝑚2𝑚1 𝑚8 𝑚9

𝑚4𝑚3 𝑚5 𝑚6 𝑚7 𝑚10 𝑚11 𝑚12 𝑚13 𝑚14

𝑚1𝑚2

𝑚8𝑚9

𝑚15

Linkage group 2

𝑚4 𝑚5𝑚6

𝑚7𝑚10𝑚11

𝑚12

𝑚13𝑚14

Linkage group 1

Efficient and Accurate Clustering for Large-Scale Genetic Mappingveronika/Efficient and... · 2014....

Documents

THE USE OF GENETIC MARKERS IN PLANT BREEDING. Use of Molecular Markers Clonal identity, Family structure, Population structure, Phylogeny (Genetic Diversity)

Genetic variation and DNA markers in forensic analysis

Cattle Breeding Strategies using Genetic Markers as a

A review of molecular genetic markers and analytical

Molecular genetic markers associated with salt …applications.emro.who.int/imemrf/arab_j_biotec_2007_10_2...Genetic markers associated with salt tolerance in sorghum Arab J. Biotech.,

Molecular markers – a tool for exploring genetic diversity

Types of genetic markers multilocus markers (RAPD, AFLP, minisatelitte DNA fingerprinting) single-locus markers (allozymes, microsatelittes, SNPs) dominant

Sleep Disorder Trends, Epigenetic Markers, And Genetic

Identification of Novel Genetic Markers Associated with ... · Identification of Novel Genetic Markers Associated with Clinical Phenotypes of ... of Novel Genetic Markers Associated

Original Article Immunohistochemical and genetic markers ... · Original Article Immunohistochemical and genetic markers to distinguish hemangiopericytoma and meningioma Penglai Zhao1*,

Genetic Markers in AML

Use of Organelle Markers to Study Genetic Diversity in Soybean · 2013-01-10 · Use of Organelle Markers to Study Genetic Diversity in Soybean ... different markers application,

Genetic markers involved in inflammatory digestive pathology

DEVELOPMENT OF GENETIC MARKERS PANEL TO PREDICT …

Genetic distance estimated by RAPD markers and its

Little genetic differentiation as assessed by uniparental markers

Molecular Markers and Genetic Diversity in Neotropical Felids

Therapygenetics: Using genetic markers to predict response to

PLANT GENETIC MARKERS Plant Biotechnology Dr.Ir. Sukendah, MSc

Expression patterns, molecular markers and genetic ... · Expression patterns, molecular markers and genetic diversity of insect-susceptible and resistant Barbarea genotypes by comparative