Upload
vuhanh
View
215
Download
0
Embed Size (px)
Citation preview
Contactsinformation
ReadsDepthsyRD-
ReadsPairsyRP-
SplitsReadsySR-
SNPsandsINDELsyGATK-
MrsFASTsAlignments
BWAsAlignment
P3A3M3yMergedsCNVs-
Filtering
Summarysof
Variants
2sConfig3sFile2sReferencesGenome2sNum3sofsProcessors2sRawsSequencesReads
Annotation
Annotation
Prep
rocessin
g
Rea
dsS
plit
tin
g
Disc3sR
eads
Cou
ntsR
eads
OpengSourcegToolsgtog$xploitgUNJgSequencegUatagfromgLivestockgSpecies
y5-sSequencing y6-sPreparesFiles yk-sDividesandsConquersAlignment yb-sVariantsCalling y/-sMergersandsFiltration y0-sResults
W/-gUNJgResequencing
Cattle BreedNumber
ofAnimals
Millions ofbase pairs
(Mbp)
Bos indicus Brahman 7 116.98
Bos indicus Gir 6 146.87
Bos indicus Nelore 8 170.44
Bos taurus Angus 17 1027.09
Bos taurus Holstein 32 718.95
Bos taurus Jersey 8 144.6
Bos taurus Limousin 7 155.47Bos taurus Romagnola 4 92.57
ThefProblem:
ProposedfSolution:f
ggggggggggggggggggggggggggggggggggggUNJgSequencegdatagallowsgresearchersgtogidentifygvariationsgwithingtheggenomesgofgindividualsgofgagspeciesgthatgcouldgimpactgimportantgproductiongtraitsMghowever"gtoolsgdesignedgtogidentifygthesegvariationsgaregoftengdesignedgspecificallygforgmousegorghumangstudiesC
ggggggggggggggggggggggggggggggggggggggggggggggggggWegaregdesigninggangopenBsourcegpipelinegforgtheganalysisgofgUNJgsequencinggstudiesginvolvinggnonBmodelgorganismsgandgagriculturalgspeciesCgThegpipelinegwillgbegfreegforgacademicgandgresearchgusegandgthegsourcegcodegforgallgwrittengprogramsgwillgbegreleasedgwithgthegfinalgpackageCgJdditionally"gthegresultsgofgtheganalysisgwillgbegprovidedgtogthegendBusergingeasilygaccessiblegforms"gsuchgasgexcelgspreadsheets"gtextgsummariesgandgpdfgfilegplotsC
9ng orderg tog makeg managementg ofg eachg stepg moreg flexible"g weg havegimplementedg ag configurationg fileg controlg systemg forg runningg thegpipelineCg Theg configurationg fileg containsg theg locationsg ofg sequencegfilesg Wwfastqwg files-g andg theg locationsg ofg installedg pipelineg programsgandgscriptsCg 9ngorderg togspeedgupgcomputationgtime"g thegpipelinegcangmakeg useg ofg multipleg processorg coresg andg highg performancegcomputinggarchitecturesCg
SequencegfilesgWwfastqwgfiles-garegfirstgsplitgintogsmallergchunksgandgalignedgtogthegreferenceggenomegusinggasgmanygprocessorsgasgallowedCgJlignmentgisgagprocessgthatgcomparesgrawgsequencegdatagbackgtogthegoriginalgreferenceggenomegandgidentifiesgthegpartgofgtheggenomegthatgthegsequenceglikelygoriginatedgfromCgXecausegthisgpipelinegsplitsgupgthegsequencegreadsgintogsmaller"gmanageablegchunks"gitgisgablegtogusegmultiplegprocessorsgtogspeedgupgalignmentgWthisgstrategygisgcalledgwUividegandgzonquerw-C
Jlignedg sequenceg datag isg theng processsedg usingg severalg programsg ingorderg tog identifyg zopyg Numberg Variantsg WzNVs-"g Singleg NucleotidegPolymorphismsg WSNPs-g andg 9nsertionVUeletionsg W9NU$Ls-Cg zNVsg aregcalledgusinggthreegalgorithmsgusedgbygtheg:umang/PPPg7enomesgProjectg[H]g WSR"g RPg andg RU-g whereasg SNPsg andg 9NU$Lsg areg calledg byg thegXroadg9nstitute.sg7enomegJnalysisgToolkitgW7JTK-C
ThegresultsgofgthegthreegzNVgcallinggalgorithmsgaregmergedgusinggangalgorithmgwegcallgwPrecisiongJwaregMergerwgWPCJCMC-gthatgtakesgintogaccountgtheguniquegadvantagesgandgdisadvantagesgofgeachgmethodCgThegSNPgandg9NU$Lgcallsgaregfurthergfilteredgingordergtogimprovegquality
zNV"g SNPg andg 9NU$Lg callsg areg theng crossgreferencedg againstg informationg knowng aboutg theggenomegingordergtogseegifgtheygimpactggenesgorgothergfunctionalg geneticg regionsg Wthisg isg calledgwJnnotationw-CgJfterwards"gthegresultsgaregtabulatedgandgpresentedgtogthegusergingspreadsheetsgandgplotsC
WH-gPrepareg8ilesW0-gUividegandgzonquergJlignment
WG-gVariantgcalling
Wj-gMergergandg8iltration
W[-g8inalgResults
UerekgMCgXickhart/"g@anagLCg:utchison/"gLingyanggXuH"0"g@iuzhougSong0"g7eorgeg$CgLiuHg
/USUJ"gJRS"gJnimalg9mprovementgProgramgLaboratory"gXJRzHUSUJ"gJRS"gXovineg8unctionalg7enomicsgLaboratory"gXJRz0UniversitygofgMaryland"gUepartmentgofgJnimalgandgJviangSciences"gzollegegPark"gMU
AfTestfRun:f100fSequencedfBullsTheg:umang7enomegProject"gwhichgwasgthegfirstg majorg UNJg sequencingg projectg forg aglargeg $ukaryoticg genome"g lastedg /Pg yearsgandgcostgapproximatelygy0gbillionCgThegcostgtog sequenceg newg genomesg hasg droppedgprecipitouslyg sinceg thatg projectg concluded"gandg currentg priceg estimatesg hoverg aroundgy["PPPgtogy]"PPPgpergindividualgsequencedCgMuchgofgthisgpricegreductionggisgduegtognewgmethodsg ofg sequencingg andg improvedginstrumentationg WeCgCg theg :iSeqg HPPPMgpicturedgrightginset-Cgggg
Ourg pipelineg wasg designedg forg theg explicitg purposeg ofganalyzingg UNJg sequenceg datag fromg organismsg thatgalreadyghavegagfinishedgreferenceggenomegprojectCgJsgofgtheg writingg ofg thisg poster"g thisg includesg [g agriculturalganimalg speciesg andg /2g agriculturalg plantg speciesg withgmanyg moreg tog shortlyg followCg Theg reasong whygresearchersg resequenceg individualsg ofg ag speciesg thatgalreadyghavegagcompletedgreferenceggenomegisgtogidentifygvariationsg ing UNJg sequenceg withing thatg individual.sggenomeCg Theseg variationsg cang beg linkedg tog diseasegsusceptibilityg Wpowderyg mildewg susceptibilityg ingJrabidopsisgWa--gorgproductivegtraitsgWwhitegcoatgcolorgingSheepgWb--C
MGCls4c9s_aMGCls4c9s_b
5c lcc l5c dcc d5c
CopyDNumber
PAGl5
GIMAP4
GIMAP7
PAGdl
CATHL4
TUBAlB
BNBDlc_b
PAG6
BNBDlc_a
TAP
LAP
ISGld8AK
GIMAPl
DEFB5
LOClccld68l5
KRTAP9−d
DEFBl
BNBD−4
FBXOl6
IFNBl
IFNBs
SUHWd
LOC78c876
c lc dc sc 4c 5c
BINEBTANlBTANdBTANsBTHODTTRACE
Antimicrobialpeptides
JnalysisgzopygNumberg9ndicatesg$xpansiongofg9mmunegSystemg7enes
Table:fOurgdatasetgwasgcomposedgofgeightgdifferentgbreedsgofgcattlefromgtwogcattlegsubspeciesKgBos indicus Worgwzebuw-gandgBos taurus.
$achgindividualgwasgsequencedgtogatgleastg[Xgcoverage"gandgseveralganimalsgweregsequencedgtoggreatergdepthgingordergtogprovidegagcontrastC
Figure:gJgsummarygofgallgvariantsgcurrentlygdetectedgusinggourgpipelineg showsg ag clearg differenceg betweeng indicusg andg taurusgsubspeciesg ing theg numberg ofg SNPsg identifiedCg 9nsertionsg andgdeletionsgvarygfargmoregamonggbreedsgthangsubspeciesgandgmaygreflectgsmallergphenotypicgdifferencesC
SummarygofgVariantsgUetectedgbygthegPipeline
Figure:g Jssociationg ofg geneticg featuresg withg variantsg WJnnotation-g hasgrevealedg severalg interestingg biologicalg featuresg [/]Cg Jntimicrobialg peptides"gwhichg serveg asg ag firstg lineg ofg defenseg forg theg immuneg system"g areg oftengduplicatedCg Thisg mayg indicateg ang evolutionaryg warmsBracewg betweeng theganimalgandgenvironmentgasgbacteriagevolvegtogresistgJntimicrobialgcompoundsgovergtimeC
MorefInformation
Pipeline Start Pipeline
Conclusion
References[/]gXickhart"getgalCgHP/HCgzopygNumbergVariationgofg9ndividualgzattleg7enomesgusinggNextB7enerationgSequencingCg7enomegResearchCgHHKg22]B2'P[H]gMils"getgalCgHP//CgMappinggzopygNumbergVariationgbygPopulationBScaleg7enomegSequencingCgNatureCgG2PCgj'B[jg
ProjectgSourcegzodeggWJlphagStage-sourceforgeCnetVprojectsVcosvardVgg
UerekgXickhartUSUJgJRSgJ9PL
derekCbickhart6arsCusdaCgov
PhoneKgW0P/-gjPGgBg]j'H
ThissworkswasssupportedsbysNRIvAFRIsgrantsno3s645520745/2k458ksfromsthesUSDAsNIFA