Transcript
Page 1: gggg$gUJggUg f Contasion gg - aipl.arsusda.gov · gg y5-sequencing y6-ss yk-ssss yb-ss y/-sss y0-ssults W/-gUJg Cattle Breed Number of Animals Millions of base pairs (Mbp) Bos indicusBrahman

Contactsinformation

ReadsDepthsyRD-

ReadsPairsyRP-

SplitsReadsySR-

SNPsandsINDELsyGATK-

MrsFASTsAlignments

BWAsAlignment

P3A3M3yMergedsCNVs-

Filtering

Summarysof

Variants

2sConfig3sFile2sReferencesGenome2sNum3sofsProcessors2sRawsSequencesReads

Annotation

Annotation

Prep

rocessin

g

Rea

dsS

plit

tin

g

Disc3sR

eads

Cou

ntsR

eads

OpengSourcegToolsgtog$xploitgUNJgSequencegUatagfromgLivestockgSpecies

y5-sSequencing y6-sPreparesFiles yk-sDividesandsConquersAlignment yb-sVariantsCalling y/-sMergersandsFiltration y0-sResults

W/-gUNJgResequencing

Cattle BreedNumber

ofAnimals

Millions ofbase pairs

(Mbp)

Bos indicus Brahman 7 116.98

Bos indicus Gir 6 146.87

Bos indicus Nelore 8 170.44

Bos taurus Angus 17 1027.09

Bos taurus Holstein 32 718.95

Bos taurus Jersey 8 144.6

Bos taurus Limousin 7 155.47Bos taurus Romagnola 4 92.57

ThefProblem:

ProposedfSolution:f

ggggggggggggggggggggggggggggggggggggUNJgSequencegdatagallowsgresearchersgtogidentifygvariationsgwithingtheggenomesgofgindividualsgofgagspeciesgthatgcouldgimpactgimportantgproductiongtraitsMghowever"gtoolsgdesignedgtogidentifygthesegvariationsgaregoftengdesignedgspecificallygforgmousegorghumangstudiesC

ggggggggggggggggggggggggggggggggggggggggggggggggggWegaregdesigninggangopenBsourcegpipelinegforgtheganalysisgofgUNJgsequencinggstudiesginvolvinggnonBmodelgorganismsgandgagriculturalgspeciesCgThegpipelinegwillgbegfreegforgacademicgandgresearchgusegandgthegsourcegcodegforgallgwrittengprogramsgwillgbegreleasedgwithgthegfinalgpackageCgJdditionally"gthegresultsgofgtheganalysisgwillgbegprovidedgtogthegendBusergingeasilygaccessiblegforms"gsuchgasgexcelgspreadsheets"gtextgsummariesgandgpdfgfilegplotsC

9ng orderg tog makeg managementg ofg eachg stepg moreg flexible"g weg havegimplementedg ag configurationg fileg controlg systemg forg runningg thegpipelineCg Theg configurationg fileg containsg theg locationsg ofg sequencegfilesg Wwfastqwg files-g andg theg locationsg ofg installedg pipelineg programsgandgscriptsCg 9ngorderg togspeedgupgcomputationgtime"g thegpipelinegcangmakeg useg ofg multipleg processorg coresg andg highg performancegcomputinggarchitecturesCg

SequencegfilesgWwfastqwgfiles-garegfirstgsplitgintogsmallergchunksgandgalignedgtogthegreferenceggenomegusinggasgmanygprocessorsgasgallowedCgJlignmentgisgagprocessgthatgcomparesgrawgsequencegdatagbackgtogthegoriginalgreferenceggenomegandgidentifiesgthegpartgofgtheggenomegthatgthegsequenceglikelygoriginatedgfromCgXecausegthisgpipelinegsplitsgupgthegsequencegreadsgintogsmaller"gmanageablegchunks"gitgisgablegtogusegmultiplegprocessorsgtogspeedgupgalignmentgWthisgstrategygisgcalledgwUividegandgzonquerw-C

Jlignedg sequenceg datag isg theng processsedg usingg severalg programsg ingorderg tog identifyg zopyg Numberg Variantsg WzNVs-"g Singleg NucleotidegPolymorphismsg WSNPs-g andg 9nsertionVUeletionsg W9NU$Ls-Cg zNVsg aregcalledgusinggthreegalgorithmsgusedgbygtheg:umang/PPPg7enomesgProjectg[H]g WSR"g RPg andg RU-g whereasg SNPsg andg 9NU$Lsg areg calledg byg thegXroadg9nstitute.sg7enomegJnalysisgToolkitgW7JTK-C

ThegresultsgofgthegthreegzNVgcallinggalgorithmsgaregmergedgusinggangalgorithmgwegcallgwPrecisiongJwaregMergerwgWPCJCMC-gthatgtakesgintogaccountgtheguniquegadvantagesgandgdisadvantagesgofgeachgmethodCgThegSNPgandg9NU$Lgcallsgaregfurthergfilteredgingordergtogimprovegquality

zNV"g SNPg andg 9NU$Lg callsg areg theng crossgreferencedg againstg informationg knowng aboutg theggenomegingordergtogseegifgtheygimpactggenesgorgothergfunctionalg geneticg regionsg Wthisg isg calledgwJnnotationw-CgJfterwards"gthegresultsgaregtabulatedgandgpresentedgtogthegusergingspreadsheetsgandgplotsC

WH-gPrepareg8ilesW0-gUividegandgzonquergJlignment

WG-gVariantgcalling

Wj-gMergergandg8iltration

W[-g8inalgResults

UerekgMCgXickhart/"g@anagLCg:utchison/"gLingyanggXuH"0"g@iuzhougSong0"g7eorgeg$CgLiuHg

/USUJ"gJRS"gJnimalg9mprovementgProgramgLaboratory"gXJRzHUSUJ"gJRS"gXovineg8unctionalg7enomicsgLaboratory"gXJRz0UniversitygofgMaryland"gUepartmentgofgJnimalgandgJviangSciences"gzollegegPark"gMU

AfTestfRun:f100fSequencedfBullsTheg:umang7enomegProject"gwhichgwasgthegfirstg majorg UNJg sequencingg projectg forg aglargeg $ukaryoticg genome"g lastedg /Pg yearsgandgcostgapproximatelygy0gbillionCgThegcostgtog sequenceg newg genomesg hasg droppedgprecipitouslyg sinceg thatg projectg concluded"gandg currentg priceg estimatesg hoverg aroundgy["PPPgtogy]"PPPgpergindividualgsequencedCgMuchgofgthisgpricegreductionggisgduegtognewgmethodsg ofg sequencingg andg improvedginstrumentationg WeCgCg theg :iSeqg HPPPMgpicturedgrightginset-Cgggg

Ourg pipelineg wasg designedg forg theg explicitg purposeg ofganalyzingg UNJg sequenceg datag fromg organismsg thatgalreadyghavegagfinishedgreferenceggenomegprojectCgJsgofgtheg writingg ofg thisg poster"g thisg includesg [g agriculturalganimalg speciesg andg /2g agriculturalg plantg speciesg withgmanyg moreg tog shortlyg followCg Theg reasong whygresearchersg resequenceg individualsg ofg ag speciesg thatgalreadyghavegagcompletedgreferenceggenomegisgtogidentifygvariationsg ing UNJg sequenceg withing thatg individual.sggenomeCg Theseg variationsg cang beg linkedg tog diseasegsusceptibilityg Wpowderyg mildewg susceptibilityg ingJrabidopsisgWa--gorgproductivegtraitsgWwhitegcoatgcolorgingSheepgWb--C

MGCls4c9s_aMGCls4c9s_b

5c lcc l5c dcc d5c

CopyDNumber

PAGl5

GIMAP4

GIMAP7

PAGdl

CATHL4

TUBAlB

BNBDlc_b

PAG6

BNBDlc_a

TAP

LAP

ISGld8AK

GIMAPl

DEFB5

LOClccld68l5

KRTAP9−d

DEFBl

BNBD−4

FBXOl6

IFNBl

IFNBs

SUHWd

LOC78c876

c lc dc sc 4c 5c

BINEBTANlBTANdBTANsBTHODTTRACE

Antimicrobialpeptides

JnalysisgzopygNumberg9ndicatesg$xpansiongofg9mmunegSystemg7enes

Table:fOurgdatasetgwasgcomposedgofgeightgdifferentgbreedsgofgcattlefromgtwogcattlegsubspeciesKgBos indicus Worgwzebuw-gandgBos taurus.

$achgindividualgwasgsequencedgtogatgleastg[Xgcoverage"gandgseveralganimalsgweregsequencedgtoggreatergdepthgingordergtogprovidegagcontrastC

Figure:gJgsummarygofgallgvariantsgcurrentlygdetectedgusinggourgpipelineg showsg ag clearg differenceg betweeng indicusg andg taurusgsubspeciesg ing theg numberg ofg SNPsg identifiedCg 9nsertionsg andgdeletionsgvarygfargmoregamonggbreedsgthangsubspeciesgandgmaygreflectgsmallergphenotypicgdifferencesC

SummarygofgVariantsgUetectedgbygthegPipeline

Figure:g Jssociationg ofg geneticg featuresg withg variantsg WJnnotation-g hasgrevealedg severalg interestingg biologicalg featuresg [/]Cg Jntimicrobialg peptides"gwhichg serveg asg ag firstg lineg ofg defenseg forg theg immuneg system"g areg oftengduplicatedCg Thisg mayg indicateg ang evolutionaryg warmsBracewg betweeng theganimalgandgenvironmentgasgbacteriagevolvegtogresistgJntimicrobialgcompoundsgovergtimeC

MorefInformation

Pipeline Start Pipeline

Conclusion

References[/]gXickhart"getgalCgHP/HCgzopygNumbergVariationgofg9ndividualgzattleg7enomesgusinggNextB7enerationgSequencingCg7enomegResearchCgHHKg22]B2'P[H]gMils"getgalCgHP//CgMappinggzopygNumbergVariationgbygPopulationBScaleg7enomegSequencingCgNatureCgG2PCgj'B[jg

ProjectgSourcegzodeggWJlphagStage-sourceforgeCnetVprojectsVcosvardVgg

UerekgXickhartUSUJgJRSgJ9PL

derekCbickhart6arsCusdaCgov

PhoneKgW0P/-gjPGgBg]j'H

ThissworkswasssupportedsbysNRIvAFRIsgrantsno3s645520745/2k458ksfromsthesUSDAsNIFA

Recommended