Upload
prasoon-kumar
View
35
Download
0
Embed Size (px)
DESCRIPTION
Modern applications require high scalability and easy development from the datastores. Earlier the only option was relational databases; but now we see a host of nosql options coming to the table. This whitepaper discusses advantages and migration considerations.
Citation preview
Introduction Therelationaldatabasehasbeenthefoundationofenterprisedatamanagementforoverthirtyyears.Buttheway,webuildandrunapplicationstoday,coupledwithunrelentinggrowthinnewdatasourcesandgrowinguserloadsarepushingrelationaldatabasesbeyondtheirlimits.So,weareseeingalotofapplicationsbeginningtomigratetoNoSQL,likeMongoDBcompletelyormovetoahybridmodelutilizingbothRDBMSandMongoDBforpersistingdifferentpartsofthesamesystem.ThiswhitepaperdiscusseshowMongoDB,theleadingnosqldatabaseoffersanewandbetterapproachforpersistingdatathanrelationaldatabases.Table of Contents
MongoDBadvantagesRelationalDatabasechallengesMongoDBSolution
AgilityPerformanceScalabilityHighAvailability
DetailedExplanationFromrigidtablestoflexibleanddynamicBSONdocumentsApplicationintegrationAtomicityinMongoDBMigrationtoolsfromrelationaltoMongoDBMongoDBMMS
Conclusion
MongoDB advantages DynamicSchemadesignenablesrapiddevelopmentwithoutalonginitialtableandforeignkeydesignsencountered,whenusingarelationaldatastore.Horizontalscalingoncommodityhardware:TheproductionsitecanmanageseveralTBinasingletable(collection),w/obeinglimitedbyaddingnewfieldsorbeinglimitedbygrowth.Shardingenableslinearandscaleoutgrowthw/orunningoutofbudget.RapidreplicaSetenablesmeetingregulationwitheasytosetupmultidatacenterDRPandHAsolution.Highwritethroughput:Withalargenumberinserts,MongoDBarchitectureisgreatforasystemthatmustsupporthighinsertload.YetyoucanguaranteetransactionswithfindAndModify(whichisslower)andtwophasecommit(applicationwise).MongoDBsupportsfulltextsearchindexesandrequiresnoadditionalinfrastructuretobedeployed.Developerorientedqueries,enabledeveloperswriteaelegantqueries.
Avarietyofindexesareavailable.Especiallyusefulisbuiltinlocationindex,whichcanbeutilizedtobuildgeospatialapplications.Map/Reduce:Ifyouhappentoneedthis,thereisbuiltinsupport.
Relational Database challenges Relationaldatabaseshavethefollowingbroadcategoriesofchallenges:
1. Datatypes:Theyarenotdesignedtosupportunstructured,semistructuredorpolymorphicdata.
2. VolumeofData:Whenitcomestosupportingpetabytesofdata,trillionsofrecordsand/ormillionsofqueriespersecond,RDBMSsstrugglebecauseofinherentdesignlimitation.
3. AgileDevelopment:RDBMSisillsuitedforiterativedevelopment,becauseyouwouldneedtofreezedatamodelbeforeanydevelopmentcanstart.Ifitsoundslikewaterfallmodel,itsureis.Also,shortdevelopmentcyclesandnewworkloadscanposechallengesinanyproject/productusingRDBMS.
4. NewArchitecture:AsRDBMSweredesigned30yearsago,theydonthavehorizontalscalingbuiltintothemyet.Theystruggletoscaleusingcommodityserversandcloudcomputing.
MongoDB Solution Agility MuchofthedataweusetodayhascomplexstructuresthatcanbemodeledandrepresentedmoreefficientlyusingJSON(JavaScriptObjectNotation)documents,ratherthantables.MongoDBstoresJSONdocumentsinabinaryrepresentationcalledBSON(BinaryJSON).BSONencodingextendsthepopularJSONrepresentationtoincludeadditionaldatatypessuchasint,longandfloatingpoint.
Bycontrast,tryingtomaptheobjectrepresentationofthedatatothetabularrepresentationofanRDBMSslowsdowndevelopment.AddingObjectRelationalMappers(ORMs)cancreateadditionalcomplexitybyreducingtheflexibilitytoevolveschemasandtooptimizequeriestomeetnewapplicationrequirements.
Performance MongoDBachievesbetterperformancebyusinginmemorycaching,betterdatalocalityandinplaceupdates.
Scalability MongoDBusesshardingtosupportdeploymentswithverylargedatasetsandhighthroughputoperations,whichisamethodforstoringdataacrossmultiplemachines.
High Availability Replicasetsprovidehighavailabilityusingautomaticfailover.Failoverallowsasecondarymemberstobecomeprimaryifprimaryisunavailable.Failover,inmostsituationsdoesnotrequiremanualintervention.
Detailed Explanation Wewillexpandsomeofthepointscoveredabovenext.
From rigid tables to flexible and dynamic BSON documents MuchofthedataweusetodayhascomplexstructuresthatcanbemodeledandrepresentedmoreefficientlyusingJSON(JavaScriptObjectNotation)documents,ratherthantables.MongoDBstoresJSONdocumentsinabinaryrepresentationcalledBSON(BinaryJSON).
BSONencodingextendsthepopularJSONrepresentationtoincludeadditionaldatatypessuchasint,longandfloatingpoint.Withsubdocumentsandarrays,JSONdocumentsalsoalignwiththestructureofobjectsattheapplicationlevel.Thismakesiteasyfordeveloperstomapthedatausedintheapplicationtoitsassociateddocumentinthedatabase.Bycontrast,tryingtomaptheobjectrepresentationofthedatatothetabularrepresentationofanRDBMSslowsdowndevelopment.AddingObjectRelationalMappers(ORMs)cancreateadditionalcomplexitybyreducingtheflexibilitytoevolveschemasandtooptimizequeriestomeetnewapplicationrequirements.Theprojectteamshouldstarttheschemadesignprocessbyconsideringtheapplicationsrequirements.Itshouldmodelthedatainawaythattakesadvantageofthedocumentmodelsflexibility.Inschemamigrations,itmaybeeasytomirrortherelationaldatabasesflatschematothedocumentmodel.However,thisapproachnegatestheadvantagesenabledbythedocumentmodelsrich,embeddeddatastructures.Forexample,datathatbelongstoaparentchildrelationshipintwoRDBMStableswouldcommonlybecollapsed(embedded)intoasingledocumentinMongoDB.Byusinganelegantdocumentdatabaseschemadesignandmakinguseofatomicityatdocumentlevel,JOIN'sareredundant.MostofJOINsleadtoalotofperformanceissuesinrelationaldatabases.Somemoreadvantagesofthismodelare:
Anaggregateddocumentcanbeaccessedwithasinglecalltothedatabase,ratherthanhavingtoJOINmultipletablestorespondtoaquery.TheMongoDBdocumentisphysicallystoredasasingleobject,requiringonlyasinglereadfrommemoryordisk.Ontheotherhand,RDBMSJOINsrequiremultiplereadsfrommultiplephysicallocations.
Asdocumentsareselfcontained,distributingthedatabaseacrossmultiplenodes(aprocesscalledsharding)becomessimplerandmakesitpossibletoachievemassivehorizontalscalabilityoncommodityhardware.TheDBAnolongerneedstoworryabouttheperformancepenaltyofexecutingcrossnodeJOINs(shouldtheyevenbepossibleintheexistingRDBMS)tocollectdatafromdifferenttables.
Application integration EaseofuseanddeveloperproductivityaresomeofMongoDBscoredesigngoals.OnefundamentaldifferencebetweenaSQLbasedRDBMSandMongoDBisthattheMongoDBinterfaceisimplementedasmethods(orfunctions)withintheAPIofaspecificprogramminglanguage,asopposedtoacompletelyseparatelanguagelikeSQL.This,coupledwiththeaffinitybetweenMongoDBsBSONdocumentmodelandthedatastructuresusedinobjectorientedprogramming,makesapplicationintegrationsimple.MongoDBhasidiomaticdriversforthemostpopularlanguages,includingoveradozendevelopedandsupportedbyMongoDB(e.g.,Java,Python,.NET,PERL)andmorethan30communitysupporteddrivers.
Atomicity in MongoDB Relationaldatabasestypicallyhavewelldevelopedfeaturesfordataintegrity,includingACIDtransactionsandconstraintenforcement.Rightly,usersdonotwanttosacrificedataintegrityastheymovetonewtypesofdatabases.WithMongoDB,userscanmaintainmanycapabilitiesof
relationaldatabases,eventhoughthetechnicalimplementationofthosecapabilitiesmaybedifferentwehavealreadyseenthiswithJOINs.MongoDBwriteoperationsareatomicatthedocumentlevelincludingtheabilitytoupdateembeddedarraysandsubdocumentsatomically.Byembeddingrelatedfieldswithinasingledocument,usersgetthesameintegrityguaranteesasatraditionalRDBMS,whichhastosynchronizecostlyACIDoperationsandmaintainreferentialintegrityacrossseparatetables.DocumentlevelatomicityinMongoDBensurescompleteisolationasadocumentisupdatedanyerrorscausetheoperationtorollbackandclientsreceiveaconsistentviewofthedocument.Despitethepowerofsingledocumentatomicoperations,theremaybecasesthatrequiremultidocumenttransactions.Therearemultipleapproachestothisincludingusingthefindandmodifycommandthatallowsadocumenttobeupdatedatomicallyandreturnedinthesameroundtrip.findandmodifyisapowerfulprimitiveontopofwhichuserscanbuildothermorecomplextransactionprotocols.Forexample,usersfrequentlybuildatomicsoftstatelocks,jobqueues,countersandstatemachinesthatcanhelpcoordinatemorecomplexbehaviors.Anotheralternativeentailsimplementingatwophasecommittoprovidetransactionlikesemantics.
Migration tools from relational to MongoDB Manyuserscreatetheirownscripts,whichtransformsourcedataintoahierarchicalJSONstructurethatcanbeimportedintoMongoDBusingthemongoimporttool.ExtractTransformLoad(ETL)toolsarealsocommonlyusedwhenmigratingdatafromrelationaldatabasestoMongoDB.AnumberofETLvendorsincludingInformatica,PentahoandTalendhavedevelopedMongoDBconnectorsthatenableaworkflowinwhichdataisextractedfromthesourcedatabase,transformedintothetargetMongoDBschema,stagedthenloadedintodocumentcollections.ManymigrationsinvolverunningtheexistingRDBMSinparallelwiththenewMongoDBdatabase,incrementallytransferringproductiondata:
1. AsrecordsareretrievedfromtheRDBMS,theapplicationwritesthembackouttoMongoDBintherequireddocumentschema.
2. Consistencycheckers,forexampleusingMD5checksums,canbeusedtovalidatethemigrateddata.
3. AllnewlycreatedorupdateddataiswrittentoMongoDBonly.
MongoDB MMS MongoDBManagementandMonitoringservice(MMSinshort)availableoncloudandonpremisecanbeusedtodoperformancemonitoringanddobackupandrestores.
Conclusion MongoDBistheleadingnosqldatabaseandisbeingincreasinglyusedinhybridapplicationsalongwithRDBMSslikeMySQLaswellasinsomecases,asstandalonedatabase.