Upload
mark-ginnebaugh
View
3.604
Download
3
Tags:
Embed Size (px)
DESCRIPTION
MySpace Chief Data Architect Christa Stelzmuller slides from her talk to the Silicon Valley SQL Server User Group in June 2009. Read about it on the Ginneblog: http://bit.ly/YLzle
Citation preview
THE MYSPACE DATA ARCHITECTURE: SCALING FOR RAPID AND SUSTAINABLE GROWTH
SPEAKER: CHRISTA STELZMULLERMYSPACE CHIEF DATA ARCHITECT
SILICON VALLEY SQL SERVER USER GROUP JUNE 2009 MARK GINNEBAUGH, USER GROUP LEADER
http://www.meetup.com/The-SiliconValley-SQL-Server-User-Group/
Christa Christa StelzmullerStelzmuller
� Chief Data Architect at MySpace since Oct 2006
� Formerly at Yahoo!
� Engineering Manager
� Data Architect for the Yahoo! Music Team� Data Architect for the Yahoo! Music Team
� Specializes in very large databases with high volumes of transactions
� Tonight’s Topic: The MySpace Data Architecture: Scaling for
Rapid and Sustainable Growth
Data Services OrganizationData Services Organization
� Operations
� Storage
� Database
� Development� Development
� Database
� Search
� ETL & Infrastructure
� Warehousing
� Mining
High Level ArchitectureHigh Level Architecture
Scaling the Database TierScaling the Database Tier
� Scale out, not up
� Functional separation
� Horizontal partitioning within functions
� Design Principles� Design Principles
� Decoupled and isolated
� Flexibility and predictability in scaling according to usage
� Distributed transaction load
� Improved administration
Functional SeparationFunctional Separation
� Logical Segments
� Profiles
� Core user generated data
� User relationships to features
Mail� Mail
� User-to-user communication data
� Features
� Content specific or feature specific, not user specific
� Search & Browse
� Read only
� Redundant denormalized stores
Functional SeparationFunctional Separation
� Infrastructure Segments
� Security
� Signup & Login
� Spam fighting
Shared� Shared
� Globally queryable core user data
� SSIS & Dispatcher
� Database-to-database communication (ETL)
� Messaging based (dispatcher)
� Package based (SSIS)
� Distribution
� Replication
Horizontal PartitioningHorizontal Partitioning
� Inter-database Partitioning Approaches
� Divide by primary access pattern (key based)
� Range based schemes
� Modulo based schemes
Write Master/Read Slave� Write Master/Read Slave
� Dedicated write master with replicated read slaves
� Dedicated write master with non-replicated slaves
� Disparate masters with non-replicated slaves
� Intra-database Partitioning Approaches
� Vertical table partitioning
� More horizontal table partitioning!
How distributed are we?How distributed are we?
� Logical Segments
� Profiles: 487 databases and growing 1 every 3 days
� Mail: 487 databases and growing 1 every 3 days
� Search & Browse: 24 databases and stable
� Features: 88 databases and growing 2 every month
� Infrastructure Segments
� Security: 6 databases and stable
� Shared: 8 databases and stable
� SSIS & Dispatcher: 30 databases and stable
� Distribution: 5 databases and stable
Challenges with Scaling OutChallenges with Scaling Out
� Data Integrity
� Service Broker/Dispatcher
� Tier Hopper
� Read/Write Volatility� Read/Write Volatility
� Prepopulator
� Transaction Manager
� Targeted Persistent Cache Implementations
� Administering all those servers
� Self-tuning intelligent systems
Service DispatcherService Dispatcher
� Service Broker
� Enabled asynchronous transactions intra- and inter-database
� Only allows for unicast messaging, requiring a physical route between each service and databasebetween each service and database
� Solution was to extend SB’s functionality
� Centralizes route management from individual databases by utilizing custom gateways
� Enables multicast messaging
� Abstracts complex SB components for rapid development
Service DispatcherService Dispatcher
Tier HopperTier Hopper
� Problem
� Database initiated changes needed to be synchronized with cache
� Database initiated events needed to be exchanged with non-DB systemsnon-DB systems
� Solution was to build a service to meet these needs
� Service Broker, SQL-CLR, and Windows Service
� Completely asynchronous
� Currently centralized
Tier HopperTier Hopper
Prepopulator
� Problem
� Web server brokered updates of cache from the databases put unnecessary pressure on databases for relatively static objects
� Multi-directional data flows are subject to race conditions � Multi-directional data flows are subject to race conditions which put extra pressure on the database to resolve
� Solution was to build a “pump” to feed cache
� Decoupled, pull-based
� Expensive transformation business logic is hosted here instead of the databases
� Manages complex joining of data to build objects
Transaction Manager
� Problem
� Web server initiated writes had no resiliency to outages
� No atomicity of transactions that crossed different databases or disparate data stores
� Solution was to move write handling from web servers � Solution was to move write handling from web servers to a different tier
� Asynchronous, persistent queue backed writes
� Supports DR multi-data center scenarios
� Supports writes to multiple storage platforms
� Supports business logic work items for extending logic within the transaction
Evolution of Reads/Writes
Volatile, Less Resilient
Persistent, Resilient
SelfSelf--tuning Systemstuning Systems
� History of Major Problems
� CPU spikes
� Excessive IO consumption
� Causes� Causes
� Fragmentation
� Outdated statistics
� Solution was to create a process that addressed fragmentation and statistics in a controlled fashion
SelfSelf--tuning Systemstuning Systems
� Data collection
� Every fifteen minutes performance data is captured from all the servers and aggregated in a data warehouse
� Baselines are established for each farm and for each server
� Auto-Response
� Top ten worst offenders
� Fix CPU
SelfSelf--tuning Systemstuning Systems
� Index defragmentation
� Nightly reorganizing or reindexing of fragmented objects
� Intelligent and limited updates based on object analysis
� Statistics Updates� Statistics Updates
� Nightly updates of statistics based on a row modification of 15%
� Prioritizes most modified first
� Includes internal system tables
� Recompiles dependent procedures
Database EcosystemDatabase Ecosystem
Other ChallengesOther Challenges
� Managing Growth
� Data growth (datafile vs. database)
� Transaction Log
� Balancing IO� Balancing IO
� SAN hot spots
� Evenly distribute reads and writes
Backups & Disaster RecoveryBackups & Disaster Recovery
� Multi-Tier Backups
� Daily snaps on production Inservs, retention 3 days
� Remote Copy between Production & Near Line
� Production data replicated to Near Line Inservs daily� Production data replicated to Near Line Inservs daily� Daily snaps on Near Line Inservs, retention 5 days
� Snap Verify
� Multi-Tier DR� Hot - transactions replicated
� Warm - block level replication
� Cold - Snaps
Database & Storage StatsDatabase & Storage Stats
Volume, Server, DB StatsTotal Volumes 2989Total Servers 669
Total Databases 1512Total Database Files 17715
Production Near LineTotal Space (TB) 2331.94 1745.64
Total Used Space (TB) 1333.3 904.99Total Free Space (TB) 998.66 839.28
Production Near LineTotal Disks 15120 2560
Database & Storage StatsDatabase & Storage Stats
Average AverageMySpace DB Connections/Server Requests/sec/Server
Profile 6,800 1,100
Mail 4,400 775
Shared 2,000 1,600Shared 2,000 1,600
Features 800 400
Security 4,800 3,700
Search 300 500
Browse 80 500
Dispatcher 6 1200
Database & Storage StatsDatabase & Storage Stats
� 6 GB/s data transfer rate
� 70% Writes and 30% Reads
� 600,000 to 750,000 IOps across all frames
� 170 Mb/s data replication over IP from production to � 170 Mb/s data replication over IP from production to backup (40-45 TB sync per day)
� 10 Brocade 48k Director switches with 256 Ports per switch (2560 total ports)
� 8 Brocade 7500 FCIP switches with 16 ports per switch (128 total ports and 16 1GE ports)
Upcoming MeetingsUpcoming MeetingsSilicon Valley SQL Server User GroupSilicon Valley SQL Server User Group
� July 21, 2009
�Peter Myers Solid Quality Mentors
�Taking Your Application Design to the Next Level with Data MiningNext Level with Data Mining
� August 18, 2009
�Elizabeth Diamond, DesignMind
�Architecting a Data Warehouse: A Case Study
www.bayareasql.org
Join our LinkedIn Group Join our LinkedIn Group
� Name of Group: Silicon Valley SQL Server User Group
� Purpose:
� Networking� Networking
� SQL Server News and discussions
� Meeting announcements /availability of slide decks
� Job posts and search
� Join here: � http://www.linkedin.com/groupInvitation?gid=1774133&sharedKey=6697B472F26D
To learn more or inquire about speaking opportunities, please contact:
Mark Ginnebaugh, User Group Leader [email protected]