32
SATURN 2018 14 th Annual SEI Architecture Technology User Network Conference MAY 7–10, 2018 | PLANO, TEXAS FamilySearch’s Family Tree Web Application Replacing Relational Database Technology and Transitioning to Cloud-Hosted Computing Randy A. Ynchausti Software Architect Email: [email protected] Twitter: @RandyYnchausti

FamilySearch’s Family Tree Web Application: Replacing ...€¦ · • Virtualized private (on premise) data center • Tens of services and hundreds of nodes • Hundreds of database

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

1FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

SATURN 201814th Annual SEI Architecture Technology User Network Conference

MAY 7–10, 2018 | PLANO, TEXAS

FamilySearch’s Family Tree Web Application

Replacing Relational Database Technology and Transitioning to Cloud-Hosted ComputingRandy A. Ynchausti

Software Architect

Email: [email protected]

Twitter: @RandyYnchausti

2FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

Outline

• FamilySearch Web Site• Family Tree Web Application• Replacing RDBMS and Cloud-Hosted Computing

Project- Scope and Motivation- Three Architecture and Design Challenges- Results

• Summary

3FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

FamilySearch Web Site

4FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

FamilySearch Web Site

5FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

FamilySearch Web Site

Notable Statistics

Statistic Value (Billion)

Searchable Names in Historical Records 6.2

Historical Record Images Online 1.25

Published Indexed Records Per Year 0.271

Web Page Views Per Day 11.2

This Photo by Unknown Author is licensed under CC BY-NC-ND

6FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

Family Tree Web Application

7FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

FamilySearch Web Site

8FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

FamilySearch Web Site

9FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

FamilySearch Web Site

10FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

FamilySearch Web Site

11FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

FamilySearch Web Site

Notable StatisticsStatistic Value

(Billion)Total Tree Persons 1.18

Tree Persons Added Per Month 0.38

Total Sources 0.915

Sources Added Per Month 0.005

Sources Attached to Tree Persons 1.28

This Photo by Unknown Author is licensed under CC BY-NC-ND

12FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

Replacing Relational DBMS and Cloud-Hosted Computing Project

13FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

Given:

• Virtualized private (on premise) data center

• Tens of services and hundreds of nodes

• Hundreds of database server licenses

• Many development teams

• Continuous delivery

• Budget events

• …

Project Scope and MotivationReplacing Relational DBMS and Cloud-Hosted Computing Project

14FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

We want:

1. Security

2. Availability

3. Scalability

4. Performance

5. Affordability

Project Scope and MotivationReplacing Relational DBMS and Cloud-Hosted Computing Project

Utility

Affordability

Performance

Availability

Security

Scalability

Data Confidentiality

Data Integrity

Consumption-Based

Capital Ownership Cost

Transaction Throughput

Data Latency

Hardware Failure

Service Failure

Horizontal Approach

Elastic

(L, L) Monthly total cost is the operating cost of the resources used by the application

(M, L) The resources the system needs to operate efficiently can change daily based on load projections and other factors

(L, L) The operating budget is nine times or more higher than the capital budget of the system

(L, M) The Family Tree database does not constrain the total number of transactions the system can process per hour

(M, H) The data the system uses to draw a nine-generation pedigree is retrieved in one second or less

(H, L) Power outage in an availability zone causes the system to redirect users to the services running in another availability zone within one minute or less from the outage event

(H, M) Redundant services in an availability zone provide %99.99 service availability

(H, H) Access-controlled data is secure 99.9999% of the time

(M, L) Additional resources can be provisioned and deployed within 10 minutes or less

(L, H) The system can automatically expand and contract its resources to accommodate fluctuating workloads.

(M, M) Adding additional resources allows the system to service a corresponding percentage increase in users and workload

(H, H) 99.9999% of all data is accurate and consistent over its entire life cycle in the system

15FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

Project Scope and MotivationReplacing Relational DBMS and Cloud-Hosted Computing Project

1 hr peak Max Capacity Linear (1 hr peak)

16FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

Project Scope and MotivationReplacing Relational DBMS and Cloud-Hosted Computing Project

17FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

Project Scope and MotivationReplacing Relational DBMS and Cloud-Hosted Computing Project

18FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

Project Scope and Motivation (Revision 2)

Project Scope and MotivationReplacing Relational DBMS and Cloud-Hosted Computing Project

Publish-Subscribe

Public API

Throttling

Conclusion Tree

Conclusion Tree

Web Client Platform API

Throttling

Tree Foundation

Web Client

Tree Data

Amazon Cloud

Tree Data

Change Message Queue

Publish-Subscribe

Message Queue

Amazon SQS

Admin Labels

Admin Labels

Postgres RDS

DMC

Extract, Transform, Load

Private Data Center

KeyProcess

Communication / Data Flow

Service

Relational Database

Responsibility Division

N-Node NoSQL Database

Batch Data Transfer

User / Consumer ApplicationContext

Queue

Diagram 1 of …

19FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

Three Architecture and Design Challenges

20FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

1) Performance differences between the existing and target technology

RDBMS• Parallel query execution• Parallel index build• Replication aborting longer running

queries

NoSQL • Operation and maintenance

Three Architecture and Design Challenges

21FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

2) Weak transactional semantics in the target NoSQL technology

Patterns that helped us form a solution• Event Source• Commutative Replicated Data Type• Command Query Resource Segregation

Three Architecture and Design Challenges

This Photo by Unknown Author is licensed under CC BY

22FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

2) Weak transactional semantics in the target NoSQL technology (continued)

Three Architecture and Design Challenges

23FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

3) Deploying and operating application services using a cloud service provider platform

Three Architecture and Design Challenges

Blueprint

Developers

Blueprint

Java

JavaScript

GitHub

Developers / Administrators

End Users

JIRA

Change Tracking

xMatters

Splunk

AppDynamics

Janitor

APICA

Domain Traffic Manager

Beanstalk

CloudFormation

Simple Systems Manager

Provisioning Service

System

Elastic Load Balancer

Service

AWS Console

24FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

Results

25FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

We achieved our schedule objective

• Production cutover was about 16 months after project launch

• Complete RDBMS transition took about six additional months

Results

26FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

We achieved our scalability objective for the Tree database

Results

Peak DB Transactions Capacity Linear (Peak DB Transactions)

Convert from R3-2XL to R4-4XL Instance Type

33 to 24 Nodes

DSE 4.8.7 to DSE 5.0.9

SSD to EBS Storage36 to 27 Nodes

27 to 33 Nodes

27FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

We achieved our performance objective for the Tree web client

Results

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

3/13

/201

6

3/27

/201

6

4/10

/201

6

4/24

/201

6

5/8/

2016

5/22

/201

6

6/5/

2016

6/19

/201

6

7/3/

2016

7/17

/201

6

7/31

/201

6

8/14

/201

6

8/28

/201

6

9/11

/201

6

9/25

/201

6

10/9

/201

6

10/2

3/20

16

Late

ncy

(Mill

isec

onds

)

Sunday

Web Client-Facing Service Latency

7:00:00 PM

2:30:00 PM

12:00:00 PM

8:00:00 AM

28FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

We achieved our affordability, scalability, and security objectives via our cloud platform approach

Results

Statistic Average

Code Check-Ins / Builds Per Day 250

Deploys to Production Per Days 161

Deploy Time in Minutes 10

Auto-scale Events Per Day 1820

29FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

Summary

30FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

The miracle occurred and it’s working!

Summary

This Photo by Unknown Author is licensed under CC BY-NC-SA

31FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

Thank you

32FamilySearch’s Family Tree Web Application© 2018 Carnegie Mellon University

SATURN 201814th Annual SEI Architecture Technology User Network Conference

MAY 7–10, 2018 | PLANO, TEXAS

FamilySearch’s Family Tree Web Application

Replacing Relational Database Technology and Transitioning to Cloud-Hosted ComputingRandy A. Ynchausti

Software Architect

Email: [email protected]

Twitter: @RandyYnchausti