Upload
ontico
View
708
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
Addressing Vendor Weaknesses in User-Space
ROBERT TREAT,OmniTI
Highload++ 2011
@robtreat2xzilla.net
+Robert Treat1
Monday, October 3, 11
Who Am I?
OMNTI - Internet Scalability ConsultantsLead Database Operations
2
Monday, October 3, 11
Who Am I?
OMNTI - Internet Scalability ConsultantsLead Database Operations
“Large Scale”
3
Monday, October 3, 11
Who Am I?
OMNTI - Internet Scalability ConsultantsLead Database Operations
“Large Scale”
High TransactionsTB+ Data
4
Monday, October 3, 11
Who Am I?
OMNTI - Internet Scalability ConsultantsLead Database Operations
“Large Scale”
High TransactionsTB+ Data
Mission Critical
5
Monday, October 3, 11
Who Am I?
Database Operations @OMNTI
PostgresMySQLOracle& More
6
Monday, October 3, 11
Postgres for Scalability
Traditional RDBMSHighly ExtensibleRuns Everywhere
Talks To Everything“BSD” Licensed
15+ Years DevelopmentOpen Development Community
7
Monday, October 3, 11
The Bloat Problem
Data Footprint Can Be Critical To Performance
8
Monday, October 3, 11
The Bloat Problem
Data Footprint Can Be Critical To Performance
Size On Disk Affects The Needs OfRAM, Disk Speed, Storage
9
Monday, October 3, 11
The Bloat Problem
Data Footprint Can Be Critical To Performance
Size On Disk Affects The Needs OfRAM, Disk Speed, Storage
“Bloat” is unused, wasted disk space, used by the database,
but not needed for actual data storage
10
Monday, October 3, 11
The Bloat Problem
Data Footprint Can Be Critical To Performance
Size On Disk Affects The Needs OfRAM, Disk Speed, Storage
“Bloat” is unused, wasted disk space, taken up by the database,
but not needed for actual data storage
Why?
11
Monday, October 3, 11
MVCC Architecture
Multiversion Concurrency Control (MVCC) allows Postgres to offer high concurrency even during significant database read/write activity. MVCC specifically offers behavior where "readers never block writers, and writers never block readers".
12
Monday, October 3, 11
MVCC Architecture
• Oracle
• MySQL (InnoDB)
• Informix
• Firebird
• MSSQL (optional)
13
Monday, October 3, 11
MVCC Architecture
• Oracle
• MySQL (InnoDB)
• Informix
• Firebird
• MSSQL (optional)
• CouchDB
14
Monday, October 3, 11
“Bloat” Manifests Differently, But Is Common
• MongoDB (deletes, some updates)
• dump/restore
• mongod --repair
• db.runCommand( { compact : 'mycollectionname' } )
• Lucene (updates)
• Hadoop / HDFS (small files)
15
Monday, October 3, 11
Postgres MVCC Architecture
• Implemented Postgres 6.5• 1999, Vadim Mikheev
• MVCC Unmasked • http://momjian.us/main/writings/pgsql/mvcc.pdf
16
Monday, October 3, 11
Postgres MVCC Architecture
• Postgres maintains global transaction counters
• Keeps track of transaction counter per row for• creating transaction• removing transaction
• Using these counters, Postgres allows different transactions to see different rows, based on visibility rules.
17
Monday, October 3, 11
Postgres MVCC Architecture
• Postgres maintains global transaction counters
• Keeps track of transaction counter per row for• creating transaction• removing transaction
• Using these counters, Postgres allows different transactions to see different rows, based on visibility rules.
18
Transaction Reading An Old RowDoesn’t Block Transaction Writing A Row
Monday, October 3, 11
MVCC Architecture
19
user_id X42Create 32 Expire
INSERT
Monday, October 3, 11
MVCC Architecture
20
user_id X42Create 32 Expire
INSERT
DELETEuser_id X42Create 32 Expire 38
Monday, October 3, 11
MVCC Architecture
21
user_id X69Create 43 Expire
user_id X69Create 43 Expire 56
OLD(delete)
NEW(insert)
UPDATE
Monday, October 3, 11
MVCC Architecture
22
user_id X69Create 43 Expire
user_id X69Create 43 Expire 56
<~~ DEAD ROW
<~~ VISIBLE ROW
Clean Up / Bloat
Monday, October 3, 11
MVCC Architecture
23
user_id X69Create 43 Expire
user_id X69Create 43 Expire 56
<~~ DEAD ROW
<~~ VISIBLE ROW
Clean Up / Bloat
Speed Up SQL Commands ByDealing With Clean Up Later
Monday, October 3, 11
How Postgres Deals With Bloat
• Heap-Only-Tuples (HOT)• On-The-Fly, Per Page Cleanup• Marks Given Row’s Space Reusable• Update Only
24
Monday, October 3, 11
How Postgres Deals With Bloat
• Heap-Only-Tuples (HOT)• On-The-Fly, Per Page Cleanup• Marks Given Row’s Space Reusable• Update Only
• VACUUM• Non-Blocking Bulk Cleanup• Removes End-Of-File Pages• “autovacuum” Process Monitors Tables
25
Monday, October 3, 11
Problems With Automatic Cleanup
• HOT• Update Only• Doesn’t Work With Changing Index Data
26
Monday, October 3, 11
Problems With Automatic Cleanup
• HOT• Update Only• Doesn’t Work When Changing Index Data
• VACUUM• Must Wait For Long Transactions To Complete• Costs I/O, Can Only Work So Fast• Can’t Remove Non End-Of-File Pages • Leaves A “High Water Mark”
27
Monday, October 3, 11
Dealing With Bloat - The Hard Way
• VACUUM FULL / CLUSTER• The Good
• Reclaims All “Dead Rows”
28
Monday, October 3, 11
Dealing With Bloat - The Hard Way
• VACUUM FULL / CLUSTER• The Good
• Reclaims All “Dead Rows”
• The Bad• Exclusive Lock• Rewrite All Data In Tables• Needs Working Space• Heavy I/O
29
Monday, October 3, 11
Monitoring Your Bloat
• check_postgres.pl• Nagios plugin• Compares physical size to row size estimates• http://bucardo.org/wiki/Check_postgres
• “bloat report”• Script to measure table/index bloat• Compares physical size to row size estimates• http://labs.omniti.com/labs/pgtreats/browser/trunk/tools/
30
Monday, October 3, 11
Dealing With Bloat In Userspace
• Solving MVCC Bloat Is A “Hard Problem”• Even a good solution would be hard to implement in core
31
Monday, October 3, 11
Dealing With Bloat In Userspace
• Solving MVCC Bloat Is A “Hard Problem”• Even a good solution would be hard to implement in core
• Can we build a tool in user space?• Develop solution quicker• Easier to deploy and maintain • Provide a prototype for future development
32
Monday, October 3, 11
Dealing With Bloat Redux
• Updating A Row Rewrites Data To New Location
33
Monday, October 3, 11
Dealing With Bloat Redux
• Updating A Row Rewrites Data To New Location
• Use Vacuum To Mark Old Rows “Reusable”
34
Monday, October 3, 11
Dealing With Bloat Redux
• Updating A Row Rewrites Data To New Location
• Use Vacuum To Mark Old Rows “Reusable”• Update Row To Rewrite Data At “Front” Of Page
35
Monday, October 3, 11
Dealing With Bloat Redux
• Updating A Row Rewrites Data To New Location
• Use Vacuum To Mark Old Rows “Reusable”• Update Row To Rewrite Data At “Front” Of Page• Use Vacuum To Reclaim Space From End Of File
36
Monday, October 3, 11
Dealing With Bloat Redux
• Updating A Row Rewrites Data To New Location
• Use Vacuum To Mark Old Rows “Reusable”• Update Row To Rewrite Data At “Front” Of Page• Use Vacuum To Reclaim Space From End Of File
• Put A Script On It• https://labs.omniti.com/pgtreats/trunk/tools/compact_table
37
Monday, October 3, 11
Dealing With Bloat Redux
• “Compact Table”• Requires Lots of Time, I/O• Often Causes Heavy Index Bloat• Heavy Concurrency Bloats Faster Than We Can Recover It
38
Monday, October 3, 11
Dealing With Bloat For Real!
• Enter “pg_reorg”
39
Monday, October 3, 11
Dealing With Bloat For Real!
• Enter “pg_reorg”• Vacuum / Cluster Replacement
40
Monday, October 3, 11
Dealing With Bloat For Real!
• Enter “pg_reorg”• Vacuum / Cluster Replacement• Command Line Tool
41
Monday, October 3, 11
Dealing With Bloat For Real!
• Enter “pg_reorg”• Vacuum / Cluster Replacement• Command Line Tool• Online Table Rewrite
• Uses Minimal Locking
42
Monday, October 3, 11
Dealing With Bloat For Real!
• Enter “pg_reorg”• Vacuum / Cluster Replacement• Command Line Tool• Online Table Rewrite
• Uses Minimal Locking• Developed By NTT
43
Monday, October 3, 11
Dealing With Bloat For Real!
• Enter “pg_reorg”• Vacuum / Cluster Replacement• Command Line Tool• Online Table Rewrite
• Uses Minimal Locking• Developed By NTT• BSD Licensed• C Code• http://pgfoundry.org/projects/reorg/
44
Monday, October 3, 11
How pg_reorg Works
45
• Create a log table for changes• Create triggers on the old table to log changes (I/U/D)• Create a new table with a copy of all data in old table• Create all indexes on the new table• Apply all changes from the log table to the new table• Modify the system catalogs information about table files• Drop old table, leaving new table in it’s place
Monday, October 3, 11
How pg_reorg Works
46
• Create a log table for changes• Create triggers on the old table to log changes• Create a new table with a copy of all data in old table• Create all indexes on the new table• Apply all changes from the log table to the new table• MODIFY THE SYSTEM CATALOGS INFORMATION ABOUT THE TABLE FILES (!!!)• Drop old table, leaving the new table in it’s place
Monday, October 3, 11
Dealing With Bloat For Real!
Open Source Code
The Power Is In Your Hands
Look At CodeExamine the SQL
(User Space Is Really Visible)
TEST!
47
Monday, October 3, 11
Dealing With Bloat For Real!
What Does Testing Look Like?
Create Some Tables, Create Artificial Bloat,
run pg_reorg
48
Monday, October 3, 11
Dealing With Bloat For Real!
What Does Testing Look Like?
Create Some Tables, Create Artificial Bloat,
run pg_reorg
WIN!
49
Monday, October 3, 11
Dealing With Bloat For Real!
Test In “Prod”
50
Monday, October 3, 11
Dealing With Bloat For Real!
Test In “Prod”
Find Some Bloated Tables,Make Backup Of Tables,
Cross Fingers, pg_reorg
51
Monday, October 3, 11
Dealing With Bloat For Real!
Test In “Prod”
Find Some Bloated Tables,Make Backup Of Tables,
Cross Fingers, pg_reorg
WIN!
52
Monday, October 3, 11
Dealing With Bloat For Real!
Eventually You Have To Use ItOn Something That Matters
53
Monday, October 3, 11
pg_reorg In The Real World
• Production Database (OLTP) • 540GB Size• 2000 TPS (off-peak time, multiple statements)• Largest Table (pre-reorg) 127GB
54
Monday, October 3, 11
pg_reorg In The Real World
• Production Database (OLTP) • 540GB Size• 2000 TPS (off-peak time, multiple statements)• Largest Table (pre-reorg) 127GB
• Rebuild Stats• 5.75 Hours To Rebuild• Reclaimed 52GB Disk Space • No outages reported for Website/API’s
55
Monday, October 3, 11
pg_reorg In The Real World
56
Monday, October 3, 11
pg_reorg In The Real World
56
Monday, October 3, 11
pg_reorg In The Real World
57
Monday, October 3, 11
pg_reorg In The Real World
57
Monday, October 3, 11
pg_reorg In The Real World
57
Monday, October 3, 11
pg_reorg In The Real World
YAY!
58
Monday, October 3, 11
Return Of The Jedi
59
Monday, October 3, 11
“your overconfidence is your weakness.”
-Luke Skywalker
60
Monday, October 3, 11
“your faith in your friends is yours.”
-Emperor Palpatine
61
Monday, October 3, 11
Sometimes You Can Have Both
Trust in NTT’s Code == faith in friends
Success in production == overconfidence
62
Monday, October 3, 11
When Good pg_reorgs Go Bad!
WARNING: unexpected attrdef record found for attr 61 of rel orders
WARNING: 1 attrdef record(s) missing for rel orders
63
Monday, October 3, 11
When Good pg_reorgs Go Bad!
WARNING: unexpected attrdef record found for attr 61 of rel orders
WARNING: 1 attrdef record(s) missing for rel orders
64
Yes, On A Production SystemYes, Trying To Take 1000’s of Orders Per Second
Monday, October 3, 11
When Good pg_reorgs Go Bad!
create table test ( a int4, b int4 default 2112, c bool);
65
Monday, October 3, 11
When Good pg_reorgs Go Bad!
create table test ( a int4, b int4 default 2112, c bool);
Postgres internals track defaults / constraints based on column position “2”, not column name “b”
66
Monday, October 3, 11
When Good pg_reorgs Go Bad!
create table test ( a int4, b int4 default 2112, c bool);
Postgres internals track defaults / constraints based on column position “2”, not column name “b”
If you drop column “a” and then do pg_reorg, column “c” is now column “2”, and default 2112 is on boolean
67
Monday, October 3, 11
When Good pg_reorgs Go Bad!
create table test ( a int4, b int4 default 2112, c bool);
Postgres internals track defaults / constraints based on column position “2”, not column name “b”
If you drop column “a” and then do pg_reorg, column “c” is now column “2”, and default 2112 is on boolean
This Is Fair - pg_reorg hacks the system tables68
Monday, October 3, 11
When Good pg_reorgs Go Bad!
69
Basic Fix: Drop All Defaults And Recreate
Monday, October 3, 11
When Good pg_reorgs Go Bad!
70
Basic Fix: Drop All Defaults And Recreate
Alternative Fix: Hack System Catalogs Some More
Monday, October 3, 11
When Good pg_reorgs Go Bad!
71
Basic Fix: Drop All Defaults And Recreate
Alternative Fix: Hack System Catalogs Some More
Haven’t we had enough system catalog hacking
for now?
Monday, October 3, 11
When Good pg_reorgs Go Bad!
72
“now, if you'll excuse me, I'll go away and have a heart attack.”
Monday, October 3, 11
What Next?
Report Problem To Mailing ListSubmit A Patch
Ultimately The Problem Is FixedEveryone’s Happy?
73
Monday, October 3, 11
Hackers Discussion
Postgres Development Community Is Funny
Sometimes Hard To Get Them To Recognize Problems
Not Everyone See Online Rebuild As A Big Problem
74
Monday, October 3, 11
Hackers Discussion
Postgres Development Community Is Funny
Sometimes Hard To Get Them To Recognize Problems
Not Everyone See Online Rebuild As A Big Problem
In All The Fairness, Not Everyone Has This Problem
75
Monday, October 3, 11
Hackers Discussion
Hackers Meeting 2011, Discussion On Internal Queuing System
Could Be Used As Underlying Basis For On-Line Rebuilding
Until Then...
76
Monday, October 3, 11
pg_reorg Is A Great Tool!Best Option For Difficult Situation
Just Be Careful!
77
Monday, October 3, 11
Highload++NTT
OmniTIPostgres Community
Momjian, Depesz, Patel, Kocoloski
xzilla.net@robtreat2
+ Robert Treat
78
THANKS!
Monday, October 3, 11