26
Inside the PostgreS QL Project Infrastructure S lide: 1 Presentation Title Presentation Sub-Title Dave Page, 25 th March 2010 Inside the PostgreSQL Project Infrastructure Dave Page PostgreS QL Core Team S enior S oftware A rchitect, E nterpris eD B

Inside the PostgreSQL Project Infrastructure

Embed Size (px)

DESCRIPTION

Dave PageThis talk will give an insight into the infrastructure behind the PostgreSQL project - what servers do we have and what do they all do? How do we monitor and manage them? How does the website cope with the Slashdot effect on release days? We will also look at how things are likely to change as the Sysadmin and Web teams work to modernize and improve our infrastructure for the future.

Citation preview

Page 1: Inside the PostgreSQL Project Infrastructure

Inside the PostgreS QL Project Infrastructure S lide: 1

Presentation TitlePresentation Sub-Title

Dave Page, 25th March 2010

Ins ide the Pos tg reS QL Projec t Infras truc ture

Dave PagePostgreS QL Core Team

S enior S oftware Architect, EnterpriseDB

Page 2: Inside the PostgreSQL Project Infrastructure

S lide: 2Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

In the beginning...

Date: Tue, 23 Apr 1996 16:06:10 -0400 (EDT) From: "Marc G. Fournier" <[email protected]> Subject: Re: [PG95]: postgres95 TODO list posted on the web To: Chad Robinson <[email protected]> cc: Jolly Chen <[email protected]>, [email protected]

...

If it helps, I’d be willing to setup a cvs database, including appropriate accounts for a core few developers that patches can go through.

From there, it wouldn’t be too hard to do a weekly "distribution" that is ftpable.

I don’t know enough about the server backend to offer much more then that :(\

Page 3: Inside the PostgreSQL Project Infrastructure

S lide: 3Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

The first server

Hosted in Toronto, Canada

Page 4: Inside the PostgreSQL Project Infrastructure

S lide: 4Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

Early services

• Mailing lists

• CVS repository

• FTP site

Page 5: Inside the PostgreSQL Project Infrastructure

S lide: 5Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

14 Years later...

• 20+ Physical servers

• 35+ Virtual Machines

• Hosted in:– France– Panama C ity– Austria– Canada– US A (4 independent locations)

Page 6: Inside the PostgreSQL Project Infrastructure

S lide: 6Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

Current services

• FTP site• Website• S ource control – CVS and GIT• Mailing lists• Wiki• Mailing list archives• Website/archives search• pgFoundry• Commitfest management server• Buildfarm and Hudson servers• Development servers… and more!

Page 7: Inside the PostgreSQL Project Infrastructure

S lide: 7Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

OS “zoo”

• Primarily using FreeBS D jails:– Easy to backup– Easy to relocate– Per-function jails

• Also running:– Ubuntu– S lackware– CentOS– Windows

Page 8: Inside the PostgreSQL Project Infrastructure

S lide: 8Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

Problems

• Little FreeBS D experience in the community.

• FreeBS D Ports are hard to upgrade, especially with lots of jails.

• Hosting companies don't tend to like FreeBS D – and we can't be too picky!

• No centralised management or deployment.

Page 9: Inside the PostgreSQL Project Infrastructure

S lide: 9Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

New infrastructure

• Runs Debian virtual machines under KVM on Debian hosts.

• Pre-built packages setup base hosts and VMs.

• Management system automates:– VM creation and configuration.– Addition and removal of user accounts and S S H keys.– Package installation and upgrades.– Detection of unexpected user accounts or unauthorised services.– S etup and configuration of Nagios and Munin monitoring.– Auto-backup configuration

Page 10: Inside the PostgreSQL Project Infrastructure

S lide: 10Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

Monitoring

• Nagios

• Munin

• S mokeping

• Auto-backup

• Google Analytics

Page 11: Inside the PostgreSQL Project Infrastructure

S lide: 11Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

Nagios

• 64 Hosts

• 514 S ervices

• S ervice checks include:– S ervice availability – NTP, S S H, HTTP, FTP, RS YNC etc.– Utilisation – disk usage, logged in users, processes, mail queue– Management – software update availability– “Our stuff” - buildfarm status, search indexer, database backups

Page 12: Inside the PostgreSQL Project Infrastructure

S lide: 12Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

Nagios

Page 13: Inside the PostgreSQL Project Infrastructure

S lide: 13Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

Munin

• Monitors resource trends:– Disk usage– Network utilisation– Processes– S endmail/Postfix stats– CPU/Memory utilisation– Apache stats

Page 14: Inside the PostgreSQL Project Infrastructure

S lide: 14Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

Munin

• CPU usage – postgresql01.managed.contegix.com

Page 15: Inside the PostgreSQL Project Infrastructure

S lide: 15Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

S mokeping

• Monitors network latency to various hosts from the Conova Communications data centre in S alzburg, Austria

Page 16: Inside the PostgreSQL Project Infrastructure

S lide: 16Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

Auto-backup

• Automatically backs up changes to key configuration files to S ubversion.

– Gives us a simple backup of config files– Allows us to trace the history of changes to a file

• Alerts the sysadmins to changes to monitored files

– Helps us see what the other team members are doing– Acts as a simple Intrusion Detection System

Page 17: Inside the PostgreSQL Project Infrastructure

S lide: 17Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

Google Analytics

• Monitors website utilisation.

• Helps us understand how the website is used.

• Can be hampered by disabled scripting support in browsers, common with computer geeks!

Page 18: Inside the PostgreSQL Project Infrastructure

S lide: 18Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

Google Analytics

Page 19: Inside the PostgreSQL Project Infrastructure

S lide: 19Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

FTP S ite

• Primary site: ftp.postgresql.org

• 62 regional mirrors in 39 countries

• Mirrors may also serve content via:– HTTP (supported)– RS YNC (unsupported)

• Content includes main FTP site, and pgFoundry downloads

Page 20: Inside the PostgreSQL Project Infrastructure

S lide: 20Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

FTP Mirror monitoring

• All mirrors are checked daily by the 'mirrorbot'

• The mirrorbot checks that content is up to date:

– Fresh mirrors have a DNS hostname, e.g. ftp.uk.postgresql.org– Fresh mirrors are listed on the website for users to choose

• Out of date or broken mirrors:

– Are automatically removed from the website and DNS .– Are reported to their maintainers via email.– Are automatically purged from the system if un-fixed.

Page 21: Inside the PostgreSQL Project Infrastructure

S lide: 21Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

Website infrastructure

• Developed following the great 8.0 S lashdotting incident.

• Capable of handling high-load scenarios on release days..

• Minimised points of failure.

Page 22: Inside the PostgreSQL Project Infrastructure

S lide: 22Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

wwwmaster.postgresql.org

• Dynamic, master server.

• Runs custom-built PHP framework for:– S tatic page rendering (general content)– Dynamic page rendering (docs, news, events etc)– Form processing

• Dynamic content stored in PostgreS QL.

• S tatic version of content generated hourly by a spider, and pushed via RS YNC to the static servers.

• Users redirected back to static servers where possible.

Page 23: Inside the PostgreSQL Project Infrastructure

S lide: 23Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

www.postgresql.org

• S tatic, slave servers

• S erve HTML, CS S , images and files such as PDFs.

• Currently 2 servers.

• Geographically diverse.

• Round-robin load balanced via DNS .

• Monitoring system dynamically removes servers from DNS within minutes of a failure.

Page 24: Inside the PostgreSQL Project Infrastructure

S lide: 24Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

Website problems

• PHP framework is complex, and understood by few.

• Adding new dynamic content can require significant effort to build administration pages.

• The framework includes lots of features and code we thought we needed, but then never used.

• S pider can take hours to process the entire site.

• S pidering the site is very inefficient.

Page 25: Inside the PostgreSQL Project Infrastructure

S lide: 25Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

New website infrastructure

• S limmed down and vastly simplified framework, built using Django and Python.

• Django's administration module makes it easy to add and manage content.

• S pider and static slaves will be replaced with Varnish cache servers:– Pages dynamically cached from wwwmaster on first request.– Last available content served if wwwmaster goes down.– Cache invalidation of individual or groups of pages as changed

on wwwmaster.

Page 26: Inside the PostgreSQL Project Infrastructure

S lide: 26Dave Page, 25th March 2010 Inside the PostgreS QL Project Infrastructure

Questions?

Thank you.