If you can't read please download the document
Upload
jim-mlodgenski
View
4.051
Download
0
Embed Size (px)
Citation preview
Bright Blue
Federated PostgreSQL
Who Am I?
@jim_mlodgenski
Co-organizer ofNYC PUG (www.nycpug.org)
Philly PUG (www.phlpug.org)
CTO, OpenSCGwww.openscg.com
http://nyc.pgconf.us
What is a federated database?
A federated database system is a type of meta-database management system (DBMS), which transparently maps multiple autonomous database systems into a single federated database. The constituent databases are interconnected via a computer network and may be geographically decentralized. ... There is no actual data integration in the constituent disparate databases as a result of data federation.-Wikipedia
How does PostgreSQL do it?
Uses Foreign Table Wrappers (FDW)
Used with SQL/MEDNew ANIS SQL 2003 Extension
Management of External Data
Standard way of handling remote objects in SQL databases
Wrappers used by SQL/MED to access remotes data sources
Types of Foreign Data Wrappers
SQL
NoSQL
File
Miscellaneous
PostgreSQL
SQL Wrappers
Oracle
MySQL
Informix
Firebird
SQLite
JDBC
ODBC
SQL Wrappers
CREATE SERVER oracle_server FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver 'ORACLE_DBNAME');
CREATE USER MAPPING FOR CURRENT_USER SERVER oracle_server OPTIONS (user 'scott', password 'tiger');
CREATE FOREIGN TABLE fdw_test ( userid numeric, username text, email text ) SERVER oracle_serverOPTIONS ( schema 'scott', table 'fdw_test');
postgres=# select * from fdw_test; userid | username | email --------+----------+------------------- 1 | scott | [email protected](1 row)
NoSQL Wrappers
MongoDB
CouchDB
MonetDB
Redis
Neo4j
Tycoon
NoSQL Wrappers
CREATE SERVER mongo_server FOREIGN DATA WRAPPER mongo_fdw OPTIONS (address '192.168.122.47', port '27017');
CREATE FOREIGN TABLE databases ( _id NAME, name TEXT )SERVER mongo_serverOPTIONS (database 'mydb', collection 'pgData');
test=# select * from databases ; _id | name --------------------------+------------ 52fd49bfba3ae4ea54afc459 | mongo 52fd49bfba3ae4ea54afc45a | postgresql 52fd49bfba3ae4ea54afc45b | oracle 52fd49bfba3ae4ea54afc45c | mysql 52fd49bfba3ae4ea54afc45d | redis 52fd49bfba3ae4ea54afc45e | db2(6 rows)
File Wrappers
Delimited files
Fixed length files
JSON files
File Wrappers
CREATE SERVER pg_load FOREIGN DATA WRAPPER file_fdw;
CREATE FOREIGN TABLE leads ( first_name text, last_name text, company_name text, address text, city text, county text, state text, zip text, phone1 text, phone2 text, email text, web text) SERVER pg_loadOPTIONS ( filename '/tmp/us-500.csv', format 'csv', header 'TRUE' );
test=# select first_name || ' ' || last_name as full_name, email from leads limit 3; full_name | email -------------------+------------------------------- James Butt | [email protected] Josephine Darakjy | [email protected] Art Venere | [email protected](3 rows)
Miscellaneous Wrappers
Hadoop
LDAP
S3
WWW
PG-Strom
Hadoop Wrapper
CREATE SERVER hive_server FOREIGN DATA WRAPPER hive_fdw OPTIONS (address '127.0.0.1', port '10000');
CREATE USER MAPPING FOR PUBLIC SERVER hive_server;
CREATE FOREIGN TABLE order_line ( ol_w_id integer, ol_d_id integer, ol_o_id integer, ol_number integer, ol_i_id integer, ol_delivery_d timestamp, ol_amount decimal(6,2), ol_supply_w_id integer, ol_quantity decimal(2,0), ol_dist_info varchar(24)) SERVER hive_server OPTIONS (table 'order_line');
INSERT INTO item_sale_month SELECT ol_i_id as i_id, EXTRACT(YEAR FROM ol_delivery_d) as year, EXTRACT(MONTH FROM ol_delivery_d) as month, sum(ol_amount) as amount FROM order_line GROUP BY 1, 2, 3;
Hadoop Wrapper
Hadoop foreign tables can also be writable
CREATE FORIEGN TABLE audit ( audit_id bigint, event_d timestamp, table varchar, action varchar, user varchar,) SERVER hive_server OPTIONS (table 'audit', flume_port '44444');
INSERT INTO audit VALUES (nextval('audit_id_seq'), now(), 'users', 'SELECT', 'scott');
Hadoop Wrapper
It also works with HBase tables
CREATE FOREIGN TABLE hive_hbase_table ( key varchar, value varchar) SERVER localhive OPTIONS (table 'hbase_table', hbase_address 'localhost', hbase_port '9090', hbase_mapping ':key,cf:val');
INSERT INTO hive_hbase_table VALUES ('key1', 'value1');INSERT INTO hive_hbase_table VALUES ('key2', 'value2');UPDATE hive_hbase_table SET value = 'update' WHERE key = 'key2';DELETE FROM hive_hbase_table WHERE key='key1';SELECT * from hive_hbase_table;
WWW Wrapper
CREATE SERVER www_fdw_server_google_search FOREIGN DATA WRAPPER www_fdw OPTIONS (uri 'https://ajax.googleapis.com/ajax/services/search/web?v=1.0');
CREATE USER MAPPING FOR current_user SERVER www_fdw_server_google_search;
CREATE FOREIGN TABLE www_fdw_google_search ( q text, GsearchResultClass text, unescapedUrl text, url text, visibleUrl text, cacheUrl text, title text, titleNoFormatting text, content text) SERVER www_fdw_server_google_search;
select url,substring(title,1,25)||'...',substring(content,1,25)||'...' from www_fdw_google_search where q='postgresql fdw'; url | ?column? | ?column? -------------------------------------------------------------+------------------------------+------------------------------ http://wiki.postgresql.org/wiki/Foreign_data_wrappers | Foreign data wrappers - '2011-01-01'; QUERY PLAN ------------------------------------------------------------------ Foreign Scan on public.bird_strikes (cost=100.00..134.54 rows=427 width=40) Output: airport, flight_date Remote SQL: SELECT airport, flight_date FROM public.bird_strikes WHERE ((flight_date > '2011-01-01 00:00:00'::timestamp without time zone))(3 rows)
PostgreSQL Wrapper
Sends built-in immutable functions
test=# explain verbose select airport, flight_date from bird_strikes where flight_date > '2011-01-01' and length(airport) < 10; QUERY PLAN ------------------------------------------------------------------------------- Foreign Scan on public.bird_strikes (cost=100.00..135.24 rows=142 width=40) Output: airport, flight_date Remote SQL: SELECT airport, flight_date FROM public.bird_strikes WHERE ((flight_date > '2011-01-01 00:00:00'::timestamp without time zone)) AND ((length(airport) < 10))(3 rows)
PostgreSQL Wrapper
Writable (INSERT, UPDATE, DELETE)
test=# explain verbose update bird_strikes set airport = 'Unknown' where record_id = 313339; QUERY PLAN ------------------------------------------------------------------------------- Update on public.bird_strikes (cost=100.00..111.05 rows=1 width=964) Remote SQL: UPDATE public.bird_strikes SET airport = $2 WHERE ctid = $1 -> Foreign Scan on public.bird_strikes (cost=100.00..111.05 rows=1 width=964) Output: aircraft_type, 'Unknown'::character varying, altitude, aircraft_model, num_wildlife_struck, impact_to_flight, effect, location, flight_num, flight_date, record_id, indicated_damage, freeform_en_route, num_engines, airline, origin_state, phase_of_flight, precipitation, wildlife_collected, wildlife_sent_to_smithsonian, remarks, reported_date, wildlife_size, sky_conditions, wildlife_species, when_time_hhmm, time_of_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport, feet_above_ground, num_human_fatalities, num_injured, speed_knots, ctid Remote SQL: SELECT aircraft_type, altitude, aircraft_model, num_wildlife_struck, impact_to_flight, effect, location, flight_num, flight_date, record_id, indicated_damage, freeform_en_route, num_engines, airline, origin_state, phase_of_flight, precipitation, wildlife_collected, wildlife_sent_to_smithsonian, remarks, reported_date, wildlife_size, sky_conditions, wildlife_species, when_time_hhmm, time_of_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport, feet_above_ground, num_human_fatalities, num_injured, speed_knots, ctid FROM public.bird_strikes WHERE ((record_id = 313339)) FOR UPDATE(5 rows)
PostgreSQL Wrapper
Writes are transactional
test=# select airport from bird_strikes where record_id = 313339; airport --------- Unknown(1 row)
test=# BEGIN;BEGINtest=# update bird_strikes set airport = 'UNKNOWN' where record_id = 313339;UPDATE 1test=# ROLLBACK;ROLLBACKtest=# select airport from bird_strikes where record_id = 313339; airport --------- Unknown(1 row)
Limitations
Aggregates are not pushed down
test=# explain verbose select count(*) from bird_strikes; QUERY PLAN --------------------------------------------------------------------------------------------------------- Aggregate (cost=220.92..220.93 rows=1 width=0) Output: count(*) -> Foreign Scan on public.bird_strikes (cost=100.00..212.39 rows=3413 width=0) Output: aircraft_type, airport, altitude, aircraft_model, num_wildlife_struck, impact_to_flight, effect, location, flight_num, flight_date, record_id, indicated_damage, freeform_en_route, num_engines, airline, origin_state, phase_of_flight, precipitation, wildlife_collected, wildlife_sent_to_smithsonian, remarks, reported_date, wildlife_size, sky_conditions, wildlife_species, when_time_hhmm, time_of_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport, feet_above_ground, num_human_fatalities, num_injured, speed_knots Remote SQL: SELECT NULL FROM public.bird_strikes(5 rows)
Limitations
ORDER BY, GROUP BY, LIMIT not pushed down
test=# explain verbose select flight_num from bird_strikes order by flight_date limit 5; QUERY PLAN ------------------------------------------------------------------------------------------- Limit (cost=169.66..169.67 rows=5 width=40) Output: flight_num, flight_date -> Sort (cost=169.66..172.86 rows=1280 width=40) Output: flight_num, flight_date Sort Key: bird_strikes.flight_date -> Foreign Scan on public.bird_strikes (cost=100.00..148.40 rows=1280 width=40) Output: flight_num, flight_date Remote SQL: SELECT flight_num, flight_date FROM public.bird_strikes(8 rows)
Limitations
Joins not pushed down
test=# explain verbose select s.name, b.flight_date test-# from bird_strikes b, state_code s test-# where b.location = s.abbreviation and flight_date > '2011-01-01';
QUERY PLAN -------------------------------------------------------------------------------
Hash Join (cost=239.88..349.95 rows=1986 width=40) Output: s.name, b.flight_date Hash Cond: ((s.abbreviation)::text = (b.location)::text) -> Foreign Scan on public.state_code s (cost=100.00..137.90 rows=930 width=64) Output: s.id, s.name, s.abbreviation, s.country, s.type, s.sort, s.status, s.occupied, s.notes, s.fips_state, s.assoc_press, s.standard_federal_region, s.census_region, s.census_region_name, s.census_division, s.census_devision_name, s.circuit_court Remote SQL: SELECT name, abbreviation FROM public.state_code -> Hash (cost=134.54..134.54 rows=427 width=40) Output: b.flight_date, b.location -> Foreign Scan on public.bird_strikes b (cost=100.00..134.54 rows=427 width=40) Output: b.flight_date, b.location Remote SQL: SELECT location, flight_date FROM public.bird_strikes WHERE ((flight_date > '2011-01-01 00:00:00'::timestamp without time zone))(11 rows)
Limitations (Gotcha)
Sometimes the foreign tables don't act like tables
test=# SELECT l.*, w.lat, w.lng FROM leads l, www_fdw_geocoder_google w WHERE w.address = l.address || ',' || l.city || ',' || l.state;
first_name | last_name | company_name | address | city | county | state | zip | phone1 | phone2 | email | web | lat | lng ------------+-----------+--------------+---------+------+--------+-------+-----+--------+--------+-------+-----+-----+-----(0 rows)
Limitations (Gotcha)
QUERY PLAN ------------------------------------------------------------------------------------------- Merge Join (cost=187.47..215.47 rows=1000 width=448) Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email, l.web, w.lat, w.lng Merge Cond: ((((((l.address || ','::text) || l.city) || ','::text) || l.state)) = w.address) -> Sort (cost=37.64..38.14 rows=200 width=384) Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email, l.web, (((((l.address || ','::text) || l.city) || ','::text) || l.state)) Sort Key: (((((l.address || ','::text) || l.city) || ','::text) || l.state)) -> Foreign Scan on public.leads l (cost=0.00..30.00 rows=200 width=384) Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email, l.web, ((((l.address || ','::text) || l.city) || ','::text) || l.state) Foreign File: /tmp/us-500.csv Foreign File Size: 81485 -> Sort (cost=149.83..152.33 rows=1000 width=96) Output: w.lat, w.lng, w.address Sort Key: w.address -> Foreign Scan on public.www_fdw_geocoder_google w (cost=0.00..100.00 rows=1000 width=96) Output: w.lat, w.lng, w.address WWW API: Request(16 rows)
Limitations (Gotcha)
CREATE OR REPLACE FUNCTION google_geocode( OUT first_name text, OUT last_name text, OUT company_name text, OUT address text, OUT city text, OUT county text, OUT state text, OUT zip text, OUT phone1 text, OUT phone2 text, OUT email text, OUT web text, OUT lat text, OUT lng text) RETURNS SETOF RECORD AS $$DECLARE r record; f_adr text; l_lat text; l_lng text;BEGIN FOR r IN SELECT * FROM leads LOOP f_adr := r.address || ',' || r.city || ',' || r.state;
EXECUTE 'SELECT lat, lng FROM www_fdw_geocoder_google WHERE address = $1' INTO l_lat, l_lng USING f_adr;
SELECT r.first_name, r.last_name, r.company_name, r.address, r.city, r.county, r.state, r.zip, r.phone1, r.phone2, r.email, r.web, l_lat, l_lng INTO first_name, last_name, company_name, address, city, county, state, zip, phone1, phone2, email, web, lat, lng; RETURN NEXT; END LOOP;END $$ LANGUAGE plpgsql;
Writing a new FDW
Might not need to write one if there is a http interface
Use the Blackhole as a templatehttps://bitbucket.org/adunstan/blackhole_fdw
Writing a new FDW
Datum blackhole_fdw_handler(PG_FUNCTION_ARGS){.../* these are required */fdwroutine->GetForeignRelSize = blackholeGetForeignRelSize;fdwroutine->GetForeignPaths = blackholeGetForeignPaths;fdwroutine->GetForeignPlan = blackholeGetForeignPlan;fdwroutine->BeginForeignScan = blackholeBeginForeignScan;fdwroutine->IterateForeignScan = blackholeIterateForeignScan;fdwroutine->ReScanForeignScan = blackholeReScanForeignScan;fdwroutine->EndForeignScan = blackholeEndForeignScan;
/* remainder are optional - use NULL if not required *//* support for insert / update / delete */fdwroutine->AddForeignUpdateTargets = blackholeAddForeignUpdateTargets;fdwroutine->PlanForeignModify = blackholePlanForeignModify;fdwroutine->BeginForeignModify = blackholeBeginForeignModify;fdwroutine->ExecForeignInsert = blackholeExecForeignInsert;fdwroutine->ExecForeignUpdate = blackholeExecForeignUpdate;fdwroutine->ExecForeignDelete = blackholeExecForeignDelete;fdwroutine->EndForeignModify = blackholeEndForeignModify;
/* support for EXPLAIN */fdwroutine->ExplainForeignScan = blackholeExplainForeignScan;fdwroutine->ExplainForeignModify = blackholeExplainForeignModify;
/* support for ANALYSE */fdwroutine->AnalyzeForeignTable = blackholeAnalyzeForeignTable;
PG_RETURN_POINTER(fdwroutine);}
Future
Even more Wrappers
Check Constraints on Foreign TablesAllows partitioning
JoinsCustom Scan APIProbably will not be the way to do this, but progress being made
Questions?
[email protected]@jim_mlodgenski