21
FOREIGN DATA WRAPPER ENHANCEMENTS June 17, 2015 PostgreSQL Developers Unconference Clustering Track Shigeru HANADA, Etsuro Fujita

Foreign Data Wrapper Enhancements

Embed Size (px)

Citation preview

Page 1: Foreign Data Wrapper Enhancements

FOREIGN DATA WRAPPER ENHANCEMENTS June 17, 2015 PostgreSQL Developers Unconference Clustering Track Shigeru HANADA, Etsuro Fujita

Page 2: Foreign Data Wrapper Enhancements

Who are we • Shigeru HANADA

•  From Tokyo, Japan •  Working on FDW since 2010 •  Implemented initial FDW API and postgres_fdw

• Etsuro Fujita •  From Tokyo, Japan •  Working on Postgres for 10 years •  Interested in FDW enhancements

Page 3: Foreign Data Wrapper Enhancements

Agenda • Past enhancements proposed for 9.5

•  Inheritance support (Committed) •  Join push-down (Committed) •  Join push-down for postgres_fdw (Returned with feedback) •  Update push-down (Returned with feedback) •  Possible remote query optimization in 9.5

•  Ideas for further enhancement •  Sort push-down •  Aggregate push-down •  More aggressive join push-down

• Discussions

Page 4: Foreign Data Wrapper Enhancements

PAST ENHANCEMENTS PROPOSED FOR 9.5

Page 5: Foreign Data Wrapper Enhancements

Inheritance support • Outline

•  Allow foreign table to participate in inheritance tree •  A way to implement sharding

• Example postgres=# explain verbose select * from parent ;! QUERY PLAN!---------------------------------------------------------------------------! Append (cost=0.00..270.00 rows=2001 width=4)! -> Seq Scan on public.parent (cost=0.00..0.00 rows=1 width=4)! Output: parent.a! -> Foreign Scan on public.ft1 (cost=100.00..135.00 rows=1000 width=4)! Output: ft1.a! Remote SQL: SELECT a FROM public.t1! -> Foreign Scan on public.ft2 (cost=100.00..135.00 rows=1000 width=4)! Output: ft2.a! Remote SQL: SELECT a FROM public.t2!(9 rows)

Page 6: Foreign Data Wrapper Enhancements

Update push-down • Outline

•  Send whole UPDATE/DELETE statement when it has same semantics on the remote side

• Example postgres=# explain verbose update foo set a = a + 1 where a > 10;! QUERY PLAN!--------------------------------------------------------------------------------! Update on public.foo (cost=100.00..139.78 rows=990 width=10)! Remote SQL: UPDATE public.foo SET a = $2 WHERE ctid = $1! -> Foreign Scan on public.foo (cost=100.00..139.78 rows=990 width=10)! Output: (a + 1), ctid! Remote SQL: SELECT a, ctid FROM public.foo WHERE ((a > 10)) FOR UPDATE!(5 rows)!!postgres=# explain verbose update foo set a = a + 1 where a > 10;! QUERY PLAN!-----------------------------------------------------------------------------! Update on public.foo (cost=100.00..139.78 rows=990 width=10)! -> Foreign Update on public.foo (cost=100.00..139.78 rows=990 width=10)! Remote SQL: UPDATE public.foo SET a = (a + 1) WHERE ((a > 10))!(3 rows)

Current

Patched

Page 7: Foreign Data Wrapper Enhancements

Update push-down, cont. •  Issues

•  FDW-APIs for update push-down •  Called from nodeModifyTable.c or nodeForeignscan.c?

•  Update push-down for an update on a join •  "UPDATE foo ... FROM bar ..." (both foo and bar are remote)

•  Further enhancements •  INSERT/UPSERT push-down

Page 8: Foreign Data Wrapper Enhancements

Join push-down • Outline

•  Join foreign tables on remote side, if it’s safe

• Example fdw=# EXPLAIN (VERBOSE) SELECT tbalance FROM pgbench_branches b JOIN pgbench_tellers t USING(bid);! QUERY PLAN!------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------! Foreign Scan (cost=100.00..101.00 rows=50 width=4)! Output: t.tbalance! Relations: (public.pgbench_branches b) INNER JOIN (public.pgbench_tellers t)! Remote SQL: SELECT r.a1 FROM (SELECT l.a9 FROM (SELECT bid a9 FROM public.pgbench_branches) l) l (a1) INNER JOIN (SELECT r.a11, r.a10 FROM (SELECT bid a10, tbalance a11 FROM public.pgbench_tellers) r) r (a1, a2) ON ((l.a1 = r.a2))!(4 rows)

Page 9: Foreign Data Wrapper Enhancements

Join push-down, cont. •  Issues

•  Implement postgres_fdw to handle join APIs •  Centralize deparsing remote query

•  Should use parse tree rather than planner information to generate join query?

•  Generic SQL deparser would help porting to FDWs for other RDBMS

Page 10: Foreign Data Wrapper Enhancements

Possible remote query optimization in 9.5 • When we run a following query:

SELECT c.grade, max(s.score) max_score! FROM scores s LEFT JOIN classes c! ON c.class_id = s.class_id! WHERE c.subject = ‘Math’! GROUP BY c.grade!HAVING max(s.score) > 50! ORDER BY c.grade DESC;

“scores” and “classes” are foreign tables

Page 11: Foreign Data Wrapper Enhancements

Possible remote query optimization in 9.5 • When we run a following query:

SELECT c.grade, max(s.score) max_score! FROM scores s LEFT JOIN classes c! ON c.class_id = s.class_id! WHERE c.subject = ‘Math’! GROUP BY c.grade!HAVING max(s.score) > 50! ORDER BY c.grade DESC;

SELECT c.grade, s.score!FROM scores s LEFT JOIN classes c! ON c.class_id = s.class_id!WHERE c.subject= ‘Math’!ORDER BY c.grade DESC; Genarate remote query

We can push-down red portions of the

query

Page 12: Foreign Data Wrapper Enhancements

Possible remote query optimization in 9.5 postgres=# EXPLAIN SELECT c.grade, max(s.score) max_score!postgres-# FROM scores s LEFT JOIN classes c!postgres-# ON c.class_id = s.class_id!postgres-# WHERE c.subject= 'Math'!

postgres-# GROUP BY c.grade!postgres-# HAVING max(s.score) > 50!postgres-# ORDER BY c.grade DESC;!

QUERY PLAN!----------------------------------------------------------------------------------! GroupAggregate (cost=27.92..27.94 rows=1 width=8)! Group Key: c.grade!

Filter: (max(s.score) > 50)! -> Sort (cost=27.92..27.92 rows=1 width=8)! Sort Key: c.grade DESC!

-> Hash Join (cost=20.18..27.91 rows=1 width=8)! Hash Cond: (s.class_id = c.class_id)! -> Seq Scan on scores s (cost=0.00..6.98 rows=198 width=8)! -> Hash (cost=20.12..20.12 rows=4 width=8)!

-> Seq Scan on classes c (cost=0.00..20.12 rows=4 width=8)! Filter: (subject = 'Math'::text)!(11 rows)

Page 13: Foreign Data Wrapper Enhancements

IDEAS FOR FURTHER ENHANCEMENT

Page 14: Foreign Data Wrapper Enhancements

Ideas for further enhancement • Sort push-down • Aggregate push-down • More aggressive join push-down •  2PC support (out of scope of this session)

•  Will be discussed in Ashutosh’s session on 19th Jun.

Page 15: Foreign Data Wrapper Enhancements

Sort push-down • Outline

•  Mark a ForiegnScan as sorted

• Efficacy •  Avoid unnecessary sort on local side •  Use ForeignScan as a source of MergeJoin directly

• How to implement •  Add extra ForeignPath with pathkeys •  Estimate costs of pre-sorted path •  Sort result of a foreign scan

•  add ORDER BY, in RDBMS FDWs •  choose pre-sorted file, in file-based FDWs

Page 16: Foreign Data Wrapper Enhancements

Sort push-down •  Issues

•  How can we limit candidates of sort keys? •  No brute-force approach •  Introduce FOREIGN INDEX to represent generic remote indexes? •  Introduce FDW-specific catalogs? •  Extract key information from ORDER BY, JOIN, GROUP BY?

•  How can we ensure that the semantics of ordering are identical? •  Even between PostgreSQLs, we have collation issues. •  Is it OK to leave it to DBAs? •  Limiting to non-character data types seems a way to go for the first cut.

•  Can we use pre-sorted join results as sorted path? •  MergeJoin as a root node of remote query means the result is sorted by

the join key, but it is not certain even we execute EXPLAIN before query.

•  Any idea?

Page 17: Foreign Data Wrapper Enhancements

Aggregate push-down • Outline

•  Replace a Aggregate/GroupAggregate/HashAggregate plan node with a ForeignScan which produces aggregated results

• Efficacy •  Reduce amount of data transferred •  Off-load overheads of aggregation

• How to implement •  New FDW API for aggregation hooking •  Implement API in each FDW

Page 18: Foreign Data Wrapper Enhancements

Aggregate push-down •  Issues

•  GROUP BY requires identical semantics about grouping keys. •  We have similar issue to sort push-down.

•  How can we map functions to remote ones? •  ROUTINE MAPPING is defined in SQL standard, but it doesn’t seem

well-designed.

Page 19: Foreign Data Wrapper Enhancements

More aggressive join push-down • Outline

•  Send local data to join it on remote side, with following way: •  VALUES expression in FROM clause •  per-table replication, with logical replication, Slony-I, etc.

• Efficacy •  Reduce amount of data transferred from remote to local

•  Limited to cases that joining small local table and huge remote table which produce small results

Page 20: Foreign Data Wrapper Enhancements

More aggressive join push-down • How to implement

•  Replace reference to a small local table with VALUES() •  Use a remote replicated table as an alternative

•  Issues •  How can we construct VALUES() expression? •  How can we know a table is replicated on the remote side?

SELECT *! FROM huge_remote_table h! JOIN! (VALUES (1, ‘foo’), (2, ‘bar’)) AS s (id, name)! ON s.id;

Generated by scanning local small table

Page 21: Foreign Data Wrapper Enhancements

DISCUSSIONS