25
PostgreSQL Materialized Views and Active Record David Roberts

PostgreSQL Materialized Views with Active Record

Embed Size (px)

DESCRIPTION

How to use PostgreSQL Materialized Views with to simplify code and improve performance of your Ruby on Rails application

Citation preview

Page 1: PostgreSQL Materialized Views with Active Record

PostgreSQL Materialized Views

and Active Record

David Roberts

Page 2: PostgreSQL Materialized Views with Active Record

The ProblemHow do you quickly filter data

represented by multiple ActiveRecord associations and calculations?

Page 3: PostgreSQL Materialized Views with Active Record

Data ModelCity%

&%Philly%&%Boston%

Technology%&%Ruby%&%Python%

Club%&%Philly.rb%

Talk%&%PostgreSQL%Materialized%Views%

Feedback%&%“Super%rad%talk!!”%&%“The%whole%world%is%now%dumber”%

Author%&%David%Roberts%

Page 4: PostgreSQL Materialized Views with Active Record

View all Commentsclass Feedback < ActiveRecord::Base belongs_to :talk INVALID_COMMENTS = ['', 'NA', 'N/A', 'not applicable']

scope :filled_out, -> { where.not(comment: INVALID_COMMENTS) }end

Feedback.filled_out

Feedback Load (2409.6ms) SELECT "feedbacks".* FROM "feedbacks" WHERE ("feedbacks"."comment" NOT IN ('', 'NA', 'N/A', 'not applicable'))

Page 5: PostgreSQL Materialized Views with Active Record

Highest Scoring Talk with Valid Comments

Feedback.filled_out \ .select('talk_id, avg(score) as overall_score') \ .group('talk_id').order('overall_score desc') \ .limit(10)

# 665msSELECT talk_id, avg(score) as overall_score FROM "feedbacks" WHERE ("feedbacks"."comment" NOT IN ('', 'NA', 'N/A', 'not applicable')) GROUP BY talk_id ORDER BY overall_score desc LIMIT 10;

Page 6: PostgreSQL Materialized Views with Active Record

Highest Scoring Talk with Valid Comments

results = Feedback.filled_out \ .select('talk_id, avg(score) as overall_score') \ .group('talk_id').order('overall_score desc') \ .limit(10)

results.first.inspect

=> "#<Feedback id: nil, talk_id: 24510>"

Page 7: PostgreSQL Materialized Views with Active Record

Highest Scoring Talks in PA by Authors named Parker

Feedback.filled_out.joins(talk: [:author, { club: :city }] ) \ .select('feedbacks.talk_id, avg(feedbacks.score) as overall_score') \ .where("cities.state_abbr = ?", 'PA') \ .where("authors.name LIKE '%?%'", 'Parker') \ .group('feedbacks.talk_id') \ .order('overall_score desc') \ .limit(10)

# 665msSELECT feedbacks.talk_id, avg(feedbacks.score) as overall_score FROM "feedbacks" INNER JOIN "talks" ON "talks"."id" = "feedbacks"."talk_id" INNER JOIN "authors" ON "authors"."id" = "talks"."author_id" INNER JOIN "clubs" ON "clubs"."id" = "talks"."club_id" INNER JOIN "cities" ON "cities"."id" = "clubs"."city_id" WHERE ("feedbacks"."comment" NOT IN ('', 'NA', 'N/A', 'not applicable')) AND (cities.state_abbr = 'PA') AND (authors.name LIKE '%Parker%') GROUP BY feedbacks.talk_id ORDER BY overall_score desc LIMIT 10;

Page 8: PostgreSQL Materialized Views with Active Record

What’s Wrong with these Examples?

• Long ugly queries

• Slow queries are bad for Web Applications

• Fighting ActiveRecord framework

• SQL aggregates are difficult to access

• Returned object no longer corresponds to model

• Repetitive code to setup joins and Filter invalid comments

Page 9: PostgreSQL Materialized Views with Active Record

Using Database Views

to clean up ActiveRecord models

Page 10: PostgreSQL Materialized Views with Active Record

Views• Uses a stored / pre-defined Query

• Uses live data from corresponding tables when queried

• Can reference data from many tables

• Great for hiding complex SQL statements

• Allows you to push functionality to database

Page 11: PostgreSQL Materialized Views with Active Record

But this is a Ruby talk!

Page 12: PostgreSQL Materialized Views with Active Record

class CreateTalkView < ActiveRecord::Migration def up connection.execute <<-SQL CREATE VIEW v_talks_report AS SELECT cities.id as city_id, cities.name as city_name, cities.state_abbr as state_abbr, technologies.id as technology_id, clubs.id as club_id, clubs.name as club_name, talks.id as talk_id, talks.name as talk_name, authors.id as author_id, authors.name as author_name, feedback_agg.overall_score as overall_score FROM ( SELECT talk_id, avg(score) as overall_score FROM feedbacks WHERE feedbacks.comment NOT IN ('', 'NA', 'N/A', 'not applicable') GROUP BY talk_id ) as feedback_agg INNER JOIN talks ON feedback_agg.talk_id = talks.id INNER JOIN authors ON talks.author_id = authors.id INNER JOIN clubs ON talks.club_id = clubs.id INNER JOIN cities ON clubs.city_id = cities.id INNER JOIN technologies ON clubs.technology_id = technologies.id SQL end

def down connection.execute 'DROP VIEW IF EXISTS v_talks_report' endend

Page 13: PostgreSQL Materialized Views with Active Record

Encapsulate in ActiveRecord Modelclass TalkReport < ActiveRecord::Base # Use associations just like any other ActiveRecord object belongs_to :author belongs_to :talk belongs_to :club belongs_to :city belongs_to :technology # take advantage of talks has_many relationship delegate :feedbacks, to: :talk

self.table_name = 'v_talks_report'

# views cannot be changed since they are virtual def readonly true endend

Page 14: PostgreSQL Materialized Views with Active Record

Highest Scoring Talks with Valid Comments

results = TalkReport.order(overall_score: :desc) \ .limit(10)

results.first.inspect => "#<TalkReport city_id: 16, city_name: \"Burlington\", state_abbr: \"VT\", technology_id: 2, club_id: 73, club_name: \"Python - Burlington\", talk_id: 6508, talk_name: \"The SQL protocol is down, reboot the optical syste...\", author_id: 100, author_name: \"Jerrell Gleichner\", overall_score: #<BigDecimal:7ff3f80f1678,'0.4E1',9(27)>>"

Page 15: PostgreSQL Materialized Views with Active Record

Highest Scoring Talks in PA by Authors named Parker

TalkReport.where(state_abbr: 'PA') \ .where("author_name LIKE '%Parker%'") \ .order(overall_score: :desc).limit(10)

Page 16: PostgreSQL Materialized Views with Active Record

Using Materialized Views

to Improve Performance

Page 17: PostgreSQL Materialized Views with Active Record

Materialized Views• Acts similar to a Database View, but results persist for

future queries

• Creates a table on disk with the Result set

• Can be indexed

• Ideal for capturing frequently used joins and aggregations

• Allows optimization of tables for updating and Materialized Views for reporting

• Must be refreshed to be updated with most recent data

Page 18: PostgreSQL Materialized Views with Active Record

class CreateTalkReportMv < ActiveRecord::Migration def up connection.execute <<-SQL CREATE MATERIALIZED VIEW mv_talks_report AS SELECT cities.id as city_id, cities.name as city_name, cities.state_abbr as state_abbr, technologies.id as technology_id, clubs.id as club_id, clubs.name as club_name, talks.id as talk_id, talks.name as talk_name, authors.id as author_id, authors.name as author_name, feedback_agg.overall_score as overall_score FROM ( SELECT talk_id, avg(score) as overall_score FROM feedbacks WHERE feedbacks.comment NOT IN ('', 'NA', 'N/A', 'not applicable') GROUP BY talk_id ) as feedback_agg INNER JOIN talks ON feedback_agg.talk_id = talks.id INNER JOIN authors ON talks.author_id = authors.id INNER JOIN clubs ON talks.club_id = clubs.id INNER JOIN cities ON clubs.city_id = cities.id INNER JOIN technologies ON clubs.technology_id = technologies.id; CREATE INDEX ON mv_talks_report (overall_score); SQL end

def down connection.execute 'DROP MATERIALIZED VIEW IF EXISTS mv_talks_report' endend

Page 19: PostgreSQL Materialized Views with Active Record

ActiveRecord Model Changeclass TalkReport < ActiveRecord::Base # No changes to associations belongs_to …

self.table_name = 'mv_talks_report'

def self.repopulate connection.execute("REFRESH MATERIALIZED VIEW #{table_name}") end # Materialized Views cannot be modified def readonly true endend

Page 20: PostgreSQL Materialized Views with Active Record

Highest Scoring Talks99% reduction in runtime

577msFeedback.filled_out \ .select('talk_id, avg(score) as score') \ .group('talk_id').order('score desc').limit(10)

1msTalkReport.order(overall_score: :desc).limit(10)

Page 21: PostgreSQL Materialized Views with Active Record

Highest Scoring Talks in PA by Authors named “Parker”

400msFeedback.filled_out.joins(talk: [:author, { club: :city }] ) \ .select('feedbacks.talk_id, avg(feedbacks.score) as overall_score') \ .where("cities.state_abbr = ?", 'PA') \ .where("authors.name LIKE '%?%'", 'Parker') \ .group('feedbacks.talk_id') \ .order('overall_score desc') \ .limit(10)

19msTalkReport.where(state_abbr: 'PA') \ .where("author_name LIKE '%?%'", 'Parker') \ .order(overall_score: :desc).limit(10)

95% reduction in runtime

Page 22: PostgreSQL Materialized Views with Active Record

Why Use Materialized Views in your Rails application?

• ActiveRecord models allow for easy representation in Rails

• Capture commonly used joins / filters

• Allows for fast, live filtering and sorting of complex associations or calculated fields

• Push data intensive processing out of Ruby to Database

• Make use of advanced Database functions

• Can Optimize Indexes for reporting only

• When Performance is more important than Storage

Page 23: PostgreSQL Materialized Views with Active Record

Downsides

• Requires PostgreSQL 9.3

• Entire Materialized View must be refreshed to update

• Bad when Live Data is required

• For this use case, roll your own Materialized View using standard tables

Page 24: PostgreSQL Materialized Views with Active Record

Downsides• Migrations are painful!

• Recommend writing in SQL, so no using scopes

• Entire Materialized View must be dropped and redefined for any changes to the View or referring tables

• Hard to read and track what changed

Page 25: PostgreSQL Materialized Views with Active Record

Resources

• Source Code used in talk

• https://github.com/droberts84/materialized-view-demo

• PostgreSQL Documentation

• https://wiki.postgresql.org/wiki/Materialized_Views