Upload
trurlscribd
View
220
Download
0
Embed Size (px)
Citation preview
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
1/52
High-Volume
Scheduling and Job
Management with
PostgreSQL
Leonardo Meira, Software Engineer
April 3rd, 2014
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
2/52
!Has been in production for years, runnin
hundreds and thousands of scripts simuand hundreds of millions of scripts in tot
!PostgreSQL is the basis for our high-vol
traditional queuing system!Log manager
!Open Source
System
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
3/52
! Context (What does Fiksu do?)
Agenda
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
4/52
! Context (What does Fiksu do?)
!
The Problem
Agenda
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
5/52
! Context (What does Fiksu do?)
!
The Problem
! The Solution
Agenda
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
6/52
! Context (What does Fiksu do?)
!
The Problem
! The Solution
! Wrap-up
Agenda
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
7/52
! Context (What does Fiksu do?)
!
The Problem
! The Solution
! Wrap-up
! Fiksus Open Source Projects
Agenda
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
8/52
! Context (What does Fiksu do?)
!
The Problem
! The Solution
! Wrap-up
! Fiksus Open Source Projects
! Questions
Agenda
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
9/52
Fiksu is a mobile atechnology compa
Fiksu makes it ea
mobile app markeacquire the users
need to grow their
business.
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
10/52
!
!
The Problem
!
!
!
!
Agenda
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
11/52
!Massive data retrieval needs
!
Jobs running periodically, guaranteed to
cloud environment
!
Machine failures handled gracefully
!Load balancing
!
Quickly diagnose troubles when failures
Problem
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
12/52
!
!
! The Solution
!
!
!
Agenda
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
13/52
Network Application Framework
!Framework for building applicatio
!Schedules, runs and monitors pro
across a distributed network.
!
Like a distributed CRON but cloprepared
!A front-end UI to control it all
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
14/52
A cloud prepared distributed cron
!Efficient queue management of applica
!
Machine node redundancy
!
Application run dependencies
!Run restrictions preventing clashing sc
from executing at the same time
!machine or cluster wide clash check
!
Load Balancing
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
15/52
A nice UI
!Historical process runs
!Scheduling reports
!Debugging
!Log/Machine/Queue managemen
!Machine health
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
16/52
Based on Af
!Ruby on Rails scripting framework
!
Open sourced in 2013 (https://github.cpublic/af)
!Provides command line parsing
!
Tight integration with PostgreSQL
!Modified log4r to make log manageme
!
Strong application component module
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
17/52
Tight integration with Postgresql
!Automatic management ofpg_stat_activity.application_name
!Provided by our gem pg_application_name
!Helpers for advistory locking
!
Provided by our gem pg_advisory_locker!Bulk data management
!More of a supplement to activerecords lac
!Provided by our gem bulk data methods
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
18/52
Network Application Framework
! Open sourced in 2013 (https://github.com/fiksu-p
!
Has been in production for years, hundreds of macross hundreds of machines
! Manages script run times and distribution (think m
multi-machine/distributed cron)
!
Cloud Machine watchdog, alarming and logging
!AWS EC2 + RDS for ease
! RDS is of the PostgreSQL variety
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
19/52
Overview A running system
NAFDB
Runner
Script5
Script6
Script7
Script8
Runner
Script1
Script2
Script3
Script4
Script configuration
Script Schedules
Script Queues
Logs
Alarming
Yo
N
Runner Machine 1 Runner Machine 2 Servic
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
20/52
Runners
! One per machine
! Responsible for
! Schedule management
! Queuing of periodic jobs
! Job starting and process management
!
Queue/clash management
! Load balancing
! Machine watchdogging
!
Seamless code version upgrades
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
21/52
Runners start up
!Clean old processes
!
Remove invalid jobs
!Wind down other runner
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
22/52
Runners main loop
! If we have been marked down, kill all local jobs and exit
! If any local jobs have died, clean them up
! If we can start any new jobs, start them
! Mark ourselves last_seen_at so other runners know we are alive
! If no machine has checked schedules in the last minute
!
Mark ourselves as the leader
! Queue any scripts that need have asked to run in this time perio
! Mark any other runners down if they havent updated their last_s
minutes
! Mark that we checked the schedules
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
23/52
Table "naf.machines"Column | Type
---------------------------+-------------------------
id | integer created_at | timestamp without time z
updated_at | timestamp without time z
server_address | inet
server_name | text
short_name | text
server_note | text
enabled | boolean
thread_pool_size | integer
last_checked_schedules_at | timestamp without time z
last_seen_alive_at | timestamp without time z
marked_down | boolean
marked_down_by_machine_id | integer
marked_down_at | timestamp without time z
log_level | text
deleted | boolean
Schema machines
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
24/52
Machines UI
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
25/52
Table "naf.applications"
Column | ---------------------+------------
id | integer
created_at | timestamp w
updated_at | timestamp w
deleted | boolean
application_type_id | integer
command | text
title | text
short_name | text
log_level | text
Schema applications/schedulesTable "naf.application_schedules"
Column | Type |
--------------------------------------+-----------------------------+
id | integer |
created_at | timestamp without time zone |
updated_at | timestamp without time zone |enabled | boolean |
visible | boolean |
application_id | integer |
application_run_group_restriction_id | integer |
application_run_group_name | text |
application_run_group_limit | integer |
run_interval | integer |
priority | integer |
enqueue_backlogs | boolean
run_interval_style_id | integer
application_run_group_quantum | integer
A li ti UI
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
26/52
Applications UI
Application Sched les UI
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
27/52
Application Schedules UI
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
28/52
Runners Queue management
! Historical jobs in partitioned tables
!
old tables dropped every month or so
! queued jobs in their own table
! well describe why a normal queue is not sat
!
running jobs in their own table! in memory hot can be replaced by a Dyn
thing but postgresql does a great job for hun
machines
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
29/52
Table "naf.historical_jobs"Column | Type
--------------------------------------+--------------------------
id | bigint
created_at | timestamp without time z
updated_at | timestamp without time z
application_id | integer
application_type_id | integer
command | text
application_run_group_restriction_id | integer
application_run_group_name | text
application_run_group_limit | integer
priority | integer
started_on_machine_id | integer
failed_to_start | boolean
started_at | timestamp without time z
pid | integer
finished_at | timestamp without time z
exit_status | integer
termination_signal | integer
state | text
request_to_terminate | boolean
marked_dead_by_machine_id | integer
marked_dead_at | timestamp without time z
log_level | text
machine_runner_invocation_id | integer
application_schedule_id | integer
Schema Historical Jobs
S h Q d J b
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
30/52
Table "naf.queued_jobs"
Column | Type
--------------------------------------+------------------
id | bigint
created_at | timestamp without
updated_at | timestamp without
application_id | integer
application_type_id | integer
command | text
application_run_group_restriction_id | integer application_run_group_name | text
application_run_group_limit | integer
priority | integer
application_schedule_id | integer
Schema Queued Jobs
S h R i J b
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
31/52
Table "naf.running_jobs"Column | Type
--------------------------------------+------------------------
id | bigint
created_at | timestamp without time updated_at | timestamp without time
application_id | integer
application_type_id | integer
command | text
application_run_group_restriction_id | integer
application_run_group_name | text
application_run_group_limit | integer
started_on_machine_id | integer
started_at | timestamp without time
pid | integer
request_to_terminate | boolean
marked_dead_by_machine_id | integer
marked_dead_at | timestamp without time
log_level | text
tags | text[]
application_schedule_id | integer
Schema Running Jobs
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
32/52
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
33/52
Affinities
!Think of them as puzzle pieces
!
Machines have affinity slots
!Slots can be required
!Applications have affinity tabs
!
An application can run on any machine that hthat match all of its tabs
!A machine will only run applications that hav
all of its required slots
Slot
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
34/52
Affinity parameters
!Machines have an allocate-able numb
slots of a specific affinity
!Applications will allocate a certain num
slots at run start (or not run on that ma
!Load balancing
Affi i i Di ( l i )
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
35/52
Affinities Diagram (puzzle pieces)
1
Machine1 App1 App1 App2 App3
1 1
Schema affinities
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
36/52
Table "naf.affinities"Column | Type |
----------------------------+-----------------------------+
id | integer |
created_at | timestamp without time zone |
updated_at | timestamp without time zone |selectable | boolean |
affinity_classification_id | integer |
affinity_name | text |
affinity_short_name | text |
affinity_note | text |
Schema -- affinities
Table "naf.machine_affinity_s
Column |
--------------------+---------
id | integer
created_at | timesta
machine_id | integer
affinity_id | integer
affinity_parameter | numeric
required | boolean
Table "naf.application_schedule_affinity_tabs "
Column | Type |-------------------------+-----------------------------+
id | integer |
created_at | timestamp without time zone |
application_schedule_id | integer |
affinity_id | integer |
affinity_parameter | numeric |
The Queue Fetcher
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
37/52
! When this machine has an empty run slot, fill itthe next job with the most priority from the que
! Has all of the required affinities of this machi
! This machine has all the affinities demanded
! Whose affinity parameters match allocable u
machine
! Is not restricted by the run group restrictions
! Whose prerequisites have completed
The Queue Fetcher
Agenda
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
38/52
!
!
!
! Wrap-up
!
!
Agenda
Overview A running system
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
39/52
Overview A running system
NAFDB
Runner Script5
Script6
Script7
Script8
Runner
Script1
Script2
Script3
Script4
Script configuration
Script Schedules
Script Queues
Logs
Alarming
Yo
N
Runner Machine 1 Runner Machine 2 Servic
Overview Runners
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
40/52
Overview Runners
Application 1
Application 2
Application 3
Application N
Machine 1
Runner 1
Machine2
Runner 2
Machine Y
Runner Y
Conclusion
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
41/52
Conclusion
!N/Af has been in production for years,
hundreds and thousands of scriptssimultaneously and hundreds of million
scripts in total
!PostgreSQL is the basis for our high-v
non-traditional queuing system
Agenda
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
42/52
!
!
!
!
! Fiksus Open Source Projects
!
Agenda
Gems
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
43/52
Gems
!N/Af
!
Af
!
Partitioned
!Bulk Data Methods
!PG Advisory Locker
!PG Application Name
N/Af
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
44/52
N/Af
!https://github.com/fiksu-public/naf
!
A network application framework that lPostgreSQL to deliver high volume, dis
and redundant job scheduling and man
Af
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
45/52
Af
! https://github.com/fiksu-public/af
!Application framework that supports:
!
Command line options integrated into instance (andvariables
! Logging via log4r
! PostgreSQL advisory locking via pg_advisor_locke
! PostgreSQL database connection updates via
pg_application_name gem! Threads and message passing
!Application components adding loggers and comm
options
Partitioned
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
46/52
Partitioned
!https://github.com/fiksu/partitioned
!
Adds assistance to ActiveRecord for manipu(reading, creating, updating) an ActiveRecord
that represents data that may be in one of m
database tables (determined by the Models d
!
Supports the creation and deletion of child tapartitioning support infrastructure.
!Supports bulk inserts and updates
Bulk Data Methods
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
47/52
Bulk Data Methods
!https://github.com/fiksu/bulk_data_me
!
MixIn used to extend ActiveRecord claimplementing bulk insert and update o
through {#create_many} and {#update
PG Advisory Locker
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
48/52
PG Advisory Locker
!https://github.com/fiksu/pg_advisory_lo
!
Helper for calling PostgreSQL functionpg_advisory_lock, pg_advisory_try_loc
pg_advisory_unlock
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
49/52
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
50/52
Agenda
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
51/52
!
!
!
!
!
! Questions
g
Thank You!
8/10/2019 High Volume Scheduling and Job Management With PostgreSQL
52/52
Want to talk?
www.fiksu.com
@fiksu
Learn more:
https://github.com/fiksu-public/naf