High Volume Scheduling and Job Management With PostgreSQL

Embed Size (px)

Citation preview

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    1/52

    High-Volume

    Scheduling and Job

    Management with

    PostgreSQL

    Leonardo Meira, Software Engineer

    April 3rd, 2014

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    2/52

    !Has been in production for years, runnin

    hundreds and thousands of scripts simuand hundreds of millions of scripts in tot

    !PostgreSQL is the basis for our high-vol

    traditional queuing system!Log manager

    !Open Source

    System

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    3/52

    ! Context (What does Fiksu do?)

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    4/52

    ! Context (What does Fiksu do?)

    !

    The Problem

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    5/52

    ! Context (What does Fiksu do?)

    !

    The Problem

    ! The Solution

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    6/52

    ! Context (What does Fiksu do?)

    !

    The Problem

    ! The Solution

    ! Wrap-up

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    7/52

    ! Context (What does Fiksu do?)

    !

    The Problem

    ! The Solution

    ! Wrap-up

    ! Fiksus Open Source Projects

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    8/52

    ! Context (What does Fiksu do?)

    !

    The Problem

    ! The Solution

    ! Wrap-up

    ! Fiksus Open Source Projects

    ! Questions

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    9/52

    Fiksu is a mobile atechnology compa

    Fiksu makes it ea

    mobile app markeacquire the users

    need to grow their

    business.

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    10/52

    !

    !

    The Problem

    !

    !

    !

    !

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    11/52

    !Massive data retrieval needs

    !

    Jobs running periodically, guaranteed to

    cloud environment

    !

    Machine failures handled gracefully

    !Load balancing

    !

    Quickly diagnose troubles when failures

    Problem

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    12/52

    !

    !

    ! The Solution

    !

    !

    !

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    13/52

    Network Application Framework

    !Framework for building applicatio

    !Schedules, runs and monitors pro

    across a distributed network.

    !

    Like a distributed CRON but cloprepared

    !A front-end UI to control it all

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    14/52

    A cloud prepared distributed cron

    !Efficient queue management of applica

    !

    Machine node redundancy

    !

    Application run dependencies

    !Run restrictions preventing clashing sc

    from executing at the same time

    !machine or cluster wide clash check

    !

    Load Balancing

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    15/52

    A nice UI

    !Historical process runs

    !Scheduling reports

    !Debugging

    !Log/Machine/Queue managemen

    !Machine health

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    16/52

    Based on Af

    !Ruby on Rails scripting framework

    !

    Open sourced in 2013 (https://github.cpublic/af)

    !Provides command line parsing

    !

    Tight integration with PostgreSQL

    !Modified log4r to make log manageme

    !

    Strong application component module

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    17/52

    Tight integration with Postgresql

    !Automatic management ofpg_stat_activity.application_name

    !Provided by our gem pg_application_name

    !Helpers for advistory locking

    !

    Provided by our gem pg_advisory_locker!Bulk data management

    !More of a supplement to activerecords lac

    !Provided by our gem bulk data methods

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    18/52

    Network Application Framework

    ! Open sourced in 2013 (https://github.com/fiksu-p

    !

    Has been in production for years, hundreds of macross hundreds of machines

    ! Manages script run times and distribution (think m

    multi-machine/distributed cron)

    !

    Cloud Machine watchdog, alarming and logging

    !AWS EC2 + RDS for ease

    ! RDS is of the PostgreSQL variety

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    19/52

    Overview A running system

    NAFDB

    Runner

    Script5

    Script6

    Script7

    Script8

    Runner

    Script1

    Script2

    Script3

    Script4

    Script configuration

    Script Schedules

    Script Queues

    Logs

    Alarming

    Yo

    N

    Runner Machine 1 Runner Machine 2 Servic

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    20/52

    Runners

    ! One per machine

    ! Responsible for

    ! Schedule management

    ! Queuing of periodic jobs

    ! Job starting and process management

    !

    Queue/clash management

    ! Load balancing

    ! Machine watchdogging

    !

    Seamless code version upgrades

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    21/52

    Runners start up

    !Clean old processes

    !

    Remove invalid jobs

    !Wind down other runner

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    22/52

    Runners main loop

    ! If we have been marked down, kill all local jobs and exit

    ! If any local jobs have died, clean them up

    ! If we can start any new jobs, start them

    ! Mark ourselves last_seen_at so other runners know we are alive

    ! If no machine has checked schedules in the last minute

    !

    Mark ourselves as the leader

    ! Queue any scripts that need have asked to run in this time perio

    ! Mark any other runners down if they havent updated their last_s

    minutes

    ! Mark that we checked the schedules

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    23/52

    Table "naf.machines"Column | Type

    ---------------------------+-------------------------

    id | integer created_at | timestamp without time z

    updated_at | timestamp without time z

    server_address | inet

    server_name | text

    short_name | text

    server_note | text

    enabled | boolean

    thread_pool_size | integer

    last_checked_schedules_at | timestamp without time z

    last_seen_alive_at | timestamp without time z

    marked_down | boolean

    marked_down_by_machine_id | integer

    marked_down_at | timestamp without time z

    log_level | text

    deleted | boolean

    Schema machines

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    24/52

    Machines UI

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    25/52

    Table "naf.applications"

    Column | ---------------------+------------

    id | integer

    created_at | timestamp w

    updated_at | timestamp w

    deleted | boolean

    application_type_id | integer

    command | text

    title | text

    short_name | text

    log_level | text

    Schema applications/schedulesTable "naf.application_schedules"

    Column | Type |

    --------------------------------------+-----------------------------+

    id | integer |

    created_at | timestamp without time zone |

    updated_at | timestamp without time zone |enabled | boolean |

    visible | boolean |

    application_id | integer |

    application_run_group_restriction_id | integer |

    application_run_group_name | text |

    application_run_group_limit | integer |

    run_interval | integer |

    priority | integer |

    enqueue_backlogs | boolean

    run_interval_style_id | integer

    application_run_group_quantum | integer

    A li ti UI

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    26/52

    Applications UI

    Application Sched les UI

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    27/52

    Application Schedules UI

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    28/52

    Runners Queue management

    ! Historical jobs in partitioned tables

    !

    old tables dropped every month or so

    ! queued jobs in their own table

    ! well describe why a normal queue is not sat

    !

    running jobs in their own table! in memory hot can be replaced by a Dyn

    thing but postgresql does a great job for hun

    machines

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    29/52

    Table "naf.historical_jobs"Column | Type

    --------------------------------------+--------------------------

    id | bigint

    created_at | timestamp without time z

    updated_at | timestamp without time z

    application_id | integer

    application_type_id | integer

    command | text

    application_run_group_restriction_id | integer

    application_run_group_name | text

    application_run_group_limit | integer

    priority | integer

    started_on_machine_id | integer

    failed_to_start | boolean

    started_at | timestamp without time z

    pid | integer

    finished_at | timestamp without time z

    exit_status | integer

    termination_signal | integer

    state | text

    request_to_terminate | boolean

    marked_dead_by_machine_id | integer

    marked_dead_at | timestamp without time z

    log_level | text

    machine_runner_invocation_id | integer

    application_schedule_id | integer

    Schema Historical Jobs

    S h Q d J b

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    30/52

    Table "naf.queued_jobs"

    Column | Type

    --------------------------------------+------------------

    id | bigint

    created_at | timestamp without

    updated_at | timestamp without

    application_id | integer

    application_type_id | integer

    command | text

    application_run_group_restriction_id | integer application_run_group_name | text

    application_run_group_limit | integer

    priority | integer

    application_schedule_id | integer

    Schema Queued Jobs

    S h R i J b

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    31/52

    Table "naf.running_jobs"Column | Type

    --------------------------------------+------------------------

    id | bigint

    created_at | timestamp without time updated_at | timestamp without time

    application_id | integer

    application_type_id | integer

    command | text

    application_run_group_restriction_id | integer

    application_run_group_name | text

    application_run_group_limit | integer

    started_on_machine_id | integer

    started_at | timestamp without time

    pid | integer

    request_to_terminate | boolean

    marked_dead_by_machine_id | integer

    marked_dead_at | timestamp without time

    log_level | text

    tags | text[]

    application_schedule_id | integer

    Schema Running Jobs

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    32/52

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    33/52

    Affinities

    !Think of them as puzzle pieces

    !

    Machines have affinity slots

    !Slots can be required

    !Applications have affinity tabs

    !

    An application can run on any machine that hthat match all of its tabs

    !A machine will only run applications that hav

    all of its required slots

    Slot

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    34/52

    Affinity parameters

    !Machines have an allocate-able numb

    slots of a specific affinity

    !Applications will allocate a certain num

    slots at run start (or not run on that ma

    !Load balancing

    Affi i i Di ( l i )

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    35/52

    Affinities Diagram (puzzle pieces)

    1

    Machine1 App1 App1 App2 App3

    1 1

    Schema affinities

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    36/52

    Table "naf.affinities"Column | Type |

    ----------------------------+-----------------------------+

    id | integer |

    created_at | timestamp without time zone |

    updated_at | timestamp without time zone |selectable | boolean |

    affinity_classification_id | integer |

    affinity_name | text |

    affinity_short_name | text |

    affinity_note | text |

    Schema -- affinities

    Table "naf.machine_affinity_s

    Column |

    --------------------+---------

    id | integer

    created_at | timesta

    machine_id | integer

    affinity_id | integer

    affinity_parameter | numeric

    required | boolean

    Table "naf.application_schedule_affinity_tabs "

    Column | Type |-------------------------+-----------------------------+

    id | integer |

    created_at | timestamp without time zone |

    application_schedule_id | integer |

    affinity_id | integer |

    affinity_parameter | numeric |

    The Queue Fetcher

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    37/52

    ! When this machine has an empty run slot, fill itthe next job with the most priority from the que

    ! Has all of the required affinities of this machi

    ! This machine has all the affinities demanded

    ! Whose affinity parameters match allocable u

    machine

    ! Is not restricted by the run group restrictions

    ! Whose prerequisites have completed

    The Queue Fetcher

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    38/52

    !

    !

    !

    ! Wrap-up

    !

    !

    Agenda

    Overview A running system

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    39/52

    Overview A running system

    NAFDB

    Runner Script5

    Script6

    Script7

    Script8

    Runner

    Script1

    Script2

    Script3

    Script4

    Script configuration

    Script Schedules

    Script Queues

    Logs

    Alarming

    Yo

    N

    Runner Machine 1 Runner Machine 2 Servic

    Overview Runners

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    40/52

    Overview Runners

    Application 1

    Application 2

    Application 3

    Application N

    Machine 1

    Runner 1

    Machine2

    Runner 2

    Machine Y

    Runner Y

    Conclusion

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    41/52

    Conclusion

    !N/Af has been in production for years,

    hundreds and thousands of scriptssimultaneously and hundreds of million

    scripts in total

    !PostgreSQL is the basis for our high-v

    non-traditional queuing system

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    42/52

    !

    !

    !

    !

    ! Fiksus Open Source Projects

    !

    Agenda

    Gems

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    43/52

    Gems

    !N/Af

    !

    Af

    !

    Partitioned

    !Bulk Data Methods

    !PG Advisory Locker

    !PG Application Name

    N/Af

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    44/52

    N/Af

    !https://github.com/fiksu-public/naf

    !

    A network application framework that lPostgreSQL to deliver high volume, dis

    and redundant job scheduling and man

    Af

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    45/52

    Af

    ! https://github.com/fiksu-public/af

    !Application framework that supports:

    !

    Command line options integrated into instance (andvariables

    ! Logging via log4r

    ! PostgreSQL advisory locking via pg_advisor_locke

    ! PostgreSQL database connection updates via

    pg_application_name gem! Threads and message passing

    !Application components adding loggers and comm

    options

    Partitioned

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    46/52

    Partitioned

    !https://github.com/fiksu/partitioned

    !

    Adds assistance to ActiveRecord for manipu(reading, creating, updating) an ActiveRecord

    that represents data that may be in one of m

    database tables (determined by the Models d

    !

    Supports the creation and deletion of child tapartitioning support infrastructure.

    !Supports bulk inserts and updates

    Bulk Data Methods

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    47/52

    Bulk Data Methods

    !https://github.com/fiksu/bulk_data_me

    !

    MixIn used to extend ActiveRecord claimplementing bulk insert and update o

    through {#create_many} and {#update

    PG Advisory Locker

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    48/52

    PG Advisory Locker

    !https://github.com/fiksu/pg_advisory_lo

    !

    Helper for calling PostgreSQL functionpg_advisory_lock, pg_advisory_try_loc

    pg_advisory_unlock

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    49/52

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    50/52

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    51/52

    !

    !

    !

    !

    !

    ! Questions

    g

    Thank You!

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    52/52

    Want to talk?

    [email protected]

    www.fiksu.com

    @fiksu

    Learn more:

    https://github.com/fiksu-public/naf