21
Lecture 7 Generate-Test-Aggregate in Coq NII Lectures Series Fr´ ed´ eric Loulergue Universit´ e d’Orl´ eans LIFO PaMDA Team October-November 2013 F. Loulergue SyDPaCC – Lecture 7 October-November 2013 1 / 20

Lecture 7 Generate-Test-Aggregate in Coq - nii.ac.jp · Lecture 7 Generate-Test-Aggregate in Coq NII Lectures Series Fr ed eric Loulergue Universit e d’Orl eans { LIFO { PaMDA Team

Embed Size (px)

Citation preview

Lecture 7Generate-Test-Aggregate in Coq

NII Lectures Series

Frederic Loulergue

Universite d’Orleans – LIFO – PaMDA Team

October-November 2013

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 1 / 20

Outline

1 Introduction

2 GTA in Coq

3 Summary

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 2 / 20

Generate-Test-Aggregate

Generate-Test-Aggregate K. Emoto, S. Fischer, Z. Hu

User’s perspective:

G Generator to produce all candidates

T Tester for validity check of candidates

A Aggregator to build the final result

Optimization:

I Filter embedding: G T A ⇒ G A

I Semiring Fusion: G A ⇒ D&C

Papers: [1, 2]

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 3 / 20

GTA: Example

The Knapsack problem

Given:

I a knapsack that could contain items for at most a volume v

I a set of items each caracterised by a value and a volume

Find the most valuable selection of items

Items and knapsack

I pens .: U200, 1dl

I scissors ": U350, 1dl

I phone %: U5000, 2dl

Knapsack: 3dl

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 4 / 20

GTA Programming: Generator

Choose an adequate generator

A generator produces:

I a multi-set of candidates

I it should generate all necessary candidates

I it could generate invalid candidates

The Knapsack problem

I G generates all the subsets of the set of items

I G {.,",%} =*{}, {.}, {"}, {%}{.,"}, {",%}, {.,%}{.,",%}+

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 5 / 20

GTA Programming: Tester

Choose an adequate tester

I it removes invalid items according to a predicate

I it should be a homomorphism

The Knapsack problem

I T is based on the predicate:

p ≡ s 7→∑

i∈svolume ≤ 3

I T*{}, {.}, {"}, {%}, {.,"}, {",%}, {.,%}, {.,",%}+= *{}, {.}, {"}, {%}, {.,"}, {",%}, {.,%}, +

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 6 / 20

GTA Programming: Tester

For better performances

p ≡ s 7→⊕

i∈svolume ′(i) ≤ 3

where

I volume ′(i) = max volume(i) (3 + 1)

I x ⊕ y = max (x + y) (3 + 1)

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 7 / 20

GTA Programming: Aggregator

Choose an adequate aggregator

It produces a summary of the valid candidates

The Knapsack problem

Examples:

I the best candidate

I the two first best candidates

I the number of candidates

I . . .

A *{}, {.}, {"}, {%}, {.,"}, {",%}, {.,%}, += ({",%}, 5350U)

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 8 / 20

GTA Programming

I The naive program G T A has complexity O(2nn)

I with GTA framework: O(n)

Two transformations:

I Filter embedding: G T A ⇒ G A

I Semiring Fusion: G A ⇒ D&C

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 9 / 20

A verified version of GTA

The Goal

I to allow the user to write GTA programs

I type class mechanism to automatically derive a efficientdivide-and-conquer program

I prove the fusion theorems in CoqI extraction towards:

I OCaml + BSMLI Scala + Spark

Joint work with

I Kento Emoto (Kyushu University of Technology)

I Julien Tesson (LACL, Universite Paris-Est Creteil)

I Frederic Dabrowski (LIFO, Universite d’Orleans)

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 10 / 20

Status

It is an ongoing work !

3 to allow the user to write GTA programs

3 type class mechanism to automatically perform semiringfusion and filter embedding fusion

7 prove the fusion theorems in Coq

7 type class mechanism to automatically derive parallel BSMLprogram from a GTA specification

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 11 / 20

Writing GTA programs and automatic fusion in Coq

DEMO

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 12 / 20

GTA in Coq

We had to define data-structures: for example Bags.

I multiset in Coq standard library is not abstract, it is a specificimplementation using functions to represent multisets

I the multiset axiomatisation in the CoLoR library is nice butdoes not use type classes

How to axiomatise:

I try to find a minimal axiomatisation so that its realisation willnot be too complicated

I then additional results are derived from this axiomatisation

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 13 / 20

Automatic parallelisation

Basically we need:

I an automatic way to derive parallel versions ofhomomorphisms

I but in a generalised setting with user-defined equivalencerelations rather than Coq = equality

⇒ ongoing work

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 14 / 20

Proof of the fusion theorem

Generate, Test, and Aggregate 267

Proof. The following calculation combines previous observations and definitionsto show the claimed identity.

aggregate (filter (ok ◦ hom) ls)= { Partition, individual aggregation }!

m∈Mok m=True

(aggregate (filter ((m ≡) ◦ hom) ls))

= { Definition of f ls , and Definition 7 }postprocessM ok f ls

= { Definition 6, homomorphic equations for f ls }postprocessM ok (aggregateM ls)

Our main result combines the theorems from Sections 4 and 5. It allows, undercertain conditions, to transform specifications in GTA form into efficient parallelalgorithms.

Main Theorem 3 (Filter-embedding Semiring Fusion). Given a set A, afinite monoid (M ,⊙), a monoid homomorphism hom from ([A ], ++) to (M ,⊙),a semiring (S ,⊕,⊗), a semiring homomorphism aggregate from (![A ]",&,×++)to (S ,⊕,⊗), a function ok : M → Bool, and a polymorphic semiring generatorgenerate, the following equation holds:

aggregate ◦ filter (ok ◦ hom) ◦ generate",×++(λx → ![x ]")

= postprocessM ok ◦ generate⊕M ,⊗M(λx → aggregateM ![x ]")

Proof. Combining Theorems 1 and 2.

Filter-embedding Semiring Fusion is not restricted to parallel algorithms. It canbe used to calculate efficient programs from specifications that use arbitrarypolymorphic semiring generators.

It is worth noting that it is possible to remove the finiteness requirementfor monoids and define a lifted semiring of finite mappings of unbounded andunknown size. We require the finiteness only in order to be able to describe thecomplexity of the resulting parallel algorithms more accurately.

If the generator happens to be a list homomorphism, like sublists, then as-sociativity of list concatenation allows the resulting program to be executed inparallel by distributing the input list evenly among available processors. Thecomplexity of a derived program using sublists as generator is linear in the sizeof the input list and quadratic in the size of the range M of the homomorphicpredicate because the semiring multiplication of the lifted semiring SM , which isused to combine all list elements, can be implemented by ranging over M ×M .

6 A More Complex Application

In this section we describe how to use our framework to derive an efficient parallelimplementation for a practical problem in statistics. We further demonstrate howto extend the derived basic program incrementally.

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 15 / 20

Proof of the fusion theorem

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 15 / 20

Proof of the fusion theorem

Currently: finiteness condition removed, so

I need to deal with a Map data-structure instead of tuples

I prove of associativity and distributivity for the semi-ring aredifficult and not yet done

I the user is may be not aware that to obtain an efficientversion, finitness optimisation is necessary

We will consider another version: M is finite

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 16 / 20

Finite and tuples with dependent types?

DEMO

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 17 / 20

Summary

GTA framework:

I usable for writing and deriving sequential functions

I soon able to derive parallel BSML versions

I not yet completely dependable (admits) ... but hopefully soon

In the future:

I scala extraction for Cloud Computing

I additional examples and library of generators/testers

I . . .

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 18 / 20

Thank you !

Coming soon

I https://traclifo.univ-orleans.fr/SYDPACC

I packages for the OPAM (OCaml) package manager

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 19 / 20

Bibliography

[1] K. Emoto, S. Fischer, and Z. Hu. Generate, Test, andAggregate – A Calculation-based Framework for SystematicParallel Programming with MapReduce. In ESOP, volume7211 of LNCS, pages 254–273. Springer, 2012.doi:10.1007/978-3-642-28869-2 13.

[2] K. Emoto, S. Fischer, and Z. Hu. Filter-embedding semiringfusion for programming with MapReduce. Formal Asp.Comput., 24(4-6):623–645, 2012.doi:10.1007/s00165-012-0241-8.

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 20 / 20