98
Bracmat A guided tour Bart Jongejan 2013

Bracmat

  • Upload
    elton

  • View
    29

  • Download
    3

Embed Size (px)

DESCRIPTION

Bracmat. A guided tour Bart Jongejan 2013. The name Applications Core Methods Why Bracmat? Code examples Documentation Development Download Finale. 1741 Country on planet Nazar, inhabited by juniper trees with good facilities for astronomy, transcendental philosophy and mining. - PowerPoint PPT Presentation

Citation preview

Page 1: Bracmat

Bracmat

A guided tourBart Jongejan

2013

Page 2: Bracmat

The nameApplications

Core MethodsWhy Bracmat?

Code examplesDocumentation

DevelopmentDownload

Finale

Page 3: Bracmat

Bracmat(brachiat. – w. branches)

1741Country on planet Nazar, inhabited by juniper trees with good facilities for astronomy, transcendental philosophy and mining.Niels Klim, by Ludvig Holberg (1684-1754).

2013Software for analysis and transformation of uncharted and complex data.

Page 4: Bracmat

Examples of Applications

HTML cleaningvalidation of text corporaextraction of tabular data from textsemantic analysis of textautomatic workflow creationcomputer algebrainvestigation of email chains

Page 5: Bracmat

HTML cleaning

ensure standard header and footercheck linksadd closing tagswarn if element not allowed in contextremove or translate disallowed

attributestranslate deprecated elements (font,

center)remove redundant elements (small big)

Page 6: Bracmat

Validation of text corpora

Dutch corpora (the Netherlands/Flanders):

CGN(2006), MWE (2007), D-COI (2008), DPC (2010), Lassi (2011), SoNaR (2012)

XML wellformedness, tag usage, sampling, visualisation for manual tasks, statistics, tabular parts of reports.

Page 7: Bracmat

Data extraction from text

Page 8: Bracmat
Page 9: Bracmat

"Skal jeg tisse mere af diabetes?"(“Do I have to urinate more because of

diabetes?”)

First: tokenizer, tagger (opennlp), parser (mate-tools)

Then: using patterns, find relation and concepts in parse tree. Result:

Semantic analysis

Polyuri DUE TO diabetes mellitus

Page 10: Bracmat

Automatic Workflow Creation

Page 11: Bracmat

Computer algebra

Face tracker: frame-by-frame video analysis

Head gestures: velocity, acceleration∝ muscle forcey: pixel position

x: frame #

a: head position

b: head velocity

c: head accelerati

on

Page 12: Bracmat

Computer algebra

Solve three equations → acceleration( c. ( -1*St2^3 + 2*St*St2*St3 + St2*St4*period + -1*St^2*St4 + -1*St3^2*period ) ^ -1* ( -1*Sh*St2^2 + Sh*St*St3 + St*St2*Sth + St2*St2h*period + -1*St3*Sth*period + -1*St^2*St2h ))

return ( Sh*(St*St3 - St2*St2) + Sth*(St*St2 - St3*period) + St2h*(St2*period - St*St) )/ ( St2*(2*St*St3 - St2*St2 + St4*period) - St*St*St4 - St3*St3*period );

Bracmat solution Java code

Page 13: Bracmat

Received: from [192.38.108.156] (unknown [192.38.108.156])by mailgate.sc.ku.dk (Postfix) with ESMTP;Thu, 11 Oct 2012 16:28:59 +0200 (CEST)

From: [email protected]: hans Keller <[email protected]>Date: Thu, 11 Oct 2012 16:28:59 +0200MIME-Version: 1.0Subject: Bracmat, GitHubCC: [email protected]: <[email protected]>X-Confirm-Reading-To: [email protected]: 1Priority: normalX-mailer: Pegasus Mail for Windows (4.63)Content-type: Multipart/Alternative; boundary="Alt-Boundary-3336.14371857"

--Alt-Boundary-3336.14371857

Email chains ( "Bracmat, GitHub" , "Bart Jongejan" , 2012 10 11 14 28 59 200 , (.) ) ( Bracmat , "Bart Jongejan" , 2012 10 11 18 8 17 200 , ( . "Re: Bracmat" , "Hans Keller" , 2012 10 12 6 45 15 200 , ( . "Re: Bracmat"

Page 14: Bracmat

Core methods

composition

normalization

pattern matching

procedural logic

Page 15: Bracmat

complex expression

CompositionCompose complex expressions from

simpler ones.

binary operator

anotherexpressionexpression

Page 16: Bracmat

NormalizationAutomatically derive canonical

expressions from unnormalized ones.

arbitrary expression

canonical expression

Page 17: Bracmat

Deconstruct complex expressions into simpler ones using pattern matching.

Pattern matching

complex expression

? ?

pattern

simple expressions

Page 18: Bracmat

Procedural logic

complex expression

? ?

pattern

& )

|

(

Page 19: Bracmat

WHY?How does a test particle move, given a set of basis vectors and a specific metric?

→ symbolic algebra

Page 20: Bracmat
Page 21: Bracmat

Symbolic manipulations easy, but MANY.

Pen and paper: doubts about correctness.

Computer: no errors.

1986: First version of Bracmat composes and normalises algebraic expressions.

1988: Pattern matching and procedural logic

Page 22: Bracmat

All Bracmat expressions are binary trees:

x^

2

+

a y

*

^3

x ^2 + a * y ^3)(

Page 23: Bracmat

Code examples

Page 24: Bracmat

1+23a+a+a3*ab+aa+b

{?}{!}{?}{!}{?}{!}

keyboard inputprompt

answer follows

answer

concise

canonical order

“a” is a symbol, not a

variable

3, 3*a and a+b are canonical forms of1+2, a+a+a and b+a, respectively.

non standard order

Page 25: Bracmat

Operators (initially):* multiplication+ addition^ exponentiation\L taking a logarithm\D taking a derivative

NO operators for subtraction and division:

a - b = a+(-1*b)a / b = a*(b^-1)

Page 26: Bracmat

Bracmat expressions autonomously seek toward stable states.Comparison: garbage falling on dump.Small things slide down through the voids.

Chemicals interact.Fumes disappear.Finally all is quiet.

This is the “Normal state”.

Page 27: Bracmat

Landfill expression:

landfill=ashtray+5*bag+barbie+12*bottle+9*cork+stone+television

Truck’s contents:

truck=apple+3*bag+paper+phone

Emptying the truck in the landfill:

(!landfill + !truck) : ?landfill

Landfill’s new stable state after a while:

apple+ashtray+8*bag+barbie+12*bottle+9*cork+paper+phone+stone+television

Page 28: Bracmat

Landfill: not nice, but unwieldy & repulsive.Good News: there are gems in the landfill.

If Hengki wants to obtain gems, he needs to:

recognise valuableitems

and pick up thosevaluable itemsJonathan McIntosh, 2004

Page 29: Bracmat

Hengki’s program !landfill: ?junk

+ ?n*((ken|barbie):?gem)

+ ?morejunk

&

& !junk+!morejunk+(!n+-1)*!gem : ?landfill

|

scan the

landfill

doll patter

n

if you see doll, take

it

after doll seen, go on with next step

add doll to H.’s

possessions

and don’t return it to the landfill

if no doll seen, landfill and Hengki’s possessions remain

unchanged

!HengkiStuff+!gem:?HengkiStuff

most of it

Page 30: Bracmat

Four new binary operators:

= : & |

and two prefixes: ?

!

bind rhs to symbol on lhsmatch lhs (subject) with rhs

(pattern)do rhs if lhs succeeds

do rhs if lhs fails

capture a value and bind it to the adjacent symbol.

produce the value that is bound to the adjacent

symbol

Page 31: Bracmat

= : & | and \D evaluated away (normally).Dynamic forces that shake and break rubble.

, and . do always persist through evaluation.Residual forces that keep things in place.

Whitespace + * ^ and \L can persist, e.g.: y x → y x y+x → x+y y*x → x*yBut: "" a, 0+a, 1*a → a, a, a

Page 32: Bracmat

Examples of data structures that don’t change when (re)evaluated.x^2,y^2,100(.1 0 0)(.0 0 -1)(.0 1 0)BUT:(1 0 0)(0 0 -1)(0 1 0) → 1 0 0 0 0 -1 0 1 0Because blank, comma and dot are binary operators, this sentence is a perfect Bracmat expression.

3 algebraic expressions separated by

commas9 numbers in a matrix

Lists built with whitespace, + and

* are always flattened!

Page 33: Bracmat

{?} Because blank, comma and dot are binary operators, the sentence you are reading is a perfect Bracmat expression.

{!} Because blank , comma and dot are binary operators , the sentence you are reading is a perfect Bracmat expression.

Page 34: Bracmat

Logical expansion of application domain of Bracmat as:

“Software for analysis and transformation of uncharted and complex data.”

textual

Example:Check sentence syntax with Bracmat patterns:

Page 35: Bracmat

(S=!NP !VP)& (NP=!DET !N)& (VP=!V|!V !NP)& (DET=a|the)& (N=woman|man)& (V=shoots|kisses)& ( a man kisses the woman:!S & put$"That's grammatical!\n" | put$"not grammatical\n" )

non-terminals

terminals

rule application

screen output if success

screen output if failure

Page 36: Bracmat

Operator $ applies function to argument.

Only few built-in functions, e.g.:

get

put

lst

str

Function application:

str$(I m p l o d e) → Implode

write a result to file, screen or string

concatenate a tree into a single string

get input from file, keyboard or string

serialize a variable to file, screen or

string

Page 37: Bracmat

Define your own functions.

E.g. syntax checker:

check= S NP VP DET N V. (S=!NP !VP) & (NP=!DET !N) & (VP=!V|!V !NP) & (DET=a|the) & (N=woman|man) & (V=shoots|kisses) & !arg:!S

= only evaluates lhs.

before dot: declaration of local

variables

'check' succeeds if match ok

after dot: function body

.

Page 38: Bracmat

Call check with a sentence as argument:

{?} check$(a woman shoots)&okay|no{!} okay{?} check$(a man a man shoots)&T|F{!} F

Page 39: Bracmat

PARSE TREE

concept 1

concept 2

relation (‘attribute’)

( ROOT. (VERB.Skal.skal) (subj.PRON.jeg.jeg) ( vobj . (VERB.tisse.tisse) ( dobj . (ADJ.mere.mere) ( pobj . (ADP.af.af) (nobj.NOUN.diabetes.diabetes)

) ) ) (pnct.X."?"."?"))

Page 40: Bracmat

location of

concept 2

relation (‘attribute’)

location of concept 1 (fragmented)

Relation Pattern

Combine fragmen

ts

[…] | (its.hasTree) $ ( !arg . ( = (VERB.?.skal|skulle) ? ( vobj . ((VERB.?.?):?a) ( dobj . ?b (pobj.(ADP.af.af) ?LC2)

) ) ) ) & !a (dobj.!b):?LC1 )& "DUE TO"

Whatever matches this

Why “!a” ?

… must also match this.

Page 41: Bracmat

(its.hasTree)$ (!LC1 . ( = (VERB.?.tisse) (dobj.(ADJ.?.mere) ?) ) )

→ (28442001.Polyuri)

concept 1

Concept 1 pattern

Page 42: Bracmat

concept 2

Concept 2 pattern

(its.hasLemma)$ (!LC2.(=sukkersyge ?|diabetes ?))

→ (73211009."diabetes mellitus ")

Page 43: Bracmat

(attribute."DUE TO")

( concept1. "Clinical Finding". 28442001.Polyuri)

( concept2. "Clinical Finding". 73211009."diabetes mellitus ")

Page 44: Bracmat

"Is sinning sincere?":?Mytext

& 0:?Bi & @( !Mytext : ? ( %?One %?Two ? & (!One !Two)+!Bi:?Bi & ~ ) )| lst$Bi

Initialise bigram accumulator

String pattern matchin

g

subject pattern

embedded instructions

at least one byte

fail! (backtrack)

accumulate

any number of bytes, even none

Page 45: Bracmat

(Bi= (" " i)+ 2*(" " s)+ (I s)+ (c e)+ (e "?")+ (e r)+ 3*(i n)+ 2*(n " ")+ (n c)+ (r e)+ (s " ")+ 2*(s i));

Page 46: Bracmat

0:?Bi & "из фрагментов текстов":?Mytext

& @( !Mytext : ? ( (%?One & utf$!One) (%?Two & utf$!Two) ? & (!One !Two)+!Bi:?Bi & ~ ) )| lst$Bi

Require UTF-8 character

Page 47: Bracmat

Bi= (" " т)+ (" " ф)+ (а г)+ (в " ")+ (г м)+ (е к)+ (е н)+ (з " ")+ (и з)+ (к с)+ (м е)+ (н т)+ 2*(о в)+ (р а)+ (с т)+ (т е)+ 2*(т о)+ (ф р);

Page 48: Bracmat

Parse 0n1n

Example of recursive pattern.{?} AB= ( "" | 0 !AB 1 )

{?} 0 0 1 1:!AB & good | bad{!} good

left hand side of | is

”nothing”. So this matches zero 0's and

1's.recurse

Page 49: Bracmat

Parse 0n1n2n

{?} AB= ( "":?C | 0 !AB 1 & 2 !C:?C )

{?} ABC=!AB !C

{?} 0 0 1 1 2 2:!ABC & good | bad{!} good

if zero 0's and 1's then also zero

2's

for each nested pair of 0 and 1, add a 2

to Cafter parsing n 0's

and n 1's, C contains n 2's

Page 50: Bracmat

Documentation

http://jongejan.dk/bart/bracmat.html

Most complete documentation.

http://rosettacode.org/wiki/Category:Bracmat

Over 170 examples that can be compared with implementations in other programming languages.

Page 51: Bracmat

Development

Evolution at moderate pace.Great variety of snippets in valid.bra:

guards against unexpected and unwanted behavioural changes and tests all C-code.

Behaviour described in file help (precursor to bracmat.html).

Changes are logged.

Page 52: Bracmat

Download

Open source since 3 June 2003 (GPL).

http://cst.dk/download/bracmat/Source code spanning period 1986-2012.

https://github.com/BartJongejan/BracmatAlways the latest source code.

Page 53: Bracmat

Finale

garbage collection most programming languages

gem collectionBracmat

“Staten bruger dem imidlertid til at undersøge Metalgruberne; thi ligesaa slet som de see hvad der er oven paa Jorden, saa fortreffeligen see de det der er inden i den.”

Page 54: Bracmat

Finale

In Bracmat, trees are first class citizens.

Trees have autonomous behaviour.

Using pattern matching, the State controls trees.

The State itself consists of trees.

Page 55: Bracmat

Finale

Ease of use, clarity and expressive power of Bracmat’s patterns can compete with RE, SQL, XQuery and Prolog.

Pattern matching as a primitive is strangely absent in popular programming languages, having died out with Snobol (1962 ~1990).

Modelling data in Bracmat is a small step away from understanding and controlling it.

Page 56: Bracmat

Contents Part 2

Bracmat expressionsExpression evaluation

Bracmat in use

Page 57: Bracmat

Bracmat expressions

Code and DataNumbersStringsListsStructuresBooleansFunctionsSpecial symbols

ArraysObjectsHash Tables

Page 58: Bracmat

Code and Data

No sharp distinction:

"I’m stable" → "I’m stable"997^1/2 → 997^1/2(not so) stable → not so stable998^1/2 → 2^1/2*499^1/2i*i:>0&pos|"not pos" → "not pos"

Page 59: Bracmat

Numbers

Arbitrary-precision arithmetic.Rational numbers.No floating point!

2/3 + -1/6 → 1/21/99+-1/100+1/101 → 10001/9999002^216091+-1 → 746093103…815528447 (65050 digits, 31st Mersenne prime)

Page 60: Bracmat

Strings

Strings cannot contain null-bytes, otherwise no restrictions.

A

ру́lсский

"2^216091+-1"

"A string can extend overmultiple lines"

Page 61: Bracmat

Lists

Sums, products and sentences.

x + 4 + a + 8p * q * zThis list has five elements

Lists inside lists:a + 5 * e ^ (i * pi + x) + b

Page 62: Bracmat

Lists (continued)

a + b : ?x + b + ?y { y := 0 }a * b : ?x * b * ?y { y := 1 }a b : ?x b ?y { y := ""}

0, 1 and the empty string "" are identity (or neutral) elements in sums, products and sentences respectively.

Page 63: Bracmat

Structures

(uni . "Københavns Universitet")(institut . CST)(publications.(2011. pub1 pub2) (2012. pub3 pub4) (2013. pub5 pub6 pub7 ))

Page 64: Bracmat

Booleans

There is no separate boolean type.Each node in a Bracmat expression

has a success/failure status flag.

a success~ failure1+2 successful evaluation to 3a:b failing match operation

Page 65: Bracmat

Functions

(=a b. !arg:(?a.?b) & (!b.!a))

$ (jeg.går)

→ går.jeg

local variablesparameter

returned value

function application

argument

result

Page 66: Bracmat

Special symbols

e Euler’s constant, the basis of the natural logarithm, 2.7182…

x \D (e ^ x) → e ^ xi unit imaginary number

i * i → -1pi the ratio of a circle's

circumference to its diameter e ^ (i * pi) → -1

Page 67: Bracmat

Arrays

An array “A” is not a variable, it is the stack of all variables named “A”.

tbl$(A,100) create array A size 100117:?(41 $ A) assign to element A[41]!(41$A) inspect element A[41]tbl$(A,0) delete array A

Page 68: Bracmat

Objects

Objects are the only Bracmat expressions that can change.

(language=(iso639=) (spkrs=))& new$language:?Danish& da:?(Danish..iso639)& 6M:?(Danish..spkrs)

Page 69: Bracmat

Hash Tables

built-in object typeeffectively store&search key-value pairs.

new$hash:?H& (H..insert)$(Danmark.!Danish)& (H..find)$Danmark

→ Danmark.(=(iso639=da) (spkrs=6M))

Page 70: Bracmat

Expression Evaluation

DefinitionProgram flowPattern matchingMacro evaluationλ – calculusNormalization

Page 71: Bracmat

Definition

Right hand side of = operator is not evaluated.

J = ? ?#x ?;

double=.!arg+!arg;double$7 → 14(=.!arg+!arg)$7 → 14

pattern definition

function definition

anonymous function

Page 72: Bracmat

Program flowEvaluation from left to right, depth first.

firstthis & thenthis ifnotthis | thenthis

whl'(body)

fun$arg (or fun'arg)

!subroutine

Page 73: Bracmat

Pattern matching

subject : pattern

Pattern doesn’t evolve, side effects possible. Primary result: subject/success or failure.& operator: escape to normal evaluation.@ prefix indicates string pattern matching.

Normal evaluator

Pattern matching evaluator

Page 74: Bracmat

@( kabbatus : ? (?%x & rev$!x:?y) !y ? )

string pattern matching

string subject

escape to

normal evaluatio

n

side effect: assignment

embedded pattern

matching operationnormally evaluated

Page 75: Bracmat

No regular expressions. Instead, use @(string:pattern)

or (tree:pattern)

regex pattern

nesting & recursion

no/yes yes

named variables no/yes yesnon-string subject no yesgreedy yes/

nono

Page 76: Bracmat

Regex: DO {stay in string world}UNTIL (regex clear as mud)THEN use other tool

Bracmat: tokenize input string → treeWHILE(more and more interesting)

{ pattern match tree & make more interesting tree }

A list, to begin with

XPathXQuer

ySQLLINQCQL

Page 77: Bracmat

Macro evaluation

Replace variable by the value of the variable.

X=6;i=0;mltpls=; ' ( whl ' ( !i+1:<10:?i & !mltpls ""$X*!i:?mltpls ) ): ?code;

Lhs: empty string

Lhs: empty string

Page 78: Bracmat

Macro after evaluation.

lst$code;

(code= = whl ' ( !i+1:<10:?i & !mltpls 6 *!i:?mltpls ) );

' replaced by =

""$X replaced by 6

Page 79: Bracmat

Macros

(1) speed up performancetight loopspattern matching over long lists,

(2) incrementally build the ”Greatest Common Pattern”

that matches a given set of data structures. (Tree kernel)

(3) create or change code at run time

Page 80: Bracmat

λ - calculus

The lambda abstraction

(λx.x)y

translates to

/('(x.$x))$y

For factorial and Fibonacci the hard way, see

http://rosettacode.org/wiki/Y_combinator#Bracmat

Page 81: Bracmat

Normalization

. , WS + * ^ \L

flatten

identity "" 0 1

sort

combine

Page 82: Bracmat

Normalization: flatten

(a,b,c),(d,e),f,g → a,b,c,d,e,f,g , , , , → a ,a , , , b , b c d e f g c , d , e , f g

Because of associativity

Page 83: Bracmat

Normalization: identity

"" a b c "" "" d "" → a b c d

0+a+b+c+0+0+d+0 → a+b+c+d

1*a*b*c*1*1*d*1 → a*b*c*d

Because "" 0 and 1 are neutral elements

Page 84: Bracmat

Normalization: sort

f+a+b+g+c+e+d → a+b+c+d+e+f+g f*a*b*g*c*e*d → a*b*c*d*e*f*g e*a*b*c*d*f*g

3*b*c+a*b+a*e^(k+j)*i↓

a*b+3*b*c+i*e^(j+k)*a

Because + and * are commutative

Page 85: Bracmat

Normalization: combine

eat+the+the+zuppe → eat+2*the+zuppe

2*a*b+(a+-1*b)^2 → a^2+b^2

Because multiplication is distributive over

addition

Page 86: Bracmat

Bracmat in useSurface AppearanceCreate a new programCanonical lay-outedit-save-reformat-reopen loopRequirementsBuildRun codeMemoryUnicode supportSGML, XML, HTML support

Page 87: Bracmat

Surface Appearance

Operators =.,|&: +*^\L\D'$_

Prefixes [~/#<>%@`?! and !!. Also -

Parentheses and quotes override default.

Comments: { yes, not stored in memory! }

Layout: free, convert to canonical with lst

white space!

Page 88: Bracmat

Create a new program

(1) From scratch. Easy: no boiler plate.

(2) Wizzard project.bra creates skeleton:description,program stub,facility to help you keep your code nicely formatted.

Page 89: Bracmat

Canonical lay-out

Principles:- reasonable margins- indented- pair of parentheses aligned (horizont|vertic)ally- heading operator aligned w. parentheses

Some coding errors become conspicuous.

Page 90: Bracmat

edit-save-reformat-reopen loop

Use combination of terminal window or DOS-prompt and auto-reloading text editor.

Edit your code, save, type !r in Bracmat, (auto)reload program in editor.

Page 91: Bracmat

{?} (diet =. !arg:? (fish|meat|chicken) ? & carnivore |

{1} !arg:? (yoghurt|milk|cheese) ? & veggie

{1} | !arg:? (fruit|nuts) ? & vegan | dontknow

{1} )

{!} diet

S 0,00 sec

{?} lst$diet

Page 92: Bracmat

(diet=

. !arg:? (fish|meat|chicken) ? & carnivore | !arg : ? (yoghurt|milk|cheese) ? & veggie | !arg:? (fruit|nuts) ?&vegan | dontknow);{!} diet

Page 93: Bracmat

Requirements

C-compiler with standard libraries32-bit or 64-bit OSbracmat.c and xml.c (< 20 000 lines in

all)nothing more

Page 94: Bracmat

Build

E.g. compile and link with GNU-C:gcc -std=c99 -pedantic -Wall -O3 -static -DNDEBUG -o bracmat bracmat.c xml.c

Windows, dynamically linked, 32 bit: 100Kb

Linux, statically linked, 64 bit: 1 Mb

Page 95: Bracmat

Run code

w/0 parameters: interactive (REPL) each parameter is evaluated, left to

right.Bracmat as embedded s/w:

JNI – call Bracmat from JavaLinked with C or C++ code, Bracmat can be called and can even call C-functions.

Page 96: Bracmat

Memory

NodesVery compact, 4,8,12,16, … bytes per nodeReference counting

Structure sharingUnreferenced nodes freed

Variablesdynamically scoped (lex.bra → lexical)

Page 97: Bracmat

Unicode

UTF8 assumed, fall back to ISO-8859-1Ø ø in Unicode, UTF-8:utf$low$chu$216 → 248

Ø ø in ISO-8859-1:asc$low$chr$216 → 248

upp$ру́lсский → РУ́lССКИЙ

Page 98: Bracmat

SGML, XML, HTML

Robust and fast parsing, done in C. Transformation to Bracmat expression. No nesting of elements. Separate step.XML/HTML entities → UTF-8

get$("<p class='Q' enabled >Sams&oslash;<br/</p>",MEM,HT,ML)

→ (p.(class.Q) (enabled.)) Samsø (br.,) (.p.)