Upload
elton
View
29
Download
3
Embed Size (px)
DESCRIPTION
Bracmat. A guided tour Bart Jongejan 2013. The name Applications Core Methods Why Bracmat? Code examples Documentation Development Download Finale. 1741 Country on planet Nazar, inhabited by juniper trees with good facilities for astronomy, transcendental philosophy and mining. - PowerPoint PPT Presentation
Citation preview
Bracmat
A guided tourBart Jongejan
2013
The nameApplications
Core MethodsWhy Bracmat?
Code examplesDocumentation
DevelopmentDownload
Finale
Bracmat(brachiat. – w. branches)
1741Country on planet Nazar, inhabited by juniper trees with good facilities for astronomy, transcendental philosophy and mining.Niels Klim, by Ludvig Holberg (1684-1754).
2013Software for analysis and transformation of uncharted and complex data.
Examples of Applications
HTML cleaningvalidation of text corporaextraction of tabular data from textsemantic analysis of textautomatic workflow creationcomputer algebrainvestigation of email chains
HTML cleaning
ensure standard header and footercheck linksadd closing tagswarn if element not allowed in contextremove or translate disallowed
attributestranslate deprecated elements (font,
center)remove redundant elements (small big)
Validation of text corpora
Dutch corpora (the Netherlands/Flanders):
CGN(2006), MWE (2007), D-COI (2008), DPC (2010), Lassi (2011), SoNaR (2012)
XML wellformedness, tag usage, sampling, visualisation for manual tasks, statistics, tabular parts of reports.
Data extraction from text
"Skal jeg tisse mere af diabetes?"(“Do I have to urinate more because of
diabetes?”)
First: tokenizer, tagger (opennlp), parser (mate-tools)
Then: using patterns, find relation and concepts in parse tree. Result:
Semantic analysis
Polyuri DUE TO diabetes mellitus
Automatic Workflow Creation
Computer algebra
Face tracker: frame-by-frame video analysis
Head gestures: velocity, acceleration∝ muscle forcey: pixel position
x: frame #
a: head position
b: head velocity
c: head accelerati
on
Computer algebra
Solve three equations → acceleration( c. ( -1*St2^3 + 2*St*St2*St3 + St2*St4*period + -1*St^2*St4 + -1*St3^2*period ) ^ -1* ( -1*Sh*St2^2 + Sh*St*St3 + St*St2*Sth + St2*St2h*period + -1*St3*Sth*period + -1*St^2*St2h ))
return ( Sh*(St*St3 - St2*St2) + Sth*(St*St2 - St3*period) + St2h*(St2*period - St*St) )/ ( St2*(2*St*St3 - St2*St2 + St4*period) - St*St*St4 - St3*St3*period );
Bracmat solution Java code
Received: from [192.38.108.156] (unknown [192.38.108.156])by mailgate.sc.ku.dk (Postfix) with ESMTP;Thu, 11 Oct 2012 16:28:59 +0200 (CEST)
From: [email protected]: hans Keller <[email protected]>Date: Thu, 11 Oct 2012 16:28:59 +0200MIME-Version: 1.0Subject: Bracmat, GitHubCC: [email protected]: <[email protected]>X-Confirm-Reading-To: [email protected]: 1Priority: normalX-mailer: Pegasus Mail for Windows (4.63)Content-type: Multipart/Alternative; boundary="Alt-Boundary-3336.14371857"
--Alt-Boundary-3336.14371857
Email chains ( "Bracmat, GitHub" , "Bart Jongejan" , 2012 10 11 14 28 59 200 , (.) ) ( Bracmat , "Bart Jongejan" , 2012 10 11 18 8 17 200 , ( . "Re: Bracmat" , "Hans Keller" , 2012 10 12 6 45 15 200 , ( . "Re: Bracmat"
Core methods
composition
normalization
pattern matching
procedural logic
complex expression
CompositionCompose complex expressions from
simpler ones.
binary operator
anotherexpressionexpression
NormalizationAutomatically derive canonical
expressions from unnormalized ones.
arbitrary expression
canonical expression
Deconstruct complex expressions into simpler ones using pattern matching.
Pattern matching
complex expression
? ?
pattern
simple expressions
Procedural logic
complex expression
? ?
pattern
& )
|
(
WHY?How does a test particle move, given a set of basis vectors and a specific metric?
→ symbolic algebra
Symbolic manipulations easy, but MANY.
Pen and paper: doubts about correctness.
Computer: no errors.
1986: First version of Bracmat composes and normalises algebraic expressions.
1988: Pattern matching and procedural logic
All Bracmat expressions are binary trees:
x^
2
+
a y
*
^3
x ^2 + a * y ^3)(
Code examples
1+23a+a+a3*ab+aa+b
{?}{!}{?}{!}{?}{!}
keyboard inputprompt
answer follows
answer
concise
canonical order
“a” is a symbol, not a
variable
3, 3*a and a+b are canonical forms of1+2, a+a+a and b+a, respectively.
non standard order
Operators (initially):* multiplication+ addition^ exponentiation\L taking a logarithm\D taking a derivative
NO operators for subtraction and division:
a - b = a+(-1*b)a / b = a*(b^-1)
Bracmat expressions autonomously seek toward stable states.Comparison: garbage falling on dump.Small things slide down through the voids.
Chemicals interact.Fumes disappear.Finally all is quiet.
This is the “Normal state”.
Landfill expression:
landfill=ashtray+5*bag+barbie+12*bottle+9*cork+stone+television
Truck’s contents:
truck=apple+3*bag+paper+phone
Emptying the truck in the landfill:
(!landfill + !truck) : ?landfill
Landfill’s new stable state after a while:
apple+ashtray+8*bag+barbie+12*bottle+9*cork+paper+phone+stone+television
Landfill: not nice, but unwieldy & repulsive.Good News: there are gems in the landfill.
If Hengki wants to obtain gems, he needs to:
recognise valuableitems
and pick up thosevaluable itemsJonathan McIntosh, 2004
Hengki’s program !landfill: ?junk
+ ?n*((ken|barbie):?gem)
+ ?morejunk
&
& !junk+!morejunk+(!n+-1)*!gem : ?landfill
|
scan the
landfill
doll patter
n
if you see doll, take
it
after doll seen, go on with next step
add doll to H.’s
possessions
and don’t return it to the landfill
if no doll seen, landfill and Hengki’s possessions remain
unchanged
!HengkiStuff+!gem:?HengkiStuff
most of it
Four new binary operators:
= : & |
and two prefixes: ?
!
bind rhs to symbol on lhsmatch lhs (subject) with rhs
(pattern)do rhs if lhs succeeds
do rhs if lhs fails
capture a value and bind it to the adjacent symbol.
produce the value that is bound to the adjacent
symbol
= : & | and \D evaluated away (normally).Dynamic forces that shake and break rubble.
, and . do always persist through evaluation.Residual forces that keep things in place.
Whitespace + * ^ and \L can persist, e.g.: y x → y x y+x → x+y y*x → x*yBut: "" a, 0+a, 1*a → a, a, a
Examples of data structures that don’t change when (re)evaluated.x^2,y^2,100(.1 0 0)(.0 0 -1)(.0 1 0)BUT:(1 0 0)(0 0 -1)(0 1 0) → 1 0 0 0 0 -1 0 1 0Because blank, comma and dot are binary operators, this sentence is a perfect Bracmat expression.
3 algebraic expressions separated by
commas9 numbers in a matrix
Lists built with whitespace, + and
* are always flattened!
{?} Because blank, comma and dot are binary operators, the sentence you are reading is a perfect Bracmat expression.
{!} Because blank , comma and dot are binary operators , the sentence you are reading is a perfect Bracmat expression.
Logical expansion of application domain of Bracmat as:
“Software for analysis and transformation of uncharted and complex data.”
textual
Example:Check sentence syntax with Bracmat patterns:
(S=!NP !VP)& (NP=!DET !N)& (VP=!V|!V !NP)& (DET=a|the)& (N=woman|man)& (V=shoots|kisses)& ( a man kisses the woman:!S & put$"That's grammatical!\n" | put$"not grammatical\n" )
non-terminals
terminals
rule application
screen output if success
screen output if failure
Operator $ applies function to argument.
Only few built-in functions, e.g.:
get
put
lst
str
Function application:
str$(I m p l o d e) → Implode
write a result to file, screen or string
concatenate a tree into a single string
get input from file, keyboard or string
serialize a variable to file, screen or
string
Define your own functions.
E.g. syntax checker:
check= S NP VP DET N V. (S=!NP !VP) & (NP=!DET !N) & (VP=!V|!V !NP) & (DET=a|the) & (N=woman|man) & (V=shoots|kisses) & !arg:!S
= only evaluates lhs.
before dot: declaration of local
variables
'check' succeeds if match ok
after dot: function body
.
Call check with a sentence as argument:
{?} check$(a woman shoots)&okay|no{!} okay{?} check$(a man a man shoots)&T|F{!} F
PARSE TREE
concept 1
concept 2
relation (‘attribute’)
( ROOT. (VERB.Skal.skal) (subj.PRON.jeg.jeg) ( vobj . (VERB.tisse.tisse) ( dobj . (ADJ.mere.mere) ( pobj . (ADP.af.af) (nobj.NOUN.diabetes.diabetes)
) ) ) (pnct.X."?"."?"))
location of
concept 2
relation (‘attribute’)
location of concept 1 (fragmented)
Relation Pattern
Combine fragmen
ts
[…] | (its.hasTree) $ ( !arg . ( = (VERB.?.skal|skulle) ? ( vobj . ((VERB.?.?):?a) ( dobj . ?b (pobj.(ADP.af.af) ?LC2)
) ) ) ) & !a (dobj.!b):?LC1 )& "DUE TO"
Whatever matches this
…
Why “!a” ?
… must also match this.
(its.hasTree)$ (!LC1 . ( = (VERB.?.tisse) (dobj.(ADJ.?.mere) ?) ) )
→ (28442001.Polyuri)
concept 1
Concept 1 pattern
concept 2
Concept 2 pattern
(its.hasLemma)$ (!LC2.(=sukkersyge ?|diabetes ?))
→ (73211009."diabetes mellitus ")
(attribute."DUE TO")
( concept1. "Clinical Finding". 28442001.Polyuri)
( concept2. "Clinical Finding". 73211009."diabetes mellitus ")
"Is sinning sincere?":?Mytext
& 0:?Bi & @( !Mytext : ? ( %?One %?Two ? & (!One !Two)+!Bi:?Bi & ~ ) )| lst$Bi
Initialise bigram accumulator
String pattern matchin
g
subject pattern
embedded instructions
at least one byte
fail! (backtrack)
accumulate
any number of bytes, even none
(Bi= (" " i)+ 2*(" " s)+ (I s)+ (c e)+ (e "?")+ (e r)+ 3*(i n)+ 2*(n " ")+ (n c)+ (r e)+ (s " ")+ 2*(s i));
0:?Bi & "из фрагментов текстов":?Mytext
& @( !Mytext : ? ( (%?One & utf$!One) (%?Two & utf$!Two) ? & (!One !Two)+!Bi:?Bi & ~ ) )| lst$Bi
Require UTF-8 character
Bi= (" " т)+ (" " ф)+ (а г)+ (в " ")+ (г м)+ (е к)+ (е н)+ (з " ")+ (и з)+ (к с)+ (м е)+ (н т)+ 2*(о в)+ (р а)+ (с т)+ (т е)+ 2*(т о)+ (ф р);
Parse 0n1n
Example of recursive pattern.{?} AB= ( "" | 0 !AB 1 )
{?} 0 0 1 1:!AB & good | bad{!} good
left hand side of | is
”nothing”. So this matches zero 0's and
1's.recurse
Parse 0n1n2n
{?} AB= ( "":?C | 0 !AB 1 & 2 !C:?C )
{?} ABC=!AB !C
{?} 0 0 1 1 2 2:!ABC & good | bad{!} good
if zero 0's and 1's then also zero
2's
for each nested pair of 0 and 1, add a 2
to Cafter parsing n 0's
and n 1's, C contains n 2's
Documentation
http://jongejan.dk/bart/bracmat.html
Most complete documentation.
http://rosettacode.org/wiki/Category:Bracmat
Over 170 examples that can be compared with implementations in other programming languages.
Development
Evolution at moderate pace.Great variety of snippets in valid.bra:
guards against unexpected and unwanted behavioural changes and tests all C-code.
Behaviour described in file help (precursor to bracmat.html).
Changes are logged.
Download
Open source since 3 June 2003 (GPL).
http://cst.dk/download/bracmat/Source code spanning period 1986-2012.
https://github.com/BartJongejan/BracmatAlways the latest source code.
Finale
garbage collection most programming languages
gem collectionBracmat
“Staten bruger dem imidlertid til at undersøge Metalgruberne; thi ligesaa slet som de see hvad der er oven paa Jorden, saa fortreffeligen see de det der er inden i den.”
Finale
In Bracmat, trees are first class citizens.
Trees have autonomous behaviour.
Using pattern matching, the State controls trees.
The State itself consists of trees.
Finale
Ease of use, clarity and expressive power of Bracmat’s patterns can compete with RE, SQL, XQuery and Prolog.
Pattern matching as a primitive is strangely absent in popular programming languages, having died out with Snobol (1962 ~1990).
Modelling data in Bracmat is a small step away from understanding and controlling it.
Contents Part 2
Bracmat expressionsExpression evaluation
Bracmat in use
Bracmat expressions
Code and DataNumbersStringsListsStructuresBooleansFunctionsSpecial symbols
ArraysObjectsHash Tables
Code and Data
No sharp distinction:
"I’m stable" → "I’m stable"997^1/2 → 997^1/2(not so) stable → not so stable998^1/2 → 2^1/2*499^1/2i*i:>0&pos|"not pos" → "not pos"
Numbers
Arbitrary-precision arithmetic.Rational numbers.No floating point!
2/3 + -1/6 → 1/21/99+-1/100+1/101 → 10001/9999002^216091+-1 → 746093103…815528447 (65050 digits, 31st Mersenne prime)
Strings
Strings cannot contain null-bytes, otherwise no restrictions.
A
ру́lсский
"2^216091+-1"
"A string can extend overmultiple lines"
Lists
Sums, products and sentences.
x + 4 + a + 8p * q * zThis list has five elements
Lists inside lists:a + 5 * e ^ (i * pi + x) + b
Lists (continued)
a + b : ?x + b + ?y { y := 0 }a * b : ?x * b * ?y { y := 1 }a b : ?x b ?y { y := ""}
0, 1 and the empty string "" are identity (or neutral) elements in sums, products and sentences respectively.
Structures
(uni . "Københavns Universitet")(institut . CST)(publications.(2011. pub1 pub2) (2012. pub3 pub4) (2013. pub5 pub6 pub7 ))
Booleans
There is no separate boolean type.Each node in a Bracmat expression
has a success/failure status flag.
a success~ failure1+2 successful evaluation to 3a:b failing match operation
Functions
(=a b. !arg:(?a.?b) & (!b.!a))
$ (jeg.går)
→ går.jeg
local variablesparameter
returned value
function application
argument
result
Special symbols
e Euler’s constant, the basis of the natural logarithm, 2.7182…
x \D (e ^ x) → e ^ xi unit imaginary number
i * i → -1pi the ratio of a circle's
circumference to its diameter e ^ (i * pi) → -1
Arrays
An array “A” is not a variable, it is the stack of all variables named “A”.
tbl$(A,100) create array A size 100117:?(41 $ A) assign to element A[41]!(41$A) inspect element A[41]tbl$(A,0) delete array A
Objects
Objects are the only Bracmat expressions that can change.
(language=(iso639=) (spkrs=))& new$language:?Danish& da:?(Danish..iso639)& 6M:?(Danish..spkrs)
Hash Tables
built-in object typeeffectively store&search key-value pairs.
new$hash:?H& (H..insert)$(Danmark.!Danish)& (H..find)$Danmark
→ Danmark.(=(iso639=da) (spkrs=6M))
Expression Evaluation
DefinitionProgram flowPattern matchingMacro evaluationλ – calculusNormalization
Definition
Right hand side of = operator is not evaluated.
J = ? ?#x ?;
double=.!arg+!arg;double$7 → 14(=.!arg+!arg)$7 → 14
pattern definition
function definition
anonymous function
Program flowEvaluation from left to right, depth first.
firstthis & thenthis ifnotthis | thenthis
whl'(body)
fun$arg (or fun'arg)
!subroutine
Pattern matching
subject : pattern
Pattern doesn’t evolve, side effects possible. Primary result: subject/success or failure.& operator: escape to normal evaluation.@ prefix indicates string pattern matching.
Normal evaluator
Pattern matching evaluator
@( kabbatus : ? (?%x & rev$!x:?y) !y ? )
string pattern matching
string subject
escape to
normal evaluatio
n
side effect: assignment
embedded pattern
matching operationnormally evaluated
No regular expressions. Instead, use @(string:pattern)
or (tree:pattern)
regex pattern
nesting & recursion
no/yes yes
named variables no/yes yesnon-string subject no yesgreedy yes/
nono
Regex: DO {stay in string world}UNTIL (regex clear as mud)THEN use other tool
Bracmat: tokenize input string → treeWHILE(more and more interesting)
{ pattern match tree & make more interesting tree }
A list, to begin with
XPathXQuer
ySQLLINQCQL
Macro evaluation
Replace variable by the value of the variable.
X=6;i=0;mltpls=; ' ( whl ' ( !i+1:<10:?i & !mltpls ""$X*!i:?mltpls ) ): ?code;
Lhs: empty string
Lhs: empty string
Macro after evaluation.
lst$code;
(code= = whl ' ( !i+1:<10:?i & !mltpls 6 *!i:?mltpls ) );
' replaced by =
""$X replaced by 6
Macros
(1) speed up performancetight loopspattern matching over long lists,
(2) incrementally build the ”Greatest Common Pattern”
that matches a given set of data structures. (Tree kernel)
(3) create or change code at run time
λ - calculus
The lambda abstraction
(λx.x)y
translates to
/('(x.$x))$y
For factorial and Fibonacci the hard way, see
http://rosettacode.org/wiki/Y_combinator#Bracmat
Normalization
. , WS + * ^ \L
flatten
identity "" 0 1
sort
combine
Normalization: flatten
(a,b,c),(d,e),f,g → a,b,c,d,e,f,g , , , , → a ,a , , , b , b c d e f g c , d , e , f g
Because of associativity
Normalization: identity
"" a b c "" "" d "" → a b c d
0+a+b+c+0+0+d+0 → a+b+c+d
1*a*b*c*1*1*d*1 → a*b*c*d
Because "" 0 and 1 are neutral elements
Normalization: sort
f+a+b+g+c+e+d → a+b+c+d+e+f+g f*a*b*g*c*e*d → a*b*c*d*e*f*g e*a*b*c*d*f*g
3*b*c+a*b+a*e^(k+j)*i↓
a*b+3*b*c+i*e^(j+k)*a
Because + and * are commutative
Normalization: combine
eat+the+the+zuppe → eat+2*the+zuppe
2*a*b+(a+-1*b)^2 → a^2+b^2
Because multiplication is distributive over
addition
Bracmat in useSurface AppearanceCreate a new programCanonical lay-outedit-save-reformat-reopen loopRequirementsBuildRun codeMemoryUnicode supportSGML, XML, HTML support
Surface Appearance
Operators =.,|&: +*^\L\D'$_
Prefixes [~/#<>%@`?! and !!. Also -
Parentheses and quotes override default.
Comments: { yes, not stored in memory! }
Layout: free, convert to canonical with lst
white space!
Create a new program
(1) From scratch. Easy: no boiler plate.
(2) Wizzard project.bra creates skeleton:description,program stub,facility to help you keep your code nicely formatted.
Canonical lay-out
Principles:- reasonable margins- indented- pair of parentheses aligned (horizont|vertic)ally- heading operator aligned w. parentheses
Some coding errors become conspicuous.
edit-save-reformat-reopen loop
Use combination of terminal window or DOS-prompt and auto-reloading text editor.
Edit your code, save, type !r in Bracmat, (auto)reload program in editor.
{?} (diet =. !arg:? (fish|meat|chicken) ? & carnivore |
{1} !arg:? (yoghurt|milk|cheese) ? & veggie
{1} | !arg:? (fruit|nuts) ? & vegan | dontknow
{1} )
{!} diet
S 0,00 sec
{?} lst$diet
(diet=
. !arg:? (fish|meat|chicken) ? & carnivore | !arg : ? (yoghurt|milk|cheese) ? & veggie | !arg:? (fruit|nuts) ?&vegan | dontknow);{!} diet
Requirements
C-compiler with standard libraries32-bit or 64-bit OSbracmat.c and xml.c (< 20 000 lines in
all)nothing more
Build
E.g. compile and link with GNU-C:gcc -std=c99 -pedantic -Wall -O3 -static -DNDEBUG -o bracmat bracmat.c xml.c
Windows, dynamically linked, 32 bit: 100Kb
Linux, statically linked, 64 bit: 1 Mb
Run code
w/0 parameters: interactive (REPL) each parameter is evaluated, left to
right.Bracmat as embedded s/w:
JNI – call Bracmat from JavaLinked with C or C++ code, Bracmat can be called and can even call C-functions.
Memory
NodesVery compact, 4,8,12,16, … bytes per nodeReference counting
Structure sharingUnreferenced nodes freed
Variablesdynamically scoped (lex.bra → lexical)
Unicode
UTF8 assumed, fall back to ISO-8859-1Ø ø in Unicode, UTF-8:utf$low$chu$216 → 248
Ø ø in ISO-8859-1:asc$low$chr$216 → 248
upp$ру́lсский → РУ́lССКИЙ
SGML, XML, HTML
Robust and fast parsing, done in C. Transformation to Bracmat expression. No nesting of elements. Separate step.XML/HTML entities → UTF-8
get$("<p class='Q' enabled >Samsø<br/</p>",MEM,HT,ML)
→ (p.(class.Q) (enabled.)) Samsø (br.,) (.p.)