38
caring about sharing: little b, a language for building modular models aneil mallavarapu department of systems biology harvard medical school alife-boston wednesday june 15 th , 2005

Caring about sharing: little b, a language for building modular models aneil mallavarapu department of systems biology harvard medical school alife-boston

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

caring about sharing: little b, a language for

building modular models

aneil mallavarapudepartment of systems biology

harvard medical school

alife-bostonwednesday june 15th, 2005

the motivation

• today, models are monolithic and used only by a small cadre of computational biologists

• how can models become a part of everyday scientific life, as gene sequences have become?

• we need a computational framework for building models in a modular and incremental way

how can we make McModelling a reality?

what is a model?• a formal description of a system of

interacting parts

• which enables some useful analysis, for example…

• mechanistic simulation: ODE/PDE, stochastic, boolean, multivalued discrete, hybrid, etc.

• steady-state analysis: flux-balance, metabolic control, null-cline analysis

• statistical analysis: bayes nets

i’ll show you “little b”,

• a programming language built in LISP,

• designed to enable modular description of biological systems,

• and write mathematical models for you.

modularity and extensibility

common lisp ansi x3j13

little b core language + syntax

symbolicmath

“x is used by y”

x y

biology

tools

MATLABSBML?PI?GUI tools?Database?

data structures and logical rules for building biochemical models.reactant reactant-type reaction reaction-type location membrane compartment aggregateenzymatic-reaction …

symbolic mathematicsunit dimension base-dimension gauss quantity polynomial rational-polynomialvar tvar cvar dvar…

object-oriented data structures & syntax + rule-based logic.define defcon defprop defruledefmethod { } – infix operator,[ ] – object operator. – field-access operator

widely supported standardfree & commercial compilers, tools, librariescore language syntax, types and functionsinteger, bignum, float, complex, string, list, array types, support for classes, structures, functions

units & dimensions

models

toy egf receptor model - parts:egf

egfr

mapkkk mapkkk*

mapkk mapkk*

mapk mapk*

“egfr+egf”“under the hood”

toy egf receptor model - reactions:

mapkkk mapkkk*

mapkk mapkk*

mapk mapk*

egf

egfr“egfr+egf”

toy egf receptor model - mechanisms:

mapk__ mapk__*

Kprovide mechanism

mapk__ mapk__*

K+ K+

}}

toy egf receptor model - mathematics:

• given a reaction with n LHS reactants, Ri with stoichiometries si:• s1R1 + … siRi … + snRn

• where the reaction occurs in a location of size Z (which may be a volume, area or length).

• reaction rate, T (moles / size-units / seconds)= • k x [R1]s1 x … [Ri]si … x [Rn]sn

• reaction rate in moles/seconds = T x Z • d[Ri]/dt = T x Z / Ci

• where Ci is the size of the compartment containing R i

e.g., A B2

T = k[A]2

d[A]/dt = … - 2 T ….d[B]/dt = … + T …

A

B

+C T = k[A][B]

d[A]/dt = … - T …d[B]/dt = … - T Zmembrane / Zcompartment

d[C]/dt = … + T …Zmembrane

Zcompartment

the mass action rate-method, calculates T, the rate of the reaction:

the rate-method is modular

which can be substituted:

implemented as a function

or the adventurous can build their own…

now imagine….

• libraries of such components have been previously defined by experts, and are available– over the web– in a database in your lab– in your own personal collection

• b enables these parts to be combined

let’s describe a situation composed of predefined parts:

dish

cell-a

egf

mapkkk

egfr

mapkk

mapk

mapkkk*

ES complex (mapkkk*-mapkk)

mapkk*

ES complex (mapkk*-mapk)

mapk*

b builds symbolic mathematical expressions:

“object-oriented syntax meets symbolic math” enables programmers and theorists to write & debug functions which translate between the world of objects and the world of mathematical expressions.

dish

cell-a

egf

mapkkk

egfr

mapkk

mapk

mapkkk*

ES complex (mapkkk*-mapkk)

mapkk*

ES complex (mapkk*-mapk)

mapk*

the symbolic math subsystem is a an extensible toolkit for theoreticians to express mathematical concepts:

• units, dimensions

• quantities, gaussian distributions

• polynomials

• rational-polynomials

system is extensible:

possible additions:

• radical expressions

• matricies

• poisson distributions

• …others?

an extensible system of units and dimensions

an extensible system of units and dimensions

ok… back to the model:

dish

cell-a

egf

mapkkk

egfr

mapkk

mapk

mapkkk*

ES complex (mapkkk*-mapkk)

mapkk*

ES complex (mapkk*-mapk)

mapk*

set initial conditions… and perform numerical integration in matlab

} shorthand for setting initial condition ofall reactants of a particular type

extend the model with a phosphatase:

mapkkk

mapkk

mapk mapk*

mkp

mkp-gene

ion-channel example:

ion

ion-channel

ion-channel

ion

ion

cell-a

cell-b

dish

multicellular / multicompartmental

nucleus

er

mito{membrane apposition

modularity and extensibility

common lisp ansi x3j13

little b core language + syntax

symbolicmath

biology

tools

units & dimensions

models

aggr

egat

es

aggregates:• an aggregate is a biochemical species which is composed of some number

of other molecules

S1 S2

?

R

S2S1

S1

S2

“RS12”

dimerizing-aggregate calculates every pairwise reaction-type which leads to formation of the complex

situation-independent encoding of reactions

• location-class• location• reaction-type• reactant-type• location-requirement

• reaction• reactant

reactant-type / reactant

• reactant-type is used to describe types of (bio)chemical species

(define ion [simple-reactant-type :location-class compartment])

(define ion-channel [simple-reactant-type :location-class membrane])

• reactant is used to describe a population of molecules of a particular reactant-type in a particular location:

(define c1 [compartment])

(define m1 [membrane]) A.(in cell.inner) :#= [reactant A c1] molecules of reactant-type A in c1

R.(in cell.membrane) :#= [reactant R m1] molecules of reactant-type R in m1

reaction-type / reaction

• a reaction-type describes the logical requirements for a reaction:

A B2

(define A [simple-reactant-type :location-class compartment])(define B [simple-reactant-type :location-class compartment])

(define RT1 [reaction-type {2 A} {B} compartment])

• a reaction is a reaction-type in a particular location:• e.g., if A.(in c1) exists, then [reaction RT1 c1] will be created• if [reaction RT1 c1] exists, then B.(in c1) will be created

{ { {

what is required for the reaction to proceed + stoichiometrywhat is required if the reaction does proceed + stoichiometryclass of location in which the reaction happens

membrane reactions

“:c1” side

“:c2” side

L

R

+

RL

[reaction-type {R + L.(required :c1)} {RL} membrane]

[reaction-type {C + I.(required :c1)} {C + I.(required :c2)} membrane]

ligand binding

“:c1” side

“:c2” sideC

I

C

I

membrane transport

“:c1” side

“:c2” sideR

R [reaction-type {R} {R.(required :inverse)} membrane]

inversion

“:c1” side

“:c2” side

engineer’s “modularity” ≠ biochemist’s “modularity”

hierarchical composition

circuits & software:

component

pathway

“flat spaghetti” composition

biochemistry

how can we name objects?

s

e

pr

by user definition

2-step mechanismimplies “r.es.1”

[simple-reactant-type (_id rs.es.1)]

by hierarchical definition

A

B{A + B}

[aggregate {A + B}]

by composition

A

B

C

A B C

B A C

by structure

B

A C

graph-based reactantsmolecular complexes may be defined according to coarse-grained structure:

e.g., scaffold (S) and two kinases (K1, K2)

k1sh3

ps

atomic reactant-types are defined with user-generated symbols (as before), but…also include sites of interaction

reactant-types representing multimeric complexes are described using graphs

s 1

2

k2sh3

ps

s 1

2

k1sh3

ps

k2sh3

ps

u

p

G = {V,E} where V = verticies, E= edges

V = s.(site 1), s.(site 2), k1.(site :sh3)…E = s.(site 1) (k.(site :sh3), s.(site 2) (k.(site :sh3) … k1.(site :ps) :u

scaffold bound to kinases where kinase1 is unphosphorylated and kinase2 is phosphorylated

some thoughts on language designI think conventional languages are for the birds. They're just extensions of the von Neumann computer, and they keep our noses in the dirt of dealing with individual words and computing addresses, and doing all kinds of silly things like that, things that we've picked up from programming for computers; we've built them into programming languages; we've built them into Fortran; we've built them in PL/1; we've built them into almost every language.

John Backus

“Programs must be written for people to read, and only incidentally for machines to execute”Abelson & Sussman, Structure and Interpretation of Computer Programs

“Intellectually, it is just as worthwhile to design a language programmers will love as it is to design a horrible one that embodies some idea you can publish a paper about.”Paul Graham, Five Questions about Language Design

programming languages as a medium of communication:

human computer

human familiarity

brevity

comprehensibility

computer uniformity

non-redundancy

computability

code safety

fromto

the core language

common lisp ansi x3j13

little b core language + syntax

symbolicmath

biology

toolsobject-oriented data structures & syntax + rule-based logic.define defcon defprop defruledefmethod { } – infix operator,[ ] – object operator. – field-access operator

units & dimensions

models

future

• generalized scaffold and multisite phosphorylation models

• markov chain-based model for representing protein-DNA interactions

• concepts for sharing stochastic models• implementation of shareable mapk, nfkb models with

scaffold• gui tools

modular extensible model building

• separation of biological understanding from mechanism and mathematical assumptions

• automated model construction from reusable parts

• extensible libraries• physical units and dimensions• free, open source software

http://littleb.org

thank you

• jeremy gunawardena

• craig muir, millennium pharmaceuticals

• matt thomson

• vlado gelev

some current approaches

Mathematical notation:

Hoffmann A, Levchenko A, Scott ML, Baltimore D. Related Articles, Links The IkappaB-NF-kappaB signaling module: temporal control and selective gene activation. Science. 2002 Nov

8;298(5596):1241-5. SBML: MAPK Scaffold ModelProc Natl Acad Sci U S A. 2000 May 23;97(11):5818-23. Scaffold proteins may biphasically affect the levels of mitogen-activated protein kinase signaling and reduce its threshold properties. Levchenko A, Bruck J, Sternberg PW.

little b was inspired by work in qualitative physics

• a branch of artificial intelligence which aimed to emulate human-like qualitative reasoning about the physical world (see Kuipers, Forbus, CML)

• a scenario is described in terms of different types of objects and their relationships

• “water is in a pot”• “a flame is under the pot”• what happens?• the computer needs to be able to

compute the implications of a scenario

• based on general rules for reasoning about physical systems