6

BIO-MATHEMATICAL CONCEPTS IN - Institutional Repositoryumpir.ump.edu.my/id/eprint/16090/1/Bio-Mathematical... · In this research, some new bio-mathematical concepts involved in the

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BIO-MATHEMATICAL CONCEPTS IN - Institutional Repositoryumpir.ump.edu.my/id/eprint/16090/1/Bio-Mathematical... · In this research, some new bio-mathematical concepts involved in the
Page 2: BIO-MATHEMATICAL CONCEPTS IN - Institutional Repositoryumpir.ump.edu.my/id/eprint/16090/1/Bio-Mathematical... · In this research, some new bio-mathematical concepts involved in the

on il of

te, T. ;osan n of 88.

the ·ocess after

3Eng. nologi l with )mi cal ·ersity 3 and lD in nologi doing which tent of

and

i ·

s been nt of : since 9d h er

from 1laysia, .duated iemical JMIST, :>). Her . odeling ystems, process

and , also rnditing

safety MP &

ved the M.Eng

ineering 'eknologi she is a

same current

nicrobial Dduction, ioreactor

BIO-MATHEMATICAL CONCEPTS IN DNA SPLICING SYSTEM

Fong Wan Heng 1, Nor Haniza Sarmin2 and Yuhani Yusof3

11bnu Sina Institute for Fundamental Science Studies

2•

3Department of Mathematics, Faculty of Science

Universiti Teknologi Malaysia, 81310 UTM Skudai, Johor, Malaysia

ABSTRACT

Splicing system is a formal characterization of the generative capacity of specified enzymatic activities operating on specified set of double-stranded DNA molecules. The formalism of this splicing system is illustrated under the framework of Formal Language Theory which is a branch of applied discrete mathematics and theoretical computer science. The mathematical fonnalism that will be used in this research is the splicing system concept that was first initiated by Head in 1987. In this research, some new bio-mathematical concepts involved in the DNA splicing system will be presented. These concepts include factor of a language and constant ofa language. The regularity of these concepts will also be presented with their proofs.

1. INTRODUCTION

Splicing system was first introduced by Head in 1987. A splicing system comprises of the set A of alphabets in Deoxyribonucleic Acid (DNA), set 1 of initial strings, and sets B and C of triples, also known as the rules. The set of alphabets include Adenine (A) , Guanine (G), Cytosine IC) and Thymine (I) , which are the bases of DNA in their nucleotide chain. When restriction enzymes are added to DNA molecules, these molecules will be cleaved at certain restriction site of the molecule. After the cleavage, the resulting fragments can be adjoined together to form new strings of DNA molecules. This process is also called the ligating process. The language "hich results from a splicing system is called a splicing language.

The relation between DNA splicing system and Fonnal Language Theory can be shown through a mathematical modeling of splicing system. DNA molecules are modeled as strings in Formal Language

147

Theory, while enzymatic operations are modeled as a set of splicing rules . The recombinant behavior of DNA is modeled as a language (normally denoted as L) in a splicing system.

Splicing languages are regular, but not all regular languages are splicing languages (Gatterdam, 1989). In the following two sections, the factor of a language, constant of a language and their regularity will be discussed using some concepts of automaton.

2. FACTOR OF LANGUAGE

Let A be a finite set to serve as the alphabet. l is a regular language and Fac(L) is the set of all factors of L. The definition of Fac(L) is given in the following .

Definition 2.1: Fac(l) Fac(l) = {x in A*: uxv in L for some u, v in A*} .

Suppose that L(M} is the language recognized by the automaton M'. We can show that L(M} = Fac(L) as given in the following theorem.

Theorem 2.1 l(M} = Fac(L).

Proof. First, let w E L(M). There are p E P and q E Q for

which pwq is a recognition path in M'. Since p is accessible in M and q is coaccessible in M, there are states i EI and I E T, u, v E A* for which upwqv is a labeled path from state i to state I. Thus uwv E L(M) =

l . Hence W E Fac(L) . Next, let w EFac(L). Then there exist u, v in A• such

that uwv E l = L(M). So, there are i E ! , t E T, for which uwv is a labeled path from state i to state t. Let p , q E Q for which upwqv is a labeled path from state i to

Page 3: BIO-MATHEMATICAL CONCEPTS IN - Institutional Repositoryumpir.ump.edu.my/id/eprint/16090/1/Bio-Mathematical... · In this research, some new bio-mathematical concepts involved in the

state t. Then pwq is a recognition path in M'. So wEL(M). o

Since the next few theorems will involve the concept of regularity, the definition of a regular language is given in the following.

Definition 2.2 (Linz, 2001): Regular A language L is called regular if and only if there exist a deterministic finite accepter M such that L = L(M).

Regular expressions are constructed from primitive constituents by repeatedly applying certain recursive rules. The definitions of primitive regular expressions and regular expressions are given next.

Definition 2.3 (Linz, 2001): Primitive Regular Expressions, Regular Expressions

Let I be a given alphabet. Then (I) ¢ ,}.,and a EI are all regular expressions. These are called primitive regular expressions. (2) If r 1 and r2 are regular expressions, so are r 1+r2, r 1 •r2,

r 1 • and (r1)(r2). (3) A string is a regular expression if and only if it can be derived from the primitive regular expressions by a finite number of applications of the rules in (2).

For the definition of regular language, deterministic finite accepter is used instead of deterministic finite automaton since deterministic finite automaton (Kelly, 1995) is also known as deterministic finite accepter in (Linz, 2001 ).

The following theorem is on the regularity of Fac(L) given that l is a regular language.

Theorem 2.2 If l is regular language, then Fac(l) is regular.

Proof. Since l is regular, let L be recognized by the trimmed

automaton M = (Q, I, 7), where Q is the set of states, I is the set of initial states, and Tis the set of terminal states. So L(M) = l. Let M' be the automaton (Q, Q, Q) where all states are both initial and final. By Theorem 2.1, l(M) = Fac(L). So, Fac(L) is recognized by the automaton M'. Thus Fac(L) is regular. o

However, the converse of Theorem 2.2 is not true as shown using a counterexample in Theorem 2.3.

Theorem 2.3 If Fae (L) is regular, then Lis not necessarily regular.

Proof. Let A = {a,b} . l = {w in A•: lwla = lwlb} is not

regular, but Fac(l) = A* is regular. For any wE A•= Fac(L) , let lwl0 2: lwl6, then wb ... b EL, where the number of bs is equal to lwla - lwl&· Thus lwb ... bla = lwb .. . bl&, wEL but L is not regular. Therefore, Fac(L) is regular

does not imply that L is regular. o

In the next section, some characteristics regarding constant of a language are given.

3. CONSTANT OF A LANGUAGE

Let Con(l) be the set of all constant factors of L. The definition of Con(L) is given as follows.

Definition 3.1 Con(L) Con(l) = {x in Fac(l): pxq and uxv in L imply that p.n and uxq are also in L}.

We can show that any word containing an element in Con(l) is also in Con(L) as given in the following theorem.

Theorem 3.1 If c E Con(L) , then w = xcy E Con(L).

Proof. If pwq EL and uwv EL, then pwq = pcxyq El and 1111·1·

= uxcyv El. Since c E Con(L) , pxcyv = pwv El and uxcyq = uwq E L. Thus, WE Con(l). o

The language recognized by the automaton P' are the words which are not constant relative to the language l. This is shown in the following theorem.

Theorem 3.2 l = (P') = {WE A· : w is not constant relative to L}.

Proof. First, we show L(P')<:;;;; ={wEA · : w is not constant

relative tol}. Let wEL (P'), (p1'p2 )~(qpq2 ) in?',

qi* qi, P1 ~q1 , /J2 ~q2 EM. Since q1 :tq1 q1 ~11,q2 ~r2 where exactly one of {1j,12} is terminal and one is not. Note that there are u, v in A' for which i ~ p 1, i ~ p 2 • Without loss of generality,

and (q1,q2

Case 2: that is, r'

iv1 P2 wq2u2r (Pi •P2)w(q1 and (qpq

2)

Therefore, w

Theorem condition for 1

Theorem 3.3 lf L is regular I

Proof.

Let Mbe a recognizing l \ l. Construct p,

w~ere £' s(Q x .

is 111 £' where(p P'=(QxQ,Qx

1

Theorem 3.2, l(P' L} · Thus the set Consequently Con i

. However, Con( mg regular as eorem J.4.

eorem 3.4

Con(L) isregular,

oof.

Let A ={a, b}. • t regular. Let 11' E

ab, U = OJ. I' '= bb p d pwq EE. that j~ ababab :£ Th ' . us

let 11 be terminal and let r2 be not terminal. Since q is fn gene ' ive · can

coaccessible then there is a word t E A• for which numberfa's == /w q2 ~r3 with 13 terminal. Thus O+/w/h + . i~tj,i~r3 • So uws, vwt EL. If w were a constant, then vws would lie in l, which would contradict For b,bival

0 6 r2 being not terminal. Thus, w is not constant relative to + /wL, + and the

l. WEA*,icon(L),a ber ofj == /w/ +

Next, we show {wEA*: w is not constant relative t == O+/w/, J, whi~h L} <:;;;; L(P'). Let w be not constant. Then ther bwbL b£ since the are u1 wu2 EL and v1 wv2 EL which are both accepted b) number ?'s == /w/ + L(M) for which either .u1wv2 1£.L or v1wu2 1£.L. Thus There Con(Lf)= there exist n11p1wq1u21"1v1p 2 wq2 v212 •

Case l: If u1wv2 1£.L, there exist iu1p1wq1v2r,r:tl1 that is, r is not terminal. From iv1 p2 wq2 v2t2 ar iu1p 1wq1v2r,q1 * q2 • Therefore in P', there exist a pa (p"p2 )w(qpq2), where(p1'p2 ) is an initial state of

Page 4: BIO-MATHEMATICAL CONCEPTS IN - Institutional Repositoryumpir.ump.edu.my/id/eprint/16090/1/Bio-Mathematical... · In this research, some new bio-mathematical concepts involved in the

state t. Then pwq is a recognition path in M'. So wEL(M').o

Since the next few theorems will involve the concept of regularity, the defi nition of a regular language is given in the following.

Definition 2.2 (Linz, 2001): Regular A language L is called regular if and only if there exist a deterministic finite accepter M such that L = L(M).

Regular expressions are constructed from primitive constituents by repeatedly applying certain recursive rules. The definitions of primitive regular expressions and regular express ions are given next.

Definition 2.3 (Linz, 2001): Primitive Regular Expressions, Regular Expressions

Let I be a given alphabet. Then ( I ) ¢,A. and a E I are all regular expressions. These are called primitive regular expressions. (2) If r1 and r2 are regular express ions, so are r 1+r2, r1•r2,

r 1 • and (r1)(r2). (3) A string is a regular express ion if and only if it can be derived from the primitive regular express ions by a finite number of applications of the rules in (2) .

For the definition of regular language, deterministic finite accepter is used instead of deterministic finite automaton since deterministic finite automaton (Kelly, 1995) is also known as deterministic finite accepter in (L inz, 2001).

The following theorem is on the regularity of Fac(L) given that L is a regular language.

Theorem 2.2 lf L is regular language, then Fac(L) is regular.

Proof. Since L is regular, let L be recognized by the trimmed

automaton M = (Q, 1, 1), where Q is the set of states, f is the set of initial states, and Tis the set of terminal states. So L(M) = L. Let M' be the automaton (Q, Q, Q) where all states are both initial and final. By Theorem 2.1, L(M') = Fac(L). So, Fac(L) is recognized by the automaton M'. Thus Fac(L) is regular. o

However, the converse of Theorem 2.2 is not true as shown using a counterexample in Theorem 2.3.

Theorem 2.3 If Fae (L) is regular, then Lis not necessarily regular.

Proof. Let A = {a,b}. L = {w in A·: lwla = lwlb} is not

regular, but Fac(L) =A• is regular. For any WE A•= F ac(L ), let lwl0 2: lwlb, then wb ... b EL, where the number of bs is equal to lwla - lwlb· Thus lwb ... bla = lwb ... blb, wEL but L is not regular. Therefore, Fac(L) is regular

does not imply that L is regular. o

In the next section, some characteristics reg constant of a language are given.

3. CONSTANT OF A LANGUAGE

Let Con(L) be the set of all constant factors of The definition of Con(L) is given as follows.

Definition 3.1 Con(L) Con(L) = {x in Fac(L): pxq and uxv in L imply that and uxq are also in L}.

We can show that any word containing an element Con(L) is also in Con(L) as given in the folio theorem.

Theorem 3.1 If c E Con(L ), then w = xcy E Con(L ).

Proof. If pwq EL and uwv EL, then pwq = pcxyq El and

= uxcyvE L. Since CE Con(L) , pxcyv = pwl'EL uxcyq = uwq E L. Thus, w E Con(L ). o

The language recognized by the automaton P' are words which are not con tant relative to the language This is shown in the following theorem.

Theorem 3.2 L = (P') = { w EA•: w is not constant relative to Ll.

Proof. First, we show L(P')~ ={wE A · : w is not co

relative to L}. Let w EL (P') , (pl' p 2 ) ~(C/pC/2 ) in qi * qz, Pi ~qi, P2 ~ q2 E M. Since q, ~ 'l i __ c__Hj , q 2 ~r2 where exactly one of {V:I terminal and one is not. Note that there are u, vinl which i ~Pi, i ~ p 2 • Without loss of gene let 'i be terminal and let r2 be not terminal. Since q coaccessible then there is a word t E A' for w

q2 ~ r3 with r3 terminal. i~l],i~r3 . So uws, vwt EL constant, then vws would lie in L , which would con r2 being not terminal. L.

Next, we show {wEA·: w is not constant relative L} ~ L(P') . Let w be not constant. Then are ui wu2 EL and vi wv2 E L which are both accepted L(M) for which either ui wv2 ~ L or vi wu2 ~ L . there exist iuipi wqiu/p ivi p2wq2v212 .

Case l: If ui wv2 ~ L , there exist that is, r is not terminal. From ivip2wq2v/2

iui Pi wqi v2r, qi * q2 • Therefore in P' , there exist a (pl' p 2 )w( qi' q2 ) , where (pi' pi) is an initial state of

Case that is,

IV1P2 wqz (p., P2) and (qi' Therefor

Theo conditio

beore If l is re

on equ

How being r Theore

In

For

-o+I If WE

Page 5: BIO-MATHEMATICAL CONCEPTS IN - Institutional Repositoryumpir.ump.edu.my/id/eprint/16090/1/Bio-Mathematical... · In this research, some new bio-mathematical concepts involved in the

·ding

of L.

:itpxv

1ent in owing

id uwv L and

are the c1age L .

:onstant

1) in P' ,

qi* q2 I\ , r2 } is in A• for

:nerality, 1ce q is r which

Thus v were a ontradict ~lative to

~\ative to ien there :epted by L. Thus,

12r,r * t 2 ,

2v2 t2 and ist a path .tate of P'

and (q1>q2)isaterminalstateofP'. Thus,wEL(P').

Case 2: If v1 wu2 "L, there exist iv1p 2 wq2u2r ', r' * t1 ,

that is, r' is not terminal. From iu1 p 1 wq1u2t1 and iv1p 2 wq2u2r',q1 * q2 • Therefore in P', there exist a path (pl> p 2 )w( qi> q2 ) , where (pl> p 2 ) is an initial state of P' and (q1>q 2 ) is a tenninal state of P'. Thus, w E L(P'). Therefore, w E L(P ') in both cases. o

Theorem 3.3 states that Con(L) 1s a necessary condition for L being regular.

Theorem 3.3 lf Lis regular language, then Con(L) is regular.

Proof. Let M be a minimal intrinsic deterministic automaton

recognizing L where M = (Q, 1, T, E). Therefore L(M) = L. Construct P = M x M = (Q x Q, 1 x 1, T x T, E') where E's;;;; (Q x Q) x A x (Q x Q), (p1, P2 )----"----t( q1, q2) is in E' where (p1, a, q1) E E and (p2 , a, qi) EE . Define P' = (Qx Q,Qx Q, {(p,q) E Q x Q: p * q},E'). From Theorem 3.2, L(P') = {wEA· : w is not constant relative to L}. Thus the set of non-constants, A• I Con(L) is regular. Consequently Con(L) is regular. o

However, Con(L) is not a sufficient condition for L being regular as shown using a counterexample in Theorem 3 .4.

Theorem 3.4 If Con(L) is regular, then L is not necessarily regular.

Proof. LetA = {a, b}. The setE = {winA' :lwL = lwjb} is

not regular. Let WE Con(L) , uwv E E andpwq E E. Let w =ab, u = aa, v = bb, p = abab, q = abab. From uwv E E and pwq EE, that is aaabbb and ababababab, uwq = aaababab "E. Thus ab" Con(L) and in fact, Con(L) =¢.

1yi. 1y., Tn general, we can show that for aL awbL b E E,

the number of a's = jwjb + jwj" + 0 and the number of b's

=O+jwjb +lwla · 1y.. 1)1>

For bL bwaL a EE , the number of a's

=O+iw10,+jwjband the number of b 's =lwla +lwlb +O.

If w E A , w E Con(L),aL awaL a "E since the number of a's = jwjb + jwj

0 + jwjb and the number of

b's = 0 + lwl,, + 0, which has too many a's. Similarly, bL bwbL b" E since the number of a's = 0 + jwj

0 + 0 and

the number of b's =Ha+ jwj" + jwj0

which has too many b's. Therefore Con(L)) = ¢. o

149

CONCLUSION

In this research, some new bio-mathematical concepts in DNA splicing system are introduced. These include factor of a language and constant of a language. Since all splicing languages are regular, but not all regular languages are splicing languages, the regularity of the new concepts are discussed through theorems and their proofs .

ACKNOWLEDGEMENT

The authors would like to thank Ibnu Sina Institute for the financial funding to this symposium. The authors would also like to thank the Research Management Centre, Universiti Teknologi Malaysia, Skudai, Johor, Malaysia for the financial funding through Vote No. 77924 and 78482. The third author would also like to acknowledge Universiti Malaysia P AHANG for the study leave and Ministry of Higher Education for SLAI scholarship.

REFERENCES

Gatterdam, R. W. , Splicing Systems and Regularity, international Journal of Computer Math. , vol. 31 , pp. 63-67, 1989.

Head, T. , Formal Language Theory and DNA: An Analysis of the Generative Capacity of Specific Recombinant Behaviors, Bulletin of Mathematica/ Biology, vol. 49, pp. 737-759, 1987.

Kelly, D. , Automata and Formal Languages , Prentice-Hall Inc. , USA, 1995 .

Linz, P. , An introduction lo Formal Languages and Automata, Jones and Bartlett Publishers lnc. , USA, 200 I.

NOMENCLATURE

A* :

1 : B, C:

Con(L) Fac(L)

L L(M)

lwla p-q

strings obtained by concatenating zero or more symbols from A set of initial strings rules in splicing system the set of all constant factors of L the set of all factors of L language language recogni zed by an automaton M ' number of as in w state p to state q

Page 6: BIO-MATHEMATICAL CONCEPTS IN - Institutional Repositoryumpir.ump.edu.my/id/eprint/16090/1/Bio-Mathematical... · In this research, some new bio-mathematical concepts involved in the

Fong Wan Heng received the B.Sc. (Industrial Mathematics) in 2003 , M.Sc. (2004) and Ph.D (2008) in Mathematics from Universiti Teknologi Malaysia.

She is a lecturer at Ibnu Sina Institute for Fundamental Science Studies, Universiti Teknologi

Malaysia, Skudai, Johor. Her current interests include Group Theory and DNA Splicing System.

Nor Haniza Sarmin received the B.Sc. (1989), M.A. (1990) and Ph .D (I 998) in Mathematics from State University of New York, Binghamton, USA.

She is an Associate Professor at Department of Mathematics, Faculty of Science, Universiti Teknologi Malaysia, Skudai, Johor. Her current interests include Group Theory and DNA Splicing System.

Yuhani Yusof received the B.Sc. (Industrial Mathematics) in 2003 and M. Sc. (Mathematics) in 2005 from Universiti Teknologi Malaysia.

She is a Ph .D (Mathematics) student in Universiti Teknologi Malaysia, Skudai , Johor. Her current interest is DNA Splicing System.

M [

ABSTRA< The aim c based pa1 properties. (LDPE)/ta1 starch con with the polyethyle used as p thus, degr blown fit mechanic< out by Fe tensile t respectivE adverse e starch ble the blend two mate the films to increa: starch fit fungi em Keywon aid, tapic

1. INTR Approxi are proc tradition of post househo and Ya designec more ec ofmam issues, popular

petrole1 bi ode gr