22
1 What is a language? An alphabet is a well defined set of characters. The character is typically used to represent an alphabet. A string : a finite sequence of alphabet symbols, can be , the empty string (Some texts use as the empty string) A language, L, is simply any set of strings (infinite or finite)over a fixed alphabet. can be { }, the empty language.

1 What is a language? An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet. A string : a finite

Embed Size (px)

Citation preview

Page 1: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

1

What is a language?

An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.

A string : a finite sequence of alphabet symbols, can be , the empty string (Some texts use as the empty string)

A language, L, is simply any set of strings (infinite or finite)over a fixed alphabet. can be { }, the empty language.

Page 2: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

2

What is a language? (cont’d)

Examples:Alphabet: A-Z Language: EnglishAlphabet: ASCII Language: C++

Page 3: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

3

Suppose = {a,b,c}. Some languages over could be:

{aa,ab,ac,bb,bc,cc} {ab,abc,abcc,abccc,. . .} { } { } {a,b,c, …

Page 4: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

4

What is a language? (cont’d)

}

} , ,

Alphabet Languages

{0,1} {0,10,100,1000,10000

{0,1,00,11,000,111,

{a,b,c} {abc,Aabbcc,Aaab,bbccc}

Page 5: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

5

Regular Languages

Formally describe tokens in the language– Regular Expressions– NFA– DFA

Page 6: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

6

Regular Expressions

A Regular Expression is a set of rules , techniques for constructing sequences of Symbols (Strings) From an Alphabet.

If A is a regular expression, then L(A) is the language defined by that regular expression.

L(“c”) is the language with the single word “c”.L(“i” “f”) is the language with just “if” in it.

Page 7: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

7

Regular Expressions (cont’d)

L(“if” | “then” | “else”) is the language with just the words “if”, “then”, and “else”.

L((“0” | “1”)(“0” | “1”)) is the language consisting of “00”, “01”, “10” and “11”.

Page 8: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

8

Regular Expressions (cont’d)

Let Σ Be an Alphabet, r a Regular Expression Then L(r) is the Language That is Characterized by the Rules of r

Page 9: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

9

Rules

fix alphabet Σ is a regular exp. (denotes the language {}) If a is in Σ , a is a regular expression (that denotes the

language {a} if r and s are regular exps. denoting L(r) and L(s)

respectively, then so are: (r) | (s) is a regular expression ( denotes the language L(r) L(s) (r)(s) is a regular expression ( denotes the language L(r)L(s) ) (r)* is a regular expression (denotes the language ( L(r)* )

Page 10: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

10

Example

L = {A, B, C, D } D = {1, 2, 3}

A | B | C | D = L

(A | B | C | D ) (A | B | C | D ) = L2

(A | B | C | D )* = L*

(A | B | C | D ) ((A | B | C | D ) | ( 1 | 2 | 3 )) = L (L D)

Page 11: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

11

Regular Expression Operation

There are three basic operations in regular expression :– Alternation (union) RE1 | RE2

– Concatenation (concatenation) RE1 RE2

– Repetition (closure) RE* (zero or more RE’s)

Page 12: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

12

Regular Expression Operation

If P and Q are regular expressions over , then so are:• P | Q (union)

If P denotes the set {a,…,e}, Q denotes the set {0,…,9} then P + Q denotes the set {a,…,e,0,…,9}

• PQ (concatenation)If P denotes the set {a,…,e}, Q denotes the set {0,…,9} then PQ

denotes the set {a0,…,e0,a1,…,e9}• Q* (closure)

If Q denotes the set {0,…,9} then Q* denotes the set {0,…,9,00,…99,…}

Page 13: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

13

Examples

If = {a,b} (a | b)*b b(a│b)*

Page 14: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

14

Regular Expression Overview

Expression Meaning

Empty patterna Any pattern represented by ‘a’ab Strings with pattern ‘a’ followed by ‘b’a|b Strings consisting of pattern ‘a’ or ‘b’a* Zero or more occurrences of patterns in ‘a’a+ One or more occurrences of patterns in ‘a’a3 Patterns in ‘a’ repeated exactly 3 times

Page 15: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

15

A regular expression R describes a set of strings of characters denoted L(R)

L(R) = the language defined by R– L(abc) = { abc }– L(hello|goodbye) = { hello, goodbye }– L(1(0|1)*) = all binary numbers that start with a 1

Each token can be defined using a regular expression

Page 16: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

16

RE Notational Shorthand

R+ one or more strings of R: R(R*) R? optional R: (R|) [abcd] one of listed characters: (a|b|c|d) [a-z] one character from this range: (a|b|c|

d...|z) [^ab] anything but none of the listed chars [^a-z] any character not from this range

Page 17: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

17

Regular Expression, R– a– ab– a|b– (ab)*– (a| )b– digit = [0-9]– posint = digit+

Strings in L(R)– “a”– “ab”– “a”, “b”– “”, “ab”, “abab”, ...– “ab”, “b”– “0”, “1”, “2”, ...– “8”, “412”, ...– “23”, “34”, ...

Page 18: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

18

More Examples

All Strings that start with “tab” or end with bat”:

tab{A,…,Z,a,...,z}*|{A,…,Z,a,....,z}*bat All Strings in which {1,2,3} exist in ascending

order:

{A,…,Z}*1 {A,…,Z}*2 {A,…,Z}*3 {A,…,Z}*

Page 19: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

19

Defining Our Language

The first thing we can define in our language are keywords. These are easy:

if | else | while | find | …

When we scan a file, we can either have a single token represent all keywords, or else break them down into groups, such as “commands”, “types”, etc.

Page 20: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

20

Language Def (cont’d)

Next we will define integers in a language:

digit = “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9”integer = {digit}+

Note that we can abbreviate ranges using the dash (“-”). Thus, digit = 0-9Relation = ‘<’ | ‘<=’ | ‘>’ | ‘>=’ | ‘<>’ | ‘=’Floating point numbers are not much more complicated:

float = {digit}+ “.” {digit}+

Page 21: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

21

Language Def (cont’d)

Identifiers are strings of letters, underscores, or digits beginning with a non-digit. Letter = a-z | A-Z digit = 0-9 Identifier = ({letter})({letter} | “_” | {digit})*

Page 22: 1 What is a language?  An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet.  A string : a finite

22

Real-world example

What is the regular expression that defines all phone numbers?

∑ = { 0-9 }Area = {digit}3

Exchange = {digit}3

Local = {digit}4

Phone_number = “(” {Area} “)” {Exchange} {Local}