13 Data Structure for Language Processing

Embed Size (px)

Citation preview

Chapter - 2

Data structures for Language processingAgendaClassificationSearch Data StructureFixed Size RecordVariable Size RecordHybrid RecordOther OrganizationTree RepresentationHashed RepresentationAllocation Data StructureStack & Extended StackHeap

Classification1. Based on nature ---- Linear and Non-linear eg :- Linear = array , stack etc. Non-Linear = Tree , Graph etc.

2. Based on Purpose --- Search and allocationeg :- Search = Binary search tree allocation = stacks,heaps

3.Based on Lifetime ---- whether used during Language Processing or during target program executions eg :- Lang. Processing = Object based data model Target program = Hash tables Search Data StructuresA Search data structure (or search structure ) is a set of entries accommodating the information concerning one entity. Each entity is assumed to contain a key field which forms the basis for search . Search Data Structures.. 1Fixed Size RecordVariable Size RecordHybrid Record

Search Data Structures.. 2Fixed Size RecordVariable Size RecordHybrid Record

Each entry has same type and sizeEg Array

Search Data Structures.. 3Fixed Size RecordVariable Size RecordHybrid Record

Type and size of each record could be differentSearch Data Structures.. 4Fixed Size RecordVariable Size RecordHybrid Record

Entry has both fixed length part and variable length partEntry Format

Generic Search Procedure for locating the entry of symbol

Binary Search Organization

Hash Tables

h is the Hashing function.S is the symbols for entryS(e) is current entry symbol

Hashing FunctionHashing function is used to make search system faster.It transforms the source symbol or group of symbols to numerical numbers to make faster comparisons and searchingHashing do not change the original meaning of symbols it just transforms them to other form.Size is pre decided for transforming message to particular formatIf message is of less size than that size , it performs folding operationIn folding message is padded with 0s to complete the size of it.

Properties of good hashing func.

Collision in hashingMany function result into same number generation which leads to collision of numbers and searching will crash

Thus to avoid collision we have various collision handling techniques

1. Rehasing technique2. Overflow chaining techniqueAllocation Data StructureImportant Allocation Data Structures

Stack & Extended StackHeap

Stacks

Extended Stack modelAn extended stack is needed for handling a variable length record . A record consists of a set of consecutive stack entriesIn addition to base and top a new pointer Previous is used.

Heaps

Use of Heap in Memory managementDue to repetition of allocation and deallocation of memory area holes are created in memory area.Memory management takes care of this holes and reallocate this area by managing it properlyIt increases performance and speed of allocation and deallocation of memory spaces